Search in sources :

Example 6 with WindowingStrategy

use of org.apache.beam.sdk.values.WindowingStrategy in project beam by apache.

the class SparkSideInputReader method get.

@Override
@Nullable
public <T> T get(PCollectionView<T> view, BoundedWindow window) {
    // --- validate sideInput.
    checkNotNull(view, "The PCollectionView passed to sideInput cannot be null ");
    KV<WindowingStrategy<?, ?>, SideInputBroadcast<?>> windowedBroadcastHelper = sideInputs.get(view.getTagInternal());
    checkNotNull(windowedBroadcastHelper, "SideInput for view " + view + " is not available.");
    // --- sideInput window
    final BoundedWindow sideInputWindow = view.getWindowMappingFn().getSideInputWindow(window);
    // --- match the appropriate sideInput window.
    // a tag will point to all matching sideInputs, that is all windows.
    // now that we've obtained the appropriate sideInputWindow, all that's left is to filter by it.
    Iterable<WindowedValue<?>> availableSideInputs = (Iterable<WindowedValue<?>>) windowedBroadcastHelper.getValue().getValue();
    Iterable<?> sideInputForWindow = StreamSupport.stream(availableSideInputs.spliterator(), false).filter(sideInputCandidate -> {
        if (sideInputCandidate == null) {
            return false;
        }
        return Iterables.contains(sideInputCandidate.getWindows(), sideInputWindow);
    }).collect(Collectors.toList()).stream().map(WindowedValue::getValue).collect(Collectors.toList());
    switch(view.getViewFn().getMaterialization().getUrn()) {
        case Materializations.ITERABLE_MATERIALIZATION_URN:
            {
                ViewFn<IterableView, T> viewFn = (ViewFn<IterableView, T>) view.getViewFn();
                return viewFn.apply(() -> sideInputForWindow);
            }
        case Materializations.MULTIMAP_MATERIALIZATION_URN:
            {
                ViewFn<MultimapView, T> viewFn = (ViewFn<MultimapView, T>) view.getViewFn();
                Coder<?> keyCoder = ((KvCoder<?, ?>) view.getCoderInternal()).getKeyCoder();
                return viewFn.apply(InMemoryMultimapSideInputView.fromIterable(keyCoder, (Iterable) sideInputForWindow));
            }
        default:
            throw new IllegalStateException(String.format("Unknown side input materialization format requested '%s'", view.getViewFn().getMaterialization().getUrn()));
    }
}
Also used : KvCoder(org.apache.beam.sdk.coders.KvCoder) Coder(org.apache.beam.sdk.coders.Coder) IterableView(org.apache.beam.sdk.transforms.Materializations.IterableView) MultimapView(org.apache.beam.sdk.transforms.Materializations.MultimapView) WindowingStrategy(org.apache.beam.sdk.values.WindowingStrategy) ViewFn(org.apache.beam.sdk.transforms.ViewFn) WindowedValue(org.apache.beam.sdk.util.WindowedValue) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) Nullable(org.checkerframework.checker.nullness.qual.Nullable)

Example 7 with WindowingStrategy

use of org.apache.beam.sdk.values.WindowingStrategy in project beam by apache.

the class SparkCombineFnTest method testSlidingCombineFnNonMerging.

@Test
public void testSlidingCombineFnNonMerging() throws Exception {
    WindowingStrategy<Object, IntervalWindow> strategy = WindowingStrategy.of(SlidingWindows.of(Duration.millis(3000)).every(Duration.millis(1000)));
    SparkCombineFn<KV<String, Integer>, Integer, Long, Long> sparkCombineFn = SparkCombineFn.keyed(combineFn, opts, Collections.emptyMap(), strategy, SparkCombineFn.WindowedAccumulator.Type.NON_MERGING);
    Instant now = Instant.ofEpochMilli(0);
    WindowedValue<KV<String, Integer>> first = input("key", 1, now.plus(Duration.millis(5000)), strategy.getWindowFn());
    WindowedValue<KV<String, Integer>> second = input("key", 2, now.plus(Duration.millis(1500)), strategy.getWindowFn());
    WindowedValue<KV<String, Integer>> third = input("key", 3, now.plus(Duration.millis(500)), strategy.getWindowFn());
    SparkCombineFn.WindowedAccumulator<KV<String, Integer>, Integer, Long, ?> c1 = sparkCombineFn.createCombiner(first);
    SparkCombineFn.WindowedAccumulator<KV<String, Integer>, Integer, Long, ?> c2 = sparkCombineFn.createCombiner(third);
    sparkCombineFn.mergeValue(c1, second);
    SparkCombineFn.WindowedAccumulator<KV<String, Integer>, Integer, Long, ?> c3 = sparkCombineFn.mergeCombiners(c1, c2);
    Iterable<WindowedValue<Long>> output = sparkCombineFn.extractOutput(c3);
    assertEquals(7, Iterables.size(output));
    List<String> format = StreamSupport.stream(output.spliterator(), false).map(val -> val.getValue() + ":" + val.getTimestamp().getMillis()).collect(Collectors.toList());
    assertUnorderedEquals(Lists.newArrayList("3:999", "5:1999", "5:2999", "2:3999", "1:5999", "1:6999", "1:7999"), format);
}
Also used : KV(org.apache.beam.sdk.values.KV) SerializablePipelineOptions(org.apache.beam.runners.core.construction.SerializablePipelineOptions) WindowedValue(org.apache.beam.sdk.util.WindowedValue) Combine(org.apache.beam.sdk.transforms.Combine) Duration(org.joda.time.Duration) PipelineOptionsFactory(org.apache.beam.sdk.options.PipelineOptionsFactory) ArrayList(java.util.ArrayList) Sessions(org.apache.beam.sdk.transforms.windowing.Sessions) SlidingWindows(org.apache.beam.sdk.transforms.windowing.SlidingWindows) Map(java.util.Map) Iterables(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables) GlobalWindow(org.apache.beam.sdk.transforms.windowing.GlobalWindow) StreamSupport(java.util.stream.StreamSupport) CombineFnUtil(org.apache.beam.sdk.util.CombineFnUtil) Before(org.junit.Before) PaneInfo(org.apache.beam.sdk.transforms.windowing.PaneInfo) WindowFn(org.apache.beam.sdk.transforms.windowing.WindowFn) Lists(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists) Test(org.junit.Test) Collectors(java.util.stream.Collectors) CombineWithContext(org.apache.beam.sdk.transforms.CombineWithContext) List(java.util.List) Stream(java.util.stream.Stream) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) Instant(org.joda.time.Instant) IntervalWindow(org.apache.beam.sdk.transforms.windowing.IntervalWindow) Collections(java.util.Collections) Assert.assertEquals(org.junit.Assert.assertEquals) WindowingStrategy(org.apache.beam.sdk.values.WindowingStrategy) Instant(org.joda.time.Instant) KV(org.apache.beam.sdk.values.KV) WindowedValue(org.apache.beam.sdk.util.WindowedValue) IntervalWindow(org.apache.beam.sdk.transforms.windowing.IntervalWindow) Test(org.junit.Test)

Example 8 with WindowingStrategy

use of org.apache.beam.sdk.values.WindowingStrategy in project beam by apache.

the class ReshuffleTest method testReshufflePreservesTimestamps.

/**
 * Tests that timestamps are preserved after applying a {@link Reshuffle} with the default {@link
 * WindowingStrategy}.
 */
@Test
@Category(ValidatesRunner.class)
public void testReshufflePreservesTimestamps() {
    PCollection<KV<String, TimestampedValue<String>>> input = pipeline.apply(Create.timestamped(TimestampedValue.of("foo", BoundedWindow.TIMESTAMP_MIN_VALUE), TimestampedValue.of("foo", new Instant(0)), TimestampedValue.of("bar", new Instant(33)), TimestampedValue.of("bar", GlobalWindow.INSTANCE.maxTimestamp())).withCoder(StringUtf8Coder.of())).apply(WithKeys.<String, String>of(input12 -> input12).withKeyType(TypeDescriptors.strings())).apply("ReifyOriginalTimestamps", Reify.timestampsInValue());
    // The outer TimestampedValue is the reified timestamp post-reshuffle. The inner
    // TimestampedValue is the pre-reshuffle timestamp.
    PCollection<TimestampedValue<TimestampedValue<String>>> output = input.apply(Reshuffle.of()).apply("ReifyReshuffledTimestamps", Reify.timestampsInValue()).apply(Values.create());
    PAssert.that(output).satisfies(input1 -> {
        for (TimestampedValue<TimestampedValue<String>> elem : input1) {
            Instant originalTimestamp = elem.getValue().getTimestamp();
            Instant afterReshuffleTimestamp = elem.getTimestamp();
            assertThat("Reshuffle must preserve element timestamps", afterReshuffleTimestamp, equalTo(originalTimestamp));
        }
        return null;
    });
    pipeline.run();
}
Also used : TypeDescriptors.integers(org.apache.beam.sdk.values.TypeDescriptors.integers) KV(org.apache.beam.sdk.values.KV) IsIterableContainingInAnyOrder.containsInAnyOrder(org.hamcrest.collection.IsIterableContainingInAnyOrder.containsInAnyOrder) Duration(org.joda.time.Duration) RunWith(org.junit.runner.RunWith) StringUtf8Coder(org.apache.beam.sdk.coders.StringUtf8Coder) Sessions(org.apache.beam.sdk.transforms.windowing.Sessions) TestPipeline(org.apache.beam.sdk.testing.TestPipeline) Is.is(org.hamcrest.core.Is.is) Window(org.apache.beam.sdk.transforms.windowing.Window) ValidatesRunner(org.apache.beam.sdk.testing.ValidatesRunner) GlobalWindow(org.apache.beam.sdk.transforms.windowing.GlobalWindow) MatcherAssert.assertThat(org.hamcrest.MatcherAssert.assertThat) TimestampedValue(org.apache.beam.sdk.values.TimestampedValue) KvCoder(org.apache.beam.sdk.coders.KvCoder) KvMatcher.isKv(org.apache.beam.sdk.TestUtils.KvMatcher.isKv) PAssert(org.apache.beam.sdk.testing.PAssert) FixedWindows(org.apache.beam.sdk.transforms.windowing.FixedWindows) Lists(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists) Test(org.junit.Test) JUnit4(org.junit.runners.JUnit4) VarLongCoder(org.apache.beam.sdk.coders.VarLongCoder) PCollection(org.apache.beam.sdk.values.PCollection) Category(org.junit.experimental.categories.Category) Serializable(java.io.Serializable) UsesTestStream(org.apache.beam.sdk.testing.UsesTestStream) AssignShardFn(org.apache.beam.sdk.transforms.Reshuffle.AssignShardFn) List(java.util.List) Rule(org.junit.Rule) Matchers.equalTo(org.hamcrest.Matchers.equalTo) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) TypeDescriptors(org.apache.beam.sdk.values.TypeDescriptors) Instant(org.joda.time.Instant) VarIntCoder(org.apache.beam.sdk.coders.VarIntCoder) TestStream(org.apache.beam.sdk.testing.TestStream) ImmutableList(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList) Assert.assertEquals(org.junit.Assert.assertEquals) WindowingStrategy(org.apache.beam.sdk.values.WindowingStrategy) TimestampedValue(org.apache.beam.sdk.values.TimestampedValue) Instant(org.joda.time.Instant) KV(org.apache.beam.sdk.values.KV) Category(org.junit.experimental.categories.Category) Test(org.junit.Test)

Example 9 with WindowingStrategy

use of org.apache.beam.sdk.values.WindowingStrategy in project beam by apache.

the class BeamFnMapTaskExecutorFactory method createOperationTransformForFetchAndFilterStreamingSideInputNodes.

private Function<Node, Node> createOperationTransformForFetchAndFilterStreamingSideInputNodes(MutableNetwork<Node, Edge> network, IdGenerator idGenerator, InstructionRequestHandler instructionRequestHandler, FnDataService beamFnDataService, Endpoints.ApiServiceDescriptor dataApiServiceDescriptor, DataflowExecutionContext executionContext, String stageName) {
    return new TypeSafeNodeFunction<FetchAndFilterStreamingSideInputsNode>(FetchAndFilterStreamingSideInputsNode.class) {

        @Override
        public Node typedApply(FetchAndFilterStreamingSideInputsNode input) {
            OutputReceiverNode output = (OutputReceiverNode) Iterables.getOnlyElement(network.successors(input));
            DataflowOperationContext operationContext = executionContext.createOperationContext(NameContext.create(stageName, input.getNameContext().originalName(), input.getNameContext().systemName(), input.getNameContext().userName()));
            return OperationNode.create(new FetchAndFilterStreamingSideInputsOperation<>(new OutputReceiver[] { output.getOutputReceiver() }, operationContext, instructionRequestHandler, beamFnDataService, dataApiServiceDescriptor, idGenerator, (Coder<WindowedValue<Object>>) output.getCoder(), (WindowingStrategy<?, BoundedWindow>) input.getWindowingStrategy(), executionContext.getStepContext(operationContext), input.getPCollectionViewsToWindowMappingFns()));
        }
    };
}
Also used : OutputReceiverNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.OutputReceiverNode) WindowedValueCoder(org.apache.beam.sdk.util.WindowedValue.WindowedValueCoder) KvCoder(org.apache.beam.sdk.coders.KvCoder) Coder(org.apache.beam.sdk.coders.Coder) FetchAndFilterStreamingSideInputsNode(org.apache.beam.runners.dataflow.worker.graph.Nodes.FetchAndFilterStreamingSideInputsNode) OutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver) TypeSafeNodeFunction(org.apache.beam.runners.dataflow.worker.graph.Networks.TypeSafeNodeFunction) CloudObject(org.apache.beam.runners.dataflow.util.CloudObject) WindowingStrategy(org.apache.beam.sdk.values.WindowingStrategy)

Example 10 with WindowingStrategy

use of org.apache.beam.sdk.values.WindowingStrategy in project beam by apache.

the class BatchGroupAlsoByWindowReshuffleDoFnTest method makeRunner.

private static <K, InputT, OutputT, W extends BoundedWindow> DoFnRunner<KV<K, Iterable<WindowedValue<InputT>>>, KV<K, OutputT>> makeRunner(GroupAlsoByWindowDoFnFactory<K, InputT, OutputT> fnFactory, WindowingStrategy<?, W> windowingStrategy, TupleTag<KV<K, OutputT>> outputTag, DoFnRunners.OutputManager outputManager) {
    final StepContext stepContext = new TestStepContext(STEP_NAME);
    StateInternalsFactory<K> stateInternalsFactory = key -> stepContext.stateInternals();
    BatchGroupAlsoByWindowFn<K, InputT, OutputT> fn = fnFactory.forStrategy(windowingStrategy, stateInternalsFactory);
    return new GroupAlsoByWindowFnRunner<>(PipelineOptionsFactory.create(), fn, NullSideInputReader.empty(), outputManager, outputTag, stepContext);
}
Also used : Arrays(java.util.Arrays) KV(org.apache.beam.sdk.values.KV) StateInternalsFactory(org.apache.beam.runners.core.StateInternalsFactory) WindowedValue(org.apache.beam.sdk.util.WindowedValue) InMemoryStateInternals(org.apache.beam.runners.core.InMemoryStateInternals) DoFnRunner(org.apache.beam.runners.core.DoFnRunner) Duration(org.joda.time.Duration) RunWith(org.junit.runner.RunWith) StepContext(org.apache.beam.runners.core.StepContext) TimerInternals(org.apache.beam.runners.core.TimerInternals) PipelineOptionsFactory(org.apache.beam.sdk.options.PipelineOptionsFactory) DoFnRunners(org.apache.beam.runners.core.DoFnRunners) GroupAlsoByWindowFnRunner(org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner) TupleTag(org.apache.beam.sdk.values.TupleTag) MatcherAssert.assertThat(org.hamcrest.MatcherAssert.assertThat) StateInternals(org.apache.beam.runners.core.StateInternals) ExpectedException(org.junit.rules.ExpectedException) PaneInfo(org.apache.beam.sdk.transforms.windowing.PaneInfo) NullSideInputReader(org.apache.beam.runners.core.NullSideInputReader) GroupAlsoByWindowDoFnFactory(org.apache.beam.runners.dataflow.worker.util.GroupAlsoByWindowProperties.GroupAlsoByWindowDoFnFactory) FixedWindows(org.apache.beam.sdk.transforms.windowing.FixedWindows) Test(org.junit.Test) JUnit4(org.junit.runners.JUnit4) List(java.util.List) Rule(org.junit.Rule) Matchers.contains(org.hamcrest.Matchers.contains) Matchers.equalTo(org.hamcrest.Matchers.equalTo) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) Instant(org.joda.time.Instant) IntervalWindow(org.apache.beam.sdk.transforms.windowing.IntervalWindow) WindowingStrategy(org.apache.beam.sdk.values.WindowingStrategy) StepContext(org.apache.beam.runners.core.StepContext) GroupAlsoByWindowFnRunner(org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner)

Aggregations

WindowingStrategy (org.apache.beam.sdk.values.WindowingStrategy)36 WindowedValue (org.apache.beam.sdk.util.WindowedValue)25 BoundedWindow (org.apache.beam.sdk.transforms.windowing.BoundedWindow)21 KV (org.apache.beam.sdk.values.KV)19 KvCoder (org.apache.beam.sdk.coders.KvCoder)17 Coder (org.apache.beam.sdk.coders.Coder)16 List (java.util.List)15 TupleTag (org.apache.beam.sdk.values.TupleTag)14 Instant (org.joda.time.Instant)13 Test (org.junit.Test)13 PCollection (org.apache.beam.sdk.values.PCollection)11 ArrayList (java.util.ArrayList)10 HashMap (java.util.HashMap)9 Map (java.util.Map)9 SerializablePipelineOptions (org.apache.beam.runners.core.construction.SerializablePipelineOptions)9 IntervalWindow (org.apache.beam.sdk.transforms.windowing.IntervalWindow)9 Duration (org.joda.time.Duration)9 IOException (java.io.IOException)8 Collectors (java.util.stream.Collectors)8 StringUtf8Coder (org.apache.beam.sdk.coders.StringUtf8Coder)8