Search in sources :

Example 51 with PCollectionView

use of org.apache.beam.sdk.values.PCollectionView in project beam by apache.

the class FetchAndFilterStreamingSideInputsOperation method buildPCollectionViewsWithSdkSupportedWindowMappingFn.

private Iterable<PCollectionView<?>> buildPCollectionViewsWithSdkSupportedWindowMappingFn(IdGenerator idGenerator, InstructionRequestHandler instructionRequestHandler, FnDataService beamFnDataService, ApiServiceDescriptor dataServiceApiServiceDescriptor, Coder<BoundedWindow> mainInputWindowCoder, Map<PCollectionView<?>, RunnerApi.FunctionSpec> pCollectionViewsToWindowMappingFns) {
    ImmutableList.Builder<PCollectionView<?>> wrappedViews = ImmutableList.builder();
    for (Map.Entry<PCollectionView<?>, RunnerApi.FunctionSpec> entry : pCollectionViewsToWindowMappingFns.entrySet()) {
        WindowMappingFn windowMappingFn = new FnApiWindowMappingFn(idGenerator, instructionRequestHandler, dataServiceApiServiceDescriptor, beamFnDataService, entry.getValue(), mainInputWindowCoder, entry.getKey().getWindowingStrategyInternal().getWindowFn().windowCoder());
        wrappedViews.add(new ForwardingPCollectionView<Materializations.MultimapView>((PCollectionView) entry.getKey()) {

            @Override
            public WindowMappingFn<?> getWindowMappingFn() {
                return windowMappingFn;
            }
        });
    }
    return wrappedViews.build();
}
Also used : PCollectionView(org.apache.beam.sdk.values.PCollectionView) WindowMappingFn(org.apache.beam.sdk.transforms.windowing.WindowMappingFn) ImmutableList(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList) Map(java.util.Map)

Example 52 with PCollectionView

use of org.apache.beam.sdk.values.PCollectionView in project beam by apache.

the class PartialGroupByKeyParDoFnsTest method testCreateWithCombinerAndBatchSideInputs.

@Test
public void testCreateWithCombinerAndBatchSideInputs() throws Exception {
    PipelineOptions options = PipelineOptionsFactory.create();
    Coder keyCoder = StringUtf8Coder.of();
    Coder valueCoder = BigEndianIntegerCoder.of();
    KvCoder<String, Integer> kvCoder = KvCoder.of(keyCoder, valueCoder);
    TestOutputReceiver receiver = new TestOutputReceiver(new ElementByteSizeObservableCoder(WindowedValue.getValueOnlyCoder(kvCoder)), counterSet, NameContextsForTests.nameContextForTest());
    StepContext stepContext = BatchModeExecutionContext.forTesting(options, "testStage").getStepContext(TestOperationContext.create(counterSet));
    when(mockSideInputReader.isEmpty()).thenReturn(false);
    ParDoFn pgbk = PartialGroupByKeyParDoFns.create(options, kvCoder, AppliedCombineFn.withInputCoder(Sum.ofIntegers(), CoderRegistry.createDefault(), kvCoder, ImmutableList.<PCollectionView<?>>of(), WindowingStrategy.globalDefault()), mockSideInputReader, receiver, stepContext);
    assertTrue(pgbk instanceof BatchSideInputPGBKParDoFn);
}
Also used : ElementByteSizeObservableCoder(org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.ElementByteSizeObservableCoder) KvCoder(org.apache.beam.sdk.coders.KvCoder) BigEndianIntegerCoder(org.apache.beam.sdk.coders.BigEndianIntegerCoder) Coder(org.apache.beam.sdk.coders.Coder) StringUtf8Coder(org.apache.beam.sdk.coders.StringUtf8Coder) IterableCoder(org.apache.beam.sdk.coders.IterableCoder) PCollectionView(org.apache.beam.sdk.values.PCollectionView) StepContext(org.apache.beam.runners.core.StepContext) PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) ElementByteSizeObservableCoder(org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.ElementByteSizeObservableCoder) BatchSideInputPGBKParDoFn(org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.BatchSideInputPGBKParDoFn) ParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoFn) StreamingSideInputPGBKParDoFn(org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.StreamingSideInputPGBKParDoFn) SimplePartialGroupByKeyParDoFn(org.apache.beam.runners.dataflow.worker.util.common.worker.SimplePartialGroupByKeyParDoFn) TestOutputReceiver(org.apache.beam.runners.dataflow.worker.util.common.worker.TestOutputReceiver) BatchSideInputPGBKParDoFn(org.apache.beam.runners.dataflow.worker.PartialGroupByKeyParDoFns.BatchSideInputPGBKParDoFn) Test(org.junit.Test)

Example 53 with PCollectionView

use of org.apache.beam.sdk.values.PCollectionView in project beam by apache.

the class StreamingSideInputDoFnRunnerTest method testMultipleSideInputs.

@Test
public void testMultipleSideInputs() throws Exception {
    PCollectionView<String> view1 = createView();
    PCollectionView<String> view2 = createView();
    IntervalWindow window = new IntervalWindow(new Instant(0), new Instant(10));
    Windmill.GlobalDataId id = Windmill.GlobalDataId.newBuilder().setTag(view1.getTagInternal().getId()).setVersion(ByteString.copyFrom(CoderUtils.encodeToByteArray(IntervalWindow.getCoder(), window))).build();
    Set<Windmill.GlobalDataRequest> requestSet = new HashSet<>();
    requestSet.add(Windmill.GlobalDataRequest.newBuilder().setDataId(id).build());
    Map<IntervalWindow, Set<Windmill.GlobalDataRequest>> blockedMap = new HashMap<>();
    blockedMap.put(window, requestSet);
    ValueState<Map<IntervalWindow, Set<GlobalDataRequest>>> blockedMapState = state.state(StateNamespaces.global(), StreamingSideInputFetcher.blockedMapAddr(WINDOW_FN.windowCoder()));
    blockedMapState.write(blockedMap);
    when(stepContext.getSideInputNotifications()).thenReturn(Arrays.asList(id));
    when(stepContext.issueSideInputFetch(any(PCollectionView.class), any(BoundedWindow.class), any(SideInputState.class))).thenReturn(true);
    when(execContext.getSideInputReaderForViews(Mockito.<Iterable<? extends PCollectionView<?>>>any())).thenReturn(mockSideInputReader);
    when(mockSideInputReader.contains(eq(view1))).thenReturn(true);
    when(mockSideInputReader.contains(eq(view2))).thenReturn(true);
    when(mockSideInputReader.get(eq(view1), any(BoundedWindow.class))).thenReturn("data1");
    when(mockSideInputReader.get(eq(view2), any(BoundedWindow.class))).thenReturn("data2");
    ListOutputManager outputManager = new ListOutputManager();
    List<PCollectionView<String>> views = Arrays.asList(view1, view2);
    StreamingSideInputFetcher<String, IntervalWindow> sideInputFetcher = createFetcher(views);
    StreamingSideInputDoFnRunner<String, String, IntervalWindow> runner = createRunner(outputManager, views, sideInputFetcher);
    sideInputFetcher.watermarkHold(createWindow(0)).add(new Instant(0));
    sideInputFetcher.elementBag(createWindow(0)).add(createDatum("e1", 0));
    runner.startBundle();
    runner.processElement(createDatum("e2", 2));
    runner.finishBundle();
    assertThat(outputManager.getOutput(mainOutputTag), contains(createDatum("e1:data1:data2", 0), createDatum("e2:data1:data2", 2)));
    assertThat(blockedMapState.read(), Matchers.nullValue());
    assertThat(sideInputFetcher.watermarkHold(createWindow(0)).read(), Matchers.nullValue());
    assertThat(sideInputFetcher.elementBag(createWindow(0)).read(), Matchers.emptyIterable());
}
Also used : Set(java.util.Set) HashSet(java.util.HashSet) HashMap(java.util.HashMap) Instant(org.joda.time.Instant) SideInputState(org.apache.beam.runners.dataflow.worker.StateFetcher.SideInputState) ListOutputManager(org.apache.beam.runners.dataflow.worker.util.ListOutputManager) ByteString(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString) PCollectionView(org.apache.beam.sdk.values.PCollectionView) GlobalDataRequest(org.apache.beam.runners.dataflow.worker.windmill.Windmill.GlobalDataRequest) Windmill(org.apache.beam.runners.dataflow.worker.windmill.Windmill) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) IntervalWindow(org.apache.beam.sdk.transforms.windowing.IntervalWindow) Map(java.util.Map) HashMap(java.util.HashMap) HashSet(java.util.HashSet) Test(org.junit.Test)

Example 54 with PCollectionView

use of org.apache.beam.sdk.values.PCollectionView in project beam by apache.

the class StreamingSideInputDoFnRunnerTest method testSideInputReady.

@Test
public void testSideInputReady() throws Exception {
    PCollectionView<String> view = createView();
    when(stepContext.getSideInputNotifications()).thenReturn(Arrays.<Windmill.GlobalDataId>asList());
    when(stepContext.issueSideInputFetch(eq(view), any(BoundedWindow.class), eq(SideInputState.UNKNOWN))).thenReturn(true);
    when(execContext.getSideInputReaderForViews(Mockito.<Iterable<? extends PCollectionView<?>>>any())).thenReturn(mockSideInputReader);
    when(mockSideInputReader.contains(eq(view))).thenReturn(true);
    when(mockSideInputReader.get(eq(view), any(BoundedWindow.class))).thenReturn("data");
    ListOutputManager outputManager = new ListOutputManager();
    List<PCollectionView<String>> views = Arrays.asList(view);
    StreamingSideInputFetcher<String, IntervalWindow> sideInputFetcher = createFetcher(views);
    StreamingSideInputDoFnRunner<String, String, IntervalWindow> runner = createRunner(outputManager, views, sideInputFetcher);
    runner.startBundle();
    runner.processElement(createDatum("e", 0));
    runner.finishBundle();
    assertThat(outputManager.getOutput(mainOutputTag), contains(createDatum("e:data", 0)));
}
Also used : PCollectionView(org.apache.beam.sdk.values.PCollectionView) Windmill(org.apache.beam.runners.dataflow.worker.windmill.Windmill) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) ListOutputManager(org.apache.beam.runners.dataflow.worker.util.ListOutputManager) ByteString(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString) IntervalWindow(org.apache.beam.sdk.transforms.windowing.IntervalWindow) Test(org.junit.Test)

Example 55 with PCollectionView

use of org.apache.beam.sdk.values.PCollectionView in project beam by apache.

the class StreamingSideInputDoFnRunnerTest method testSideInputNotification.

@Test
public void testSideInputNotification() throws Exception {
    PCollectionView<String> view = createView();
    IntervalWindow window = new IntervalWindow(new Instant(0), new Instant(10));
    Windmill.GlobalDataId id = Windmill.GlobalDataId.newBuilder().setTag(view.getTagInternal().getId()).setVersion(ByteString.copyFrom(CoderUtils.encodeToByteArray(IntervalWindow.getCoder(), window))).build();
    Set<Windmill.GlobalDataRequest> requestSet = new HashSet<>();
    requestSet.add(Windmill.GlobalDataRequest.newBuilder().setDataId(id).build());
    Map<IntervalWindow, Set<Windmill.GlobalDataRequest>> blockedMap = new HashMap<>();
    blockedMap.put(window, requestSet);
    ValueState<Map<IntervalWindow, Set<GlobalDataRequest>>> blockedMapState = state.state(StateNamespaces.global(), StreamingSideInputFetcher.blockedMapAddr(WINDOW_FN.windowCoder()));
    blockedMapState.write(blockedMap);
    ListOutputManager outputManager = new ListOutputManager();
    List<PCollectionView<String>> views = Arrays.asList(view);
    StreamingSideInputFetcher<String, IntervalWindow> sideInputFetcher = createFetcher(views);
    StreamingSideInputDoFnRunner<String, String, IntervalWindow> runner = createRunner(outputManager, views, sideInputFetcher);
    sideInputFetcher.watermarkHold(createWindow(0)).add(new Instant(0));
    sideInputFetcher.elementBag(createWindow(0)).add(createDatum("e", 0));
    when(stepContext.getSideInputNotifications()).thenReturn(Arrays.asList(id));
    when(stepContext.issueSideInputFetch(eq(view), any(BoundedWindow.class), eq(SideInputState.UNKNOWN))).thenReturn(false);
    when(stepContext.issueSideInputFetch(eq(view), any(BoundedWindow.class), eq(SideInputState.KNOWN_READY))).thenReturn(true);
    when(execContext.getSideInputReaderForViews(Mockito.<Iterable<? extends PCollectionView<?>>>any())).thenReturn(mockSideInputReader);
    when(mockSideInputReader.contains(eq(view))).thenReturn(true);
    when(mockSideInputReader.get(eq(view), any(BoundedWindow.class))).thenReturn("data");
    runner.startBundle();
    runner.finishBundle();
    assertThat(outputManager.getOutput(mainOutputTag), contains(createDatum("e:data", 0)));
    assertThat(blockedMapState.read(), Matchers.nullValue());
    assertThat(sideInputFetcher.watermarkHold(createWindow(0)).read(), Matchers.nullValue());
    assertThat(sideInputFetcher.elementBag(createWindow(0)).read(), Matchers.emptyIterable());
}
Also used : Set(java.util.Set) HashSet(java.util.HashSet) HashMap(java.util.HashMap) Instant(org.joda.time.Instant) ListOutputManager(org.apache.beam.runners.dataflow.worker.util.ListOutputManager) ByteString(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString) PCollectionView(org.apache.beam.sdk.values.PCollectionView) GlobalDataRequest(org.apache.beam.runners.dataflow.worker.windmill.Windmill.GlobalDataRequest) Windmill(org.apache.beam.runners.dataflow.worker.windmill.Windmill) BoundedWindow(org.apache.beam.sdk.transforms.windowing.BoundedWindow) IntervalWindow(org.apache.beam.sdk.transforms.windowing.IntervalWindow) Map(java.util.Map) HashMap(java.util.HashMap) HashSet(java.util.HashSet) Test(org.junit.Test)

Aggregations

PCollectionView (org.apache.beam.sdk.values.PCollectionView)67 Map (java.util.Map)29 HashMap (java.util.HashMap)28 Test (org.junit.Test)28 TupleTag (org.apache.beam.sdk.values.TupleTag)27 BoundedWindow (org.apache.beam.sdk.transforms.windowing.BoundedWindow)22 Coder (org.apache.beam.sdk.coders.Coder)21 KV (org.apache.beam.sdk.values.KV)20 Instant (org.joda.time.Instant)20 KvCoder (org.apache.beam.sdk.coders.KvCoder)18 WindowedValue (org.apache.beam.sdk.util.WindowedValue)18 PCollection (org.apache.beam.sdk.values.PCollection)18 DoFn (org.apache.beam.sdk.transforms.DoFn)16 ArrayList (java.util.ArrayList)15 IntervalWindow (org.apache.beam.sdk.transforms.windowing.IntervalWindow)14 List (java.util.List)13 ImmutableMap (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap)13 IOException (java.io.IOException)12 RunnerApi (org.apache.beam.model.pipeline.v1.RunnerApi)12 ByteString (org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString)10