Search in sources :

Example 1 with SourceOperationResponse

use of com.google.api.services.dataflow.model.SourceOperationResponse in project beam by apache.

the class WorkerCustomSources method performSplitTyped.

private static <T> SourceOperationResponse performSplitTyped(PipelineOptions options, BoundedSource<T> source, long desiredBundleSizeBytes, int numBundlesLimit, long apiByteLimit) throws Exception {
    // Try to split normally
    List<BoundedSource<T>> bundles = splitAndValidate(source, desiredBundleSizeBytes, options);
    // If serialized size is too big, try splitting with a proportionally larger desiredBundleSize
    // to reduce the oversplitting.
    long serializedSize = DataflowApiUtils.computeSerializedSizeBytes(wrapIntoSourceSplitResponse(bundles));
    // If split response is too large, scale desired size for expected DATAFLOW_API_SIZE_BYTES/2.
    if (serializedSize > apiByteLimit) {
        double expansion = 2 * (double) serializedSize / apiByteLimit;
        long expandedBundleSizeBytes = (long) (desiredBundleSizeBytes * expansion);
        LOG.warn("Splitting source {} into bundles of estimated size {} bytes produced {} bundles, which" + " have total serialized size {} bytes. As this is too large for the Google Cloud" + " Dataflow API, retrying splitting once with increased desiredBundleSizeBytes {}" + " to reduce the number of splits.", source, desiredBundleSizeBytes, bundles.size(), serializedSize, expandedBundleSizeBytes);
        desiredBundleSizeBytes = expandedBundleSizeBytes;
        bundles = splitAndValidate(source, desiredBundleSizeBytes, options);
        serializedSize = DataflowApiUtils.computeSerializedSizeBytes(wrapIntoSourceSplitResponse(bundles));
        LOG.info("Splitting with desiredBundleSizeBytes {} produced {} bundles " + "with total serialized size {} bytes", desiredBundleSizeBytes, bundles.size(), serializedSize);
    }
    int numBundlesBeforeRebundling = bundles.size();
    // the sources into numBundlesLimit compressed serialized bundles.
    if (bundles.size() > numBundlesLimit) {
        LOG.warn("Splitting source {} into bundles of estimated size {} bytes produced {} bundles. " + "Rebundling into {} bundles.", source, desiredBundleSizeBytes, bundles.size(), numBundlesLimit);
        bundles = limitNumberOfBundles(bundles, numBundlesLimit);
    }
    SourceOperationResponse response = new SourceOperationResponse().setSplit(wrapIntoSourceSplitResponse(bundles));
    long finalResponseSize = DataflowApiUtils.computeSerializedSizeBytes(response);
    LOG.info("Splitting source {} produced {} bundles with total serialized response size {}", source, bundles.size(), finalResponseSize);
    if (finalResponseSize > apiByteLimit) {
        String message = String.format("Total size of the BoundedSource objects generated by split() operation is larger " + "than the allowable limit. When splitting %s into bundles of %d bytes " + "it generated %d BoundedSource objects with total serialized size of %d bytes " + "which is larger than the limit %d. " + "For more information, please check the corresponding FAQ entry at " + "https://cloud.google.com/dataflow/pipelines/troubleshooting-your-pipeline", source, desiredBundleSizeBytes, numBundlesBeforeRebundling, finalResponseSize, apiByteLimit);
        throw new IllegalArgumentException(message);
    }
    return response;
}
Also used : BoundedSource(org.apache.beam.sdk.io.BoundedSource) Structs.getString(org.apache.beam.runners.dataflow.util.Structs.getString) ByteString(org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString) Structs.addString(org.apache.beam.runners.dataflow.util.Structs.addString) Base64.encodeBase64String(com.google.api.client.util.Base64.encodeBase64String) SourceOperationResponse(com.google.api.services.dataflow.model.SourceOperationResponse)

Example 2 with SourceOperationResponse

use of com.google.api.services.dataflow.model.SourceOperationResponse in project beam by apache.

the class WorkItemStatusClient method reportSuccess.

/**
 * Return the {@link WorkItemServiceState} resulting from sending a success completion status.
 */
public synchronized WorkItemServiceState reportSuccess() throws IOException {
    checkState(!finalStateSent, "cannot reportSuccess after sending a final state");
    checkState(worker != null, "setWorker should be called before reportSuccess");
    WorkItemStatus status = createStatusUpdate(true);
    if (worker instanceof SourceOperationExecutor) {
        // TODO: Find out a generic way for the DataflowWorkExecutor to report work-specific results
        // into the work update.
        SourceOperationResponse response = ((SourceOperationExecutor) worker).getResponse();
        if (response != null) {
            status.setSourceOperationResponse(response);
        }
    }
    LOG.info("Success processing work item {}", uniqueWorkId());
    return execute(status);
}
Also used : WorkItemStatus(com.google.api.services.dataflow.model.WorkItemStatus) SourceOperationResponse(com.google.api.services.dataflow.model.SourceOperationResponse)

Example 3 with SourceOperationResponse

use of com.google.api.services.dataflow.model.SourceOperationResponse in project beam by apache.

the class NoOpSourceOperationExecutorTest method testNoOpSourceOperationExecutor.

@Test
public void testNoOpSourceOperationExecutor() throws Exception {
    executor.execute();
    SourceOperationResponse response = executor.getResponse();
    assertEquals("SOURCE_SPLIT_OUTCOME_USE_CURRENT", response.getSplit().getOutcome());
}
Also used : SourceOperationResponse(com.google.api.services.dataflow.model.SourceOperationResponse) Test(org.junit.Test)

Example 4 with SourceOperationResponse

use of com.google.api.services.dataflow.model.SourceOperationResponse in project beam by apache.

the class WorkerCustomSourcesTest method performSplit.

static SourceSplitResponse performSplit(com.google.api.services.dataflow.model.Source source, PipelineOptions options, @Nullable Long desiredBundleSizeBytes, @Nullable Integer numBundlesLimitForTest, @Nullable Long apiByteLimitForTest) throws Exception {
    SourceSplitRequest splitRequest = new SourceSplitRequest();
    splitRequest.setSource(source);
    if (desiredBundleSizeBytes != null) {
        splitRequest.setOptions(new SourceSplitOptions().setDesiredBundleSizeBytes(desiredBundleSizeBytes));
    }
    SourceOperationResponse response = WorkerCustomSources.performSplitWithApiLimit(splitRequest, options, MoreObjects.firstNonNull(numBundlesLimitForTest, WorkerCustomSources.DEFAULT_NUM_BUNDLES_LIMIT), MoreObjects.firstNonNull(apiByteLimitForTest, WorkerCustomSources.DATAFLOW_SPLIT_RESPONSE_API_SIZE_LIMIT));
    return response.getSplit();
}
Also used : SourceSplitOptions(com.google.api.services.dataflow.model.SourceSplitOptions) SourceSplitRequest(com.google.api.services.dataflow.model.SourceSplitRequest) SourceOperationResponse(com.google.api.services.dataflow.model.SourceOperationResponse)

Aggregations

SourceOperationResponse (com.google.api.services.dataflow.model.SourceOperationResponse)4 Base64.encodeBase64String (com.google.api.client.util.Base64.encodeBase64String)1 SourceSplitOptions (com.google.api.services.dataflow.model.SourceSplitOptions)1 SourceSplitRequest (com.google.api.services.dataflow.model.SourceSplitRequest)1 WorkItemStatus (com.google.api.services.dataflow.model.WorkItemStatus)1 Structs.addString (org.apache.beam.runners.dataflow.util.Structs.addString)1 Structs.getString (org.apache.beam.runners.dataflow.util.Structs.getString)1 BoundedSource (org.apache.beam.sdk.io.BoundedSource)1 ByteString (org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.ByteString)1 Test (org.junit.Test)1