use of com.google.api.services.dataflow.model.SourceOperationResponse in project beam by apache.
the class WorkerCustomSources method performSplitTyped.
private static <T> SourceOperationResponse performSplitTyped(PipelineOptions options, BoundedSource<T> source, long desiredBundleSizeBytes, int numBundlesLimit, long apiByteLimit) throws Exception {
// Try to split normally
List<BoundedSource<T>> bundles = splitAndValidate(source, desiredBundleSizeBytes, options);
// If serialized size is too big, try splitting with a proportionally larger desiredBundleSize
// to reduce the oversplitting.
long serializedSize = DataflowApiUtils.computeSerializedSizeBytes(wrapIntoSourceSplitResponse(bundles));
// If split response is too large, scale desired size for expected DATAFLOW_API_SIZE_BYTES/2.
if (serializedSize > apiByteLimit) {
double expansion = 2 * (double) serializedSize / apiByteLimit;
long expandedBundleSizeBytes = (long) (desiredBundleSizeBytes * expansion);
LOG.warn("Splitting source {} into bundles of estimated size {} bytes produced {} bundles, which" + " have total serialized size {} bytes. As this is too large for the Google Cloud" + " Dataflow API, retrying splitting once with increased desiredBundleSizeBytes {}" + " to reduce the number of splits.", source, desiredBundleSizeBytes, bundles.size(), serializedSize, expandedBundleSizeBytes);
desiredBundleSizeBytes = expandedBundleSizeBytes;
bundles = splitAndValidate(source, desiredBundleSizeBytes, options);
serializedSize = DataflowApiUtils.computeSerializedSizeBytes(wrapIntoSourceSplitResponse(bundles));
LOG.info("Splitting with desiredBundleSizeBytes {} produced {} bundles " + "with total serialized size {} bytes", desiredBundleSizeBytes, bundles.size(), serializedSize);
}
int numBundlesBeforeRebundling = bundles.size();
// the sources into numBundlesLimit compressed serialized bundles.
if (bundles.size() > numBundlesLimit) {
LOG.warn("Splitting source {} into bundles of estimated size {} bytes produced {} bundles. " + "Rebundling into {} bundles.", source, desiredBundleSizeBytes, bundles.size(), numBundlesLimit);
bundles = limitNumberOfBundles(bundles, numBundlesLimit);
}
SourceOperationResponse response = new SourceOperationResponse().setSplit(wrapIntoSourceSplitResponse(bundles));
long finalResponseSize = DataflowApiUtils.computeSerializedSizeBytes(response);
LOG.info("Splitting source {} produced {} bundles with total serialized response size {}", source, bundles.size(), finalResponseSize);
if (finalResponseSize > apiByteLimit) {
String message = String.format("Total size of the BoundedSource objects generated by split() operation is larger " + "than the allowable limit. When splitting %s into bundles of %d bytes " + "it generated %d BoundedSource objects with total serialized size of %d bytes " + "which is larger than the limit %d. " + "For more information, please check the corresponding FAQ entry at " + "https://cloud.google.com/dataflow/pipelines/troubleshooting-your-pipeline", source, desiredBundleSizeBytes, numBundlesBeforeRebundling, finalResponseSize, apiByteLimit);
throw new IllegalArgumentException(message);
}
return response;
}
use of com.google.api.services.dataflow.model.SourceOperationResponse in project beam by apache.
the class WorkItemStatusClient method reportSuccess.
/**
* Return the {@link WorkItemServiceState} resulting from sending a success completion status.
*/
public synchronized WorkItemServiceState reportSuccess() throws IOException {
checkState(!finalStateSent, "cannot reportSuccess after sending a final state");
checkState(worker != null, "setWorker should be called before reportSuccess");
WorkItemStatus status = createStatusUpdate(true);
if (worker instanceof SourceOperationExecutor) {
// TODO: Find out a generic way for the DataflowWorkExecutor to report work-specific results
// into the work update.
SourceOperationResponse response = ((SourceOperationExecutor) worker).getResponse();
if (response != null) {
status.setSourceOperationResponse(response);
}
}
LOG.info("Success processing work item {}", uniqueWorkId());
return execute(status);
}
use of com.google.api.services.dataflow.model.SourceOperationResponse in project beam by apache.
the class NoOpSourceOperationExecutorTest method testNoOpSourceOperationExecutor.
@Test
public void testNoOpSourceOperationExecutor() throws Exception {
executor.execute();
SourceOperationResponse response = executor.getResponse();
assertEquals("SOURCE_SPLIT_OUTCOME_USE_CURRENT", response.getSplit().getOutcome());
}
use of com.google.api.services.dataflow.model.SourceOperationResponse in project beam by apache.
the class WorkerCustomSourcesTest method performSplit.
static SourceSplitResponse performSplit(com.google.api.services.dataflow.model.Source source, PipelineOptions options, @Nullable Long desiredBundleSizeBytes, @Nullable Integer numBundlesLimitForTest, @Nullable Long apiByteLimitForTest) throws Exception {
SourceSplitRequest splitRequest = new SourceSplitRequest();
splitRequest.setSource(source);
if (desiredBundleSizeBytes != null) {
splitRequest.setOptions(new SourceSplitOptions().setDesiredBundleSizeBytes(desiredBundleSizeBytes));
}
SourceOperationResponse response = WorkerCustomSources.performSplitWithApiLimit(splitRequest, options, MoreObjects.firstNonNull(numBundlesLimitForTest, WorkerCustomSources.DEFAULT_NUM_BUNDLES_LIMIT), MoreObjects.firstNonNull(apiByteLimitForTest, WorkerCustomSources.DATAFLOW_SPLIT_RESPONSE_API_SIZE_LIMIT));
return response.getSplit();
}
Aggregations