Search in sources :

Example 1 with ShuffleDataSegmentPusher

use of org.apache.druid.indexing.worker.shuffle.ShuffleDataSegmentPusher in project druid by druid-io.

the class PartialSegmentGenerateTask method generateSegments.

private List<DataSegment> generateSegments(final TaskToolbox toolbox, final ParallelIndexSupervisorTaskClient taskClient, final InputSource inputSource, final File tmpDir) throws IOException, InterruptedException, ExecutionException, TimeoutException {
    final DataSchema dataSchema = ingestionSchema.getDataSchema();
    final FireDepartment fireDepartmentForMetrics = new FireDepartment(dataSchema, new RealtimeIOConfig(null, null), null);
    final FireDepartmentMetrics fireDepartmentMetrics = fireDepartmentForMetrics.getMetrics();
    final RowIngestionMeters buildSegmentsMeters = toolbox.getRowIngestionMetersFactory().createRowIngestionMeters();
    toolbox.addMonitor(new RealtimeMetricsMonitor(Collections.singletonList(fireDepartmentForMetrics), Collections.singletonMap(DruidMetrics.TASK_ID, new String[] { getId() })));
    final ParallelIndexTuningConfig tuningConfig = ingestionSchema.getTuningConfig();
    final PartitionsSpec partitionsSpec = tuningConfig.getGivenOrDefaultPartitionsSpec();
    final long pushTimeout = tuningConfig.getPushTimeout();
    final SegmentAllocatorForBatch segmentAllocator = createSegmentAllocator(toolbox, taskClient);
    final SequenceNameFunction sequenceNameFunction = segmentAllocator.getSequenceNameFunction();
    final ParseExceptionHandler parseExceptionHandler = new ParseExceptionHandler(buildSegmentsMeters, tuningConfig.isLogParseExceptions(), tuningConfig.getMaxParseExceptions(), tuningConfig.getMaxSavedParseExceptions());
    final boolean useMaxMemoryEstimates = getContextValue(Tasks.USE_MAX_MEMORY_ESTIMATES, Tasks.DEFAULT_USE_MAX_MEMORY_ESTIMATES);
    final Appenderator appenderator = BatchAppenderators.newAppenderator(getId(), toolbox.getAppenderatorsManager(), fireDepartmentMetrics, toolbox, dataSchema, tuningConfig, new ShuffleDataSegmentPusher(supervisorTaskId, getId(), toolbox.getIntermediaryDataManager()), buildSegmentsMeters, parseExceptionHandler, useMaxMemoryEstimates);
    boolean exceptionOccurred = false;
    try (final BatchAppenderatorDriver driver = BatchAppenderators.newDriver(appenderator, toolbox, segmentAllocator)) {
        driver.startJob();
        final SegmentsAndCommitMetadata pushed = InputSourceProcessor.process(dataSchema, driver, partitionsSpec, inputSource, inputSource.needsFormat() ? ParallelIndexSupervisorTask.getInputFormat(ingestionSchema) : null, tmpDir, sequenceNameFunction, inputRowIteratorBuilder, buildSegmentsMeters, parseExceptionHandler, pushTimeout);
        return pushed.getSegments();
    } catch (Exception e) {
        exceptionOccurred = true;
        throw e;
    } finally {
        if (exceptionOccurred) {
            appenderator.closeNow();
        } else {
            appenderator.close();
        }
    }
}
Also used : RealtimeIOConfig(org.apache.druid.segment.indexing.RealtimeIOConfig) ShuffleDataSegmentPusher(org.apache.druid.indexing.worker.shuffle.ShuffleDataSegmentPusher) SegmentsAndCommitMetadata(org.apache.druid.segment.realtime.appenderator.SegmentsAndCommitMetadata) BatchAppenderatorDriver(org.apache.druid.segment.realtime.appenderator.BatchAppenderatorDriver) TimeoutException(java.util.concurrent.TimeoutException) IOException(java.io.IOException) ExecutionException(java.util.concurrent.ExecutionException) DataSchema(org.apache.druid.segment.indexing.DataSchema) FireDepartment(org.apache.druid.segment.realtime.FireDepartment) FireDepartmentMetrics(org.apache.druid.segment.realtime.FireDepartmentMetrics) SegmentAllocatorForBatch(org.apache.druid.indexing.common.task.SegmentAllocatorForBatch) Appenderator(org.apache.druid.segment.realtime.appenderator.Appenderator) PartitionsSpec(org.apache.druid.indexer.partitions.PartitionsSpec) ParseExceptionHandler(org.apache.druid.segment.incremental.ParseExceptionHandler) RealtimeMetricsMonitor(org.apache.druid.segment.realtime.RealtimeMetricsMonitor) SequenceNameFunction(org.apache.druid.indexing.common.task.SequenceNameFunction) RowIngestionMeters(org.apache.druid.segment.incremental.RowIngestionMeters)

Aggregations

IOException (java.io.IOException)1 ExecutionException (java.util.concurrent.ExecutionException)1 TimeoutException (java.util.concurrent.TimeoutException)1 PartitionsSpec (org.apache.druid.indexer.partitions.PartitionsSpec)1 SegmentAllocatorForBatch (org.apache.druid.indexing.common.task.SegmentAllocatorForBatch)1 SequenceNameFunction (org.apache.druid.indexing.common.task.SequenceNameFunction)1 ShuffleDataSegmentPusher (org.apache.druid.indexing.worker.shuffle.ShuffleDataSegmentPusher)1 ParseExceptionHandler (org.apache.druid.segment.incremental.ParseExceptionHandler)1 RowIngestionMeters (org.apache.druid.segment.incremental.RowIngestionMeters)1 DataSchema (org.apache.druid.segment.indexing.DataSchema)1 RealtimeIOConfig (org.apache.druid.segment.indexing.RealtimeIOConfig)1 FireDepartment (org.apache.druid.segment.realtime.FireDepartment)1 FireDepartmentMetrics (org.apache.druid.segment.realtime.FireDepartmentMetrics)1 RealtimeMetricsMonitor (org.apache.druid.segment.realtime.RealtimeMetricsMonitor)1 Appenderator (org.apache.druid.segment.realtime.appenderator.Appenderator)1 BatchAppenderatorDriver (org.apache.druid.segment.realtime.appenderator.BatchAppenderatorDriver)1 SegmentsAndCommitMetadata (org.apache.druid.segment.realtime.appenderator.SegmentsAndCommitMetadata)1