Search in sources :

Example 1 with SegmentAllocatorForBatch

use of org.apache.druid.indexing.common.task.SegmentAllocatorForBatch in project druid by druid-io.

the class PartialSegmentGenerateTask method generateSegments.

private List<DataSegment> generateSegments(final TaskToolbox toolbox, final ParallelIndexSupervisorTaskClient taskClient, final InputSource inputSource, final File tmpDir) throws IOException, InterruptedException, ExecutionException, TimeoutException {
    final DataSchema dataSchema = ingestionSchema.getDataSchema();
    final FireDepartment fireDepartmentForMetrics = new FireDepartment(dataSchema, new RealtimeIOConfig(null, null), null);
    final FireDepartmentMetrics fireDepartmentMetrics = fireDepartmentForMetrics.getMetrics();
    final RowIngestionMeters buildSegmentsMeters = toolbox.getRowIngestionMetersFactory().createRowIngestionMeters();
    toolbox.addMonitor(new RealtimeMetricsMonitor(Collections.singletonList(fireDepartmentForMetrics), Collections.singletonMap(DruidMetrics.TASK_ID, new String[] { getId() })));
    final ParallelIndexTuningConfig tuningConfig = ingestionSchema.getTuningConfig();
    final PartitionsSpec partitionsSpec = tuningConfig.getGivenOrDefaultPartitionsSpec();
    final long pushTimeout = tuningConfig.getPushTimeout();
    final SegmentAllocatorForBatch segmentAllocator = createSegmentAllocator(toolbox, taskClient);
    final SequenceNameFunction sequenceNameFunction = segmentAllocator.getSequenceNameFunction();
    final ParseExceptionHandler parseExceptionHandler = new ParseExceptionHandler(buildSegmentsMeters, tuningConfig.isLogParseExceptions(), tuningConfig.getMaxParseExceptions(), tuningConfig.getMaxSavedParseExceptions());
    final boolean useMaxMemoryEstimates = getContextValue(Tasks.USE_MAX_MEMORY_ESTIMATES, Tasks.DEFAULT_USE_MAX_MEMORY_ESTIMATES);
    final Appenderator appenderator = BatchAppenderators.newAppenderator(getId(), toolbox.getAppenderatorsManager(), fireDepartmentMetrics, toolbox, dataSchema, tuningConfig, new ShuffleDataSegmentPusher(supervisorTaskId, getId(), toolbox.getIntermediaryDataManager()), buildSegmentsMeters, parseExceptionHandler, useMaxMemoryEstimates);
    boolean exceptionOccurred = false;
    try (final BatchAppenderatorDriver driver = BatchAppenderators.newDriver(appenderator, toolbox, segmentAllocator)) {
        driver.startJob();
        final SegmentsAndCommitMetadata pushed = InputSourceProcessor.process(dataSchema, driver, partitionsSpec, inputSource, inputSource.needsFormat() ? ParallelIndexSupervisorTask.getInputFormat(ingestionSchema) : null, tmpDir, sequenceNameFunction, inputRowIteratorBuilder, buildSegmentsMeters, parseExceptionHandler, pushTimeout);
        return pushed.getSegments();
    } catch (Exception e) {
        exceptionOccurred = true;
        throw e;
    } finally {
        if (exceptionOccurred) {
            appenderator.closeNow();
        } else {
            appenderator.close();
        }
    }
}
Also used : RealtimeIOConfig(org.apache.druid.segment.indexing.RealtimeIOConfig) ShuffleDataSegmentPusher(org.apache.druid.indexing.worker.shuffle.ShuffleDataSegmentPusher) SegmentsAndCommitMetadata(org.apache.druid.segment.realtime.appenderator.SegmentsAndCommitMetadata) BatchAppenderatorDriver(org.apache.druid.segment.realtime.appenderator.BatchAppenderatorDriver) TimeoutException(java.util.concurrent.TimeoutException) IOException(java.io.IOException) ExecutionException(java.util.concurrent.ExecutionException) DataSchema(org.apache.druid.segment.indexing.DataSchema) FireDepartment(org.apache.druid.segment.realtime.FireDepartment) FireDepartmentMetrics(org.apache.druid.segment.realtime.FireDepartmentMetrics) SegmentAllocatorForBatch(org.apache.druid.indexing.common.task.SegmentAllocatorForBatch) Appenderator(org.apache.druid.segment.realtime.appenderator.Appenderator) PartitionsSpec(org.apache.druid.indexer.partitions.PartitionsSpec) ParseExceptionHandler(org.apache.druid.segment.incremental.ParseExceptionHandler) RealtimeMetricsMonitor(org.apache.druid.segment.realtime.RealtimeMetricsMonitor) SequenceNameFunction(org.apache.druid.indexing.common.task.SequenceNameFunction) RowIngestionMeters(org.apache.druid.segment.incremental.RowIngestionMeters)

Example 2 with SegmentAllocatorForBatch

use of org.apache.druid.indexing.common.task.SegmentAllocatorForBatch in project druid by druid-io.

the class SinglePhaseSubTask method generateAndPushSegments.

/**
 * This method reads input data row by row and adds the read row to a proper segment using {@link BaseAppenderatorDriver}.
 * If there is no segment for the row, a new one is created.  Segments can be published in the middle of reading inputs
 * if one of below conditions are satisfied.
 *
 * <ul>
 * <li>
 * If the number of rows in a segment exceeds {@link DynamicPartitionsSpec#maxRowsPerSegment}
 * </li>
 * <li>
 * If the number of rows added to {@link BaseAppenderatorDriver} so far exceeds {@link DynamicPartitionsSpec#maxTotalRows}
 * </li>
 * </ul>
 * <p>
 * At the end of this method, all the remaining segments are published.
 *
 * @return true if generated segments are successfully published, otherwise false
 */
private Set<DataSegment> generateAndPushSegments(final TaskToolbox toolbox, final ParallelIndexSupervisorTaskClient taskClient, final InputSource inputSource, final File tmpDir) throws IOException, InterruptedException {
    final DataSchema dataSchema = ingestionSchema.getDataSchema();
    final GranularitySpec granularitySpec = dataSchema.getGranularitySpec();
    final FireDepartment fireDepartmentForMetrics = new FireDepartment(dataSchema, new RealtimeIOConfig(null, null), null);
    final FireDepartmentMetrics fireDepartmentMetrics = fireDepartmentForMetrics.getMetrics();
    toolbox.addMonitor(new RealtimeMetricsMonitor(Collections.singletonList(fireDepartmentForMetrics), Collections.singletonMap(DruidMetrics.TASK_ID, new String[] { getId() })));
    final ParallelIndexTuningConfig tuningConfig = ingestionSchema.getTuningConfig();
    final DynamicPartitionsSpec partitionsSpec = (DynamicPartitionsSpec) tuningConfig.getGivenOrDefaultPartitionsSpec();
    final long pushTimeout = tuningConfig.getPushTimeout();
    final boolean explicitIntervals = !granularitySpec.inputIntervals().isEmpty();
    final boolean useLineageBasedSegmentAllocation = getContextValue(SinglePhaseParallelIndexTaskRunner.CTX_USE_LINEAGE_BASED_SEGMENT_ALLOCATION_KEY, SinglePhaseParallelIndexTaskRunner.LEGACY_DEFAULT_USE_LINEAGE_BASED_SEGMENT_ALLOCATION);
    // subtaskSpecId is used as the sequenceName, so that retry tasks for the same spec
    // can allocate the same set of segments.
    final String sequenceName = useLineageBasedSegmentAllocation ? Preconditions.checkNotNull(subtaskSpecId, "subtaskSpecId") : getId();
    final SegmentAllocatorForBatch segmentAllocator = SegmentAllocators.forLinearPartitioning(toolbox, sequenceName, new SupervisorTaskAccess(getSupervisorTaskId(), taskClient), getIngestionSchema().getDataSchema(), getTaskLockHelper(), ingestionSchema.getIOConfig().isAppendToExisting(), partitionsSpec, useLineageBasedSegmentAllocation);
    final boolean useMaxMemoryEstimates = getContextValue(Tasks.USE_MAX_MEMORY_ESTIMATES, Tasks.DEFAULT_USE_MAX_MEMORY_ESTIMATES);
    final Appenderator appenderator = BatchAppenderators.newAppenderator(getId(), toolbox.getAppenderatorsManager(), fireDepartmentMetrics, toolbox, dataSchema, tuningConfig, rowIngestionMeters, parseExceptionHandler, useMaxMemoryEstimates);
    boolean exceptionOccurred = false;
    try (final BatchAppenderatorDriver driver = BatchAppenderators.newDriver(appenderator, toolbox, segmentAllocator);
        final CloseableIterator<InputRow> inputRowIterator = AbstractBatchIndexTask.inputSourceReader(tmpDir, dataSchema, inputSource, inputSource.needsFormat() ? ParallelIndexSupervisorTask.getInputFormat(ingestionSchema) : null, inputRow -> {
            if (inputRow == null) {
                return false;
            }
            if (explicitIntervals) {
                final Optional<Interval> optInterval = granularitySpec.bucketInterval(inputRow.getTimestamp());
                return optInterval.isPresent();
            }
            return true;
        }, rowIngestionMeters, parseExceptionHandler)) {
        driver.startJob();
        final Set<DataSegment> pushedSegments = new HashSet<>();
        while (inputRowIterator.hasNext()) {
            final InputRow inputRow = inputRowIterator.next();
            // Segments are created as needed, using a single sequence name. They may be allocated from the overlord
            // (in append mode) or may be created on our own authority (in overwrite mode).
            final AppenderatorDriverAddResult addResult = driver.add(inputRow, sequenceName);
            if (addResult.isOk()) {
                final boolean isPushRequired = addResult.isPushRequired(partitionsSpec.getMaxRowsPerSegment(), partitionsSpec.getMaxTotalRowsOr(DynamicPartitionsSpec.DEFAULT_MAX_TOTAL_ROWS));
                if (isPushRequired) {
                    // There can be some segments waiting for being published even though any rows won't be added to them.
                    // If those segments are not published here, the available space in appenderator will be kept to be small
                    // which makes the size of segments smaller.
                    final SegmentsAndCommitMetadata pushed = driver.pushAllAndClear(pushTimeout);
                    pushedSegments.addAll(pushed.getSegments());
                    LOG.info("Pushed [%s] segments", pushed.getSegments().size());
                    LOG.infoSegments(pushed.getSegments(), "Pushed segments");
                }
            } else {
                throw new ISE("Failed to add a row with timestamp[%s]", inputRow.getTimestamp());
            }
            fireDepartmentMetrics.incrementProcessed();
        }
        final SegmentsAndCommitMetadata pushed = driver.pushAllAndClear(pushTimeout);
        pushedSegments.addAll(pushed.getSegments());
        LOG.info("Pushed [%s] segments", pushed.getSegments().size());
        LOG.infoSegments(pushed.getSegments(), "Pushed segments");
        appenderator.close();
        return pushedSegments;
    } catch (TimeoutException | ExecutionException e) {
        exceptionOccurred = true;
        throw new RuntimeException(e);
    } catch (Exception e) {
        exceptionOccurred = true;
        throw e;
    } finally {
        if (exceptionOccurred) {
            appenderator.closeNow();
        } else {
            appenderator.close();
        }
    }
}
Also used : RealtimeIOConfig(org.apache.druid.segment.indexing.RealtimeIOConfig) SegmentsAndCommitMetadata(org.apache.druid.segment.realtime.appenderator.SegmentsAndCommitMetadata) DataSegment(org.apache.druid.timeline.DataSegment) FireDepartment(org.apache.druid.segment.realtime.FireDepartment) ISE(org.apache.druid.java.util.common.ISE) ExecutionException(java.util.concurrent.ExecutionException) HashSet(java.util.HashSet) TimeoutException(java.util.concurrent.TimeoutException) BatchAppenderatorDriver(org.apache.druid.segment.realtime.appenderator.BatchAppenderatorDriver) AppenderatorDriverAddResult(org.apache.druid.segment.realtime.appenderator.AppenderatorDriverAddResult) TimeoutException(java.util.concurrent.TimeoutException) IOException(java.io.IOException) ExecutionException(java.util.concurrent.ExecutionException) DataSchema(org.apache.druid.segment.indexing.DataSchema) FireDepartmentMetrics(org.apache.druid.segment.realtime.FireDepartmentMetrics) DynamicPartitionsSpec(org.apache.druid.indexer.partitions.DynamicPartitionsSpec) SegmentAllocatorForBatch(org.apache.druid.indexing.common.task.SegmentAllocatorForBatch) Appenderator(org.apache.druid.segment.realtime.appenderator.Appenderator) ArbitraryGranularitySpec(org.apache.druid.segment.indexing.granularity.ArbitraryGranularitySpec) GranularitySpec(org.apache.druid.segment.indexing.granularity.GranularitySpec) InputRow(org.apache.druid.data.input.InputRow) RealtimeMetricsMonitor(org.apache.druid.segment.realtime.RealtimeMetricsMonitor) Interval(org.joda.time.Interval)

Aggregations

IOException (java.io.IOException)2 ExecutionException (java.util.concurrent.ExecutionException)2 TimeoutException (java.util.concurrent.TimeoutException)2 SegmentAllocatorForBatch (org.apache.druid.indexing.common.task.SegmentAllocatorForBatch)2 DataSchema (org.apache.druid.segment.indexing.DataSchema)2 RealtimeIOConfig (org.apache.druid.segment.indexing.RealtimeIOConfig)2 FireDepartment (org.apache.druid.segment.realtime.FireDepartment)2 FireDepartmentMetrics (org.apache.druid.segment.realtime.FireDepartmentMetrics)2 RealtimeMetricsMonitor (org.apache.druid.segment.realtime.RealtimeMetricsMonitor)2 Appenderator (org.apache.druid.segment.realtime.appenderator.Appenderator)2 BatchAppenderatorDriver (org.apache.druid.segment.realtime.appenderator.BatchAppenderatorDriver)2 SegmentsAndCommitMetadata (org.apache.druid.segment.realtime.appenderator.SegmentsAndCommitMetadata)2 HashSet (java.util.HashSet)1 InputRow (org.apache.druid.data.input.InputRow)1 DynamicPartitionsSpec (org.apache.druid.indexer.partitions.DynamicPartitionsSpec)1 PartitionsSpec (org.apache.druid.indexer.partitions.PartitionsSpec)1 SequenceNameFunction (org.apache.druid.indexing.common.task.SequenceNameFunction)1 ShuffleDataSegmentPusher (org.apache.druid.indexing.worker.shuffle.ShuffleDataSegmentPusher)1 ISE (org.apache.druid.java.util.common.ISE)1 ParseExceptionHandler (org.apache.druid.segment.incremental.ParseExceptionHandler)1