Examples with IngestSegmentFirehoseFactory - org.apache.druid.indexing.firehose.IngestSegmentFirehoseFactory

Example 1 with IngestSegmentFirehoseFactory

use of org.apache.druid.indexing.firehose.IngestSegmentFirehoseFactory in project druid by druid-io.

the class CompactionTaskRunTest method testRunRegularIndexTaskWithIngestSegmentFirehose.

/**
 * Run a regular index task that's equivalent to the compaction task in {@link #testRunWithDynamicPartitioning()},
 * using {@link IngestSegmentFirehoseFactory}.
 *
 * This is not entirely CompactionTask related, but it's similar conceptually and it requires
 * similar setup to what this test suite already has.
 *
 * It could be moved to a separate test class if needed.
 */
@Test
public void testRunRegularIndexTaskWithIngestSegmentFirehose() throws Exception {
    runIndexTask();
    IndexTask indexTask = new IndexTask(null, null, new IndexTask.IndexIngestionSpec(new DataSchema("test", getObjectMapper().convertValue(new StringInputRowParser(DEFAULT_PARSE_SPEC, null), Map.class), new AggregatorFactory[] { new LongSumAggregatorFactory("val", "val") }, new UniformGranularitySpec(Granularities.HOUR, Granularities.MINUTE, null), null, getObjectMapper()), new IndexTask.IndexIOConfig(new IngestSegmentFirehoseFactory(DATA_SOURCE, Intervals.of("2014-01-01/2014-01-02"), null, null, null, null, null, getIndexIO(), coordinatorClient, segmentCacheManagerFactory, RETRY_POLICY_FACTORY), false, false), IndexTaskTest.createTuningConfig(5000000, null, null, Long.MAX_VALUE, null, false, true)), null);
    // This is a regular index so we need to explicitly add this context to store the CompactionState
    indexTask.addToContext(Tasks.STORE_COMPACTION_STATE_KEY, true);
    final Pair<TaskStatus, List<DataSegment>> resultPair = runTask(indexTask);
    Assert.assertTrue(resultPair.lhs.isSuccess());
    final List<DataSegment> segments = resultPair.rhs;
    Assert.assertEquals(3, segments.size());
    for (int i = 0; i < 3; i++) {
        Assert.assertEquals(Intervals.of("2014-01-01T0%d:00:00/2014-01-01T0%d:00:00", i, i + 1), segments.get(i).getInterval());
        Assert.assertEquals(getDefaultCompactionState(Granularities.HOUR, Granularities.MINUTE, ImmutableList.of()), segments.get(i).getLastCompactionState());
        if (lockGranularity == LockGranularity.SEGMENT) {
            Assert.assertEquals(new NumberedOverwriteShardSpec(32768, 0, 2, (short) 1, (short) 1), segments.get(i).getShardSpec());
        } else {
            Assert.assertEquals(new NumberedShardSpec(0, 1), segments.get(i).getShardSpec());
        }
    }
}

Also used : LongSumAggregatorFactory(org.apache.druid.query.aggregation.LongSumAggregatorFactory) TaskStatus(org.apache.druid.indexer.TaskStatus) DataSegment(org.apache.druid.timeline.DataSegment) DataSchema(org.apache.druid.segment.indexing.DataSchema) IngestSegmentFirehoseFactory(org.apache.druid.indexing.firehose.IngestSegmentFirehoseFactory) UniformGranularitySpec(org.apache.druid.segment.indexing.granularity.UniformGranularitySpec) StringInputRowParser(org.apache.druid.data.input.impl.StringInputRowParser) ArrayList(java.util.ArrayList) List(java.util.List) ImmutableList(com.google.common.collect.ImmutableList) NumberedOverwriteShardSpec(org.apache.druid.timeline.partition.NumberedOverwriteShardSpec) Map(java.util.Map) HashMap(java.util.HashMap) HashBasedNumberedShardSpec(org.apache.druid.timeline.partition.HashBasedNumberedShardSpec) NumberedShardSpec(org.apache.druid.timeline.partition.NumberedShardSpec) Test(org.junit.Test)

Example 2 with IngestSegmentFirehoseFactory

use of org.apache.druid.indexing.firehose.IngestSegmentFirehoseFactory in project druid by druid-io.

the class AbstractBatchIndexTask method findInputSegments.

/**
 * If the given firehoseFactory is {@link IngestSegmentFirehoseFactory}, then it finds the segments to lock
 * from the firehoseFactory. This is because those segments will be read by this task no matter what segments would be
 * filtered by intervalsToRead, so they need to be locked.
 * <p>
 * However, firehoseFactory is not IngestSegmentFirehoseFactory, it means this task will overwrite some segments
 * with data read from some input source outside of Druid. As a result, only the segments falling in intervalsToRead
 * should be locked.
 * <p>
 * The order of segments within the returned list is unspecified, but each segment is guaranteed to appear in the list
 * only once.
 */
protected static List<DataSegment> findInputSegments(String dataSource, TaskActionClient actionClient, List<Interval> intervalsToRead, FirehoseFactory firehoseFactory) throws IOException {
    if (firehoseFactory instanceof IngestSegmentFirehoseFactory) {
        // intervalsToRead is ignored here.
        final List<WindowedSegmentId> inputSegments = ((IngestSegmentFirehoseFactory) firehoseFactory).getSegments();
        if (inputSegments == null) {
            final Interval inputInterval = Preconditions.checkNotNull(((IngestSegmentFirehoseFactory) firehoseFactory).getInterval(), "input interval");
            return ImmutableList.copyOf(actionClient.submit(new RetrieveUsedSegmentsAction(dataSource, inputInterval, null, Segments.ONLY_VISIBLE)));
        } else {
            final List<String> inputSegmentIds = inputSegments.stream().map(WindowedSegmentId::getSegmentId).collect(Collectors.toList());
            final Collection<DataSegment> dataSegmentsInIntervals = actionClient.submit(new RetrieveUsedSegmentsAction(dataSource, null, inputSegments.stream().flatMap(windowedSegmentId -> windowedSegmentId.getIntervals().stream()).collect(Collectors.toSet()), Segments.ONLY_VISIBLE));
            return dataSegmentsInIntervals.stream().filter(segment -> inputSegmentIds.contains(segment.getId().toString())).collect(Collectors.toList());
        }
    } else {
        return ImmutableList.copyOf(actionClient.submit(new RetrieveUsedSegmentsAction(dataSource, null, intervalsToRead, Segments.ONLY_VISIBLE)));
    }
}

Also used : IngestSegmentFirehoseFactory(org.apache.druid.indexing.firehose.IngestSegmentFirehoseFactory) TaskToolbox(org.apache.druid.indexing.common.TaskToolbox) TaskConfig(org.apache.druid.indexing.common.config.TaskConfig) IndexIOConfig(org.apache.druid.indexing.common.task.IndexTask.IndexIOConfig) Partitions(org.apache.druid.timeline.Partitions) CompactionState(org.apache.druid.timeline.CompactionState) JodaUtils(org.apache.druid.java.util.common.JodaUtils) TaskActionClient(org.apache.druid.indexing.common.actions.TaskActionClient) Optional(com.google.common.base.Optional) Map(java.util.Map) InputSourceReader(org.apache.druid.data.input.InputSourceReader) TaskLock(org.apache.druid.indexing.common.TaskLock) TypeReference(com.fasterxml.jackson.core.type.TypeReference) Execs(org.apache.druid.java.util.common.concurrent.Execs) InputRowSchemas(org.apache.druid.indexing.input.InputRowSchemas) Predicate(java.util.function.Predicate) GuardedBy(com.google.errorprone.annotations.concurrent.GuardedBy) InputFormat(org.apache.druid.data.input.InputFormat) Collection(java.util.Collection) Segments(org.apache.druid.indexing.overlord.Segments) StringUtils(org.apache.druid.java.util.common.StringUtils) Set(java.util.Set) ISE(org.apache.druid.java.util.common.ISE) IngestionSpec(org.apache.druid.segment.indexing.IngestionSpec) Collectors(java.util.stream.Collectors) LockGranularity(org.apache.druid.indexing.common.LockGranularity) MaxAllowedLocksExceededException(org.apache.druid.indexing.common.task.batch.MaxAllowedLocksExceededException) IngestSegmentFirehoseFactory(org.apache.druid.indexing.firehose.IngestSegmentFirehoseFactory) InputRow(org.apache.druid.data.input.InputRow) CountDownLatch(java.util.concurrent.CountDownLatch) List(java.util.List) TaskLockType(org.apache.druid.indexing.common.TaskLockType) IndexTuningConfig(org.apache.druid.indexing.common.task.IndexTask.IndexTuningConfig) DataSegment(org.apache.druid.timeline.DataSegment) TransformSpec(org.apache.druid.segment.transform.TransformSpec) Logger(org.apache.druid.java.util.common.logger.Logger) Granularity(org.apache.druid.java.util.common.granularity.Granularity) ParseExceptionHandler(org.apache.druid.segment.incremental.ParseExceptionHandler) IntervalsByGranularity(org.apache.druid.java.util.common.granularity.IntervalsByGranularity) HashBasedNumberedShardSpec(org.apache.druid.timeline.partition.HashBasedNumberedShardSpec) TimeChunkLockTryAcquireAction(org.apache.druid.indexing.common.actions.TimeChunkLockTryAcquireAction) RowIngestionMeters(org.apache.druid.segment.incremental.RowIngestionMeters) Function(java.util.function.Function) TaskStatus(org.apache.druid.indexer.TaskStatus) TuningConfig(org.apache.druid.segment.indexing.TuningConfig) ArrayList(java.util.ArrayList) HashSet(java.util.HashSet) InputSource(org.apache.druid.data.input.InputSource) Interval(org.joda.time.Interval) ImmutableList(com.google.common.collect.ImmutableList) WindowedSegmentId(org.apache.druid.indexing.firehose.WindowedSegmentId) BiConsumer(java.util.function.BiConsumer) SegmentHandoffNotifier(org.apache.druid.segment.handoff.SegmentHandoffNotifier) Nullable(javax.annotation.Nullable) ExecutorService(java.util.concurrent.ExecutorService) ClientCompactionTaskTransformSpec(org.apache.druid.client.indexing.ClientCompactionTaskTransformSpec) Period(org.joda.time.Period) FirehoseFactory(org.apache.druid.data.input.FirehoseFactory) VersionedIntervalTimeline(org.apache.druid.timeline.VersionedIntervalTimeline) Iterator(java.util.Iterator) ServiceMetricEvent(org.apache.druid.java.util.emitter.service.ServiceMetricEvent) GranularitySpec(org.apache.druid.segment.indexing.granularity.GranularitySpec) DimensionsSpec(org.apache.druid.data.input.impl.DimensionsSpec) IOException(java.io.IOException) File(java.io.File) TimeUnit(java.util.concurrent.TimeUnit) Consumer(java.util.function.Consumer) GranularityType(org.apache.druid.java.util.common.granularity.GranularityType) Preconditions(com.google.common.base.Preconditions) RetrieveUsedSegmentsAction(org.apache.druid.indexing.common.actions.RetrieveUsedSegmentsAction) SegmentDescriptor(org.apache.druid.query.SegmentDescriptor) DataSchema(org.apache.druid.segment.indexing.DataSchema) Collections(java.util.Collections) RetrieveUsedSegmentsAction(org.apache.druid.indexing.common.actions.RetrieveUsedSegmentsAction) WindowedSegmentId(org.apache.druid.indexing.firehose.WindowedSegmentId) DataSegment(org.apache.druid.timeline.DataSegment) Interval(org.joda.time.Interval)

Aggregations

ImmutableList (com.google.common.collect.ImmutableList)2 ArrayList (java.util.ArrayList)2 List (java.util.List)2 Map (java.util.Map)2 TaskStatus (org.apache.druid.indexer.TaskStatus)2 IngestSegmentFirehoseFactory (org.apache.druid.indexing.firehose.IngestSegmentFirehoseFactory)2 DataSchema (org.apache.druid.segment.indexing.DataSchema)2 DataSegment (org.apache.druid.timeline.DataSegment)2 HashBasedNumberedShardSpec (org.apache.druid.timeline.partition.HashBasedNumberedShardSpec)2 TypeReference (com.fasterxml.jackson.core.type.TypeReference)1 Optional (com.google.common.base.Optional)1 Preconditions (com.google.common.base.Preconditions)1 GuardedBy (com.google.errorprone.annotations.concurrent.GuardedBy)1 File (java.io.File)1 IOException (java.io.IOException)1 Collection (java.util.Collection)1 Collections (java.util.Collections)1 HashMap (java.util.HashMap)1 HashSet (java.util.HashSet)1 Iterator (java.util.Iterator)1