Search in sources :

Example 1 with GranularitySpec

use of org.apache.druid.segment.indexing.granularity.GranularitySpec in project druid by druid-io.

the class DataSchemaTest method testWithDimensionSpec.

@Test
public void testWithDimensionSpec() {
    TimestampSpec tsSpec = Mockito.mock(TimestampSpec.class);
    GranularitySpec gSpec = Mockito.mock(GranularitySpec.class);
    DimensionsSpec oldDimSpec = Mockito.mock(DimensionsSpec.class);
    DimensionsSpec newDimSpec = Mockito.mock(DimensionsSpec.class);
    AggregatorFactory aggFactory = Mockito.mock(AggregatorFactory.class);
    Mockito.when(aggFactory.getName()).thenReturn("myAgg");
    TransformSpec transSpec = Mockito.mock(TransformSpec.class);
    Map<String, Object> parserMap = Mockito.mock(Map.class);
    Mockito.when(newDimSpec.withDimensionExclusions(ArgumentMatchers.any(Set.class))).thenReturn(newDimSpec);
    DataSchema oldSchema = new DataSchema("dataSource", tsSpec, oldDimSpec, new AggregatorFactory[] { aggFactory }, gSpec, transSpec, parserMap, jsonMapper);
    DataSchema newSchema = oldSchema.withDimensionsSpec(newDimSpec);
    Assert.assertSame(oldSchema.getDataSource(), newSchema.getDataSource());
    Assert.assertSame(oldSchema.getTimestampSpec(), newSchema.getTimestampSpec());
    Assert.assertSame(newDimSpec, newSchema.getDimensionsSpec());
    Assert.assertSame(oldSchema.getAggregators(), newSchema.getAggregators());
    Assert.assertSame(oldSchema.getGranularitySpec(), newSchema.getGranularitySpec());
    Assert.assertSame(oldSchema.getTransformSpec(), newSchema.getTransformSpec());
    Assert.assertSame(oldSchema.getParserMap(), newSchema.getParserMap());
}
Also used : ImmutableSet(com.google.common.collect.ImmutableSet) Set(java.util.Set) ArbitraryGranularitySpec(org.apache.druid.segment.indexing.granularity.ArbitraryGranularitySpec) GranularitySpec(org.apache.druid.segment.indexing.granularity.GranularitySpec) TimestampSpec(org.apache.druid.data.input.impl.TimestampSpec) DimensionsSpec(org.apache.druid.data.input.impl.DimensionsSpec) DoubleSumAggregatorFactory(org.apache.druid.query.aggregation.DoubleSumAggregatorFactory) AggregatorFactory(org.apache.druid.query.aggregation.AggregatorFactory) TransformSpec(org.apache.druid.segment.transform.TransformSpec) InitializedNullHandlingTest(org.apache.druid.testing.InitializedNullHandlingTest) Test(org.junit.Test) IdUtilsTest(org.apache.druid.common.utils.IdUtilsTest)

Example 2 with GranularitySpec

use of org.apache.druid.segment.indexing.granularity.GranularitySpec in project druid by druid-io.

the class CompactionTask method createDataSchema.

private static DataSchema createDataSchema(String dataSource, List<NonnullPair<QueryableIndex, DataSegment>> queryableIndexAndSegments, @Nullable DimensionsSpec dimensionsSpec, @Nullable ClientCompactionTaskTransformSpec transformSpec, @Nullable AggregatorFactory[] metricsSpec, @Nonnull ClientCompactionTaskGranularitySpec granularitySpec) {
    // check index metadata &
    // Decide which values to propagate (i.e. carry over) for rollup & queryGranularity
    final SettableSupplier<Boolean> rollup = new SettableSupplier<>();
    final SettableSupplier<Granularity> queryGranularity = new SettableSupplier<>();
    decideRollupAndQueryGranularityCarryOver(rollup, queryGranularity, queryableIndexAndSegments);
    final Interval totalInterval = JodaUtils.umbrellaInterval(queryableIndexAndSegments.stream().map(p -> p.rhs.getInterval()).collect(Collectors.toList()));
    final Granularity queryGranularityToUse;
    if (granularitySpec.getQueryGranularity() == null) {
        queryGranularityToUse = queryGranularity.get();
        log.info("Generate compaction task spec with segments original query granularity [%s]", queryGranularityToUse);
    } else {
        queryGranularityToUse = granularitySpec.getQueryGranularity();
        log.info("Generate compaction task spec with new query granularity overrided from input [%s]", queryGranularityToUse);
    }
    final GranularitySpec uniformGranularitySpec = new UniformGranularitySpec(Preconditions.checkNotNull(granularitySpec.getSegmentGranularity()), queryGranularityToUse, granularitySpec.isRollup() == null ? rollup.get() : granularitySpec.isRollup(), Collections.singletonList(totalInterval));
    // find unique dimensions
    final DimensionsSpec finalDimensionsSpec = dimensionsSpec == null ? createDimensionsSpec(queryableIndexAndSegments) : dimensionsSpec;
    final AggregatorFactory[] finalMetricsSpec = metricsSpec == null ? createMetricsSpec(queryableIndexAndSegments) : metricsSpec;
    return new DataSchema(dataSource, new TimestampSpec(ColumnHolder.TIME_COLUMN_NAME, "millis", null), finalDimensionsSpec, finalMetricsSpec, uniformGranularitySpec, transformSpec == null ? null : new TransformSpec(transformSpec.getFilter(), null));
}
Also used : LockGranularity(org.apache.druid.indexing.common.LockGranularity) Granularity(org.apache.druid.java.util.common.granularity.Granularity) AggregatorFactory(org.apache.druid.query.aggregation.AggregatorFactory) TransformSpec(org.apache.druid.segment.transform.TransformSpec) ClientCompactionTaskTransformSpec(org.apache.druid.client.indexing.ClientCompactionTaskTransformSpec) SettableSupplier(org.apache.druid.common.guava.SettableSupplier) DataSchema(org.apache.druid.segment.indexing.DataSchema) UniformGranularitySpec(org.apache.druid.segment.indexing.granularity.UniformGranularitySpec) UniformGranularitySpec(org.apache.druid.segment.indexing.granularity.UniformGranularitySpec) GranularitySpec(org.apache.druid.segment.indexing.granularity.GranularitySpec) ClientCompactionTaskGranularitySpec(org.apache.druid.client.indexing.ClientCompactionTaskGranularitySpec) TimestampSpec(org.apache.druid.data.input.impl.TimestampSpec) DimensionsSpec(org.apache.druid.data.input.impl.DimensionsSpec) Interval(org.joda.time.Interval)

Example 3 with GranularitySpec

use of org.apache.druid.segment.indexing.granularity.GranularitySpec in project druid by druid-io.

the class IndexTask method determineShardSpecs.

/**
 * Determines intervals and shardSpecs for input data.  This method first checks that it must determine intervals and
 * shardSpecs by itself.  Intervals must be determined if they are not specified in {@link GranularitySpec}.
 * ShardSpecs must be determined if the perfect rollup must be guaranteed even though the number of shards is not
 * specified in {@link IndexTuningConfig}.
 * <p/>
 * If both intervals and shardSpecs don't have to be determined, this method simply returns {@link ShardSpecs} for the
 * given intervals.  Here, if {@link HashedPartitionsSpec#numShards} is not specified, {@link NumberedShardSpec} is
 * used.
 * <p/>
 * If one of intervals or shardSpecs need to be determined, this method reads the entire input for determining one of
 * them.  If the perfect rollup must be guaranteed, {@link HashBasedNumberedShardSpec} is used for hash partitioning
 * of input data.  In the future we may want to also support single-dimension partitioning.
 *
 * @return a map indicating how many shardSpecs need to be created per interval.
 */
private PartitionAnalysis determineShardSpecs(final TaskToolbox toolbox, final InputSource inputSource, final File tmpDir, @Nonnull final PartitionsSpec partitionsSpec) throws IOException {
    final ObjectMapper jsonMapper = toolbox.getJsonMapper();
    final GranularitySpec granularitySpec = ingestionSchema.getDataSchema().getGranularitySpec();
    // Must determine intervals if unknown, since we acquire all locks before processing any data.
    final boolean determineIntervals = granularitySpec.inputIntervals().isEmpty();
    // Must determine partitions if rollup is guaranteed and the user didn't provide a specific value.
    final boolean determineNumPartitions = partitionsSpec.needsDeterminePartitions(false);
    // if we were given number of shards per interval and the intervals, we don't need to scan the data
    if (!determineNumPartitions && !determineIntervals) {
        log.info("Skipping determine partition scan");
        if (partitionsSpec.getType() == SecondaryPartitionType.HASH) {
            return PartialHashSegmentGenerateTask.createHashPartitionAnalysisFromPartitionsSpec(granularitySpec, (HashedPartitionsSpec) partitionsSpec, // not overriding numShards
            null);
        } else if (partitionsSpec.getType() == SecondaryPartitionType.LINEAR) {
            return createLinearPartitionAnalysis(granularitySpec, (DynamicPartitionsSpec) partitionsSpec);
        } else {
            throw new UOE("%s", partitionsSpec.getClass().getName());
        }
    } else {
        // determine intervals containing data and prime HLL collectors
        log.info("Determining intervals and shardSpecs");
        return createShardSpecsFromInput(jsonMapper, ingestionSchema, inputSource, tmpDir, granularitySpec, partitionsSpec, determineIntervals);
    }
}
Also used : DynamicPartitionsSpec(org.apache.druid.indexer.partitions.DynamicPartitionsSpec) GranularitySpec(org.apache.druid.segment.indexing.granularity.GranularitySpec) ArbitraryGranularitySpec(org.apache.druid.segment.indexing.granularity.ArbitraryGranularitySpec) UOE(org.apache.druid.java.util.common.UOE) ObjectMapper(com.fasterxml.jackson.databind.ObjectMapper)

Example 4 with GranularitySpec

use of org.apache.druid.segment.indexing.granularity.GranularitySpec in project druid by druid-io.

the class SinglePhaseParallelIndexTaskRunner method findIntervalAndVersion.

private NonnullPair<Interval, String> findIntervalAndVersion(DateTime timestamp) throws IOException {
    final GranularitySpec granularitySpec = getIngestionSchema().getDataSchema().getGranularitySpec();
    // This method is called whenever subtasks need to allocate a new segment via the supervisor task.
    // As a result, this code is never called in the Overlord. For now using the materialized intervals
    // here is ok for performance reasons
    final Set<Interval> materializedBucketIntervals = granularitySpec.materializedBucketIntervals();
    // List locks whenever allocating a new segment because locks might be revoked and no longer valid.
    final List<TaskLock> locks = getToolbox().getTaskActionClient().submit(new LockListAction());
    final TaskLock revokedLock = locks.stream().filter(TaskLock::isRevoked).findAny().orElse(null);
    if (revokedLock != null) {
        throw new ISE("Lock revoked: [%s]", revokedLock);
    }
    final Map<Interval, String> versions = locks.stream().collect(Collectors.toMap(TaskLock::getInterval, TaskLock::getVersion));
    Interval interval;
    String version;
    if (!materializedBucketIntervals.isEmpty()) {
        // If granularity spec has explicit intervals, we just need to find the version associated to the interval.
        // This is because we should have gotten all required locks up front when the task starts up.
        final Optional<Interval> maybeInterval = granularitySpec.bucketInterval(timestamp);
        if (!maybeInterval.isPresent()) {
            throw new IAE("Could not find interval for timestamp [%s]", timestamp);
        }
        interval = maybeInterval.get();
        if (!materializedBucketIntervals.contains(interval)) {
            throw new ISE("Unspecified interval[%s] in granularitySpec[%s]", interval, granularitySpec);
        }
        version = ParallelIndexSupervisorTask.findVersion(versions, interval);
        if (version == null) {
            throw new ISE("Cannot find a version for interval[%s]", interval);
        }
    } else {
        // We don't have explicit intervals. We can use the segment granularity to figure out what
        // interval we need, but we might not have already locked it.
        interval = granularitySpec.getSegmentGranularity().bucket(timestamp);
        version = ParallelIndexSupervisorTask.findVersion(versions, interval);
        if (version == null) {
            final int maxAllowedLockCount = getIngestionSchema().getTuningConfig().getMaxAllowedLockCount();
            if (maxAllowedLockCount >= 0 && locks.size() >= maxAllowedLockCount) {
                throw new MaxAllowedLocksExceededException(maxAllowedLockCount);
            }
            // We don't have a lock for this interval, so we should lock it now.
            final TaskLock lock = Preconditions.checkNotNull(getToolbox().getTaskActionClient().submit(new TimeChunkLockTryAcquireAction(TaskLockType.EXCLUSIVE, interval)), "Cannot acquire a lock for interval[%s]", interval);
            if (lock.isRevoked()) {
                throw new ISE(StringUtils.format("Lock for interval [%s] was revoked.", interval));
            }
            version = lock.getVersion();
        }
    }
    return new NonnullPair<>(interval, version);
}
Also used : LockListAction(org.apache.druid.indexing.common.actions.LockListAction) NonnullPair(org.apache.druid.java.util.common.NonnullPair) IAE(org.apache.druid.java.util.common.IAE) TaskLock(org.apache.druid.indexing.common.TaskLock) GranularitySpec(org.apache.druid.segment.indexing.granularity.GranularitySpec) MaxAllowedLocksExceededException(org.apache.druid.indexing.common.task.batch.MaxAllowedLocksExceededException) TimeChunkLockTryAcquireAction(org.apache.druid.indexing.common.actions.TimeChunkLockTryAcquireAction) ISE(org.apache.druid.java.util.common.ISE) Interval(org.joda.time.Interval)

Example 5 with GranularitySpec

use of org.apache.druid.segment.indexing.granularity.GranularitySpec in project druid by druid-io.

the class PartialHashSegmentGenerateTask method createSegmentAllocator.

@Override
SegmentAllocatorForBatch createSegmentAllocator(TaskToolbox toolbox, ParallelIndexSupervisorTaskClient taskClient) throws IOException {
    final GranularitySpec granularitySpec = ingestionSchema.getDataSchema().getGranularitySpec();
    final ParallelIndexTuningConfig tuningConfig = ingestionSchema.getTuningConfig();
    final HashedPartitionsSpec partitionsSpec = (HashedPartitionsSpec) tuningConfig.getGivenOrDefaultPartitionsSpec();
    return SegmentAllocators.forNonLinearPartitioning(toolbox, getDataSource(), getSubtaskSpecId(), granularitySpec, new SupervisorTaskAccess(supervisorTaskId, taskClient), createHashPartitionAnalysisFromPartitionsSpec(granularitySpec, partitionsSpec, intervalToNumShardsOverride));
}
Also used : HashedPartitionsSpec(org.apache.druid.indexer.partitions.HashedPartitionsSpec) GranularitySpec(org.apache.druid.segment.indexing.granularity.GranularitySpec)

Aggregations

GranularitySpec (org.apache.druid.segment.indexing.granularity.GranularitySpec)32 DataSchema (org.apache.druid.segment.indexing.DataSchema)21 DimensionsSpec (org.apache.druid.data.input.impl.DimensionsSpec)15 TimestampSpec (org.apache.druid.data.input.impl.TimestampSpec)14 Test (org.junit.Test)14 UniformGranularitySpec (org.apache.druid.segment.indexing.granularity.UniformGranularitySpec)13 InputFormat (org.apache.druid.data.input.InputFormat)11 InputRow (org.apache.druid.data.input.InputRow)11 InputSource (org.apache.druid.data.input.InputSource)11 AggregatorFactory (org.apache.druid.query.aggregation.AggregatorFactory)11 LongSumAggregatorFactory (org.apache.druid.query.aggregation.LongSumAggregatorFactory)11 TransformSpec (org.apache.druid.segment.transform.TransformSpec)9 InitializedNullHandlingTest (org.apache.druid.testing.InitializedNullHandlingTest)9 Interval (org.joda.time.Interval)9 Map (java.util.Map)8 SamplerResponse (org.apache.druid.client.indexing.SamplerResponse)8 SamplerResponseRow (org.apache.druid.client.indexing.SamplerResponse.SamplerResponseRow)8 RecordSupplierInputSource (org.apache.druid.indexing.seekablestream.RecordSupplierInputSource)8 CsvInputFormat (org.apache.druid.data.input.impl.CsvInputFormat)7 InlineInputSource (org.apache.druid.data.input.impl.InlineInputSource)7