use of org.apache.druid.segment.indexing.granularity.GranularitySpec in project druid by druid-io.
the class DataSchemaTest method testWithDimensionSpec.
@Test
public void testWithDimensionSpec() {
TimestampSpec tsSpec = Mockito.mock(TimestampSpec.class);
GranularitySpec gSpec = Mockito.mock(GranularitySpec.class);
DimensionsSpec oldDimSpec = Mockito.mock(DimensionsSpec.class);
DimensionsSpec newDimSpec = Mockito.mock(DimensionsSpec.class);
AggregatorFactory aggFactory = Mockito.mock(AggregatorFactory.class);
Mockito.when(aggFactory.getName()).thenReturn("myAgg");
TransformSpec transSpec = Mockito.mock(TransformSpec.class);
Map<String, Object> parserMap = Mockito.mock(Map.class);
Mockito.when(newDimSpec.withDimensionExclusions(ArgumentMatchers.any(Set.class))).thenReturn(newDimSpec);
DataSchema oldSchema = new DataSchema("dataSource", tsSpec, oldDimSpec, new AggregatorFactory[] { aggFactory }, gSpec, transSpec, parserMap, jsonMapper);
DataSchema newSchema = oldSchema.withDimensionsSpec(newDimSpec);
Assert.assertSame(oldSchema.getDataSource(), newSchema.getDataSource());
Assert.assertSame(oldSchema.getTimestampSpec(), newSchema.getTimestampSpec());
Assert.assertSame(newDimSpec, newSchema.getDimensionsSpec());
Assert.assertSame(oldSchema.getAggregators(), newSchema.getAggregators());
Assert.assertSame(oldSchema.getGranularitySpec(), newSchema.getGranularitySpec());
Assert.assertSame(oldSchema.getTransformSpec(), newSchema.getTransformSpec());
Assert.assertSame(oldSchema.getParserMap(), newSchema.getParserMap());
}
use of org.apache.druid.segment.indexing.granularity.GranularitySpec in project druid by druid-io.
the class CompactionTask method createDataSchema.
private static DataSchema createDataSchema(String dataSource, List<NonnullPair<QueryableIndex, DataSegment>> queryableIndexAndSegments, @Nullable DimensionsSpec dimensionsSpec, @Nullable ClientCompactionTaskTransformSpec transformSpec, @Nullable AggregatorFactory[] metricsSpec, @Nonnull ClientCompactionTaskGranularitySpec granularitySpec) {
// check index metadata &
// Decide which values to propagate (i.e. carry over) for rollup & queryGranularity
final SettableSupplier<Boolean> rollup = new SettableSupplier<>();
final SettableSupplier<Granularity> queryGranularity = new SettableSupplier<>();
decideRollupAndQueryGranularityCarryOver(rollup, queryGranularity, queryableIndexAndSegments);
final Interval totalInterval = JodaUtils.umbrellaInterval(queryableIndexAndSegments.stream().map(p -> p.rhs.getInterval()).collect(Collectors.toList()));
final Granularity queryGranularityToUse;
if (granularitySpec.getQueryGranularity() == null) {
queryGranularityToUse = queryGranularity.get();
log.info("Generate compaction task spec with segments original query granularity [%s]", queryGranularityToUse);
} else {
queryGranularityToUse = granularitySpec.getQueryGranularity();
log.info("Generate compaction task spec with new query granularity overrided from input [%s]", queryGranularityToUse);
}
final GranularitySpec uniformGranularitySpec = new UniformGranularitySpec(Preconditions.checkNotNull(granularitySpec.getSegmentGranularity()), queryGranularityToUse, granularitySpec.isRollup() == null ? rollup.get() : granularitySpec.isRollup(), Collections.singletonList(totalInterval));
// find unique dimensions
final DimensionsSpec finalDimensionsSpec = dimensionsSpec == null ? createDimensionsSpec(queryableIndexAndSegments) : dimensionsSpec;
final AggregatorFactory[] finalMetricsSpec = metricsSpec == null ? createMetricsSpec(queryableIndexAndSegments) : metricsSpec;
return new DataSchema(dataSource, new TimestampSpec(ColumnHolder.TIME_COLUMN_NAME, "millis", null), finalDimensionsSpec, finalMetricsSpec, uniformGranularitySpec, transformSpec == null ? null : new TransformSpec(transformSpec.getFilter(), null));
}
use of org.apache.druid.segment.indexing.granularity.GranularitySpec in project druid by druid-io.
the class IndexTask method determineShardSpecs.
/**
* Determines intervals and shardSpecs for input data. This method first checks that it must determine intervals and
* shardSpecs by itself. Intervals must be determined if they are not specified in {@link GranularitySpec}.
* ShardSpecs must be determined if the perfect rollup must be guaranteed even though the number of shards is not
* specified in {@link IndexTuningConfig}.
* <p/>
* If both intervals and shardSpecs don't have to be determined, this method simply returns {@link ShardSpecs} for the
* given intervals. Here, if {@link HashedPartitionsSpec#numShards} is not specified, {@link NumberedShardSpec} is
* used.
* <p/>
* If one of intervals or shardSpecs need to be determined, this method reads the entire input for determining one of
* them. If the perfect rollup must be guaranteed, {@link HashBasedNumberedShardSpec} is used for hash partitioning
* of input data. In the future we may want to also support single-dimension partitioning.
*
* @return a map indicating how many shardSpecs need to be created per interval.
*/
private PartitionAnalysis determineShardSpecs(final TaskToolbox toolbox, final InputSource inputSource, final File tmpDir, @Nonnull final PartitionsSpec partitionsSpec) throws IOException {
final ObjectMapper jsonMapper = toolbox.getJsonMapper();
final GranularitySpec granularitySpec = ingestionSchema.getDataSchema().getGranularitySpec();
// Must determine intervals if unknown, since we acquire all locks before processing any data.
final boolean determineIntervals = granularitySpec.inputIntervals().isEmpty();
// Must determine partitions if rollup is guaranteed and the user didn't provide a specific value.
final boolean determineNumPartitions = partitionsSpec.needsDeterminePartitions(false);
// if we were given number of shards per interval and the intervals, we don't need to scan the data
if (!determineNumPartitions && !determineIntervals) {
log.info("Skipping determine partition scan");
if (partitionsSpec.getType() == SecondaryPartitionType.HASH) {
return PartialHashSegmentGenerateTask.createHashPartitionAnalysisFromPartitionsSpec(granularitySpec, (HashedPartitionsSpec) partitionsSpec, // not overriding numShards
null);
} else if (partitionsSpec.getType() == SecondaryPartitionType.LINEAR) {
return createLinearPartitionAnalysis(granularitySpec, (DynamicPartitionsSpec) partitionsSpec);
} else {
throw new UOE("%s", partitionsSpec.getClass().getName());
}
} else {
// determine intervals containing data and prime HLL collectors
log.info("Determining intervals and shardSpecs");
return createShardSpecsFromInput(jsonMapper, ingestionSchema, inputSource, tmpDir, granularitySpec, partitionsSpec, determineIntervals);
}
}
use of org.apache.druid.segment.indexing.granularity.GranularitySpec in project druid by druid-io.
the class SinglePhaseParallelIndexTaskRunner method findIntervalAndVersion.
private NonnullPair<Interval, String> findIntervalAndVersion(DateTime timestamp) throws IOException {
final GranularitySpec granularitySpec = getIngestionSchema().getDataSchema().getGranularitySpec();
// This method is called whenever subtasks need to allocate a new segment via the supervisor task.
// As a result, this code is never called in the Overlord. For now using the materialized intervals
// here is ok for performance reasons
final Set<Interval> materializedBucketIntervals = granularitySpec.materializedBucketIntervals();
// List locks whenever allocating a new segment because locks might be revoked and no longer valid.
final List<TaskLock> locks = getToolbox().getTaskActionClient().submit(new LockListAction());
final TaskLock revokedLock = locks.stream().filter(TaskLock::isRevoked).findAny().orElse(null);
if (revokedLock != null) {
throw new ISE("Lock revoked: [%s]", revokedLock);
}
final Map<Interval, String> versions = locks.stream().collect(Collectors.toMap(TaskLock::getInterval, TaskLock::getVersion));
Interval interval;
String version;
if (!materializedBucketIntervals.isEmpty()) {
// If granularity spec has explicit intervals, we just need to find the version associated to the interval.
// This is because we should have gotten all required locks up front when the task starts up.
final Optional<Interval> maybeInterval = granularitySpec.bucketInterval(timestamp);
if (!maybeInterval.isPresent()) {
throw new IAE("Could not find interval for timestamp [%s]", timestamp);
}
interval = maybeInterval.get();
if (!materializedBucketIntervals.contains(interval)) {
throw new ISE("Unspecified interval[%s] in granularitySpec[%s]", interval, granularitySpec);
}
version = ParallelIndexSupervisorTask.findVersion(versions, interval);
if (version == null) {
throw new ISE("Cannot find a version for interval[%s]", interval);
}
} else {
// We don't have explicit intervals. We can use the segment granularity to figure out what
// interval we need, but we might not have already locked it.
interval = granularitySpec.getSegmentGranularity().bucket(timestamp);
version = ParallelIndexSupervisorTask.findVersion(versions, interval);
if (version == null) {
final int maxAllowedLockCount = getIngestionSchema().getTuningConfig().getMaxAllowedLockCount();
if (maxAllowedLockCount >= 0 && locks.size() >= maxAllowedLockCount) {
throw new MaxAllowedLocksExceededException(maxAllowedLockCount);
}
// We don't have a lock for this interval, so we should lock it now.
final TaskLock lock = Preconditions.checkNotNull(getToolbox().getTaskActionClient().submit(new TimeChunkLockTryAcquireAction(TaskLockType.EXCLUSIVE, interval)), "Cannot acquire a lock for interval[%s]", interval);
if (lock.isRevoked()) {
throw new ISE(StringUtils.format("Lock for interval [%s] was revoked.", interval));
}
version = lock.getVersion();
}
}
return new NonnullPair<>(interval, version);
}
use of org.apache.druid.segment.indexing.granularity.GranularitySpec in project druid by druid-io.
the class PartialHashSegmentGenerateTask method createSegmentAllocator.
@Override
SegmentAllocatorForBatch createSegmentAllocator(TaskToolbox toolbox, ParallelIndexSupervisorTaskClient taskClient) throws IOException {
final GranularitySpec granularitySpec = ingestionSchema.getDataSchema().getGranularitySpec();
final ParallelIndexTuningConfig tuningConfig = ingestionSchema.getTuningConfig();
final HashedPartitionsSpec partitionsSpec = (HashedPartitionsSpec) tuningConfig.getGivenOrDefaultPartitionsSpec();
return SegmentAllocators.forNonLinearPartitioning(toolbox, getDataSource(), getSubtaskSpecId(), granularitySpec, new SupervisorTaskAccess(supervisorTaskId, taskClient), createHashPartitionAnalysisFromPartitionsSpec(granularitySpec, partitionsSpec, intervalToNumShardsOverride));
}
Aggregations