Search in sources :

Example 1 with DynamicPartitionsSpec

use of org.apache.druid.indexer.partitions.DynamicPartitionsSpec in project druid by druid-io.

the class NewestSegmentFirstIteratorTest method testFindPartitionsSpecFromConfigWithNullTuningConfigReturnDynamicPartitinosSpecWithMaxTotalRowsOfLongMax.

@Test
public void testFindPartitionsSpecFromConfigWithNullTuningConfigReturnDynamicPartitinosSpecWithMaxTotalRowsOfLongMax() {
    final DataSourceCompactionConfig config = new DataSourceCompactionConfig("datasource", null, null, null, null, null, null, null, null, null, null, null);
    Assert.assertEquals(new DynamicPartitionsSpec(null, Long.MAX_VALUE), NewestSegmentFirstIterator.findPartitionsSpecFromConfig(ClientCompactionTaskQueryTuningConfig.from(config.getTuningConfig(), config.getMaxRowsPerSegment())));
}
Also used : DynamicPartitionsSpec(org.apache.druid.indexer.partitions.DynamicPartitionsSpec) DataSourceCompactionConfig(org.apache.druid.server.coordinator.DataSourceCompactionConfig) Test(org.junit.Test)

Example 2 with DynamicPartitionsSpec

use of org.apache.druid.indexer.partitions.DynamicPartitionsSpec in project druid by druid-io.

the class NewestSegmentFirstIteratorTest method testFindPartitionsSpecFromConfigWithDeprecatedMaxTotalRowsAndPartitionsSpecIgnoreDeprecatedOne.

@Test
public void testFindPartitionsSpecFromConfigWithDeprecatedMaxTotalRowsAndPartitionsSpecIgnoreDeprecatedOne() {
    final DataSourceCompactionConfig config = new DataSourceCompactionConfig("datasource", null, null, null, null, new UserCompactionTaskQueryTuningConfig(null, null, 1000L, null, new DynamicPartitionsSpec(null, null), null, null, null, null, null, null, null, null, null, null, null, null), null, null, null, null, null, null);
    Assert.assertEquals(new DynamicPartitionsSpec(null, Long.MAX_VALUE), NewestSegmentFirstIterator.findPartitionsSpecFromConfig(ClientCompactionTaskQueryTuningConfig.from(config.getTuningConfig(), config.getMaxRowsPerSegment())));
}
Also used : DynamicPartitionsSpec(org.apache.druid.indexer.partitions.DynamicPartitionsSpec) DataSourceCompactionConfig(org.apache.druid.server.coordinator.DataSourceCompactionConfig) UserCompactionTaskQueryTuningConfig(org.apache.druid.server.coordinator.UserCompactionTaskQueryTuningConfig) Test(org.junit.Test)

Example 3 with DynamicPartitionsSpec

use of org.apache.druid.indexer.partitions.DynamicPartitionsSpec in project druid by druid-io.

the class NewestSegmentFirstIteratorTest method testFindPartitionsSpecFromConfigWithNullMaxTotalRowsReturnLongMaxValue.

@Test
public void testFindPartitionsSpecFromConfigWithNullMaxTotalRowsReturnLongMaxValue() {
    final DataSourceCompactionConfig config = new DataSourceCompactionConfig("datasource", null, null, null, null, new UserCompactionTaskQueryTuningConfig(null, null, null, null, new DynamicPartitionsSpec(null, null), null, null, null, null, null, null, null, null, null, null, null, null), null, null, null, null, null, null);
    Assert.assertEquals(new DynamicPartitionsSpec(null, Long.MAX_VALUE), NewestSegmentFirstIterator.findPartitionsSpecFromConfig(ClientCompactionTaskQueryTuningConfig.from(config.getTuningConfig(), config.getMaxRowsPerSegment())));
}
Also used : DynamicPartitionsSpec(org.apache.druid.indexer.partitions.DynamicPartitionsSpec) DataSourceCompactionConfig(org.apache.druid.server.coordinator.DataSourceCompactionConfig) UserCompactionTaskQueryTuningConfig(org.apache.druid.server.coordinator.UserCompactionTaskQueryTuningConfig) Test(org.junit.Test)

Example 4 with DynamicPartitionsSpec

use of org.apache.druid.indexer.partitions.DynamicPartitionsSpec in project druid by druid-io.

the class IndexTask method determineShardSpecs.

/**
 * Determines intervals and shardSpecs for input data.  This method first checks that it must determine intervals and
 * shardSpecs by itself.  Intervals must be determined if they are not specified in {@link GranularitySpec}.
 * ShardSpecs must be determined if the perfect rollup must be guaranteed even though the number of shards is not
 * specified in {@link IndexTuningConfig}.
 * <p/>
 * If both intervals and shardSpecs don't have to be determined, this method simply returns {@link ShardSpecs} for the
 * given intervals.  Here, if {@link HashedPartitionsSpec#numShards} is not specified, {@link NumberedShardSpec} is
 * used.
 * <p/>
 * If one of intervals or shardSpecs need to be determined, this method reads the entire input for determining one of
 * them.  If the perfect rollup must be guaranteed, {@link HashBasedNumberedShardSpec} is used for hash partitioning
 * of input data.  In the future we may want to also support single-dimension partitioning.
 *
 * @return a map indicating how many shardSpecs need to be created per interval.
 */
private PartitionAnalysis determineShardSpecs(final TaskToolbox toolbox, final InputSource inputSource, final File tmpDir, @Nonnull final PartitionsSpec partitionsSpec) throws IOException {
    final ObjectMapper jsonMapper = toolbox.getJsonMapper();
    final GranularitySpec granularitySpec = ingestionSchema.getDataSchema().getGranularitySpec();
    // Must determine intervals if unknown, since we acquire all locks before processing any data.
    final boolean determineIntervals = granularitySpec.inputIntervals().isEmpty();
    // Must determine partitions if rollup is guaranteed and the user didn't provide a specific value.
    final boolean determineNumPartitions = partitionsSpec.needsDeterminePartitions(false);
    // if we were given number of shards per interval and the intervals, we don't need to scan the data
    if (!determineNumPartitions && !determineIntervals) {
        log.info("Skipping determine partition scan");
        if (partitionsSpec.getType() == SecondaryPartitionType.HASH) {
            return PartialHashSegmentGenerateTask.createHashPartitionAnalysisFromPartitionsSpec(granularitySpec, (HashedPartitionsSpec) partitionsSpec, // not overriding numShards
            null);
        } else if (partitionsSpec.getType() == SecondaryPartitionType.LINEAR) {
            return createLinearPartitionAnalysis(granularitySpec, (DynamicPartitionsSpec) partitionsSpec);
        } else {
            throw new UOE("%s", partitionsSpec.getClass().getName());
        }
    } else {
        // determine intervals containing data and prime HLL collectors
        log.info("Determining intervals and shardSpecs");
        return createShardSpecsFromInput(jsonMapper, ingestionSchema, inputSource, tmpDir, granularitySpec, partitionsSpec, determineIntervals);
    }
}
Also used : DynamicPartitionsSpec(org.apache.druid.indexer.partitions.DynamicPartitionsSpec) GranularitySpec(org.apache.druid.segment.indexing.granularity.GranularitySpec) ArbitraryGranularitySpec(org.apache.druid.segment.indexing.granularity.ArbitraryGranularitySpec) UOE(org.apache.druid.java.util.common.UOE) ObjectMapper(com.fasterxml.jackson.databind.ObjectMapper)

Example 5 with DynamicPartitionsSpec

use of org.apache.druid.indexer.partitions.DynamicPartitionsSpec in project druid by druid-io.

the class CompactionTaskParallelRunTest method testRunParallelWithDynamicPartitioningMatchCompactionState.

@Test
public void testRunParallelWithDynamicPartitioningMatchCompactionState() throws Exception {
    runIndexTask(null, true);
    final Builder builder = new Builder(DATA_SOURCE, getSegmentCacheManagerFactory(), RETRY_POLICY_FACTORY);
    final CompactionTask compactionTask = builder.inputSpec(new CompactionIntervalSpec(INTERVAL_TO_INDEX, null)).tuningConfig(AbstractParallelIndexSupervisorTaskTest.DEFAULT_TUNING_CONFIG_FOR_PARALLEL_INDEXING).build();
    final Set<DataSegment> compactedSegments = runTask(compactionTask);
    for (DataSegment segment : compactedSegments) {
        Assert.assertSame(lockGranularity == LockGranularity.TIME_CHUNK ? NumberedShardSpec.class : NumberedOverwriteShardSpec.class, segment.getShardSpec().getClass());
        // Expect compaction state to exist as store compaction state by default
        Map<String, String> expectedLongSumMetric = new HashMap<>();
        expectedLongSumMetric.put("type", "longSum");
        expectedLongSumMetric.put("name", "val");
        expectedLongSumMetric.put("fieldName", "val");
        expectedLongSumMetric.put("expression", null);
        CompactionState expectedState = new CompactionState(new DynamicPartitionsSpec(null, Long.MAX_VALUE), new DimensionsSpec(DimensionsSpec.getDefaultSchemas(ImmutableList.of("ts", "dim"))), ImmutableList.of(expectedLongSumMetric), null, compactionTask.getTuningConfig().getIndexSpec().asMap(getObjectMapper()), getObjectMapper().readValue(getObjectMapper().writeValueAsString(new UniformGranularitySpec(Granularities.HOUR, Granularities.MINUTE, true, ImmutableList.of(segment.getInterval()))), Map.class));
        Assert.assertEquals(expectedState, segment.getLastCompactionState());
    }
}
Also used : HashMap(java.util.HashMap) Builder(org.apache.druid.indexing.common.task.CompactionTask.Builder) DataSegment(org.apache.druid.timeline.DataSegment) UniformGranularitySpec(org.apache.druid.segment.indexing.granularity.UniformGranularitySpec) DynamicPartitionsSpec(org.apache.druid.indexer.partitions.DynamicPartitionsSpec) DimensionsSpec(org.apache.druid.data.input.impl.DimensionsSpec) NumberedOverwriteShardSpec(org.apache.druid.timeline.partition.NumberedOverwriteShardSpec) CompactionState(org.apache.druid.timeline.CompactionState) Map(java.util.Map) ImmutableMap(com.google.common.collect.ImmutableMap) HashMap(java.util.HashMap) NumberedShardSpec(org.apache.druid.timeline.partition.NumberedShardSpec) HashBasedNumberedShardSpec(org.apache.druid.timeline.partition.HashBasedNumberedShardSpec) AbstractParallelIndexSupervisorTaskTest(org.apache.druid.indexing.common.task.batch.parallel.AbstractParallelIndexSupervisorTaskTest) Test(org.junit.Test)

Aggregations

DynamicPartitionsSpec (org.apache.druid.indexer.partitions.DynamicPartitionsSpec)52 Test (org.junit.Test)34 IndexSpec (org.apache.druid.segment.IndexSpec)19 List (java.util.List)15 Map (java.util.Map)15 ImmutableList (com.google.common.collect.ImmutableList)13 StringUtils (org.apache.druid.java.util.common.StringUtils)13 DataSegment (org.apache.druid.timeline.DataSegment)13 ImmutableMap (com.google.common.collect.ImmutableMap)12 HashMap (java.util.HashMap)11 Function (java.util.function.Function)11 Pair (org.apache.druid.java.util.common.Pair)11 Closeable (java.io.Closeable)10 DimensionsSpec (org.apache.druid.data.input.impl.DimensionsSpec)10 RoaringBitmapSerdeFactory (org.apache.druid.segment.data.RoaringBitmapSerdeFactory)10 Duration (org.joda.time.Duration)10 Interval (org.joda.time.Interval)10 ArrayList (java.util.ArrayList)9 UUID (java.util.UUID)9 UniformGranularitySpec (org.apache.druid.segment.indexing.granularity.UniformGranularitySpec)9