Examples with HashedPartitionsSpec - org.apache.druid.indexer.partitions.HashedPartitionsSpec

Example 11 with HashedPartitionsSpec

use of org.apache.druid.indexer.partitions.HashedPartitionsSpec in project druid by druid-io.

the class PartialHashSegmentGenerateTaskTest method testCreateHashPartitionAnalysisFromPartitionsSpecWithNumShardsMap.

@Test
public void testCreateHashPartitionAnalysisFromPartitionsSpecWithNumShardsMap() {
    final List<Interval> intervals = ImmutableList.of(Intervals.of("2020-01-01/2020-01-02"), Intervals.of("2020-01-02/2020-01-03"), Intervals.of("2020-01-03/2020-01-04"));
    final Map<Interval, Integer> intervalToNumShards = ImmutableMap.of(Intervals.of("2020-01-01/2020-01-02"), 1, Intervals.of("2020-01-02/2020-01-03"), 2, Intervals.of("2020-01-03/2020-01-04"), 3);
    final HashPartitionAnalysis partitionAnalysis = PartialHashSegmentGenerateTask.createHashPartitionAnalysisFromPartitionsSpec(new UniformGranularitySpec(Granularities.DAY, Granularities.NONE, intervals), new HashedPartitionsSpec(null, null, null), intervalToNumShards);
    Assert.assertEquals(intervals.size(), partitionAnalysis.getNumTimePartitions());
    for (Interval interval : intervals) {
        Assert.assertEquals(intervalToNumShards.get(interval).intValue(), partitionAnalysis.getBucketAnalysis(interval).intValue());
    }
}

Also used : HashPartitionAnalysis(org.apache.druid.indexing.common.task.batch.partition.HashPartitionAnalysis) UniformGranularitySpec(org.apache.druid.segment.indexing.granularity.UniformGranularitySpec) HashedPartitionsSpec(org.apache.druid.indexer.partitions.HashedPartitionsSpec) Interval(org.joda.time.Interval) Test(org.junit.Test)

Example 12 with HashedPartitionsSpec

use of org.apache.druid.indexer.partitions.HashedPartitionsSpec in project druid by druid-io.

the class HashPartitionAdjustingCorePartitionSizeTest method testEqualNumberOfPartitionsToBuckets.

@Test
public void testEqualNumberOfPartitionsToBuckets() throws IOException {
    final File inputDir = temporaryFolder.newFolder();
    for (int i = 0; i < 10; i++) {
        try (final Writer writer = Files.newBufferedWriter(new File(inputDir, "test_" + i).toPath(), StandardCharsets.UTF_8)) {
            writer.write(StringUtils.format("2020-01-01T00:00:00,%s,b1,%d\n", "aa" + (i + 10), 10 * (i + 1)));
        }
    }
    final DimensionBasedPartitionsSpec partitionsSpec = new HashedPartitionsSpec(null, 5, ImmutableList.of("dim1"));
    final Set<DataSegment> segments = runTestTask(TIMESTAMP_SPEC, DIMENSIONS_SPEC, INPUT_FORMAT, null, INTERVAL_TO_INDEX, inputDir, "test_*", partitionsSpec, maxNumConcurrentSubTasks, TaskState.SUCCESS);
    Assert.assertEquals(5, segments.size());
    segments.forEach(segment -> {
        Assert.assertSame(HashBasedNumberedShardSpec.class, segment.getShardSpec().getClass());
        final HashBasedNumberedShardSpec shardSpec = (HashBasedNumberedShardSpec) segment.getShardSpec();
        Assert.assertEquals(5, shardSpec.getNumCorePartitions());
        Assert.assertEquals(5, shardSpec.getNumBuckets());
        Assert.assertEquals(ImmutableList.of("dim1"), shardSpec.getPartitionDimensions());
    });
}

Also used : HashBasedNumberedShardSpec(org.apache.druid.timeline.partition.HashBasedNumberedShardSpec) HashedPartitionsSpec(org.apache.druid.indexer.partitions.HashedPartitionsSpec) DimensionBasedPartitionsSpec(org.apache.druid.indexer.partitions.DimensionBasedPartitionsSpec) File(java.io.File) DataSegment(org.apache.druid.timeline.DataSegment) Writer(java.io.Writer) Test(org.junit.Test)

Example 13 with HashedPartitionsSpec

use of org.apache.druid.indexer.partitions.HashedPartitionsSpec in project druid by druid-io.

the class HashPartitionMultiPhaseParallelIndexingTest method testRun.

@Test
public void testRun() throws Exception {
    final Integer maxRowsPerSegment = numShards == null ? 10 : null;
    final Set<DataSegment> publishedSegments = runTestTask(new HashedPartitionsSpec(maxRowsPerSegment, numShards, ImmutableList.of("dim1", "dim2")), TaskState.SUCCESS, false);
    final Map<Interval, Integer> expectedIntervalToNumSegments = computeExpectedIntervalToNumSegments(maxRowsPerSegment, numShards);
    assertHashedPartition(publishedSegments, expectedIntervalToNumSegments);
}

Also used : HashedPartitionsSpec(org.apache.druid.indexer.partitions.HashedPartitionsSpec) DataSegment(org.apache.druid.timeline.DataSegment) Interval(org.joda.time.Interval) Test(org.junit.Test)

Example 14 with HashedPartitionsSpec

use of org.apache.druid.indexer.partitions.HashedPartitionsSpec in project druid by druid-io.

the class HashPartitionMultiPhaseParallelIndexingTest method testRunWithHashPartitionFunction.

@Test
public void testRunWithHashPartitionFunction() throws Exception {
    final Integer maxRowsPerSegment = numShards == null ? 10 : null;
    final Set<DataSegment> publishedSegments = runTestTask(new HashedPartitionsSpec(maxRowsPerSegment, numShards, ImmutableList.of("dim1", "dim2"), HashPartitionFunction.MURMUR3_32_ABS), TaskState.SUCCESS, false);
    final Map<Interval, Integer> expectedIntervalToNumSegments = computeExpectedIntervalToNumSegments(maxRowsPerSegment, numShards);
    assertHashedPartition(publishedSegments, expectedIntervalToNumSegments);
}

Also used : HashedPartitionsSpec(org.apache.druid.indexer.partitions.HashedPartitionsSpec) DataSegment(org.apache.druid.timeline.DataSegment) Interval(org.joda.time.Interval) Test(org.junit.Test)

Example 15 with HashedPartitionsSpec

use of org.apache.druid.indexer.partitions.HashedPartitionsSpec in project druid by druid-io.

the class HashPartitionMultiPhaseParallelIndexingTest method testAppendLinearlyPartitionedSegmensToHashPartitionedDatasourceSuccessfullyAppend.

@Test
public void testAppendLinearlyPartitionedSegmensToHashPartitionedDatasourceSuccessfullyAppend() {
    final Set<DataSegment> publishedSegments = new HashSet<>();
    publishedSegments.addAll(runTestTask(new HashedPartitionsSpec(null, numShards, ImmutableList.of("dim1", "dim2")), TaskState.SUCCESS, false));
    // Append
    publishedSegments.addAll(runTestTask(new DynamicPartitionsSpec(5, null), TaskState.SUCCESS, true));
    // And append again
    publishedSegments.addAll(runTestTask(new DynamicPartitionsSpec(10, null), TaskState.SUCCESS, true));
    final Map<Interval, List<DataSegment>> intervalToSegments = new HashMap<>();
    publishedSegments.forEach(segment -> intervalToSegments.computeIfAbsent(segment.getInterval(), k -> new ArrayList<>()).add(segment));
    for (Entry<Interval, List<DataSegment>> entry : intervalToSegments.entrySet()) {
        final List<DataSegment> segments = entry.getValue();
        final List<DataSegment> hashedSegments = segments.stream().filter(segment -> segment.getShardSpec().getClass() == HashBasedNumberedShardSpec.class).collect(Collectors.toList());
        final List<DataSegment> linearSegments = segments.stream().filter(segment -> segment.getShardSpec().getClass() == NumberedShardSpec.class).collect(Collectors.toList());
        for (DataSegment hashedSegment : hashedSegments) {
            final HashBasedNumberedShardSpec hashShardSpec = (HashBasedNumberedShardSpec) hashedSegment.getShardSpec();
            for (DataSegment linearSegment : linearSegments) {
                Assert.assertEquals(hashedSegment.getInterval(), linearSegment.getInterval());
                Assert.assertEquals(hashedSegment.getVersion(), linearSegment.getVersion());
                final NumberedShardSpec numberedShardSpec = (NumberedShardSpec) linearSegment.getShardSpec();
                Assert.assertEquals(hashShardSpec.getNumCorePartitions(), numberedShardSpec.getNumCorePartitions());
                Assert.assertTrue(hashShardSpec.getPartitionNum() < numberedShardSpec.getPartitionNum());
            }
        }
    }
}

Also used : Arrays(java.util.Arrays) Comparators(org.apache.druid.java.util.common.guava.Comparators) Intervals(org.apache.druid.java.util.common.Intervals) HashBasedNumberedShardSpec(org.apache.druid.timeline.partition.HashBasedNumberedShardSpec) RunWith(org.junit.runner.RunWith) HashMap(java.util.HashMap) TimestampSpec(org.apache.druid.data.input.impl.TimestampSpec) ArrayList(java.util.ArrayList) HashSet(java.util.HashSet) Interval(org.joda.time.Interval) CSVParseSpec(org.apache.druid.data.input.impl.CSVParseSpec) ImmutableList(com.google.common.collect.ImmutableList) Map(java.util.Map) DynamicPartitionsSpec(org.apache.druid.indexer.partitions.DynamicPartitionsSpec) PartitionsSpec(org.apache.druid.indexer.partitions.PartitionsSpec) Parameterized(org.junit.runners.Parameterized) Nullable(javax.annotation.Nullable) HashPartitionFunction(org.apache.druid.timeline.partition.HashPartitionFunction) Before(org.junit.Before) ParseSpec(org.apache.druid.data.input.impl.ParseSpec) DateTimes(org.apache.druid.java.util.common.DateTimes) ScanResultValue(org.apache.druid.query.scan.ScanResultValue) Files(java.nio.file.Files) InputFormat(org.apache.druid.data.input.InputFormat) NumberedShardSpec(org.apache.druid.timeline.partition.NumberedShardSpec) StringUtils(org.apache.druid.java.util.common.StringUtils) Set(java.util.Set) CsvInputFormat(org.apache.druid.data.input.impl.CsvInputFormat) DimensionsSpec(org.apache.druid.data.input.impl.DimensionsSpec) HashedPartitionsSpec(org.apache.druid.indexer.partitions.HashedPartitionsSpec) Test(org.junit.Test) IOException(java.io.IOException) Collectors(java.util.stream.Collectors) LockGranularity(org.apache.druid.indexing.common.LockGranularity) File(java.io.File) StandardCharsets(java.nio.charset.StandardCharsets) TaskState(org.apache.druid.indexer.TaskState) List(java.util.List) DataSegment(org.apache.druid.timeline.DataSegment) Writer(java.io.Writer) Entry(java.util.Map.Entry) Assert(org.junit.Assert) HashBasedNumberedShardSpec(org.apache.druid.timeline.partition.HashBasedNumberedShardSpec) HashedPartitionsSpec(org.apache.druid.indexer.partitions.HashedPartitionsSpec) HashMap(java.util.HashMap) DataSegment(org.apache.druid.timeline.DataSegment) DynamicPartitionsSpec(org.apache.druid.indexer.partitions.DynamicPartitionsSpec) ArrayList(java.util.ArrayList) ImmutableList(com.google.common.collect.ImmutableList) List(java.util.List) HashBasedNumberedShardSpec(org.apache.druid.timeline.partition.HashBasedNumberedShardSpec) NumberedShardSpec(org.apache.druid.timeline.partition.NumberedShardSpec) HashSet(java.util.HashSet) Interval(org.joda.time.Interval) Test(org.junit.Test)

Aggregations

HashedPartitionsSpec (org.apache.druid.indexer.partitions.HashedPartitionsSpec)43 Test (org.junit.Test)31 Interval (org.joda.time.Interval)20 DataSegment (org.apache.druid.timeline.DataSegment)15 List (java.util.List)14 ImmutableList (com.google.common.collect.ImmutableList)12 PartitionsSpec (org.apache.druid.indexer.partitions.PartitionsSpec)12 Map (java.util.Map)11 SingleDimensionPartitionsSpec (org.apache.druid.indexer.partitions.SingleDimensionPartitionsSpec)11 HashBasedNumberedShardSpec (org.apache.druid.timeline.partition.HashBasedNumberedShardSpec)11 ArrayList (java.util.ArrayList)10 DimensionsSpec (org.apache.druid.data.input.impl.DimensionsSpec)9 StringUtils (org.apache.druid.java.util.common.StringUtils)9 File (java.io.File)8 HashMap (java.util.HashMap)8 DynamicPartitionsSpec (org.apache.druid.indexer.partitions.DynamicPartitionsSpec)8 UniformGranularitySpec (org.apache.druid.segment.indexing.granularity.UniformGranularitySpec)8 HashPartitionFunction (org.apache.druid.timeline.partition.HashPartitionFunction)8 ImmutableMap (com.google.common.collect.ImmutableMap)7 IOException (java.io.IOException)7