Search in sources :

Example 36 with LinearShardSpec

use of org.apache.druid.timeline.partition.LinearShardSpec in project druid by druid-io.

the class FixedBucketsHistogramQuantileSqlAggregatorTest method createQuerySegmentWalker.

@Override
public SpecificSegmentsQuerySegmentWalker createQuerySegmentWalker() throws IOException {
    ApproximateHistogramDruidModule.registerSerde();
    final QueryableIndex index = IndexBuilder.create(CalciteTests.getJsonMapper()).tmpDir(temporaryFolder.newFolder()).segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance()).schema(new IncrementalIndexSchema.Builder().withMetrics(new CountAggregatorFactory("cnt"), new DoubleSumAggregatorFactory("m1", "m1"), new FixedBucketsHistogramAggregatorFactory("fbhist_m1", "m1", 20, 0, 10, FixedBucketsHistogram.OutlierHandlingMode.IGNORE, false)).withRollup(false).build()).rows(CalciteTests.ROWS1).buildMMappedIndex();
    return new SpecificSegmentsQuerySegmentWalker(conglomerate).add(DataSegment.builder().dataSource(CalciteTests.DATASOURCE1).interval(index.getDataInterval()).version("1").shardSpec(new LinearShardSpec(0)).size(0).build(), index);
}
Also used : CountAggregatorFactory(org.apache.druid.query.aggregation.CountAggregatorFactory) DoubleSumAggregatorFactory(org.apache.druid.query.aggregation.DoubleSumAggregatorFactory) SpecificSegmentsQuerySegmentWalker(org.apache.druid.sql.calcite.util.SpecificSegmentsQuerySegmentWalker) QueryableIndex(org.apache.druid.segment.QueryableIndex) LinearShardSpec(org.apache.druid.timeline.partition.LinearShardSpec) IndexBuilder(org.apache.druid.segment.IndexBuilder) FixedBucketsHistogramAggregatorFactory(org.apache.druid.query.aggregation.histogram.FixedBucketsHistogramAggregatorFactory)

Example 37 with LinearShardSpec

use of org.apache.druid.timeline.partition.LinearShardSpec in project druid by druid-io.

the class SequenceMetadataTest method testPublishAnnotatedSegmentsThrowExceptionIfOverwriteSegmentsNotNullAndNotEmpty.

@Test
public void testPublishAnnotatedSegmentsThrowExceptionIfOverwriteSegmentsNotNullAndNotEmpty() throws Exception {
    DataSegment dataSegment = DataSegment.builder().dataSource("foo").interval(Intervals.of("2001/P1D")).shardSpec(new LinearShardSpec(1)).version("b").size(0).build();
    Set<DataSegment> notNullNotEmptySegment = ImmutableSet.of(dataSegment);
    SequenceMetadata<Integer, Integer> sequenceMetadata = new SequenceMetadata<>(1, "test", ImmutableMap.of(), ImmutableMap.of(), true, ImmutableSet.of());
    TransactionalSegmentPublisher transactionalSegmentPublisher = sequenceMetadata.createPublisher(mockSeekableStreamIndexTaskRunner, mockTaskToolbox, true);
    expectedException.expect(ISE.class);
    expectedException.expectMessage("Stream ingestion task unexpectedly attempted to overwrite segments: " + SegmentUtils.commaSeparatedIdentifiers(notNullNotEmptySegment));
    transactionalSegmentPublisher.publishAnnotatedSegments(notNullNotEmptySegment, null, ImmutableSet.of(), null);
}
Also used : TransactionalSegmentPublisher(org.apache.druid.segment.realtime.appenderator.TransactionalSegmentPublisher) LinearShardSpec(org.apache.druid.timeline.partition.LinearShardSpec) DataSegment(org.apache.druid.timeline.DataSegment) Test(org.junit.Test)

Example 38 with LinearShardSpec

use of org.apache.druid.timeline.partition.LinearShardSpec in project druid by druid-io.

the class SequenceMetadataTest method testPublishAnnotatedSegmentsSucceedIfDropSegmentsAndOverwriteSegmentsNullAndEmpty.

@Test
public void testPublishAnnotatedSegmentsSucceedIfDropSegmentsAndOverwriteSegmentsNullAndEmpty() throws Exception {
    Mockito.when(mockSeekableStreamIndexTaskRunner.deserializePartitionsFromMetadata(ArgumentMatchers.any(), ArgumentMatchers.any())).thenReturn(mockSeekableStreamEndSequenceNumbers);
    Mockito.when(mockSeekableStreamEndSequenceNumbers.getPartitionSequenceNumberMap()).thenReturn(ImmutableMap.of());
    Mockito.when(mockTaskToolbox.getTaskActionClient()).thenReturn(mockTaskActionClient);
    DataSegment dataSegment = DataSegment.builder().dataSource("foo").interval(Intervals.of("2001/P1D")).shardSpec(new LinearShardSpec(1)).version("b").size(0).build();
    Set<DataSegment> notNullNotEmptySegment = ImmutableSet.of(dataSegment);
    SequenceMetadata<Integer, Integer> sequenceMetadata = new SequenceMetadata<>(1, "test", ImmutableMap.of(), ImmutableMap.of(), true, ImmutableSet.of());
    TransactionalSegmentPublisher transactionalSegmentPublisher = sequenceMetadata.createPublisher(mockSeekableStreamIndexTaskRunner, mockTaskToolbox, false);
    transactionalSegmentPublisher.publishAnnotatedSegments(null, null, notNullNotEmptySegment, ImmutableMap.of());
}
Also used : TransactionalSegmentPublisher(org.apache.druid.segment.realtime.appenderator.TransactionalSegmentPublisher) LinearShardSpec(org.apache.druid.timeline.partition.LinearShardSpec) DataSegment(org.apache.druid.timeline.DataSegment) Test(org.junit.Test)

Example 39 with LinearShardSpec

use of org.apache.druid.timeline.partition.LinearShardSpec in project druid by druid-io.

the class IndexerSQLMetadataStorageCoordinatorTest method testAnotherAllocatePendingSegmentAfterRevertingCompaction.

/**
 * Slightly different that the above test but that involves reverted compaction
 *   1) used segments of version = A, id = 0, 1, 2
 *   2) overwrote segments of version = B, id = 0 <= compaction
 *   3) marked segments unused for version = A, id = 0, 1, 2 <= overshadowing
 *   4) pending segment of version = B, id = 1 <= appending new data, aborted
 *   5) reverted compaction, mark segments used for version = A, id = 0, 1, 2, and mark compacted segments unused
 *   6) used segments of version = A, id = 0, 1, 2
 *   7) pending segment of version = B, id = 1
 */
@Test
public void testAnotherAllocatePendingSegmentAfterRevertingCompaction() {
    String maxVersion = "Z";
    // 1.0) simulate one append load
    final PartialShardSpec partialShardSpec = NumberedPartialShardSpec.instance();
    final String dataSource = "ds";
    final Interval interval = Intervals.of("2017-01-01/2017-02-01");
    final SegmentIdWithShardSpec identifier = coordinator.allocatePendingSegment(dataSource, "seq", null, interval, partialShardSpec, "A", true);
    Assert.assertEquals("ds_2017-01-01T00:00:00.000Z_2017-02-01T00:00:00.000Z_A", identifier.toString());
    // Assume it publishes; create its corresponding segment
    DataSegment segment = new DataSegment("ds", Intervals.of("2017-01-01T00Z/2017-02-01T00Z"), "A", ImmutableMap.of(), ImmutableList.of("dim1"), ImmutableList.of("m1"), new LinearShardSpec(0), 9, 100);
    Assert.assertTrue(insertUsedSegments(ImmutableSet.of(segment)));
    List<String> ids = retrieveUsedSegmentIds();
    Assert.assertEquals("ds_2017-01-01T00:00:00.000Z_2017-02-01T00:00:00.000Z_A", ids.get(0));
    // 1.1) simulate one more append load  (as if previous segment was published, note different sequence name)
    final SegmentIdWithShardSpec identifier1 = coordinator.allocatePendingSegment(dataSource, "seq2", identifier.toString(), interval, partialShardSpec, maxVersion, true);
    Assert.assertEquals("ds_2017-01-01T00:00:00.000Z_2017-02-01T00:00:00.000Z_A_1", identifier1.toString());
    // Assume it publishes; create its corresponding segment
    segment = new DataSegment("ds", Intervals.of("2017-01-01T00Z/2017-02-01T00Z"), "A", ImmutableMap.of(), ImmutableList.of("dim1"), ImmutableList.of("m1"), new LinearShardSpec(1), 9, 100);
    Assert.assertTrue(insertUsedSegments(ImmutableSet.of(segment)));
    ids = retrieveUsedSegmentIds();
    Assert.assertEquals("ds_2017-01-01T00:00:00.000Z_2017-02-01T00:00:00.000Z_A_1", ids.get(1));
    // 1.2) simulate one more append load  (as if previous segment was published, note different sequence name)
    final SegmentIdWithShardSpec identifier2 = coordinator.allocatePendingSegment(dataSource, "seq3", identifier1.toString(), interval, partialShardSpec, maxVersion, true);
    Assert.assertEquals("ds_2017-01-01T00:00:00.000Z_2017-02-01T00:00:00.000Z_A_2", identifier2.toString());
    // Assume it publishes; create its corresponding segment
    segment = new DataSegment("ds", Intervals.of("2017-01-01T00Z/2017-02-01T00Z"), "A", ImmutableMap.of(), ImmutableList.of("dim1"), ImmutableList.of("m1"), new LinearShardSpec(2), 9, 100);
    // state so far:
    // pendings: A: 0,1,2
    // used segments A: 0,1,2
    // unused segments:
    Assert.assertTrue(insertUsedSegments(ImmutableSet.of(segment)));
    ids = retrieveUsedSegmentIds();
    Assert.assertEquals("ds_2017-01-01T00:00:00.000Z_2017-02-01T00:00:00.000Z_A_2", ids.get(2));
    // 2)
    // now simulate that one compaction was done (batch) ingestion for same interval (like reindex of the previous three):
    DataSegment compactedSegment = new DataSegment("ds", Intervals.of("2017-01-01T00Z/2017-02-01T00Z"), "B", ImmutableMap.of(), ImmutableList.of("dim1"), ImmutableList.of("m1"), new LinearShardSpec(0), 9, 100);
    Assert.assertTrue(insertUsedSegments(ImmutableSet.of(compactedSegment)));
    ids = retrieveUsedSegmentIds();
    Assert.assertEquals("ds_2017-01-01T00:00:00.000Z_2017-02-01T00:00:00.000Z_B", ids.get(3));
    // 3) When overshadowing, segments are still marked as "used" in the segments table
    // state so far:
    // pendings: A: 0,1,2
    // used segments: A: 0,1,2; B: 0 <- new compacted segment, overshadows previous version A
    // unused segment:
    // 4) pending segment of version = B, id = 1 <= appending new data, aborted
    final SegmentIdWithShardSpec identifier3 = coordinator.allocatePendingSegment(dataSource, "seq4", identifier2.toString(), interval, partialShardSpec, maxVersion, true);
    Assert.assertEquals("ds_2017-01-01T00:00:00.000Z_2017-02-01T00:00:00.000Z_B_1", identifier3.toString());
    // no corresponding segment, pending aborted
    // state so far:
    // pendings: A: 0,1,2; B:1 (note that B_1 does not make it into segments since its task aborted)
    // used segments: A: 0,1,2; B: 0 <-  compacted segment, overshadows previous version A
    // unused segment:
    // 5) reverted compaction (by marking B_0 as unused)
    // Revert compaction a manual metadata update which is basically the following two steps:
    // <- drop compacted segment
    markAllSegmentsUnused(ImmutableSet.of(compactedSegment));
    // pending: version = A, id = 0,1,2
    // version = B, id = 1
    // 
    // used segment: version = A, id = 0,1,2
    // unused segment: version = B, id = 0
    List<String> pendings = retrievePendingSegmentIds();
    Assert.assertTrue(pendings.size() == 4);
    List<String> used = retrieveUsedSegmentIds();
    Assert.assertTrue(used.size() == 3);
    List<String> unused = retrieveUnusedSegmentIds();
    Assert.assertTrue(unused.size() == 1);
    // Simulate one more append load
    final SegmentIdWithShardSpec identifier4 = coordinator.allocatePendingSegment(dataSource, "seq5", identifier1.toString(), interval, partialShardSpec, maxVersion, true);
    // maxid = B_1 -> new partno = 2
    // versionofexistingchunk=A
    // ds_2017-01-01T00:00:00.000Z_2017-02-01T00:00:00.000Z_A_2
    Assert.assertEquals("ds_2017-01-01T00:00:00.000Z_2017-02-01T00:00:00.000Z_A_3", identifier4.toString());
    // Assume it publishes; create its corresponding segment
    segment = new DataSegment("ds", Intervals.of("2017-01-01T00Z/2017-02-01T00Z"), "A", ImmutableMap.of(), ImmutableList.of("dim1"), ImmutableList.of("m1"), new LinearShardSpec(3), 9, 100);
    // pending: version = A, id = 0,1,2,3
    // version = B, id = 1
    // 
    // used segment: version = A, id = 0,1,2,3
    // unused segment: version = B, id = 0
    Assert.assertTrue(insertUsedSegments(ImmutableSet.of(segment)));
    ids = retrieveUsedSegmentIds();
    Assert.assertEquals("ds_2017-01-01T00:00:00.000Z_2017-02-01T00:00:00.000Z_A_3", ids.get(3));
}
Also used : LinearShardSpec(org.apache.druid.timeline.partition.LinearShardSpec) HashBasedNumberedPartialShardSpec(org.apache.druid.timeline.partition.HashBasedNumberedPartialShardSpec) PartialShardSpec(org.apache.druid.timeline.partition.PartialShardSpec) NumberedPartialShardSpec(org.apache.druid.timeline.partition.NumberedPartialShardSpec) NumberedOverwritePartialShardSpec(org.apache.druid.timeline.partition.NumberedOverwritePartialShardSpec) SegmentIdWithShardSpec(org.apache.druid.segment.realtime.appenderator.SegmentIdWithShardSpec) DataSegment(org.apache.druid.timeline.DataSegment) Interval(org.joda.time.Interval) Test(org.junit.Test)

Example 40 with LinearShardSpec

use of org.apache.druid.timeline.partition.LinearShardSpec in project druid by druid-io.

the class IndexerSQLMetadataStorageCoordinatorTest method testNoPendingSegmentsAndOneUsedSegment.

@Test
public void testNoPendingSegmentsAndOneUsedSegment() {
    String maxVersion = "Z";
    // create one used segment
    DataSegment segment = new DataSegment("ds", Intervals.of("2017-01-01T00Z/2017-02-01T00Z"), "A", ImmutableMap.of(), ImmutableList.of("dim1"), ImmutableList.of("m1"), new LinearShardSpec(0), 9, 100);
    Assert.assertTrue(insertUsedSegments(ImmutableSet.of(segment)));
    List<String> ids = retrieveUsedSegmentIds();
    Assert.assertEquals("ds_2017-01-01T00:00:00.000Z_2017-02-01T00:00:00.000Z_A", ids.get(0));
    // simulate one aborted append load
    final PartialShardSpec partialShardSpec = NumberedPartialShardSpec.instance();
    final String dataSource = "ds";
    final Interval interval = Intervals.of("2017-01-01/2017-02-01");
    final SegmentIdWithShardSpec identifier = coordinator.allocatePendingSegment(dataSource, "seq", null, interval, partialShardSpec, maxVersion, true);
    Assert.assertEquals("ds_2017-01-01T00:00:00.000Z_2017-02-01T00:00:00.000Z_A_1", identifier.toString());
}
Also used : LinearShardSpec(org.apache.druid.timeline.partition.LinearShardSpec) HashBasedNumberedPartialShardSpec(org.apache.druid.timeline.partition.HashBasedNumberedPartialShardSpec) PartialShardSpec(org.apache.druid.timeline.partition.PartialShardSpec) NumberedPartialShardSpec(org.apache.druid.timeline.partition.NumberedPartialShardSpec) NumberedOverwritePartialShardSpec(org.apache.druid.timeline.partition.NumberedOverwritePartialShardSpec) DataSegment(org.apache.druid.timeline.DataSegment) SegmentIdWithShardSpec(org.apache.druid.segment.realtime.appenderator.SegmentIdWithShardSpec) Interval(org.joda.time.Interval) Test(org.junit.Test)

Aggregations

LinearShardSpec (org.apache.druid.timeline.partition.LinearShardSpec)42 DataSegment (org.apache.druid.timeline.DataSegment)30 Test (org.junit.Test)18 QueryableIndex (org.apache.druid.segment.QueryableIndex)14 Interval (org.joda.time.Interval)14 GeneratorSchemaInfo (org.apache.druid.segment.generator.GeneratorSchemaInfo)12 SegmentGenerator (org.apache.druid.segment.generator.SegmentGenerator)12 SpecificSegmentsQuerySegmentWalker (org.apache.druid.sql.calcite.util.SpecificSegmentsQuerySegmentWalker)12 CountAggregatorFactory (org.apache.druid.query.aggregation.CountAggregatorFactory)11 DoubleSumAggregatorFactory (org.apache.druid.query.aggregation.DoubleSumAggregatorFactory)9 Setup (org.openjdk.jmh.annotations.Setup)9 SegmentIdWithShardSpec (org.apache.druid.segment.realtime.appenderator.SegmentIdWithShardSpec)8 MetadataStorageTablesConfig (org.apache.druid.metadata.MetadataStorageTablesConfig)7 IndexBuilder (org.apache.druid.segment.IndexBuilder)7 DataSegmentPusher (org.apache.druid.segment.loading.DataSegmentPusher)7 HdfsDataSegmentPusher (org.apache.druid.storage.hdfs.HdfsDataSegmentPusher)7 HdfsDataSegmentPusherConfig (org.apache.druid.storage.hdfs.HdfsDataSegmentPusherConfig)7 LocalFileSystem (org.apache.hadoop.fs.LocalFileSystem)7 Path (org.apache.hadoop.fs.Path)7 File (java.io.File)6