Search in sources :

Example 11 with ScanResultValue

use of org.apache.druid.query.scan.ScanResultValue in project druid by druid-io.

the class AbstractMultiPhaseParallelIndexingTest method querySegment.

List<ScanResultValue> querySegment(DataSegment dataSegment, List<String> columns, File tempSegmentDir) {
    Segment segment = loadSegment(dataSegment, tempSegmentDir);
    final QueryRunner<ScanResultValue> runner = SCAN_QUERY_RUNNER_FACTORY.createRunner(segment);
    return runner.run(QueryPlus.wrap(new ScanQuery(new TableDataSource("dataSource"), new SpecificSegmentSpec(new SegmentDescriptor(dataSegment.getInterval(), dataSegment.getVersion(), dataSegment.getShardSpec().getPartitionNum())), null, null, 0, 0, 0, null, null, null, columns, false, null))).toList();
}
Also used : TableDataSource(org.apache.druid.query.TableDataSource) SpecificSegmentSpec(org.apache.druid.query.spec.SpecificSegmentSpec) ScanResultValue(org.apache.druid.query.scan.ScanResultValue) SegmentDescriptor(org.apache.druid.query.SegmentDescriptor) ScanQuery(org.apache.druid.query.scan.ScanQuery) DataSegment(org.apache.druid.timeline.DataSegment) Segment(org.apache.druid.segment.Segment)

Example 12 with ScanResultValue

use of org.apache.druid.query.scan.ScanResultValue in project druid by druid-io.

the class KafkaIndexTaskTest method testKafkaRecordEntityInputFormat.

@Test(timeout = 60_000L)
public void testKafkaRecordEntityInputFormat() throws Exception {
    // Insert data
    insertData(Iterables.limit(records, 3));
    final KafkaIndexTask task = createTask(null, new DataSchema("test_ds", new TimestampSpec("timestamp", "iso", null), new DimensionsSpec(Arrays.asList(new StringDimensionSchema("dim1"), new StringDimensionSchema("dim1t"), new StringDimensionSchema("dim2"), new LongDimensionSchema("dimLong"), new FloatDimensionSchema("dimFloat"), new StringDimensionSchema("kafka.topic"), new LongDimensionSchema("kafka.offset"), new StringDimensionSchema("kafka.header.encoding"))), new AggregatorFactory[] { new DoubleSumAggregatorFactory("met1sum", "met1"), new CountAggregatorFactory("rows") }, new UniformGranularitySpec(Granularities.DAY, Granularities.NONE, null), null), new KafkaIndexTaskIOConfig(0, "sequence0", new SeekableStreamStartSequenceNumbers<>(topic, ImmutableMap.of(0, 0L), ImmutableSet.of()), new SeekableStreamEndSequenceNumbers<>(topic, ImmutableMap.of(0, 5L)), kafkaServer.consumerProperties(), KafkaSupervisorIOConfig.DEFAULT_POLL_TIMEOUT_MILLIS, true, null, null, new TestKafkaInputFormat(INPUT_FORMAT)));
    Assert.assertTrue(task.supportsQueries());
    final ListenableFuture<TaskStatus> future = runTask(task);
    while (countEvents(task) != 3) {
        Thread.sleep(25);
    }
    Assert.assertEquals(Status.READING, task.getRunner().getStatus());
    final QuerySegmentSpec interval = OBJECT_MAPPER.readValue("\"2008/2012\"", QuerySegmentSpec.class);
    List<ScanResultValue> scanResultValues = scanData(task, interval);
    // verify that there are no records indexed in the rollbacked time period
    Assert.assertEquals(3, Iterables.size(scanResultValues));
    int i = 0;
    for (ScanResultValue result : scanResultValues) {
        final Map<String, Object> event = ((List<Map<String, Object>>) result.getEvents()).get(0);
        Assert.assertEquals((long) i++, event.get("kafka.offset"));
        Assert.assertEquals(topic, event.get("kafka.topic"));
        Assert.assertEquals("application/json", event.get("kafka.header.encoding"));
    }
    // insert remaining data
    insertData(Iterables.skip(records, 3));
    // Wait for task to exit
    Assert.assertEquals(TaskState.SUCCESS, future.get().getStatusCode());
    // Check metrics
    Assert.assertEquals(4, task.getRunner().getRowIngestionMeters().getProcessed());
    Assert.assertEquals(0, task.getRunner().getRowIngestionMeters().getUnparseable());
    Assert.assertEquals(0, task.getRunner().getRowIngestionMeters().getThrownAway());
}
Also used : UniformGranularitySpec(org.apache.druid.segment.indexing.granularity.UniformGranularitySpec) TimestampSpec(org.apache.druid.data.input.impl.TimestampSpec) QuerySegmentSpec(org.apache.druid.query.spec.QuerySegmentSpec) List(java.util.List) ImmutableList(com.google.common.collect.ImmutableList) SeekableStreamEndSequenceNumbers(org.apache.druid.indexing.seekablestream.SeekableStreamEndSequenceNumbers) DoubleSumAggregatorFactory(org.apache.druid.query.aggregation.DoubleSumAggregatorFactory) LongDimensionSchema(org.apache.druid.data.input.impl.LongDimensionSchema) FloatDimensionSchema(org.apache.druid.data.input.impl.FloatDimensionSchema) DoubleSumAggregatorFactory(org.apache.druid.query.aggregation.DoubleSumAggregatorFactory) AggregatorFactory(org.apache.druid.query.aggregation.AggregatorFactory) CountAggregatorFactory(org.apache.druid.query.aggregation.CountAggregatorFactory) TaskStatus(org.apache.druid.indexer.TaskStatus) StringDimensionSchema(org.apache.druid.data.input.impl.StringDimensionSchema) DataSchema(org.apache.druid.segment.indexing.DataSchema) CountAggregatorFactory(org.apache.druid.query.aggregation.CountAggregatorFactory) ScanResultValue(org.apache.druid.query.scan.ScanResultValue) SeekableStreamStartSequenceNumbers(org.apache.druid.indexing.seekablestream.SeekableStreamStartSequenceNumbers) DimensionsSpec(org.apache.druid.data.input.impl.DimensionsSpec) Test(org.junit.Test) IndexTaskTest(org.apache.druid.indexing.common.task.IndexTaskTest)

Example 13 with ScanResultValue

use of org.apache.druid.query.scan.ScanResultValue in project druid by druid-io.

the class KafkaIndexTaskTest method testRunTransactionModeRollback.

@Test(timeout = 60_000L)
public void testRunTransactionModeRollback() throws Exception {
    final KafkaIndexTask task = createTask(null, new KafkaIndexTaskIOConfig(0, "sequence0", new SeekableStreamStartSequenceNumbers<>(topic, ImmutableMap.of(0, 0L), ImmutableSet.of()), new SeekableStreamEndSequenceNumbers<>(topic, ImmutableMap.of(0, 13L)), kafkaServer.consumerProperties(), KafkaSupervisorIOConfig.DEFAULT_POLL_TIMEOUT_MILLIS, true, null, null, INPUT_FORMAT));
    final ListenableFuture<TaskStatus> future = runTask(task);
    // Insert 2 records initially
    try (final KafkaProducer<byte[], byte[]> kafkaProducer = kafkaServer.newProducer()) {
        kafkaProducer.initTransactions();
        kafkaProducer.beginTransaction();
        for (ProducerRecord<byte[], byte[]> record : Iterables.limit(records, 2)) {
            kafkaProducer.send(record).get();
        }
        kafkaProducer.commitTransaction();
    }
    while (countEvents(task) != 2) {
        Thread.sleep(25);
    }
    Assert.assertEquals(2, countEvents(task));
    Assert.assertEquals(Status.READING, task.getRunner().getStatus());
    // verify the 2 indexed records
    final QuerySegmentSpec firstInterval = OBJECT_MAPPER.readValue("\"2008/2010\"", QuerySegmentSpec.class);
    Iterable<ScanResultValue> scanResultValues = scanData(task, firstInterval);
    Assert.assertEquals(2, Iterables.size(scanResultValues));
    // Insert 3 more records and rollback
    try (final KafkaProducer<byte[], byte[]> kafkaProducer = kafkaServer.newProducer()) {
        kafkaProducer.initTransactions();
        kafkaProducer.beginTransaction();
        for (ProducerRecord<byte[], byte[]> record : Iterables.limit(Iterables.skip(records, 2), 3)) {
            kafkaProducer.send(record).get();
        }
        kafkaProducer.flush();
        kafkaProducer.abortTransaction();
    }
    Assert.assertEquals(2, countEvents(task));
    Assert.assertEquals(Status.READING, task.getRunner().getStatus());
    final QuerySegmentSpec rollbackedInterval = OBJECT_MAPPER.readValue("\"2010/2012\"", QuerySegmentSpec.class);
    scanResultValues = scanData(task, rollbackedInterval);
    // verify that there are no records indexed in the rollbacked time period
    Assert.assertEquals(0, Iterables.size(scanResultValues));
    // Insert remaining data
    try (final KafkaProducer<byte[], byte[]> kafkaProducer = kafkaServer.newProducer()) {
        kafkaProducer.initTransactions();
        kafkaProducer.beginTransaction();
        for (ProducerRecord<byte[], byte[]> record : Iterables.skip(records, 5)) {
            kafkaProducer.send(record).get();
        }
        kafkaProducer.commitTransaction();
    }
    final QuerySegmentSpec endInterval = OBJECT_MAPPER.readValue("\"2008/2049\"", QuerySegmentSpec.class);
    Iterable<ScanResultValue> scanResultValues1 = scanData(task, endInterval);
    Assert.assertEquals(2, Iterables.size(scanResultValues1));
    Assert.assertEquals(TaskState.SUCCESS, future.get().getStatusCode());
    Assert.assertEquals(task.getRunner().getEndOffsets(), task.getRunner().getCurrentOffsets());
    // Check metrics
    Assert.assertEquals(3, task.getRunner().getRowIngestionMeters().getProcessed());
    Assert.assertEquals(3, task.getRunner().getRowIngestionMeters().getUnparseable());
    Assert.assertEquals(1, task.getRunner().getRowIngestionMeters().getThrownAway());
    // Check published metadata and segments in deep storage
    assertEqualsExceptVersion(ImmutableList.of(sdd("2008/P1D", 0, ImmutableList.of("a")), sdd("2009/P1D", 0, ImmutableList.of("b")), sdd("2013/P1D", 0, ImmutableList.of("f")), sdd("2049/P1D", 0, ImmutableList.of("f"))), publishedDescriptors());
    Assert.assertEquals(new KafkaDataSourceMetadata(new SeekableStreamEndSequenceNumbers<>(topic, ImmutableMap.of(0, 13L))), newDataSchemaMetadata());
}
Also used : ScanResultValue(org.apache.druid.query.scan.ScanResultValue) SeekableStreamStartSequenceNumbers(org.apache.druid.indexing.seekablestream.SeekableStreamStartSequenceNumbers) QuerySegmentSpec(org.apache.druid.query.spec.QuerySegmentSpec) TaskStatus(org.apache.druid.indexer.TaskStatus) SeekableStreamEndSequenceNumbers(org.apache.druid.indexing.seekablestream.SeekableStreamEndSequenceNumbers) Test(org.junit.Test) IndexTaskTest(org.apache.druid.indexing.common.task.IndexTaskTest)

Example 14 with ScanResultValue

use of org.apache.druid.query.scan.ScanResultValue in project druid by druid-io.

the class SetAndVerifyContextQueryRunnerTest method testTimeoutZeroIsNotImmediateTimeoutExplicitServersideMax.

@Test
public void testTimeoutZeroIsNotImmediateTimeoutExplicitServersideMax() {
    Query<ScanResultValue> query = new Druids.ScanQueryBuilder().dataSource("foo").intervals(new MultipleIntervalSegmentSpec(ImmutableList.of(Intervals.ETERNITY))).context(ImmutableMap.of(QueryContexts.TIMEOUT_KEY, 0)).build();
    ServerConfig defaultConfig = new ServerConfig() {

        @Override
        public long getMaxQueryTimeout() {
            return 10000L;
        }
    };
    QueryRunner<ScanResultValue> mockRunner = EasyMock.createMock(QueryRunner.class);
    SetAndVerifyContextQueryRunner<ScanResultValue> queryRunner = new SetAndVerifyContextQueryRunner<>(defaultConfig, mockRunner);
    Query<ScanResultValue> transformed = queryRunner.withTimeoutAndMaxScatterGatherBytes(query, defaultConfig);
    // timeout is set to 0, so withTimeoutAndMaxScatterGatherBytes should set QUERY_FAIL_TIME to be the current
    // time + max query timeout at the time the method was called
    // this means that the fail time should be greater than the current time when checking
    Assert.assertTrue(System.currentTimeMillis() < (Long) transformed.getContextValue(DirectDruidClient.QUERY_FAIL_TIME));
}
Also used : ServerConfig(org.apache.druid.server.initialization.ServerConfig) ScanResultValue(org.apache.druid.query.scan.ScanResultValue) MultipleIntervalSegmentSpec(org.apache.druid.query.spec.MultipleIntervalSegmentSpec) Test(org.junit.Test)

Example 15 with ScanResultValue

use of org.apache.druid.query.scan.ScanResultValue in project hive by apache.

the class DruidScanQueryRecordReader method nextKeyValue.

@Override
public boolean nextKeyValue() throws IOException {
    if (compactedValues.hasNext()) {
        return true;
    }
    if (getQueryResultsIterator().hasNext()) {
        ScanResultValue current = getQueryResultsIterator().next();
        // noinspection unchecked
        compactedValues = ((List<List<Object>>) current.getEvents()).iterator();
        return nextKeyValue();
    }
    return false;
}
Also used : ScanResultValue(org.apache.druid.query.scan.ScanResultValue) List(java.util.List)

Aggregations

ScanResultValue (org.apache.druid.query.scan.ScanResultValue)15 Test (org.junit.Test)10 List (java.util.List)5 ScanQuery (org.apache.druid.query.scan.ScanQuery)5 MultipleIntervalSegmentSpec (org.apache.druid.query.spec.MultipleIntervalSegmentSpec)5 ImmutableList (com.google.common.collect.ImmutableList)4 SegmentDescriptor (org.apache.druid.query.SegmentDescriptor)4 ServerConfig (org.apache.druid.server.initialization.ServerConfig)4 TaskStatus (org.apache.druid.indexer.TaskStatus)3 IndexTaskTest (org.apache.druid.indexing.common.task.IndexTaskTest)3 SeekableStreamEndSequenceNumbers (org.apache.druid.indexing.seekablestream.SeekableStreamEndSequenceNumbers)3 SeekableStreamStartSequenceNumbers (org.apache.druid.indexing.seekablestream.SeekableStreamStartSequenceNumbers)3 TableDataSource (org.apache.druid.query.TableDataSource)3 MultipleSpecificSegmentSpec (org.apache.druid.query.spec.MultipleSpecificSegmentSpec)3 QuerySegmentSpec (org.apache.druid.query.spec.QuerySegmentSpec)3 DimensionsSpec (org.apache.druid.data.input.impl.DimensionsSpec)2 FloatDimensionSchema (org.apache.druid.data.input.impl.FloatDimensionSchema)2 LongDimensionSchema (org.apache.druid.data.input.impl.LongDimensionSchema)2 StringDimensionSchema (org.apache.druid.data.input.impl.StringDimensionSchema)2 TimestampSpec (org.apache.druid.data.input.impl.TimestampSpec)2