Search in sources :

Example 11 with TransformSpec

use of org.apache.druid.segment.transform.TransformSpec in project druid by druid-io.

the class KafkaIndexTaskTest method testRunWithTransformSpec.

@Test(timeout = 60_000L)
public void testRunWithTransformSpec() throws Exception {
    final KafkaIndexTask task = createTask(null, NEW_DATA_SCHEMA.withTransformSpec(new TransformSpec(new SelectorDimFilter("dim1", "b", null), ImmutableList.of(new ExpressionTransform("dim1t", "concat(dim1,dim1)", ExprMacroTable.nil())))), new KafkaIndexTaskIOConfig(0, "sequence0", new SeekableStreamStartSequenceNumbers<>(topic, ImmutableMap.of(0, 0L), ImmutableSet.of()), new SeekableStreamEndSequenceNumbers<>(topic, ImmutableMap.of(0, 5L)), kafkaServer.consumerProperties(), KafkaSupervisorIOConfig.DEFAULT_POLL_TIMEOUT_MILLIS, true, null, null, INPUT_FORMAT));
    final ListenableFuture<TaskStatus> future = runTask(task);
    // Wait for the task to start reading
    while (task.getRunner().getStatus() != Status.READING) {
        Thread.sleep(10);
    }
    // Insert data
    insertData();
    // Wait for task to exit
    Assert.assertEquals(TaskState.SUCCESS, future.get().getStatusCode());
    // Check metrics
    Assert.assertEquals(1, task.getRunner().getRowIngestionMeters().getProcessed());
    Assert.assertEquals(0, task.getRunner().getRowIngestionMeters().getUnparseable());
    Assert.assertEquals(4, task.getRunner().getRowIngestionMeters().getThrownAway());
    // Check published metadata
    final List<SegmentDescriptor> publishedDescriptors = publishedDescriptors();
    assertEqualsExceptVersion(ImmutableList.of(sdd("2009/P1D", 0)), publishedDescriptors);
    Assert.assertEquals(new KafkaDataSourceMetadata(new SeekableStreamEndSequenceNumbers<>(topic, ImmutableMap.of(0, 5L))), newDataSchemaMetadata());
    // Check segments in deep storage
    Assert.assertEquals(ImmutableList.of("b"), readSegmentColumn("dim1", publishedDescriptors.get(0)));
    Assert.assertEquals(ImmutableList.of("bb"), readSegmentColumn("dim1t", publishedDescriptors.get(0)));
}
Also used : SelectorDimFilter(org.apache.druid.query.filter.SelectorDimFilter) SegmentDescriptor(org.apache.druid.query.SegmentDescriptor) SeekableStreamStartSequenceNumbers(org.apache.druid.indexing.seekablestream.SeekableStreamStartSequenceNumbers) ExpressionTransform(org.apache.druid.segment.transform.ExpressionTransform) TaskStatus(org.apache.druid.indexer.TaskStatus) TransformSpec(org.apache.druid.segment.transform.TransformSpec) SeekableStreamEndSequenceNumbers(org.apache.druid.indexing.seekablestream.SeekableStreamEndSequenceNumbers) Test(org.junit.Test) IndexTaskTest(org.apache.druid.indexing.common.task.IndexTaskTest)

Example 12 with TransformSpec

use of org.apache.druid.segment.transform.TransformSpec in project druid by druid-io.

the class KafkaIndexTaskTest method testSerde.

@Test
public void testSerde() throws Exception {
    // This is both a serde test and a regression test for https://github.com/apache/druid/issues/7724.
    final KafkaIndexTask task = createTask("taskid", NEW_DATA_SCHEMA.withTransformSpec(new TransformSpec(null, ImmutableList.of(new ExpressionTransform("beep", "nofunc()", ExprMacroTable.nil())))), new KafkaIndexTaskIOConfig(0, "sequence", new SeekableStreamStartSequenceNumbers<>(topic, ImmutableMap.of(), ImmutableSet.of()), new SeekableStreamEndSequenceNumbers<>(topic, ImmutableMap.of()), ImmutableMap.of(), KafkaSupervisorIOConfig.DEFAULT_POLL_TIMEOUT_MILLIS, true, null, null, INPUT_FORMAT));
    final Task task1 = OBJECT_MAPPER.readValue(OBJECT_MAPPER.writeValueAsBytes(task), Task.class);
    Assert.assertEquals(task, task1);
}
Also used : Task(org.apache.druid.indexing.common.task.Task) SeekableStreamStartSequenceNumbers(org.apache.druid.indexing.seekablestream.SeekableStreamStartSequenceNumbers) ExpressionTransform(org.apache.druid.segment.transform.ExpressionTransform) TransformSpec(org.apache.druid.segment.transform.TransformSpec) SeekableStreamEndSequenceNumbers(org.apache.druid.indexing.seekablestream.SeekableStreamEndSequenceNumbers) Test(org.junit.Test) IndexTaskTest(org.apache.druid.indexing.common.task.IndexTaskTest)

Example 13 with TransformSpec

use of org.apache.druid.segment.transform.TransformSpec in project druid by druid-io.

the class InputRowSchemasTest method test_createColumnsFilter_normal.

@Test
public void test_createColumnsFilter_normal() {
    final ColumnsFilter columnsFilter = InputRowSchemas.createColumnsFilter(new TimestampSpec("ts", "auto", null), new DimensionsSpec(ImmutableList.of(StringDimensionSchema.create("foo"))), new TransformSpec(new SelectorDimFilter("bar", "x", null), ImmutableList.of(new ExpressionTransform("baz", "qux + 3", ExprMacroTable.nil()))), new AggregatorFactory[] { new LongSumAggregatorFactory("billy", "bob") });
    Assert.assertEquals(ColumnsFilter.inclusionBased(ImmutableSet.of("ts", "foo", "bar", "qux", "bob")), columnsFilter);
}
Also used : SelectorDimFilter(org.apache.druid.query.filter.SelectorDimFilter) TimestampSpec(org.apache.druid.data.input.impl.TimestampSpec) LongSumAggregatorFactory(org.apache.druid.query.aggregation.LongSumAggregatorFactory) ColumnsFilter(org.apache.druid.data.input.ColumnsFilter) DimensionsSpec(org.apache.druid.data.input.impl.DimensionsSpec) ExpressionTransform(org.apache.druid.segment.transform.ExpressionTransform) TransformSpec(org.apache.druid.segment.transform.TransformSpec) Test(org.junit.Test) NullHandlingTest(org.apache.druid.common.config.NullHandlingTest)

Example 14 with TransformSpec

use of org.apache.druid.segment.transform.TransformSpec in project druid by druid-io.

the class InputRowSchemasTest method test_createColumnsFilter_schemaless.

@Test
public void test_createColumnsFilter_schemaless() {
    final ColumnsFilter columnsFilter = InputRowSchemas.createColumnsFilter(new TimestampSpec("ts", "auto", null), DimensionsSpec.builder().setDimensionExclusions(ImmutableList.of("ts", "foo", "bar", "qux", "bob")).build(), new TransformSpec(new SelectorDimFilter("bar", "x", null), ImmutableList.of(new ExpressionTransform("baz", "qux + 3", ExprMacroTable.nil()))), new AggregatorFactory[] { new LongSumAggregatorFactory("billy", "bob") });
    Assert.assertEquals(ColumnsFilter.exclusionBased(ImmutableSet.of("foo")), columnsFilter);
}
Also used : SelectorDimFilter(org.apache.druid.query.filter.SelectorDimFilter) TimestampSpec(org.apache.druid.data.input.impl.TimestampSpec) LongSumAggregatorFactory(org.apache.druid.query.aggregation.LongSumAggregatorFactory) ColumnsFilter(org.apache.druid.data.input.ColumnsFilter) ExpressionTransform(org.apache.druid.segment.transform.ExpressionTransform) TransformSpec(org.apache.druid.segment.transform.TransformSpec) Test(org.junit.Test) NullHandlingTest(org.apache.druid.common.config.NullHandlingTest)

Example 15 with TransformSpec

use of org.apache.druid.segment.transform.TransformSpec in project druid by druid-io.

the class IngestSegmentFirehoseFactory method connect.

@Override
public Firehose connect(InputRowParser inputRowParser, File temporaryDirectory) throws ParseException {
    log.debug("Connecting firehose: dataSource[%s], interval[%s], segmentIds[%s]", dataSource, interval, segmentIds);
    final List<TimelineObjectHolder<String, DataSegment>> timeLineSegments = getTimeline();
    // Download all segments locally.
    // Note: this requires enough local storage space to fit all of the segments, even though
    // IngestSegmentFirehose iterates over the segments in series. We may want to change this
    // to download files lazily, perhaps sharing code with PrefetchableTextFilesFirehoseFactory.
    final SegmentCacheManager segmentCacheManager = segmentCacheManagerFactory.manufacturate(temporaryDirectory);
    Map<DataSegment, File> segmentFileMap = Maps.newLinkedHashMap();
    for (TimelineObjectHolder<String, DataSegment> holder : timeLineSegments) {
        for (PartitionChunk<DataSegment> chunk : holder.getObject()) {
            final DataSegment segment = chunk.getObject();
            segmentFileMap.computeIfAbsent(segment, k -> {
                try {
                    return segmentCacheManager.getSegmentFiles(segment);
                } catch (SegmentLoadingException e) {
                    throw new RuntimeException(e);
                }
            });
        }
    }
    final List<String> dims = ReingestionTimelineUtils.getDimensionsToReingest(dimensions, inputRowParser.getParseSpec().getDimensionsSpec(), timeLineSegments);
    final List<String> metricsList = metrics == null ? ReingestionTimelineUtils.getUniqueMetrics(timeLineSegments) : metrics;
    final List<WindowedStorageAdapter> adapters = Lists.newArrayList(Iterables.concat(Iterables.transform(timeLineSegments, new Function<TimelineObjectHolder<String, DataSegment>, Iterable<WindowedStorageAdapter>>() {

        @Override
        public Iterable<WindowedStorageAdapter> apply(final TimelineObjectHolder<String, DataSegment> holder) {
            return Iterables.transform(holder.getObject(), new Function<PartitionChunk<DataSegment>, WindowedStorageAdapter>() {

                @Override
                public WindowedStorageAdapter apply(final PartitionChunk<DataSegment> input) {
                    final DataSegment segment = input.getObject();
                    try {
                        return new WindowedStorageAdapter(new QueryableIndexStorageAdapter(indexIO.loadIndex(Preconditions.checkNotNull(segmentFileMap.get(segment), "File for segment %s", segment.getId()))), holder.getInterval());
                    } catch (IOException e) {
                        throw new RuntimeException(e);
                    }
                }
            });
        }
    })));
    final TransformSpec transformSpec = TransformSpec.fromInputRowParser(inputRowParser);
    return new IngestSegmentFirehose(adapters, transformSpec, dims, metricsList, dimFilter);
}
Also used : IngestSegmentFirehose(org.apache.druid.segment.realtime.firehose.IngestSegmentFirehose) SegmentLoadingException(org.apache.druid.segment.loading.SegmentLoadingException) QueryableIndexStorageAdapter(org.apache.druid.segment.QueryableIndexStorageAdapter) IOException(java.io.IOException) DataSegment(org.apache.druid.timeline.DataSegment) TransformSpec(org.apache.druid.segment.transform.TransformSpec) SegmentCacheManager(org.apache.druid.segment.loading.SegmentCacheManager) Function(com.google.common.base.Function) TimelineObjectHolder(org.apache.druid.timeline.TimelineObjectHolder) PartitionChunk(org.apache.druid.timeline.partition.PartitionChunk) File(java.io.File) WindowedStorageAdapter(org.apache.druid.segment.realtime.firehose.WindowedStorageAdapter)

Aggregations

TransformSpec (org.apache.druid.segment.transform.TransformSpec)23 Test (org.junit.Test)19 ExpressionTransform (org.apache.druid.segment.transform.ExpressionTransform)16 DimensionsSpec (org.apache.druid.data.input.impl.DimensionsSpec)13 TimestampSpec (org.apache.druid.data.input.impl.TimestampSpec)13 SelectorDimFilter (org.apache.druid.query.filter.SelectorDimFilter)12 GranularitySpec (org.apache.druid.segment.indexing.granularity.GranularitySpec)10 AggregatorFactory (org.apache.druid.query.aggregation.AggregatorFactory)9 DataSchema (org.apache.druid.segment.indexing.DataSchema)9 InitializedNullHandlingTest (org.apache.druid.testing.InitializedNullHandlingTest)9 Map (java.util.Map)8 LongSumAggregatorFactory (org.apache.druid.query.aggregation.LongSumAggregatorFactory)8 UniformGranularitySpec (org.apache.druid.segment.indexing.granularity.UniformGranularitySpec)8 TaskStatus (org.apache.druid.indexer.TaskStatus)7 DataSegment (org.apache.druid.timeline.DataSegment)7 ImmutableMap (com.google.common.collect.ImmutableMap)6 InputFormat (org.apache.druid.data.input.InputFormat)6 ArrayList (java.util.ArrayList)5 SamplerResponse (org.apache.druid.client.indexing.SamplerResponse)5 SamplerResponseRow (org.apache.druid.client.indexing.SamplerResponse.SamplerResponseRow)5