Examples with StringInputRowParser - org.apache.druid.data.input.impl.StringInputRowParser

Example 21 with StringInputRowParser

use of org.apache.druid.data.input.impl.StringInputRowParser in project druid by druid-io.

the class CompactionTaskRunTest method testRunRegularIndexTaskWithIngestSegmentFirehose.

/**
 * Run a regular index task that's equivalent to the compaction task in {@link #testRunWithDynamicPartitioning()},
 * using {@link IngestSegmentFirehoseFactory}.
 *
 * This is not entirely CompactionTask related, but it's similar conceptually and it requires
 * similar setup to what this test suite already has.
 *
 * It could be moved to a separate test class if needed.
 */
@Test
public void testRunRegularIndexTaskWithIngestSegmentFirehose() throws Exception {
    runIndexTask();
    IndexTask indexTask = new IndexTask(null, null, new IndexTask.IndexIngestionSpec(new DataSchema("test", getObjectMapper().convertValue(new StringInputRowParser(DEFAULT_PARSE_SPEC, null), Map.class), new AggregatorFactory[] { new LongSumAggregatorFactory("val", "val") }, new UniformGranularitySpec(Granularities.HOUR, Granularities.MINUTE, null), null, getObjectMapper()), new IndexTask.IndexIOConfig(new IngestSegmentFirehoseFactory(DATA_SOURCE, Intervals.of("2014-01-01/2014-01-02"), null, null, null, null, null, getIndexIO(), coordinatorClient, segmentCacheManagerFactory, RETRY_POLICY_FACTORY), false, false), IndexTaskTest.createTuningConfig(5000000, null, null, Long.MAX_VALUE, null, false, true)), null);
    // This is a regular index so we need to explicitly add this context to store the CompactionState
    indexTask.addToContext(Tasks.STORE_COMPACTION_STATE_KEY, true);
    final Pair<TaskStatus, List<DataSegment>> resultPair = runTask(indexTask);
    Assert.assertTrue(resultPair.lhs.isSuccess());
    final List<DataSegment> segments = resultPair.rhs;
    Assert.assertEquals(3, segments.size());
    for (int i = 0; i < 3; i++) {
        Assert.assertEquals(Intervals.of("2014-01-01T0%d:00:00/2014-01-01T0%d:00:00", i, i + 1), segments.get(i).getInterval());
        Assert.assertEquals(getDefaultCompactionState(Granularities.HOUR, Granularities.MINUTE, ImmutableList.of()), segments.get(i).getLastCompactionState());
        if (lockGranularity == LockGranularity.SEGMENT) {
            Assert.assertEquals(new NumberedOverwriteShardSpec(32768, 0, 2, (short) 1, (short) 1), segments.get(i).getShardSpec());
        } else {
            Assert.assertEquals(new NumberedShardSpec(0, 1), segments.get(i).getShardSpec());
        }
    }
}

Also used : LongSumAggregatorFactory(org.apache.druid.query.aggregation.LongSumAggregatorFactory) TaskStatus(org.apache.druid.indexer.TaskStatus) DataSegment(org.apache.druid.timeline.DataSegment) DataSchema(org.apache.druid.segment.indexing.DataSchema) IngestSegmentFirehoseFactory(org.apache.druid.indexing.firehose.IngestSegmentFirehoseFactory) UniformGranularitySpec(org.apache.druid.segment.indexing.granularity.UniformGranularitySpec) StringInputRowParser(org.apache.druid.data.input.impl.StringInputRowParser) ArrayList(java.util.ArrayList) List(java.util.List) ImmutableList(com.google.common.collect.ImmutableList) NumberedOverwriteShardSpec(org.apache.druid.timeline.partition.NumberedOverwriteShardSpec) Map(java.util.Map) HashMap(java.util.HashMap) HashBasedNumberedShardSpec(org.apache.druid.timeline.partition.HashBasedNumberedShardSpec) NumberedShardSpec(org.apache.druid.timeline.partition.NumberedShardSpec) Test(org.junit.Test)

Example 22 with StringInputRowParser

use of org.apache.druid.data.input.impl.StringInputRowParser in project druid by druid-io.

the class IndexGeneratorJobTest method constructFeed.

@Parameterized.Parameters(name = "useCombiner={0}, partitionType={1}, interval={2}, shardInfoForEachSegment={3}, " + "data={4}, inputFormatName={5}, inputRowParser={6}, maxRowsInMemory={7}, " + "maxBytesInMemory={8}, aggs={9}, datasourceName={10}, forceExtendableShardSpecs={11}")
public static Collection<Object[]> constructFeed() {
    final Object[][] baseConstructors = new Object[][] { { false, "single", "2014-10-22T00:00:00Z/P2D", new String[][][] { { { null, "c.example.com" }, { "c.example.com", "e.example.com" }, { "e.example.com", "g.example.com" }, { "g.example.com", "i.example.com" }, { "i.example.com", null } }, { { null, "c.example.com" }, { "c.example.com", "e.example.com" }, { "e.example.com", "g.example.com" }, { "g.example.com", "i.example.com" }, { "i.example.com", null } } }, ImmutableList.of("2014102200,a.example.com,100", "2014102200,b.exmaple.com,50", "2014102200,c.example.com,200", "2014102200,d.example.com,250", "2014102200,e.example.com,123", "2014102200,f.example.com,567", "2014102200,g.example.com,11", "2014102200,h.example.com,251", "2014102200,i.example.com,963", "2014102200,j.example.com,333", "2014102300,a.example.com,100", "2014102300,b.exmaple.com,50", "2014102300,c.example.com,200", "2014102300,d.example.com,250", "2014102300,e.example.com,123", "2014102300,f.example.com,567", "2014102300,g.example.com,11", "2014102300,h.example.com,251", "2014102300,i.example.com,963", "2014102300,j.example.com,333"), null, new StringInputRowParser(new CSVParseSpec(new TimestampSpec("timestamp", "yyyyMMddHH", null), new DimensionsSpec(DimensionsSpec.getDefaultSchemas(ImmutableList.of("host"))), null, ImmutableList.of("timestamp", "host", "visited_num"), false, 0), null), null, null, AGGS1, "website" }, { false, "hashed", "2014-10-22T00:00:00Z/P1D", new Integer[][][] { { { 0, 4 }, { 1, 4 }, { 2, 4 }, { 3, 4 } } }, ImmutableList.of("2014102200,a.example.com,100", "2014102201,b.exmaple.com,50", "2014102202,c.example.com,200", "2014102203,d.example.com,250", "2014102204,e.example.com,123", "2014102205,f.example.com,567", "2014102206,g.example.com,11", "2014102207,h.example.com,251", "2014102208,i.example.com,963", "2014102209,j.example.com,333", "2014102210,k.example.com,253", "2014102211,l.example.com,321", "2014102212,m.example.com,3125", "2014102213,n.example.com,234", "2014102214,o.example.com,325", "2014102215,p.example.com,3533", "2014102216,q.example.com,500", "2014102216,q.example.com,87"), null, new HadoopyStringInputRowParser(new CSVParseSpec(new TimestampSpec("timestamp", "yyyyMMddHH", null), new DimensionsSpec(DimensionsSpec.getDefaultSchemas(ImmutableList.of("host"))), null, ImmutableList.of("timestamp", "host", "visited_num"), false, 0)), null, null, AGGS1, "website" }, { true, "hashed", "2014-10-22T00:00:00Z/P1D", new Integer[][][] { { { 0, 4 }, { 1, 4 }, { 2, 4 }, { 3, 4 } } }, ImmutableList.of("2014102200,a.example.com,100", "2014102201,b.exmaple.com,50", "2014102202,c.example.com,200", "2014102203,d.example.com,250", "2014102204,e.example.com,123", "2014102205,f.example.com,567", "2014102206,g.example.com,11", "2014102207,h.example.com,251", "2014102208,i.example.com,963", "2014102209,j.example.com,333", "2014102210,k.example.com,253", "2014102211,l.example.com,321", "2014102212,m.example.com,3125", "2014102213,n.example.com,234", "2014102214,o.example.com,325", "2014102215,p.example.com,3533", "2014102216,q.example.com,500", "2014102216,q.example.com,87"), null, new StringInputRowParser(new CSVParseSpec(new TimestampSpec("timestamp", "yyyyMMddHH", null), new DimensionsSpec(DimensionsSpec.getDefaultSchemas(ImmutableList.of("host"))), null, ImmutableList.of("timestamp", "host", "visited_num"), false, 0), null), null, null, AGGS1, "website" }, { false, "single", "2014-10-22T00:00:00Z/P2D", new String[][][] { { { null, "c.example.com" }, { "c.example.com", "e.example.com" }, { "e.example.com", "g.example.com" }, { "g.example.com", "i.example.com" }, { "i.example.com", null } }, { { null, "c.example.com" }, { "c.example.com", "e.example.com" }, { "e.example.com", "g.example.com" }, { "g.example.com", "i.example.com" }, { "i.example.com", null } } }, ImmutableList.of("2014102200,a.example.com,100", "2014102200,b.exmaple.com,50", "2014102200,c.example.com,200", "2014102200,d.example.com,250", "2014102200,e.example.com,123", "2014102200,f.example.com,567", "2014102200,g.example.com,11", "2014102200,h.example.com,251", "2014102200,i.example.com,963", "2014102200,j.example.com,333", "2014102300,a.example.com,100", "2014102300,b.exmaple.com,50", "2014102300,c.example.com,200", "2014102300,d.example.com,250", "2014102300,e.example.com,123", "2014102300,f.example.com,567", "2014102300,g.example.com,11", "2014102300,h.example.com,251", "2014102300,i.example.com,963", "2014102300,j.example.com,333"), SequenceFileInputFormat.class.getName(), new HadoopyStringInputRowParser(new CSVParseSpec(new TimestampSpec("timestamp", "yyyyMMddHH", null), new DimensionsSpec(DimensionsSpec.getDefaultSchemas(ImmutableList.of("host"))), null, ImmutableList.of("timestamp", "host", "visited_num"), false, 0)), null, null, AGGS1, "website" }, { // Tests that new indexes inherit the dimension order from previous index
    false, "hashed", "2014-10-22T00:00:00Z/P1D", new Integer[][][] { { // use a single partition, dimension order inheritance is not supported across partitions
    { 0, 1 } } }, ImmutableList.of("{\"ts\":\"2014102200\", \"X\":\"x.example.com\"}", "{\"ts\":\"2014102201\", \"Y\":\"y.example.com\"}", "{\"ts\":\"2014102202\", \"M\":\"m.example.com\"}", "{\"ts\":\"2014102203\", \"Q\":\"q.example.com\"}", "{\"ts\":\"2014102204\", \"B\":\"b.example.com\"}", "{\"ts\":\"2014102205\", \"F\":\"f.example.com\"}"), null, new StringInputRowParser(new JSONParseSpec(new TimestampSpec("ts", "yyyyMMddHH", null), DimensionsSpec.EMPTY, null, null, null), null), // force 1 row max per index for easier testing
    1, null, AGGS2, "inherit_dims" }, { // Tests that pre-specified dim order is maintained across indexes.
    false, "hashed", "2014-10-22T00:00:00Z/P1D", new Integer[][][] { { { 0, 1 } } }, ImmutableList.of("{\"ts\":\"2014102200\", \"X\":\"x.example.com\"}", "{\"ts\":\"2014102201\", \"Y\":\"y.example.com\"}", "{\"ts\":\"2014102202\", \"M\":\"m.example.com\"}", "{\"ts\":\"2014102203\", \"Q\":\"q.example.com\"}", "{\"ts\":\"2014102204\", \"B\":\"b.example.com\"}", "{\"ts\":\"2014102205\", \"F\":\"f.example.com\"}"), null, new StringInputRowParser(new JSONParseSpec(new TimestampSpec("ts", "yyyyMMddHH", null), new DimensionsSpec(DimensionsSpec.getDefaultSchemas(ImmutableList.of("B", "F", "M", "Q", "X", "Y"))), null, null, null), null), // force 1 row max per index for easier testing
    1, null, AGGS2, "inherit_dims2" } };
    // Run each baseConstructor with/without forceExtendableShardSpecs.
    final List<Object[]> constructors = new ArrayList<>();
    for (Object[] baseConstructor : baseConstructors) {
        for (int forceExtendableShardSpecs = 0; forceExtendableShardSpecs < 2; forceExtendableShardSpecs++) {
            final Object[] fullConstructor = new Object[baseConstructor.length + 1];
            System.arraycopy(baseConstructor, 0, fullConstructor, 0, baseConstructor.length);
            fullConstructor[baseConstructor.length] = forceExtendableShardSpecs == 0;
            constructors.add(fullConstructor);
        }
    }
    return constructors;
}

Also used : SequenceFileInputFormat(org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat) ArrayList(java.util.ArrayList) CSVParseSpec(org.apache.druid.data.input.impl.CSVParseSpec) StringInputRowParser(org.apache.druid.data.input.impl.StringInputRowParser) TimestampSpec(org.apache.druid.data.input.impl.TimestampSpec) DimensionsSpec(org.apache.druid.data.input.impl.DimensionsSpec) JSONParseSpec(org.apache.druid.data.input.impl.JSONParseSpec)

Example 23 with StringInputRowParser

use of org.apache.druid.data.input.impl.StringInputRowParser in project druid by druid-io.

the class JobHelperTest method setup.

@Before
public void setup() throws Exception {
    tmpDir = temporaryFolder.newFile();
    dataFile = temporaryFolder.newFile();
    config = new HadoopDruidIndexerConfig(new HadoopIngestionSpec(new DataSchema("website", HadoopDruidIndexerConfig.JSON_MAPPER.convertValue(new StringInputRowParser(new CSVParseSpec(new TimestampSpec("timestamp", "yyyyMMddHH", null), new DimensionsSpec(DimensionsSpec.getDefaultSchemas(ImmutableList.of("host"))), null, ImmutableList.of("timestamp", "host", "visited_num"), false, 0), null), Map.class), new AggregatorFactory[] { new LongSumAggregatorFactory("visited_num", "visited_num") }, new UniformGranularitySpec(Granularities.DAY, Granularities.NONE, ImmutableList.of(this.interval)), null, HadoopDruidIndexerConfig.JSON_MAPPER), new HadoopIOConfig(ImmutableMap.of("paths", dataFile.getCanonicalPath(), "type", "static"), null, tmpDir.getCanonicalPath()), new HadoopTuningConfig(tmpDir.getCanonicalPath(), null, null, null, null, null, null, null, null, false, false, false, false, // Map of job properties
    ImmutableMap.of("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem", "fs.s3.awsAccessKeyId", "THISISMYACCESSKEY"), false, false, null, null, false, false, null, null, null, null, null)));
    HadoopDruidIndexerConfig.PROPERTIES.setProperty(VALID_DRUID_PROP, "true");
    HadoopDruidIndexerConfig.PROPERTIES.setProperty(VALID_HADOOP_PREFIX + VALID_HADOOP_PROP, "true");
    HadoopDruidIndexerConfig.PROPERTIES.setProperty(INVALID_PROP, "true");
}

Also used : DataSchema(org.apache.druid.segment.indexing.DataSchema) UniformGranularitySpec(org.apache.druid.segment.indexing.granularity.UniformGranularitySpec) CSVParseSpec(org.apache.druid.data.input.impl.CSVParseSpec) StringInputRowParser(org.apache.druid.data.input.impl.StringInputRowParser) TimestampSpec(org.apache.druid.data.input.impl.TimestampSpec) LongSumAggregatorFactory(org.apache.druid.query.aggregation.LongSumAggregatorFactory) DimensionsSpec(org.apache.druid.data.input.impl.DimensionsSpec) LongSumAggregatorFactory(org.apache.druid.query.aggregation.LongSumAggregatorFactory) CountAggregatorFactory(org.apache.druid.query.aggregation.CountAggregatorFactory) AggregatorFactory(org.apache.druid.query.aggregation.AggregatorFactory) Before(org.junit.Before)

Example 24 with StringInputRowParser

use of org.apache.druid.data.input.impl.StringInputRowParser in project druid by druid-io.

the class StreamChunkParserTest method testBothParserAndInputFormatParseProperlyUsingInputFormat.

@Test
public void testBothParserAndInputFormatParseProperlyUsingInputFormat() throws IOException {
    final InputRowParser<ByteBuffer> parser = new StringInputRowParser(new JSONParseSpec(TIMESTAMP_SPEC, DimensionsSpec.EMPTY, JSONPathSpec.DEFAULT, Collections.emptyMap(), false), StringUtils.UTF8_STRING);
    final TrackingJsonInputFormat inputFormat = new TrackingJsonInputFormat(JSONPathSpec.DEFAULT, Collections.emptyMap());
    final StreamChunkParser<ByteEntity> chunkParser = new StreamChunkParser<>(parser, inputFormat, new InputRowSchema(TIMESTAMP_SPEC, DimensionsSpec.EMPTY, ColumnsFilter.all()), TransformSpec.NONE, temporaryFolder.newFolder(), row -> true, rowIngestionMeters, parseExceptionHandler);
    parseAndAssertResult(chunkParser);
    Assert.assertTrue(inputFormat.props.used);
}

Also used : ByteEntity(org.apache.druid.data.input.impl.ByteEntity) StringInputRowParser(org.apache.druid.data.input.impl.StringInputRowParser) InputRowSchema(org.apache.druid.data.input.InputRowSchema) ByteBuffer(java.nio.ByteBuffer) JSONParseSpec(org.apache.druid.data.input.impl.JSONParseSpec) Test(org.junit.Test)

Example 25 with StringInputRowParser

use of org.apache.druid.data.input.impl.StringInputRowParser in project druid by druid-io.

the class RealtimePlumberSchoolTest method setUp.

@Before
public void setUp() throws Exception {
    tmpDir = FileUtils.createTempDir();
    ObjectMapper jsonMapper = new DefaultObjectMapper();
    schema = new DataSchema("test", jsonMapper.convertValue(new StringInputRowParser(new JSONParseSpec(new TimestampSpec("timestamp", "auto", null), DimensionsSpec.EMPTY, null, null, null), null), Map.class), new AggregatorFactory[] { new CountAggregatorFactory("rows") }, new UniformGranularitySpec(Granularities.HOUR, Granularities.NONE, null), null, jsonMapper);
    schema2 = new DataSchema("test", jsonMapper.convertValue(new StringInputRowParser(new JSONParseSpec(new TimestampSpec("timestamp", "auto", null), DimensionsSpec.EMPTY, null, null, null), null), Map.class), new AggregatorFactory[] { new CountAggregatorFactory("rows") }, new UniformGranularitySpec(Granularities.YEAR, Granularities.NONE, null), null, jsonMapper);
    announcer = EasyMock.createMock(DataSegmentAnnouncer.class);
    announcer.announceSegment(EasyMock.anyObject());
    EasyMock.expectLastCall().anyTimes();
    segmentPublisher = EasyMock.createNiceMock(SegmentPublisher.class);
    dataSegmentPusher = EasyMock.createNiceMock(DataSegmentPusher.class);
    handoffNotifierFactory = EasyMock.createNiceMock(SegmentHandoffNotifierFactory.class);
    handoffNotifier = EasyMock.createNiceMock(SegmentHandoffNotifier.class);
    EasyMock.expect(handoffNotifierFactory.createSegmentHandoffNotifier(EasyMock.anyString())).andReturn(handoffNotifier).anyTimes();
    EasyMock.expect(handoffNotifier.registerSegmentHandoffCallback(EasyMock.anyObject(), EasyMock.anyObject(), EasyMock.anyObject())).andReturn(true).anyTimes();
    emitter = EasyMock.createMock(ServiceEmitter.class);
    EasyMock.replay(announcer, segmentPublisher, dataSegmentPusher, handoffNotifierFactory, handoffNotifier, emitter);
    tuningConfig = new RealtimeTuningConfig(null, 1, null, null, null, null, null, new IntervalStartVersioningPolicy(), rejectionPolicy, null, null, null, null, 0, 0, false, null, null, null, null);
    realtimePlumberSchool = new RealtimePlumberSchool(emitter, new DefaultQueryRunnerFactoryConglomerate(new HashMap<>()), dataSegmentPusher, announcer, segmentPublisher, handoffNotifierFactory, DirectQueryProcessingPool.INSTANCE, NoopJoinableFactory.INSTANCE, TestHelper.getTestIndexMergerV9(segmentWriteOutMediumFactory), TestHelper.getTestIndexIO(), MapCache.create(0), FireDepartmentTest.NO_CACHE_CONFIG, new CachePopulatorStats(), TestHelper.makeJsonMapper());
    metrics = new FireDepartmentMetrics();
    plumber = (RealtimePlumber) realtimePlumberSchool.findPlumber(schema, tuningConfig, metrics);
}

Also used : ServiceEmitter(org.apache.druid.java.util.emitter.service.ServiceEmitter) DataSegmentPusher(org.apache.druid.segment.loading.DataSegmentPusher) DataSegmentAnnouncer(org.apache.druid.server.coordination.DataSegmentAnnouncer) SegmentHandoffNotifier(org.apache.druid.segment.handoff.SegmentHandoffNotifier) DefaultQueryRunnerFactoryConglomerate(org.apache.druid.query.DefaultQueryRunnerFactoryConglomerate) AggregatorFactory(org.apache.druid.query.aggregation.AggregatorFactory) CountAggregatorFactory(org.apache.druid.query.aggregation.CountAggregatorFactory) RealtimeTuningConfig(org.apache.druid.segment.indexing.RealtimeTuningConfig) SegmentHandoffNotifierFactory(org.apache.druid.segment.handoff.SegmentHandoffNotifierFactory) DataSchema(org.apache.druid.segment.indexing.DataSchema) UniformGranularitySpec(org.apache.druid.segment.indexing.granularity.UniformGranularitySpec) SegmentPublisher(org.apache.druid.segment.realtime.SegmentPublisher) FireDepartmentMetrics(org.apache.druid.segment.realtime.FireDepartmentMetrics) CountAggregatorFactory(org.apache.druid.query.aggregation.CountAggregatorFactory) CachePopulatorStats(org.apache.druid.client.cache.CachePopulatorStats) StringInputRowParser(org.apache.druid.data.input.impl.StringInputRowParser) TimestampSpec(org.apache.druid.data.input.impl.TimestampSpec) DefaultObjectMapper(org.apache.druid.jackson.DefaultObjectMapper) JSONParseSpec(org.apache.druid.data.input.impl.JSONParseSpec) ObjectMapper(com.fasterxml.jackson.databind.ObjectMapper) DefaultObjectMapper(org.apache.druid.jackson.DefaultObjectMapper) Before(org.junit.Before)

Aggregations

StringInputRowParser (org.apache.druid.data.input.impl.StringInputRowParser)30 TimestampSpec (org.apache.druid.data.input.impl.TimestampSpec)22 Test (org.junit.Test)16 DimensionsSpec (org.apache.druid.data.input.impl.DimensionsSpec)15 JSONParseSpec (org.apache.druid.data.input.impl.JSONParseSpec)15 InitializedNullHandlingTest (org.apache.druid.testing.InitializedNullHandlingTest)11 IdUtilsTest (org.apache.druid.common.utils.IdUtilsTest)9 ArbitraryGranularitySpec (org.apache.druid.segment.indexing.granularity.ArbitraryGranularitySpec)9 DoubleSumAggregatorFactory (org.apache.druid.query.aggregation.DoubleSumAggregatorFactory)8 DataSchema (org.apache.druid.segment.indexing.DataSchema)8 UniformGranularitySpec (org.apache.druid.segment.indexing.granularity.UniformGranularitySpec)8 ArrayList (java.util.ArrayList)7 CSVParseSpec (org.apache.druid.data.input.impl.CSVParseSpec)7 LongSumAggregatorFactory (org.apache.druid.query.aggregation.LongSumAggregatorFactory)7 Map (java.util.Map)6 AggregatorFactory (org.apache.druid.query.aggregation.AggregatorFactory)5 CountAggregatorFactory (org.apache.druid.query.aggregation.CountAggregatorFactory)5 Before (org.junit.Before)5 ObjectMapper (com.fasterxml.jackson.databind.ObjectMapper)4 File (java.io.File)4