Search in sources :

Example 1 with InputRowSchema

use of org.apache.druid.data.input.InputRowSchema in project druid by druid-io.

the class ProtobufReader method parseInputRows.

@Override
protected List<InputRow> parseInputRows(DynamicMessage intermediateRow) throws ParseException, JsonProcessingException {
    Map<String, Object> record;
    if (flattenSpec == null || JSONPathSpec.DEFAULT.equals(flattenSpec)) {
        try {
            record = CollectionUtils.mapKeys(intermediateRow.getAllFields(), k -> k.getJsonName());
        } catch (Exception ex) {
            throw new ParseException(null, ex, "Protobuf message could not be parsed");
        }
    } else {
        try {
            String json = JsonFormat.printer().print(intermediateRow);
            record = recordFlattener.flatten(OBJECT_MAPPER.readValue(json, JsonNode.class));
        } catch (InvalidProtocolBufferException e) {
            throw new ParseException(null, e, "Protobuf message could not be parsed");
        }
    }
    return Collections.singletonList(MapInputRowParser.parse(inputRowSchema, record));
}
Also used : DynamicMessage(com.google.protobuf.DynamicMessage) ParseException(org.apache.druid.java.util.common.parsers.ParseException) ObjectFlattener(org.apache.druid.java.util.common.parsers.ObjectFlattener) CollectionUtils(org.apache.druid.utils.CollectionUtils) InputRowSchema(org.apache.druid.data.input.InputRowSchema) Iterators(com.google.common.collect.Iterators) ByteBuffer(java.nio.ByteBuffer) JSONPathSpec(org.apache.druid.java.util.common.parsers.JSONPathSpec) Map(java.util.Map) JsonNode(com.fasterxml.jackson.databind.JsonNode) CloseableIterator(org.apache.druid.java.util.common.parsers.CloseableIterator) InvalidProtocolBufferException(com.google.protobuf.InvalidProtocolBufferException) MapInputRowParser(org.apache.druid.data.input.impl.MapInputRowParser) JSONFlattenerMaker(org.apache.druid.java.util.common.parsers.JSONFlattenerMaker) ObjectMapper(com.fasterxml.jackson.databind.ObjectMapper) JsonProcessingException(com.fasterxml.jackson.core.JsonProcessingException) IOException(java.io.IOException) IOUtils(org.apache.commons.io.IOUtils) InputRow(org.apache.druid.data.input.InputRow) List(java.util.List) IntermediateRowParsingReader(org.apache.druid.data.input.IntermediateRowParsingReader) CloseableIterators(org.apache.druid.java.util.common.CloseableIterators) JsonFormat(com.google.protobuf.util.JsonFormat) ObjectFlatteners(org.apache.druid.java.util.common.parsers.ObjectFlatteners) InputEntity(org.apache.druid.data.input.InputEntity) Collections(java.util.Collections) InvalidProtocolBufferException(com.google.protobuf.InvalidProtocolBufferException) ParseException(org.apache.druid.java.util.common.parsers.ParseException) ParseException(org.apache.druid.java.util.common.parsers.ParseException) InvalidProtocolBufferException(com.google.protobuf.InvalidProtocolBufferException) JsonProcessingException(com.fasterxml.jackson.core.JsonProcessingException) IOException(java.io.IOException)

Example 2 with InputRowSchema

use of org.apache.druid.data.input.InputRowSchema in project druid by druid-io.

the class ProtobufInputFormatTest method testParseNestedData.

@Test
public void testParseNestedData() throws Exception {
    // configure parser with desc file
    ProtobufInputFormat protobufInputFormat = new ProtobufInputFormat(flattenSpec, decoder);
    // create binary of proto test event
    DateTime dateTime = new DateTime(2012, 7, 12, 9, 30, ISOChronology.getInstanceUTC());
    ProtoTestEventWrapper.ProtoTestEvent event = ProtobufInputRowParserTest.buildNestedData(dateTime);
    final ByteEntity entity = new ByteEntity(ProtobufInputRowParserTest.toByteBuffer(event));
    InputRow row = protobufInputFormat.createReader(new InputRowSchema(timestampSpec, dimensionsSpec, null), entity, null).read().next();
    ProtobufInputRowParserTest.verifyNestedData(row, dateTime);
}
Also used : ByteEntity(org.apache.druid.data.input.impl.ByteEntity) InputRow(org.apache.druid.data.input.InputRow) InputRowSchema(org.apache.druid.data.input.InputRowSchema) DateTime(org.joda.time.DateTime) Test(org.junit.Test)

Example 3 with InputRowSchema

use of org.apache.druid.data.input.InputRowSchema in project druid by druid-io.

the class ProtobufInputFormatTest method testParseFlatData.

@Test
public void testParseFlatData() throws Exception {
    // configure parser with desc file
    ProtobufInputFormat protobufInputFormat = new ProtobufInputFormat(null, decoder);
    // create binary of proto test event
    DateTime dateTime = new DateTime(2012, 7, 12, 9, 30, ISOChronology.getInstanceUTC());
    ProtoTestEventWrapper.ProtoTestEvent event = ProtobufInputRowParserTest.buildFlatData(dateTime);
    final ByteEntity entity = new ByteEntity(ProtobufInputRowParserTest.toByteBuffer(event));
    InputRow row = protobufInputFormat.createReader(new InputRowSchema(timestampSpec, dimensionsSpec, null), entity, null).read().next();
    ProtobufInputRowParserTest.verifyFlatData(row, dateTime);
}
Also used : ByteEntity(org.apache.druid.data.input.impl.ByteEntity) InputRow(org.apache.druid.data.input.InputRow) InputRowSchema(org.apache.druid.data.input.InputRowSchema) DateTime(org.joda.time.DateTime) Test(org.junit.Test)

Example 4 with InputRowSchema

use of org.apache.druid.data.input.InputRowSchema in project druid by druid-io.

the class ParquetReaderResourceLeakTest method testFetchOnReadCleanupAfterExhaustingIterator.

@Test
public void testFetchOnReadCleanupAfterExhaustingIterator() throws IOException {
    InputRowSchema schema = new InputRowSchema(new TimestampSpec("timestamp", "iso", null), new DimensionsSpec(DimensionsSpec.getDefaultSchemas(ImmutableList.of("page", "language", "user", "unpatrolled"))), ColumnsFilter.all());
    FetchingFileEntity entity = new FetchingFileEntity(new File("example/wiki/wiki.parquet"));
    ParquetInputFormat parquet = new ParquetInputFormat(JSONPathSpec.DEFAULT, false, new Configuration());
    File tempDir = temporaryFolder.newFolder();
    InputEntityReader reader = parquet.createReader(schema, entity, tempDir);
    Assert.assertEquals(0, Objects.requireNonNull(tempDir.list()).length);
    try (CloseableIterator<InputRow> iterator = reader.read()) {
        Assert.assertTrue(Objects.requireNonNull(tempDir.list()).length > 0);
        while (iterator.hasNext()) {
            iterator.next();
        }
    }
    Assert.assertEquals(0, Objects.requireNonNull(tempDir.list()).length);
}
Also used : Configuration(org.apache.hadoop.conf.Configuration) TimestampSpec(org.apache.druid.data.input.impl.TimestampSpec) InputRow(org.apache.druid.data.input.InputRow) DimensionsSpec(org.apache.druid.data.input.impl.DimensionsSpec) InputRowSchema(org.apache.druid.data.input.InputRowSchema) InputEntityReader(org.apache.druid.data.input.InputEntityReader) File(java.io.File) Test(org.junit.Test)

Example 5 with InputRowSchema

use of org.apache.druid.data.input.InputRowSchema in project druid by druid-io.

the class ProtobufReaderTest method setUp.

@Before
public void setUp() {
    TimestampSpec timestampSpec = new TimestampSpec("timestamp", "iso", null);
    DimensionsSpec dimensionsSpec = new DimensionsSpec(Lists.newArrayList(new StringDimensionSchema("event"), new StringDimensionSchema("id"), new StringDimensionSchema("someOtherId"), new StringDimensionSchema("isValid")));
    flattenSpec = new JSONPathSpec(true, Lists.newArrayList(new JSONPathFieldSpec(JSONPathFieldType.ROOT, "eventType", "eventType"), new JSONPathFieldSpec(JSONPathFieldType.PATH, "foobar", "$.foo.bar"), new JSONPathFieldSpec(JSONPathFieldType.PATH, "bar0", "$.bar[0].bar")));
    inputRowSchema = new InputRowSchema(timestampSpec, dimensionsSpec, null);
    inputRowSchemaWithComplexTimestamp = new InputRowSchema(new TimestampSpec("otherTimestamp", "iso", null), dimensionsSpec, null);
    decoder = new FileBasedProtobufBytesDecoder("prototest.desc", "ProtoTestEvent");
}
Also used : TimestampSpec(org.apache.druid.data.input.impl.TimestampSpec) DimensionsSpec(org.apache.druid.data.input.impl.DimensionsSpec) JSONPathSpec(org.apache.druid.java.util.common.parsers.JSONPathSpec) JSONPathFieldSpec(org.apache.druid.java.util.common.parsers.JSONPathFieldSpec) InputRowSchema(org.apache.druid.data.input.InputRowSchema) StringDimensionSchema(org.apache.druid.data.input.impl.StringDimensionSchema) Before(org.junit.Before)

Aggregations

InputRowSchema (org.apache.druid.data.input.InputRowSchema)63 Test (org.junit.Test)55 InputRow (org.apache.druid.data.input.InputRow)52 InputEntityReader (org.apache.druid.data.input.InputEntityReader)39 TimestampSpec (org.apache.druid.data.input.impl.TimestampSpec)37 DimensionsSpec (org.apache.druid.data.input.impl.DimensionsSpec)36 JSONPathSpec (org.apache.druid.java.util.common.parsers.JSONPathSpec)29 JSONPathFieldSpec (org.apache.druid.java.util.common.parsers.JSONPathFieldSpec)26 InputRowListPlusRawValues (org.apache.druid.data.input.InputRowListPlusRawValues)24 InputSourceReader (org.apache.druid.data.input.InputSourceReader)10 ByteEntity (org.apache.druid.data.input.impl.ByteEntity)9 CsvInputFormat (org.apache.druid.data.input.impl.CsvInputFormat)9 InitializedNullHandlingTest (org.apache.druid.testing.InitializedNullHandlingTest)9 File (java.io.File)7 KafkaRecordEntity (org.apache.druid.data.input.kafka.KafkaRecordEntity)5 ArrayList (java.util.ArrayList)4 Collections (java.util.Collections)4 List (java.util.List)4 Map (java.util.Map)4 Nullable (javax.annotation.Nullable)4