Search in sources :

Example 11 with GranularitySpec

use of org.apache.druid.segment.indexing.granularity.GranularitySpec in project druid by druid-io.

the class InputSourceSamplerTest method testMultipleJsonStringInOneBlock.

/**
 * This case tests sampling for multiple json lines in one text block
 * Currently only RecordSupplierInputSource supports this kind of input, see https://github.com/apache/druid/pull/10383 for more information
 *
 * This test combines illegal json block and legal json block together to verify:
 * 1. all lines in the illegal json block should not be parsed
 * 2. the illegal json block should not affect the processing of the 2nd record
 * 3. all lines in legal json block should be parsed successfully
 */
@Test
public void testMultipleJsonStringInOneBlock() throws IOException {
    if (!ParserType.STR_JSON.equals(parserType) || !useInputFormatApi) {
        return;
    }
    final TimestampSpec timestampSpec = new TimestampSpec("t", null, null);
    final DimensionsSpec dimensionsSpec = new DimensionsSpec(ImmutableList.of(StringDimensionSchema.create("dim1PlusBar")));
    final TransformSpec transformSpec = new TransformSpec(null, ImmutableList.of(new ExpressionTransform("dim1PlusBar", "concat(dim1 + 'bar')", TestExprMacroTable.INSTANCE)));
    final AggregatorFactory[] aggregatorFactories = { new LongSumAggregatorFactory("met1", "met1") };
    final GranularitySpec granularitySpec = new UniformGranularitySpec(Granularities.DAY, Granularities.HOUR, true, null);
    final DataSchema dataSchema = createDataSchema(timestampSpec, dimensionsSpec, aggregatorFactories, granularitySpec, transformSpec);
    List<String> jsonBlockList = ImmutableList.of(// include the line which can't be parsed into JSON object to form a illegal json block
    String.join("", STR_JSON_ROWS), // exclude the last line to form a legal json block
    STR_JSON_ROWS.stream().limit(STR_JSON_ROWS.size() - 1).collect(Collectors.joining()));
    SamplerResponse response = inputSourceSampler.sample(new RecordSupplierInputSource("topicName", new TestRecordSupplier(jsonBlockList), true), createInputFormat(), dataSchema, new SamplerConfig(200, 3000));
    // 
    // the 1st json block contains STR_JSON_ROWS.size() lines, and 2nd json block contains STR_JSON_ROWS.size()-1 lines
    // together there should STR_JSON_ROWS.size() * 2 - 1 lines
    // 
    int illegalRows = STR_JSON_ROWS.size();
    int legalRows = STR_JSON_ROWS.size() - 1;
    Assert.assertEquals(illegalRows + legalRows, response.getNumRowsRead());
    Assert.assertEquals(legalRows, response.getNumRowsIndexed());
    Assert.assertEquals(illegalRows + 2, response.getData().size());
    List<SamplerResponseRow> data = response.getData();
    List<Map<String, Object>> rawColumnList = this.getRawColumns();
    int index = 0;
    // 
    // first n rows are related to the first json block which fails to parse
    // 
    String parseExceptionMessage;
    if (useInputFormatApi) {
        parseExceptionMessage = "Timestamp[bad_timestamp] is unparseable! Event: {t=bad_timestamp, dim1=foo, met1=6}";
    } else {
        parseExceptionMessage = "Timestamp[bad_timestamp] is unparseable! Event: {t=bad_timestamp, dim1=foo, met1=6}";
    }
    for (; index < illegalRows; index++) {
        assertEqualsSamplerResponseRow(new SamplerResponseRow(rawColumnList.get(index), null, true, parseExceptionMessage), data.get(index));
    }
    // 
    // following are parsed rows for legal json block
    // 
    assertEqualsSamplerResponseRow(new SamplerResponseRow(rawColumnList.get(0), new SamplerTestUtils.MapAllowingNullValuesBuilder<String, Object>().put("__time", 1555934400000L).put("dim1PlusBar", "foobar").put("met1", 11L).build(), null, null), data.get(index++));
    assertEqualsSamplerResponseRow(new SamplerResponseRow(rawColumnList.get(3), new SamplerTestUtils.MapAllowingNullValuesBuilder<String, Object>().put("__time", 1555934400000L).put("dim1PlusBar", "foo2bar").put("met1", 4L).build(), null, null), data.get(index));
}
Also used : SamplerResponse(org.apache.druid.client.indexing.SamplerResponse) LongSumAggregatorFactory(org.apache.druid.query.aggregation.LongSumAggregatorFactory) LongSumAggregatorFactory(org.apache.druid.query.aggregation.LongSumAggregatorFactory) AggregatorFactory(org.apache.druid.query.aggregation.AggregatorFactory) TransformSpec(org.apache.druid.segment.transform.TransformSpec) DataSchema(org.apache.druid.segment.indexing.DataSchema) UniformGranularitySpec(org.apache.druid.segment.indexing.granularity.UniformGranularitySpec) UniformGranularitySpec(org.apache.druid.segment.indexing.granularity.UniformGranularitySpec) GranularitySpec(org.apache.druid.segment.indexing.granularity.GranularitySpec) RecordSupplierInputSource(org.apache.druid.indexing.seekablestream.RecordSupplierInputSource) TimestampSpec(org.apache.druid.data.input.impl.TimestampSpec) DimensionsSpec(org.apache.druid.data.input.impl.DimensionsSpec) SamplerResponseRow(org.apache.druid.client.indexing.SamplerResponse.SamplerResponseRow) ExpressionTransform(org.apache.druid.segment.transform.ExpressionTransform) Map(java.util.Map) ImmutableMap(com.google.common.collect.ImmutableMap) InitializedNullHandlingTest(org.apache.druid.testing.InitializedNullHandlingTest) Test(org.junit.Test)

Example 12 with GranularitySpec

use of org.apache.druid.segment.indexing.granularity.GranularitySpec in project druid by druid-io.

the class InputSourceSamplerTest method testWithTransformsAutoDimensions.

@Test
public void testWithTransformsAutoDimensions() throws IOException {
    final TimestampSpec timestampSpec = new TimestampSpec("t", null, null);
    final DimensionsSpec dimensionsSpec = new DimensionsSpec(null);
    final TransformSpec transformSpec = new TransformSpec(null, ImmutableList.of(new ExpressionTransform("dim1PlusBar", "concat(dim1, 'bar')", TestExprMacroTable.INSTANCE)));
    final AggregatorFactory[] aggregatorFactories = { new LongSumAggregatorFactory("met1", "met1") };
    final GranularitySpec granularitySpec = new UniformGranularitySpec(Granularities.DAY, Granularities.HOUR, true, null);
    final DataSchema dataSchema = createDataSchema(timestampSpec, dimensionsSpec, aggregatorFactories, granularitySpec, transformSpec);
    final InputSource inputSource = createInputSource(getTestRows(), dataSchema);
    final InputFormat inputFormat = createInputFormat();
    SamplerResponse response = inputSourceSampler.sample(inputSource, inputFormat, dataSchema, null);
    Assert.assertEquals(6, response.getNumRowsRead());
    Assert.assertEquals(5, response.getNumRowsIndexed());
    Assert.assertEquals(4, response.getData().size());
    List<SamplerResponseRow> data = response.getData();
    assertEqualsSamplerResponseRow(new SamplerResponseRow(getRawColumns().get(0), new SamplerTestUtils.MapAllowingNullValuesBuilder<String, Object>().put("__time", 1555934400000L).put("dim1", "foo").put("dim2", null).put("met1", 6L).build(), null, null), data.get(0));
    assertEqualsSamplerResponseRow(new SamplerResponseRow(getRawColumns().get(3), new SamplerTestUtils.MapAllowingNullValuesBuilder<String, Object>().put("__time", 1555934400000L).put("dim1", "foo2").put("dim2", null).put("met1", 4L).build(), null, null), data.get(1));
    assertEqualsSamplerResponseRow(new SamplerResponseRow(getRawColumns().get(4), new SamplerTestUtils.MapAllowingNullValuesBuilder<String, Object>().put("__time", 1555934400000L).put("dim1", "foo").put("dim2", "bar").put("met1", 5L).build(), null, null), data.get(2));
    assertEqualsSamplerResponseRow(new SamplerResponseRow(getRawColumns().get(5), null, true, getUnparseableTimestampString()), data.get(3));
}
Also used : RecordSupplierInputSource(org.apache.druid.indexing.seekablestream.RecordSupplierInputSource) InlineInputSource(org.apache.druid.data.input.impl.InlineInputSource) InputSource(org.apache.druid.data.input.InputSource) SamplerResponse(org.apache.druid.client.indexing.SamplerResponse) LongSumAggregatorFactory(org.apache.druid.query.aggregation.LongSumAggregatorFactory) LongSumAggregatorFactory(org.apache.druid.query.aggregation.LongSumAggregatorFactory) AggregatorFactory(org.apache.druid.query.aggregation.AggregatorFactory) TransformSpec(org.apache.druid.segment.transform.TransformSpec) DataSchema(org.apache.druid.segment.indexing.DataSchema) UniformGranularitySpec(org.apache.druid.segment.indexing.granularity.UniformGranularitySpec) UniformGranularitySpec(org.apache.druid.segment.indexing.granularity.UniformGranularitySpec) GranularitySpec(org.apache.druid.segment.indexing.granularity.GranularitySpec) JsonInputFormat(org.apache.druid.data.input.impl.JsonInputFormat) InputFormat(org.apache.druid.data.input.InputFormat) CsvInputFormat(org.apache.druid.data.input.impl.CsvInputFormat) TimestampSpec(org.apache.druid.data.input.impl.TimestampSpec) DimensionsSpec(org.apache.druid.data.input.impl.DimensionsSpec) SamplerResponseRow(org.apache.druid.client.indexing.SamplerResponse.SamplerResponseRow) ExpressionTransform(org.apache.druid.segment.transform.ExpressionTransform) InitializedNullHandlingTest(org.apache.druid.testing.InitializedNullHandlingTest) Test(org.junit.Test)

Example 13 with GranularitySpec

use of org.apache.druid.segment.indexing.granularity.GranularitySpec in project druid by druid-io.

the class InputSourceSamplerTest method testWithNoRollup.

@Test
public void testWithNoRollup() throws IOException {
    final TimestampSpec timestampSpec = new TimestampSpec("t", null, null);
    final DimensionsSpec dimensionsSpec = new DimensionsSpec(null);
    final AggregatorFactory[] aggregatorFactories = { new LongSumAggregatorFactory("met1", "met1") };
    final GranularitySpec granularitySpec = new UniformGranularitySpec(Granularities.DAY, Granularities.HOUR, false, null);
    final DataSchema dataSchema = createDataSchema(timestampSpec, dimensionsSpec, aggregatorFactories, granularitySpec, null);
    final InputSource inputSource = createInputSource(getTestRows(), dataSchema);
    final InputFormat inputFormat = createInputFormat();
    SamplerResponse response = inputSourceSampler.sample(inputSource, inputFormat, dataSchema, null);
    Assert.assertEquals(6, response.getNumRowsRead());
    Assert.assertEquals(5, response.getNumRowsIndexed());
    Assert.assertEquals(6, response.getData().size());
    List<SamplerResponseRow> data = response.getData();
    assertEqualsSamplerResponseRow(new SamplerResponseRow(getRawColumns().get(0), new SamplerTestUtils.MapAllowingNullValuesBuilder<String, Object>().put("__time", 1555934400000L).put("dim1", "foo").put("dim2", null).put("met1", 1L).build(), null, null), data.get(0));
    assertEqualsSamplerResponseRow(new SamplerResponseRow(getRawColumns().get(1), new SamplerTestUtils.MapAllowingNullValuesBuilder<String, Object>().put("__time", 1555934400000L).put("dim1", "foo").put("dim2", null).put("met1", 2L).build(), null, null), data.get(1));
    assertEqualsSamplerResponseRow(new SamplerResponseRow(getRawColumns().get(2), new SamplerTestUtils.MapAllowingNullValuesBuilder<String, Object>().put("__time", 1555934400000L).put("dim1", "foo").put("dim2", null).put("met1", 3L).build(), null, null), data.get(2));
    assertEqualsSamplerResponseRow(new SamplerResponseRow(getRawColumns().get(3), new SamplerTestUtils.MapAllowingNullValuesBuilder<String, Object>().put("__time", 1555934400000L).put("dim1", "foo2").put("dim2", null).put("met1", 4L).build(), null, null), data.get(3));
    assertEqualsSamplerResponseRow(new SamplerResponseRow(getRawColumns().get(4), new SamplerTestUtils.MapAllowingNullValuesBuilder<String, Object>().put("__time", 1555934400000L).put("dim1", "foo").put("dim2", "bar").put("met1", 5L).build(), null, null), data.get(4));
    assertEqualsSamplerResponseRow(new SamplerResponseRow(getRawColumns().get(5), null, true, getUnparseableTimestampString()), data.get(5));
}
Also used : RecordSupplierInputSource(org.apache.druid.indexing.seekablestream.RecordSupplierInputSource) InlineInputSource(org.apache.druid.data.input.impl.InlineInputSource) InputSource(org.apache.druid.data.input.InputSource) SamplerResponse(org.apache.druid.client.indexing.SamplerResponse) LongSumAggregatorFactory(org.apache.druid.query.aggregation.LongSumAggregatorFactory) LongSumAggregatorFactory(org.apache.druid.query.aggregation.LongSumAggregatorFactory) AggregatorFactory(org.apache.druid.query.aggregation.AggregatorFactory) DataSchema(org.apache.druid.segment.indexing.DataSchema) UniformGranularitySpec(org.apache.druid.segment.indexing.granularity.UniformGranularitySpec) UniformGranularitySpec(org.apache.druid.segment.indexing.granularity.UniformGranularitySpec) GranularitySpec(org.apache.druid.segment.indexing.granularity.GranularitySpec) JsonInputFormat(org.apache.druid.data.input.impl.JsonInputFormat) InputFormat(org.apache.druid.data.input.InputFormat) CsvInputFormat(org.apache.druid.data.input.impl.CsvInputFormat) TimestampSpec(org.apache.druid.data.input.impl.TimestampSpec) DimensionsSpec(org.apache.druid.data.input.impl.DimensionsSpec) SamplerResponseRow(org.apache.druid.client.indexing.SamplerResponse.SamplerResponseRow) InitializedNullHandlingTest(org.apache.druid.testing.InitializedNullHandlingTest) Test(org.junit.Test)

Example 14 with GranularitySpec

use of org.apache.druid.segment.indexing.granularity.GranularitySpec in project hive by apache.

the class DruidStorageHandler method updateKafkaIngestion.

private void updateKafkaIngestion(Table table) {
    final String overlordAddress = HiveConf.getVar(getConf(), HiveConf.ConfVars.HIVE_DRUID_OVERLORD_DEFAULT_ADDRESS);
    final String dataSourceName = Preconditions.checkNotNull(DruidStorageHandlerUtils.getTableProperty(table, Constants.DRUID_DATA_SOURCE), "Druid datasource name is null");
    final String kafkaTopic = Preconditions.checkNotNull(DruidStorageHandlerUtils.getTableProperty(table, DruidConstants.KAFKA_TOPIC), "kafka topic is null");
    final String kafkaServers = Preconditions.checkNotNull(DruidStorageHandlerUtils.getTableProperty(table, DruidConstants.KAFKA_BOOTSTRAP_SERVERS), "kafka connect string is null");
    Properties tableProperties = new Properties();
    tableProperties.putAll(table.getParameters());
    final GranularitySpec granularitySpec = DruidStorageHandlerUtils.getGranularitySpec(getConf(), tableProperties);
    List<FieldSchema> columns = table.getSd().getCols();
    List<String> columnNames = new ArrayList<>(columns.size());
    List<TypeInfo> columnTypes = new ArrayList<>(columns.size());
    for (FieldSchema schema : columns) {
        columnNames.add(schema.getName());
        columnTypes.add(TypeInfoUtils.getTypeInfoFromTypeString(schema.getType()));
    }
    Pair<List<DimensionSchema>, AggregatorFactory[]> dimensionsAndAggregates = DruidStorageHandlerUtils.getDimensionsAndAggregates(columnNames, columnTypes);
    if (!columnNames.contains(DruidConstants.DEFAULT_TIMESTAMP_COLUMN)) {
        throw new IllegalStateException("Timestamp column (' " + DruidConstants.DEFAULT_TIMESTAMP_COLUMN + "') not specified in create table; list of columns is : " + columnNames);
    }
    DimensionsSpec dimensionsSpec = new DimensionsSpec(dimensionsAndAggregates.lhs, null, null);
    String timestampFormat = DruidStorageHandlerUtils.getTableProperty(table, DruidConstants.DRUID_TIMESTAMP_FORMAT);
    String timestampColumnName = DruidStorageHandlerUtils.getTableProperty(table, DruidConstants.DRUID_TIMESTAMP_COLUMN);
    if (timestampColumnName == null) {
        timestampColumnName = DruidConstants.DEFAULT_TIMESTAMP_COLUMN;
    }
    final TimestampSpec timestampSpec = new TimestampSpec(timestampColumnName, timestampFormat, null);
    final InputRowParser inputRowParser = DruidKafkaUtils.getInputRowParser(table, timestampSpec, dimensionsSpec);
    final Map<String, Object> inputParser = JSON_MAPPER.convertValue(inputRowParser, new TypeReference<Map<String, Object>>() {
    });
    final DataSchema dataSchema = new DataSchema(dataSourceName, inputParser, dimensionsAndAggregates.rhs, granularitySpec, null, DruidStorageHandlerUtils.JSON_MAPPER);
    IndexSpec indexSpec = DruidStorageHandlerUtils.getIndexSpec(getConf());
    KafkaSupervisorSpec spec = DruidKafkaUtils.createKafkaSupervisorSpec(table, kafkaTopic, kafkaServers, dataSchema, indexSpec);
    // Fetch existing Ingestion Spec from Druid, if any
    KafkaSupervisorSpec existingSpec = fetchKafkaIngestionSpec(table);
    String targetState = DruidStorageHandlerUtils.getTableProperty(table, DruidConstants.DRUID_KAFKA_INGESTION);
    if (targetState == null) {
        // Case when user has not specified any ingestion state in the current command
        // if there is a kafka supervisor running then keep it last known state is START otherwise STOP.
        targetState = existingSpec == null ? "STOP" : "START";
    }
    if ("STOP".equalsIgnoreCase(targetState)) {
        if (existingSpec != null) {
            stopKafkaIngestion(overlordAddress, dataSourceName);
        }
    } else if ("START".equalsIgnoreCase(targetState)) {
        if (existingSpec == null || !existingSpec.equals(spec)) {
            DruidKafkaUtils.updateKafkaIngestionSpec(overlordAddress, spec);
        }
    } else if ("RESET".equalsIgnoreCase(targetState)) {
        // Case when there are changes in multiple table properties.
        if (existingSpec != null && !existingSpec.equals(spec)) {
            DruidKafkaUtils.updateKafkaIngestionSpec(overlordAddress, spec);
        }
        resetKafkaIngestion(overlordAddress, dataSourceName);
    } else {
        throw new IllegalArgumentException(String.format("Invalid value for property [%s], Valid values are [START, STOP, RESET]", DruidConstants.DRUID_KAFKA_INGESTION));
    }
    // We do not want to keep state in two separate places so remove from hive table properties.
    table.getParameters().remove(DruidConstants.DRUID_KAFKA_INGESTION);
}
Also used : IndexSpec(org.apache.druid.segment.IndexSpec) FieldSchema(org.apache.hadoop.hive.metastore.api.FieldSchema) ArrayList(java.util.ArrayList) Properties(java.util.Properties) TypeInfo(org.apache.hadoop.hive.serde2.typeinfo.TypeInfo) DataSchema(org.apache.druid.segment.indexing.DataSchema) GranularitySpec(org.apache.druid.segment.indexing.granularity.GranularitySpec) TimestampSpec(org.apache.druid.data.input.impl.TimestampSpec) DimensionsSpec(org.apache.druid.data.input.impl.DimensionsSpec) ArrayList(java.util.ArrayList) List(java.util.List) ImmutableList(com.google.common.collect.ImmutableList) InputRowParser(org.apache.druid.data.input.impl.InputRowParser) Map(java.util.Map) KafkaSupervisorSpec(org.apache.hadoop.hive.druid.json.KafkaSupervisorSpec)

Example 15 with GranularitySpec

use of org.apache.druid.segment.indexing.granularity.GranularitySpec in project hive by apache.

the class DruidOutputFormat method getHiveRecordWriter.

@Override
public FileSinkOperator.RecordWriter getHiveRecordWriter(JobConf jc, Path finalOutPath, Class<? extends Writable> valueClass, boolean isCompressed, Properties tableProperties, Progressable progress) throws IOException {
    final int targetNumShardsPerGranularity = Integer.parseUnsignedInt(tableProperties.getProperty(Constants.DRUID_TARGET_SHARDS_PER_GRANULARITY, "0"));
    final int maxPartitionSize = targetNumShardsPerGranularity > 0 ? -1 : HiveConf.getIntVar(jc, HiveConf.ConfVars.HIVE_DRUID_MAX_PARTITION_SIZE);
    // If datasource is in the table properties, it is an INSERT/INSERT OVERWRITE as the datasource
    // name was already persisted. Otherwise, it is a CT/CTAS and we need to get the name from the
    // job properties that are set by configureOutputJobProperties in the DruidStorageHandler
    final String dataSource = tableProperties.getProperty(Constants.DRUID_DATA_SOURCE) == null ? jc.get(Constants.DRUID_DATA_SOURCE) : tableProperties.getProperty(Constants.DRUID_DATA_SOURCE);
    final String segmentDirectory = jc.get(DruidConstants.DRUID_SEGMENT_INTERMEDIATE_DIRECTORY);
    final GranularitySpec granularitySpec = DruidStorageHandlerUtils.getGranularitySpec(jc, tableProperties);
    final String columnNameProperty = tableProperties.getProperty(serdeConstants.LIST_COLUMNS);
    final String columnTypeProperty = tableProperties.getProperty(serdeConstants.LIST_COLUMN_TYPES);
    if (StringUtils.isEmpty(columnNameProperty) || StringUtils.isEmpty(columnTypeProperty)) {
        throw new IllegalStateException(String.format("List of columns names [%s] or columns type [%s] is/are not present", columnNameProperty, columnTypeProperty));
    }
    ArrayList<String> columnNames = Lists.newArrayList(columnNameProperty.split(","));
    if (!columnNames.contains(DruidConstants.DEFAULT_TIMESTAMP_COLUMN)) {
        throw new IllegalStateException("Timestamp column (' " + DruidConstants.DEFAULT_TIMESTAMP_COLUMN + "') not specified in create table; list of columns is : " + tableProperties.getProperty(serdeConstants.LIST_COLUMNS));
    }
    ArrayList<TypeInfo> columnTypes = TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
    Pair<List<DimensionSchema>, AggregatorFactory[]> dimensionsAndAggregates = DruidStorageHandlerUtils.getDimensionsAndAggregates(columnNames, columnTypes);
    final InputRowParser inputRowParser = new MapInputRowParser(new TimeAndDimsParseSpec(new TimestampSpec(DruidConstants.DEFAULT_TIMESTAMP_COLUMN, "auto", null), new DimensionsSpec(dimensionsAndAggregates.lhs, Lists.newArrayList(Constants.DRUID_TIMESTAMP_GRANULARITY_COL_NAME, Constants.DRUID_SHARD_KEY_COL_NAME), null)));
    Map<String, Object> inputParser = DruidStorageHandlerUtils.JSON_MAPPER.convertValue(inputRowParser, new TypeReference<Map<String, Object>>() {
    });
    final DataSchema dataSchema = new DataSchema(Preconditions.checkNotNull(dataSource, "Data source name is null"), inputParser, dimensionsAndAggregates.rhs, granularitySpec, null, DruidStorageHandlerUtils.JSON_MAPPER);
    final String workingPath = jc.get(DruidConstants.DRUID_JOB_WORKING_DIRECTORY);
    final String version = jc.get(DruidConstants.DRUID_SEGMENT_VERSION);
    String basePersistDirectory = HiveConf.getVar(jc, HiveConf.ConfVars.HIVE_DRUID_BASE_PERSIST_DIRECTORY);
    if (Strings.isNullOrEmpty(basePersistDirectory)) {
        basePersistDirectory = System.getProperty("java.io.tmpdir");
    }
    Integer maxRowInMemory = HiveConf.getIntVar(jc, HiveConf.ConfVars.HIVE_DRUID_MAX_ROW_IN_MEMORY);
    IndexSpec indexSpec = DruidStorageHandlerUtils.getIndexSpec(jc);
    RealtimeTuningConfig realtimeTuningConfig = new RealtimeTuningConfig(maxRowInMemory, null, null, null, new File(basePersistDirectory, dataSource), new CustomVersioningPolicy(version), null, null, null, indexSpec, null, true, 0, 0, true, null, 0L, null, null);
    LOG.debug(String.format("running with Data schema [%s] ", dataSchema));
    return new DruidRecordWriter(dataSchema, realtimeTuningConfig, DruidStorageHandlerUtils.createSegmentPusherForDirectory(segmentDirectory, jc), maxPartitionSize, new Path(workingPath, SEGMENTS_DESCRIPTOR_DIR_NAME), finalOutPath.getFileSystem(jc));
}
Also used : IndexSpec(org.apache.druid.segment.IndexSpec) MapInputRowParser(org.apache.druid.data.input.impl.MapInputRowParser) TimeAndDimsParseSpec(org.apache.druid.data.input.impl.TimeAndDimsParseSpec) TimestampSpec(org.apache.druid.data.input.impl.TimestampSpec) ArrayList(java.util.ArrayList) List(java.util.List) Path(org.apache.hadoop.fs.Path) RealtimeTuningConfig(org.apache.druid.segment.indexing.RealtimeTuningConfig) TypeInfo(org.apache.hadoop.hive.serde2.typeinfo.TypeInfo) DataSchema(org.apache.druid.segment.indexing.DataSchema) GranularitySpec(org.apache.druid.segment.indexing.granularity.GranularitySpec) DimensionsSpec(org.apache.druid.data.input.impl.DimensionsSpec) MapInputRowParser(org.apache.druid.data.input.impl.MapInputRowParser) InputRowParser(org.apache.druid.data.input.impl.InputRowParser) CustomVersioningPolicy(org.apache.druid.segment.realtime.plumber.CustomVersioningPolicy) Map(java.util.Map) File(java.io.File)

Aggregations

GranularitySpec (org.apache.druid.segment.indexing.granularity.GranularitySpec)32 DataSchema (org.apache.druid.segment.indexing.DataSchema)21 DimensionsSpec (org.apache.druid.data.input.impl.DimensionsSpec)15 TimestampSpec (org.apache.druid.data.input.impl.TimestampSpec)14 Test (org.junit.Test)14 UniformGranularitySpec (org.apache.druid.segment.indexing.granularity.UniformGranularitySpec)13 InputFormat (org.apache.druid.data.input.InputFormat)11 InputRow (org.apache.druid.data.input.InputRow)11 InputSource (org.apache.druid.data.input.InputSource)11 AggregatorFactory (org.apache.druid.query.aggregation.AggregatorFactory)11 LongSumAggregatorFactory (org.apache.druid.query.aggregation.LongSumAggregatorFactory)11 TransformSpec (org.apache.druid.segment.transform.TransformSpec)9 InitializedNullHandlingTest (org.apache.druid.testing.InitializedNullHandlingTest)9 Interval (org.joda.time.Interval)9 Map (java.util.Map)8 SamplerResponse (org.apache.druid.client.indexing.SamplerResponse)8 SamplerResponseRow (org.apache.druid.client.indexing.SamplerResponse.SamplerResponseRow)8 RecordSupplierInputSource (org.apache.druid.indexing.seekablestream.RecordSupplierInputSource)8 CsvInputFormat (org.apache.druid.data.input.impl.CsvInputFormat)7 InlineInputSource (org.apache.druid.data.input.impl.InlineInputSource)7