Search in sources :

Example 1 with HadoopIOConfig

use of org.apache.druid.indexer.HadoopIOConfig in project druid by druid-io.

the class StaticPathSpecTest method testAddingPaths.

@Test
public void testAddingPaths() throws Exception {
    Job job = new Job();
    StaticPathSpec pathSpec = new StaticPathSpec("/a/c,/a/b/{c,d}", null);
    DataSchema schema = new DataSchema("ds", null, new AggregatorFactory[0], null, null, jsonMapper);
    HadoopIOConfig io = new HadoopIOConfig(null, null, null);
    pathSpec.addInputPaths(new HadoopDruidIndexerConfig(new HadoopIngestionSpec(schema, io, null)), job);
    String paths = job.getConfiguration().get(MultipleInputs.DIR_FORMATS);
    String formatter = TextInputFormat.class.getName();
    String[] expected = { "/a/c;" + formatter, "/a/b/c;" + formatter, "/a/b/d;" + formatter };
    Assert.assertArrayEquals(expected, paths.split(","));
}
Also used : DataSchema(org.apache.druid.segment.indexing.DataSchema) HadoopIngestionSpec(org.apache.druid.indexer.HadoopIngestionSpec) Job(org.apache.hadoop.mapreduce.Job) HadoopDruidIndexerConfig(org.apache.druid.indexer.HadoopDruidIndexerConfig) HadoopIOConfig(org.apache.druid.indexer.HadoopIOConfig) Test(org.junit.Test)

Example 2 with HadoopIOConfig

use of org.apache.druid.indexer.HadoopIOConfig in project druid by druid-io.

the class GranularityPathSpecTest method testIntervalTrimming.

@Test
public void testIntervalTrimming() throws Exception {
    UserGroupInformation.setLoginUser(UserGroupInformation.createUserForTesting("test", new String[] { "testGroup" }));
    HadoopIngestionSpec spec = new HadoopIngestionSpec(new DataSchema("foo", null, new AggregatorFactory[0], new UniformGranularitySpec(Granularities.DAY, Granularities.ALL, ImmutableList.of(Intervals.of("2015-01-01T11Z/2015-01-02T05Z"))), null, jsonMapper), new HadoopIOConfig(null, null, null), DEFAULT_TUNING_CONFIG);
    granularityPathSpec.setDataGranularity(Granularities.HOUR);
    granularityPathSpec.setPathFormat("yyyy/MM/dd/HH");
    granularityPathSpec.setFilePattern(".*");
    granularityPathSpec.setInputFormat(TextInputFormat.class);
    Job job = Job.getInstance();
    String formatStr = "file:%s/%s;org.apache.hadoop.mapreduce.lib.input.TextInputFormat";
    createFile(testFolder, "test/2015/01/01/00/file1", "test/2015/01/01/10/file2", "test/2015/01/01/18/file3", "test/2015/01/02/00/file1", "test/2015/01/02/03/file2", "test/2015/01/02/05/file3", "test/2015/01/02/07/file4", "test/2015/01/02/09/file5");
    granularityPathSpec.setInputPath(testFolder.getRoot().getPath() + "/test");
    granularityPathSpec.addInputPaths(HadoopDruidIndexerConfig.fromSpec(spec), job);
    String actual = job.getConfiguration().get("mapreduce.input.multipleinputs.dir.formats");
    String expected = Joiner.on(",").join(Lists.newArrayList(StringUtils.format(formatStr, testFolder.getRoot(), "test/2015/01/01/18/file3"), StringUtils.format(formatStr, testFolder.getRoot(), "test/2015/01/02/00/file1"), StringUtils.format(formatStr, testFolder.getRoot(), "test/2015/01/02/03/file2")));
    Assert.assertEquals("Did not find expected input paths", expected, actual);
}
Also used : HadoopIngestionSpec(org.apache.druid.indexer.HadoopIngestionSpec) DataSchema(org.apache.druid.segment.indexing.DataSchema) UniformGranularitySpec(org.apache.druid.segment.indexing.granularity.UniformGranularitySpec) AggregatorFactory(org.apache.druid.query.aggregation.AggregatorFactory) Job(org.apache.hadoop.mapreduce.Job) HadoopIOConfig(org.apache.druid.indexer.HadoopIOConfig) Test(org.junit.Test)

Example 3 with HadoopIOConfig

use of org.apache.druid.indexer.HadoopIOConfig in project druid by druid-io.

the class MaterializedViewSupervisorSpec method createTask.

public HadoopIndexTask createTask(Interval interval, String version, List<DataSegment> segments) {
    String taskId = StringUtils.format("%s_%s_%s", TASK_PREFIX, dataSourceName, DateTimes.nowUtc());
    // generate parser
    Map<String, Object> parseSpec = new HashMap<>();
    parseSpec.put("format", "timeAndDims");
    parseSpec.put("dimensionsSpec", dimensionsSpec);
    Map<String, Object> parser = new HashMap<>();
    parser.put("type", "map");
    parser.put("parseSpec", parseSpec);
    // generate HadoopTuningConfig
    HadoopTuningConfig tuningConfigForTask = new HadoopTuningConfig(tuningConfig.getWorkingPath(), version, tuningConfig.getPartitionsSpec(), tuningConfig.getShardSpecs(), tuningConfig.getIndexSpec(), tuningConfig.getIndexSpecForIntermediatePersists(), tuningConfig.getAppendableIndexSpec(), tuningConfig.getMaxRowsInMemory(), tuningConfig.getMaxBytesInMemory(), tuningConfig.isLeaveIntermediate(), tuningConfig.isCleanupOnFailure(), tuningConfig.isOverwriteFiles(), tuningConfig.isIgnoreInvalidRows(), tuningConfig.getJobProperties(), tuningConfig.isCombineText(), tuningConfig.getUseCombiner(), tuningConfig.getMaxRowsInMemory(), tuningConfig.getNumBackgroundPersistThreads(), tuningConfig.isForceExtendableShardSpecs(), true, tuningConfig.getUserAllowedHadoopPrefix(), tuningConfig.isLogParseExceptions(), tuningConfig.getMaxParseExceptions(), tuningConfig.isUseYarnRMJobStatusFallback(), tuningConfig.getAwaitSegmentAvailabilityTimeoutMillis());
    // generate granularity
    ArbitraryGranularitySpec granularitySpec = new ArbitraryGranularitySpec(Granularities.NONE, ImmutableList.of(interval));
    // generate DataSchema
    DataSchema dataSchema = new DataSchema(dataSourceName, parser, aggregators, granularitySpec, TransformSpec.NONE, objectMapper);
    // generate DatasourceIngestionSpec
    DatasourceIngestionSpec datasourceIngestionSpec = new DatasourceIngestionSpec(baseDataSource, null, ImmutableList.of(interval), segments, null, null, null, false, null);
    // generate HadoopIOConfig
    Map<String, Object> inputSpec = new HashMap<>();
    inputSpec.put("type", "dataSource");
    inputSpec.put("ingestionSpec", datasourceIngestionSpec);
    HadoopIOConfig hadoopIOConfig = new HadoopIOConfig(inputSpec, null, null);
    // generate HadoopIngestionSpec
    HadoopIngestionSpec spec = new HadoopIngestionSpec(dataSchema, hadoopIOConfig, tuningConfigForTask);
    // generate HadoopIndexTask
    HadoopIndexTask task = new HadoopIndexTask(taskId, spec, hadoopCoordinates, hadoopDependencyCoordinates, classpathPrefix, objectMapper, context, authorizerMapper, chatHandlerProvider);
    return task;
}
Also used : DataSchema(org.apache.druid.segment.indexing.DataSchema) DatasourceIngestionSpec(org.apache.druid.indexer.hadoop.DatasourceIngestionSpec) HadoopIngestionSpec(org.apache.druid.indexer.HadoopIngestionSpec) HashMap(java.util.HashMap) HadoopTuningConfig(org.apache.druid.indexer.HadoopTuningConfig) ArbitraryGranularitySpec(org.apache.druid.segment.indexing.granularity.ArbitraryGranularitySpec) HadoopIndexTask(org.apache.druid.indexing.common.task.HadoopIndexTask) HadoopIOConfig(org.apache.druid.indexer.HadoopIOConfig)

Example 4 with HadoopIOConfig

use of org.apache.druid.indexer.HadoopIOConfig in project druid by druid-io.

the class MaterializedViewSupervisorTest method testCheckSegmentsAndSubmitTasks.

@Test
public void testCheckSegmentsAndSubmitTasks() throws IOException {
    Set<DataSegment> baseSegments = Sets.newHashSet(new DataSegment("base", Intervals.of("2015-01-02T00Z/2015-01-03T00Z"), "2015-01-03", ImmutableMap.of(), ImmutableList.of("dim1", "dim2"), ImmutableList.of("m1"), new HashBasedNumberedShardSpec(0, 1, 0, 1, null, null, null), 9, 1024));
    indexerMetadataStorageCoordinator.announceHistoricalSegments(baseSegments);
    EasyMock.expect(taskMaster.getTaskQueue()).andReturn(Optional.of(taskQueue)).anyTimes();
    EasyMock.expect(taskMaster.getTaskRunner()).andReturn(Optional.absent()).anyTimes();
    EasyMock.expect(taskStorage.getActiveTasks()).andReturn(ImmutableList.of()).anyTimes();
    EasyMock.expect(taskStorage.getStatus("test_task1")).andReturn(Optional.of(TaskStatus.failure("test_task1", "Dummy task status failure err message"))).anyTimes();
    EasyMock.expect(taskStorage.getStatus("test_task2")).andReturn(Optional.of(TaskStatus.running("test_task2"))).anyTimes();
    EasyMock.replay(taskStorage);
    Pair<Map<Interval, HadoopIndexTask>, Map<Interval, String>> runningTasksPair = supervisor.getRunningTasks();
    Map<Interval, HadoopIndexTask> runningTasks = runningTasksPair.lhs;
    Map<Interval, String> runningVersion = runningTasksPair.rhs;
    DataSchema dataSchema = new DataSchema("test_datasource", null, null, null, TransformSpec.NONE, objectMapper);
    HadoopIOConfig hadoopIOConfig = new HadoopIOConfig(new HashMap<>(), null, null);
    HadoopIngestionSpec spec = new HadoopIngestionSpec(dataSchema, hadoopIOConfig, null);
    HadoopIndexTask task1 = new HadoopIndexTask("test_task1", spec, null, null, null, objectMapper, null, null, null);
    runningTasks.put(Intervals.of("2015-01-01T00Z/2015-01-02T00Z"), task1);
    runningVersion.put(Intervals.of("2015-01-01T00Z/2015-01-02T00Z"), "test_version1");
    HadoopIndexTask task2 = new HadoopIndexTask("test_task2", spec, null, null, null, objectMapper, null, null, null);
    runningTasks.put(Intervals.of("2015-01-02T00Z/2015-01-03T00Z"), task2);
    runningVersion.put(Intervals.of("2015-01-02T00Z/2015-01-03T00Z"), "test_version2");
    supervisor.checkSegmentsAndSubmitTasks();
    Map<Interval, HadoopIndexTask> expectedRunningTasks = new HashMap<>();
    Map<Interval, String> expectedRunningVersion = new HashMap<>();
    expectedRunningTasks.put(Intervals.of("2015-01-02T00Z/2015-01-03T00Z"), task2);
    expectedRunningVersion.put(Intervals.of("2015-01-02T00Z/2015-01-03T00Z"), "test_version2");
    Assert.assertEquals(expectedRunningTasks, runningTasks);
    Assert.assertEquals(expectedRunningVersion, runningVersion);
}
Also used : HashBasedNumberedShardSpec(org.apache.druid.timeline.partition.HashBasedNumberedShardSpec) HadoopIngestionSpec(org.apache.druid.indexer.HadoopIngestionSpec) HashMap(java.util.HashMap) DataSegment(org.apache.druid.timeline.DataSegment) HadoopIOConfig(org.apache.druid.indexer.HadoopIOConfig) DataSchema(org.apache.druid.segment.indexing.DataSchema) HashMap(java.util.HashMap) Map(java.util.Map) ImmutableMap(com.google.common.collect.ImmutableMap) SortedMap(java.util.SortedMap) HadoopIndexTask(org.apache.druid.indexing.common.task.HadoopIndexTask) Interval(org.joda.time.Interval) Test(org.junit.Test)

Example 5 with HadoopIOConfig

use of org.apache.druid.indexer.HadoopIOConfig in project druid by druid-io.

the class TaskSerdeTest method testHadoopIndexTaskSerde.

@Test
public void testHadoopIndexTaskSerde() throws Exception {
    final HadoopIndexTask task = new HadoopIndexTask(null, new HadoopIngestionSpec(new DataSchema("foo", null, new AggregatorFactory[0], new UniformGranularitySpec(Granularities.DAY, null, ImmutableList.of(Intervals.of("2010-01-01/P1D"))), null, jsonMapper), new HadoopIOConfig(ImmutableMap.of("paths", "bar"), null, null), null), null, null, "blah", jsonMapper, null, AuthTestUtils.TEST_AUTHORIZER_MAPPER, null);
    final String json = jsonMapper.writeValueAsString(task);
    final HadoopIndexTask task2 = (HadoopIndexTask) jsonMapper.readValue(json, Task.class);
    Assert.assertEquals("foo", task.getDataSource());
    Assert.assertEquals(task.getId(), task2.getId());
    Assert.assertEquals(task.getGroupId(), task2.getGroupId());
    Assert.assertEquals(task.getDataSource(), task2.getDataSource());
    Assert.assertEquals(task.getSpec().getTuningConfig().getJobProperties(), task2.getSpec().getTuningConfig().getJobProperties());
    Assert.assertEquals("blah", task.getClasspathPrefix());
    Assert.assertEquals("blah", task2.getClasspathPrefix());
}
Also used : HadoopIngestionSpec(org.apache.druid.indexer.HadoopIngestionSpec) DataSchema(org.apache.druid.segment.indexing.DataSchema) UniformGranularitySpec(org.apache.druid.segment.indexing.granularity.UniformGranularitySpec) HadoopIOConfig(org.apache.druid.indexer.HadoopIOConfig) Test(org.junit.Test)

Aggregations

HadoopIOConfig (org.apache.druid.indexer.HadoopIOConfig)6 HadoopIngestionSpec (org.apache.druid.indexer.HadoopIngestionSpec)6 DataSchema (org.apache.druid.segment.indexing.DataSchema)6 Test (org.junit.Test)5 UniformGranularitySpec (org.apache.druid.segment.indexing.granularity.UniformGranularitySpec)3 Job (org.apache.hadoop.mapreduce.Job)3 HashMap (java.util.HashMap)2 HadoopIndexTask (org.apache.druid.indexing.common.task.HadoopIndexTask)2 AggregatorFactory (org.apache.druid.query.aggregation.AggregatorFactory)2 ImmutableMap (com.google.common.collect.ImmutableMap)1 Map (java.util.Map)1 SortedMap (java.util.SortedMap)1 HadoopDruidIndexerConfig (org.apache.druid.indexer.HadoopDruidIndexerConfig)1 HadoopTuningConfig (org.apache.druid.indexer.HadoopTuningConfig)1 DatasourceIngestionSpec (org.apache.druid.indexer.hadoop.DatasourceIngestionSpec)1 ArbitraryGranularitySpec (org.apache.druid.segment.indexing.granularity.ArbitraryGranularitySpec)1 DataSegment (org.apache.druid.timeline.DataSegment)1 HashBasedNumberedShardSpec (org.apache.druid.timeline.partition.HashBasedNumberedShardSpec)1 Interval (org.joda.time.Interval)1