Examples with HadoopIndexTask - org.apache.druid.indexing.common.task.HadoopIndexTask

Example 1 with HadoopIndexTask

use of org.apache.druid.indexing.common.task.HadoopIndexTask in project druid by druid-io.

the class MaterializedViewSupervisorSpec method createTask.

public HadoopIndexTask createTask(Interval interval, String version, List<DataSegment> segments) {
    String taskId = StringUtils.format("%s_%s_%s", TASK_PREFIX, dataSourceName, DateTimes.nowUtc());
    // generate parser
    Map<String, Object> parseSpec = new HashMap<>();
    parseSpec.put("format", "timeAndDims");
    parseSpec.put("dimensionsSpec", dimensionsSpec);
    Map<String, Object> parser = new HashMap<>();
    parser.put("type", "map");
    parser.put("parseSpec", parseSpec);
    // generate HadoopTuningConfig
    HadoopTuningConfig tuningConfigForTask = new HadoopTuningConfig(tuningConfig.getWorkingPath(), version, tuningConfig.getPartitionsSpec(), tuningConfig.getShardSpecs(), tuningConfig.getIndexSpec(), tuningConfig.getIndexSpecForIntermediatePersists(), tuningConfig.getAppendableIndexSpec(), tuningConfig.getMaxRowsInMemory(), tuningConfig.getMaxBytesInMemory(), tuningConfig.isLeaveIntermediate(), tuningConfig.isCleanupOnFailure(), tuningConfig.isOverwriteFiles(), tuningConfig.isIgnoreInvalidRows(), tuningConfig.getJobProperties(), tuningConfig.isCombineText(), tuningConfig.getUseCombiner(), tuningConfig.getMaxRowsInMemory(), tuningConfig.getNumBackgroundPersistThreads(), tuningConfig.isForceExtendableShardSpecs(), true, tuningConfig.getUserAllowedHadoopPrefix(), tuningConfig.isLogParseExceptions(), tuningConfig.getMaxParseExceptions(), tuningConfig.isUseYarnRMJobStatusFallback(), tuningConfig.getAwaitSegmentAvailabilityTimeoutMillis());
    // generate granularity
    ArbitraryGranularitySpec granularitySpec = new ArbitraryGranularitySpec(Granularities.NONE, ImmutableList.of(interval));
    // generate DataSchema
    DataSchema dataSchema = new DataSchema(dataSourceName, parser, aggregators, granularitySpec, TransformSpec.NONE, objectMapper);
    // generate DatasourceIngestionSpec
    DatasourceIngestionSpec datasourceIngestionSpec = new DatasourceIngestionSpec(baseDataSource, null, ImmutableList.of(interval), segments, null, null, null, false, null);
    // generate HadoopIOConfig
    Map<String, Object> inputSpec = new HashMap<>();
    inputSpec.put("type", "dataSource");
    inputSpec.put("ingestionSpec", datasourceIngestionSpec);
    HadoopIOConfig hadoopIOConfig = new HadoopIOConfig(inputSpec, null, null);
    // generate HadoopIngestionSpec
    HadoopIngestionSpec spec = new HadoopIngestionSpec(dataSchema, hadoopIOConfig, tuningConfigForTask);
    // generate HadoopIndexTask
    HadoopIndexTask task = new HadoopIndexTask(taskId, spec, hadoopCoordinates, hadoopDependencyCoordinates, classpathPrefix, objectMapper, context, authorizerMapper, chatHandlerProvider);
    return task;
}

Also used : DataSchema(org.apache.druid.segment.indexing.DataSchema) DatasourceIngestionSpec(org.apache.druid.indexer.hadoop.DatasourceIngestionSpec) HadoopIngestionSpec(org.apache.druid.indexer.HadoopIngestionSpec) HashMap(java.util.HashMap) HadoopTuningConfig(org.apache.druid.indexer.HadoopTuningConfig) ArbitraryGranularitySpec(org.apache.druid.segment.indexing.granularity.ArbitraryGranularitySpec) HadoopIndexTask(org.apache.druid.indexing.common.task.HadoopIndexTask) HadoopIOConfig(org.apache.druid.indexer.HadoopIOConfig)

Example 2 with HadoopIndexTask

use of org.apache.druid.indexing.common.task.HadoopIndexTask in project druid by druid-io.

the class MaterializedViewSupervisorTest method testCreateTask.

/**
 * Verifies that creating HadoopIndexTask compleates without raising exception.
 */
@Test
public void testCreateTask() {
    List<DataSegment> baseSegments = Collections.singletonList(new DataSegment("base", Intervals.of("2015-01-02T00Z/2015-01-03T00Z"), "2015-01-03", ImmutableMap.of(), ImmutableList.of("dim1", "dim2"), ImmutableList.of("m1"), new HashBasedNumberedShardSpec(0, 1, 0, 1, null, null, null), 9, 1024));
    HadoopIndexTask task = spec.createTask(Intervals.of("2015-01-02T00Z/2015-01-03T00Z"), "2015-01-03", baseSegments);
    Assert.assertNotNull(task);
}

Also used : HashBasedNumberedShardSpec(org.apache.druid.timeline.partition.HashBasedNumberedShardSpec) DataSegment(org.apache.druid.timeline.DataSegment) HadoopIndexTask(org.apache.druid.indexing.common.task.HadoopIndexTask) Test(org.junit.Test)

Example 3 with HadoopIndexTask

use of org.apache.druid.indexing.common.task.HadoopIndexTask in project druid by druid-io.

the class MaterializedViewSupervisor method checkSegmentsAndSubmitTasks.

/**
 * Find intervals in which derived dataSource should rebuild the segments.
 * Choose the latest intervals to create new HadoopIndexTask and submit it.
 */
@VisibleForTesting
void checkSegmentsAndSubmitTasks() {
    synchronized (taskLock) {
        List<Interval> intervalsToRemove = new ArrayList<>();
        for (Map.Entry<Interval, HadoopIndexTask> entry : runningTasks.entrySet()) {
            Optional<TaskStatus> taskStatus = taskStorage.getStatus(entry.getValue().getId());
            if (!taskStatus.isPresent() || !taskStatus.get().isRunnable()) {
                intervalsToRemove.add(entry.getKey());
            }
        }
        for (Interval interval : intervalsToRemove) {
            runningTasks.remove(interval);
            runningVersion.remove(interval);
        }
        if (runningTasks.size() == maxTaskCount) {
            // if the number of running tasks reach the max task count, supervisor won't submit new tasks.
            return;
        }
        Pair<SortedMap<Interval, String>, Map<Interval, List<DataSegment>>> toBuildIntervalAndBaseSegments = checkSegments();
        SortedMap<Interval, String> sortedToBuildVersion = toBuildIntervalAndBaseSegments.lhs;
        Map<Interval, List<DataSegment>> baseSegments = toBuildIntervalAndBaseSegments.rhs;
        missInterval = sortedToBuildVersion.keySet();
        submitTasks(sortedToBuildVersion, baseSegments);
    }
}

Also used : ArrayList(java.util.ArrayList) TaskStatus(org.apache.druid.indexer.TaskStatus) DataSegment(org.apache.druid.timeline.DataSegment) SortedMap(java.util.SortedMap) ArrayList(java.util.ArrayList) List(java.util.List) HashMap(java.util.HashMap) Map(java.util.Map) TreeMap(java.util.TreeMap) SortedMap(java.util.SortedMap) HadoopIndexTask(org.apache.druid.indexing.common.task.HadoopIndexTask) Interval(org.joda.time.Interval) VisibleForTesting(com.google.common.annotations.VisibleForTesting)

Example 4 with HadoopIndexTask

use of org.apache.druid.indexing.common.task.HadoopIndexTask in project druid by druid-io.

the class MaterializedViewSupervisor method clearTasks.

private void clearTasks() {
    for (HadoopIndexTask task : runningTasks.values()) {
        if (taskMaster.getTaskQueue().isPresent()) {
            taskMaster.getTaskQueue().get().shutdown(task.getId(), "killing all tasks");
        }
    }
    runningTasks.clear();
    runningVersion.clear();
}

Also used : HadoopIndexTask(org.apache.druid.indexing.common.task.HadoopIndexTask)

Example 5 with HadoopIndexTask

use of org.apache.druid.indexing.common.task.HadoopIndexTask in project druid by druid-io.

the class MaterializedViewSupervisor method submitTasks.

private void submitTasks(SortedMap<Interval, String> sortedToBuildVersion, Map<Interval, List<DataSegment>> baseSegments) {
    for (Map.Entry<Interval, String> entry : sortedToBuildVersion.entrySet()) {
        if (runningTasks.size() < maxTaskCount) {
            HadoopIndexTask task = spec.createTask(entry.getKey(), entry.getValue(), baseSegments.get(entry.getKey()));
            try {
                if (taskMaster.getTaskQueue().isPresent()) {
                    taskMaster.getTaskQueue().get().add(task);
                    runningVersion.put(entry.getKey(), entry.getValue());
                    runningTasks.put(entry.getKey(), task);
                }
            } catch (EntryExistsException e) {
                log.error("task %s already exsits", task);
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }
    }
}

Also used : EntryExistsException(org.apache.druid.metadata.EntryExistsException) HashMap(java.util.HashMap) Map(java.util.Map) TreeMap(java.util.TreeMap) SortedMap(java.util.SortedMap) HadoopIndexTask(org.apache.druid.indexing.common.task.HadoopIndexTask) EntryExistsException(org.apache.druid.metadata.EntryExistsException) IOException(java.io.IOException) Interval(org.joda.time.Interval)

Aggregations

HadoopIndexTask (org.apache.druid.indexing.common.task.HadoopIndexTask)6 HashMap (java.util.HashMap)4 Map (java.util.Map)3 SortedMap (java.util.SortedMap)3 DataSegment (org.apache.druid.timeline.DataSegment)3 Interval (org.joda.time.Interval)3 TreeMap (java.util.TreeMap)2 HadoopIOConfig (org.apache.druid.indexer.HadoopIOConfig)2 HadoopIngestionSpec (org.apache.druid.indexer.HadoopIngestionSpec)2 DataSchema (org.apache.druid.segment.indexing.DataSchema)2 HashBasedNumberedShardSpec (org.apache.druid.timeline.partition.HashBasedNumberedShardSpec)2 Test (org.junit.Test)2 VisibleForTesting (com.google.common.annotations.VisibleForTesting)1 ImmutableMap (com.google.common.collect.ImmutableMap)1 IOException (java.io.IOException)1 ArrayList (java.util.ArrayList)1 List (java.util.List)1 HadoopTuningConfig (org.apache.druid.indexer.HadoopTuningConfig)1 TaskStatus (org.apache.druid.indexer.TaskStatus)1 DatasourceIngestionSpec (org.apache.druid.indexer.hadoop.DatasourceIngestionSpec)1