Search in sources :

Example 11 with Dag

use of org.apache.gobblin.service.modules.flowgraph.Dag in project incubator-gobblin by apache.

the class DagManager method setActive.

/**
 * When a {@link DagManager} becomes active, it loads the serialized representations of the currently running {@link Dag}s
 * from the checkpoint directory, deserializes the {@link Dag}s and adds them to a queue to be consumed by
 * the {@link DagManagerThread}s.
 * @param active a boolean to indicate if the {@link DagManager} is the leader.
 */
public synchronized void setActive(boolean active) {
    if (this.isActive == active) {
        log.info("DagManager already {}, skipping further actions.", (!active) ? "inactive" : "active");
        return;
    }
    this.isActive = active;
    try {
        if (this.isActive) {
            log.info("Activating DagManager.");
            log.info("Scheduling {} DagManager threads", numThreads);
            // Initializing state store for persisting Dags.
            this.dagStateStore = createDagStateStore(config, topologySpecMap);
            DagStateStore failedDagStateStore = createDagStateStore(ConfigUtils.getConfigOrEmpty(config, FAILED_DAG_STATESTORE_PREFIX).withFallback(config), topologySpecMap);
            Set<String> failedDagIds = Collections.synchronizedSet(failedDagStateStore.getDagIds());
            ContextAwareMeter allSuccessfulMeter = null;
            ContextAwareMeter allFailedMeter = null;
            if (instrumentationEnabled) {
                MetricContext metricContext = Instrumented.getMetricContext(ConfigUtils.configToState(ConfigFactory.empty()), getClass());
                allSuccessfulMeter = metricContext.contextAwareMeter(MetricRegistry.name(ServiceMetricNames.GOBBLIN_SERVICE_PREFIX, ServiceMetricNames.SUCCESSFUL_FLOW_METER));
                allFailedMeter = metricContext.contextAwareMeter(MetricRegistry.name(ServiceMetricNames.GOBBLIN_SERVICE_PREFIX, ServiceMetricNames.FAILED_FLOW_METER));
            }
            // On startup, the service creates DagManagerThreads that are scheduled at a fixed rate.
            this.dagManagerThreads = new DagManagerThread[numThreads];
            for (int i = 0; i < numThreads; i++) {
                DagManagerThread dagManagerThread = new DagManagerThread(jobStatusRetriever, dagStateStore, failedDagStateStore, runQueue[i], cancelQueue[i], resumeQueue[i], instrumentationEnabled, defaultQuota, perUserQuota, failedDagIds, allSuccessfulMeter, allFailedMeter, this.defaultJobStartSlaTimeMillis);
                this.dagManagerThreads[i] = dagManagerThread;
                this.scheduledExecutorPool.scheduleAtFixedRate(dagManagerThread, 0, this.pollingInterval, TimeUnit.SECONDS);
            }
            FailedDagRetentionThread failedDagRetentionThread = new FailedDagRetentionThread(failedDagStateStore, failedDagIds, failedDagRetentionTime);
            this.scheduledExecutorPool.scheduleAtFixedRate(failedDagRetentionThread, 0, retentionPollingInterval, TimeUnit.MINUTES);
            List<Dag<JobExecutionPlan>> dags = dagStateStore.getDags();
            log.info("Loading " + dags.size() + " dags from dag state store");
            for (Dag<JobExecutionPlan> dag : dags) {
                addDag(dag, false, false);
            }
        } else {
            // Mark the DagManager inactive.
            log.info("Inactivating the DagManager. Shutting down all DagManager threads");
            this.scheduledExecutorPool.shutdown();
            // The DMThread's metrics mappings follow the lifecycle of the DMThread itself and so are lost by DM deactivation-reactivation but the RootMetricContext is a (persistent) singleton.
            // To avoid IllegalArgumentException by the RMC preventing (re-)add of a metric already known, remove all metrics that a new DMThread thread would attempt to add (in DagManagerThread::initialize) whenever running post-re-enablement
            RootMetricContext.get().removeMatching(getMetricsFilterForDagManager());
            try {
                this.scheduledExecutorPool.awaitTermination(TERMINATION_TIMEOUT, TimeUnit.SECONDS);
            } catch (InterruptedException e) {
                log.error("Exception encountered when shutting down DagManager threads.", e);
            }
        }
    } catch (IOException e) {
        log.error("Exception encountered when activating the new DagManager", e);
        throw new RuntimeException(e);
    }
}
Also used : JobExecutionPlan(org.apache.gobblin.service.modules.spec.JobExecutionPlan) Dag(org.apache.gobblin.service.modules.flowgraph.Dag) IOException(java.io.IOException) RootMetricContext(org.apache.gobblin.metrics.RootMetricContext) MetricContext(org.apache.gobblin.metrics.MetricContext) ContextAwareMeter(org.apache.gobblin.metrics.ContextAwareMeter)

Example 12 with Dag

use of org.apache.gobblin.service.modules.flowgraph.Dag in project incubator-gobblin by apache.

the class MysqlDagStateStore method convertDagIntoState.

/**
 * For {@link Dag} to work with {@link MysqlStateStore}, it needs to be packaged into a {@link State} object.
 * The way that it does is simply serialize the {@link Dag} first and use the key {@link #DAG_KEY_IN_STATE}
 * to be pair with it.
 *
 * The serialization step is required for readability and portability of serde lib.
 * @param dag The dag to be converted.
 * @return An {@link State} object that contains a single k-v pair for {@link Dag}.
 */
private State convertDagIntoState(Dag<JobExecutionPlan> dag) {
    State outputState = new State();
    // Make sure the object has been serialized.
    List<JobExecutionPlan> jobExecutionPlanList = dag.getNodes().stream().map(Dag.DagNode::getValue).collect(Collectors.toList());
    outputState.setProp(DAG_KEY_IN_STATE, serDe.serialize(jobExecutionPlanList));
    return outputState;
}
Also used : JobExecutionPlan(org.apache.gobblin.service.modules.spec.JobExecutionPlan) State(org.apache.gobblin.configuration.State) Dag(org.apache.gobblin.service.modules.flowgraph.Dag)

Example 13 with Dag

use of org.apache.gobblin.service.modules.flowgraph.Dag in project incubator-gobblin by apache.

the class Orchestrator method deleteFromExecutor.

private void deleteFromExecutor(Spec spec, Properties headers) {
    Dag<JobExecutionPlan> jobExecutionPlanDag = specCompiler.compileFlow(spec);
    if (jobExecutionPlanDag.isEmpty()) {
        _log.warn("Cannot determine an executor to delete Spec: " + spec);
        return;
    }
    // Delete all compiled JobSpecs on their respective Executor
    for (Dag.DagNode<JobExecutionPlan> dagNode : jobExecutionPlanDag.getNodes()) {
        JobExecutionPlan jobExecutionPlan = dagNode.getValue();
        Spec jobSpec = jobExecutionPlan.getJobSpec();
        try {
            SpecProducer<Spec> producer = jobExecutionPlan.getSpecExecutor().getProducer().get();
            _log.info(String.format("Going to delete JobSpec: %s on Executor: %s", jobSpec, producer));
            producer.deleteSpec(jobSpec.getUri(), headers);
        } catch (Exception e) {
            _log.error(String.format("Could not delete JobSpec: %s for flow: %s", jobSpec, spec), e);
        }
    }
}
Also used : JobExecutionPlan(org.apache.gobblin.service.modules.spec.JobExecutionPlan) Dag(org.apache.gobblin.service.modules.flowgraph.Dag) FlowSpec(org.apache.gobblin.runtime.api.FlowSpec) TopologySpec(org.apache.gobblin.runtime.api.TopologySpec) JobSpec(org.apache.gobblin.runtime.api.JobSpec) Spec(org.apache.gobblin.runtime.api.Spec) InvocationTargetException(java.lang.reflect.InvocationTargetException) IOException(java.io.IOException)

Example 14 with Dag

use of org.apache.gobblin.service.modules.flowgraph.Dag in project incubator-gobblin by apache.

the class FlowGraphPath method concatenate.

/**
 * Concatenate two {@link Dag}s. Modify the {@link ConfigurationKeys#JOB_DEPENDENCIES} in the {@link JobSpec}s of the child
 * {@link Dag} to reflect the concatenation operation.
 * @param dagLeft The parent dag.
 * @param dagRight The child dag.
 * @return The concatenated dag with modified {@link ConfigurationKeys#JOB_DEPENDENCIES}.
 */
@VisibleForTesting
static Dag<JobExecutionPlan> concatenate(Dag<JobExecutionPlan> dagLeft, Dag<JobExecutionPlan> dagRight) {
    // Compute the fork nodes - set of nodes with no dependents in the concatenated dag.
    Set<DagNode<JobExecutionPlan>> forkNodes = dagLeft.getEndNodes().stream().filter(endNode -> isNodeForkable(endNode)).collect(Collectors.toSet());
    Set<DagNode<JobExecutionPlan>> dependencyNodes = dagLeft.getDependencyNodes(forkNodes);
    if (!dependencyNodes.isEmpty()) {
        List<String> dependenciesList = dependencyNodes.stream().map(dagNode -> dagNode.getValue().getJobSpec().getConfig().getString(ConfigurationKeys.JOB_NAME_KEY)).collect(Collectors.toList());
        String dependencies = Joiner.on(",").join(dependenciesList);
        for (DagNode<JobExecutionPlan> childNode : dagRight.getStartNodes()) {
            JobSpec jobSpec = childNode.getValue().getJobSpec();
            jobSpec.setConfig(jobSpec.getConfig().withValue(ConfigurationKeys.JOB_DEPENDENCIES, ConfigValueFactory.fromAnyRef(dependencies)));
        }
    }
    return dagLeft.concatenate(dagRight, forkNodes);
}
Also used : JobExecutionPlanDagFactory(org.apache.gobblin.service.modules.spec.JobExecutionPlanDagFactory) Getter(lombok.Getter) FlowTemplate(org.apache.gobblin.service.modules.template.FlowTemplate) URISyntaxException(java.net.URISyntaxException) ConfigValueFactory(com.typesafe.config.ConfigValueFactory) ConfigUtils(org.apache.gobblin.util.ConfigUtils) ArrayList(java.util.ArrayList) DatasetDescriptor(org.apache.gobblin.service.modules.dataset.DatasetDescriptor) JobSpec(org.apache.gobblin.runtime.api.JobSpec) Files(com.google.common.io.Files) Optional(com.google.common.base.Optional) Map(java.util.Map) Path(org.apache.hadoop.fs.Path) JobTemplate(org.apache.gobblin.runtime.api.JobTemplate) URI(java.net.URI) FlowEdge(org.apache.gobblin.service.modules.flowgraph.FlowEdge) SpecExecutor(org.apache.gobblin.runtime.api.SpecExecutor) Iterator(java.util.Iterator) Dag(org.apache.gobblin.service.modules.flowgraph.Dag) Config(com.typesafe.config.Config) Set(java.util.Set) ConfigurationKeys(org.apache.gobblin.configuration.ConfigurationKeys) Maps(com.google.common.collect.Maps) Collectors(java.util.stream.Collectors) SpecNotFoundException(org.apache.gobblin.runtime.api.SpecNotFoundException) List(java.util.List) DagNode(org.apache.gobblin.service.modules.flowgraph.Dag.DagNode) VisibleForTesting(com.google.common.annotations.VisibleForTesting) JobExecutionPlan(org.apache.gobblin.service.modules.spec.JobExecutionPlan) Joiner(com.google.common.base.Joiner) FlowSpec(org.apache.gobblin.runtime.api.FlowSpec) DagNode(org.apache.gobblin.service.modules.flowgraph.Dag.DagNode) JobExecutionPlan(org.apache.gobblin.service.modules.spec.JobExecutionPlan) JobSpec(org.apache.gobblin.runtime.api.JobSpec) VisibleForTesting(com.google.common.annotations.VisibleForTesting)

Aggregations

Dag (org.apache.gobblin.service.modules.flowgraph.Dag)14 JobExecutionPlan (org.apache.gobblin.service.modules.spec.JobExecutionPlan)12 Config (com.typesafe.config.Config)7 FlowSpec (org.apache.gobblin.runtime.api.FlowSpec)7 JobSpec (org.apache.gobblin.runtime.api.JobSpec)6 Spec (org.apache.gobblin.runtime.api.Spec)6 Test (org.testng.annotations.Test)6 IOException (java.io.IOException)5 TopologySpec (org.apache.gobblin.runtime.api.TopologySpec)5 File (java.io.File)4 ArrayList (java.util.ArrayList)4 Path (org.apache.hadoop.fs.Path)4 Joiner (com.google.common.base.Joiner)3 Optional (com.google.common.base.Optional)3 InvocationTargetException (java.lang.reflect.InvocationTargetException)3 URISyntaxException (java.net.URISyntaxException)3 List (java.util.List)3 Collectors (java.util.stream.Collectors)3 ConfigurationKeys (org.apache.gobblin.configuration.ConfigurationKeys)3 SpecExecutor (org.apache.gobblin.runtime.api.SpecExecutor)3