Examples with MapReduce - co.cask.cdap.api.mapreduce.MapReduce

Example 1 with MapReduce

use of co.cask.cdap.api.mapreduce.MapReduce in project cdap by caskdata.

the class MapReduceProgramRunner method run.

@Override
public ProgramController run(final Program program, ProgramOptions options) {
    // Extract and verify parameters
    ApplicationSpecification appSpec = program.getApplicationSpecification();
    Preconditions.checkNotNull(appSpec, "Missing application specification.");
    ProgramType processorType = program.getType();
    Preconditions.checkNotNull(processorType, "Missing processor type.");
    Preconditions.checkArgument(processorType == ProgramType.MAPREDUCE, "Only MAPREDUCE process type is supported.");
    MapReduceSpecification spec = appSpec.getMapReduce().get(program.getName());
    Preconditions.checkNotNull(spec, "Missing MapReduceSpecification for %s", program.getName());
    Arguments arguments = options.getArguments();
    RunId runId = ProgramRunners.getRunId(options);
    WorkflowProgramInfo workflowInfo = WorkflowProgramInfo.create(arguments);
    DatasetFramework programDatasetFramework = workflowInfo == null ? datasetFramework : NameMappedDatasetFramework.createFromWorkflowProgramInfo(datasetFramework, workflowInfo, appSpec);
    // Setup dataset framework context, if required
    if (programDatasetFramework instanceof ProgramContextAware) {
        ProgramId programId = program.getId();
        ((ProgramContextAware) programDatasetFramework).setContext(new BasicProgramContext(programId.run(runId)));
    }
    MapReduce mapReduce;
    try {
        mapReduce = new InstantiatorFactory(false).get(TypeToken.of(program.<MapReduce>getMainClass())).create();
    } catch (Exception e) {
        LOG.error("Failed to instantiate MapReduce class for {}", spec.getClassName(), e);
        throw Throwables.propagate(e);
    }
    // List of all Closeable resources that needs to be cleanup
    List<Closeable> closeables = new ArrayList<>();
    try {
        PluginInstantiator pluginInstantiator = createPluginInstantiator(options, program.getClassLoader());
        if (pluginInstantiator != null) {
            closeables.add(pluginInstantiator);
        }
        final BasicMapReduceContext context = new BasicMapReduceContext(program, options, cConf, spec, workflowInfo, discoveryServiceClient, metricsCollectionService, txSystemClient, programDatasetFramework, streamAdmin, getPluginArchive(options), pluginInstantiator, secureStore, secureStoreManager, messagingService);
        Reflections.visit(mapReduce, mapReduce.getClass(), new PropertyFieldSetter(context.getSpecification().getProperties()), new MetricsFieldSetter(context.getMetrics()), new DataSetFieldSetter(context));
        // note: this sets logging context on the thread level
        LoggingContextAccessor.setLoggingContext(context.getLoggingContext());
        // Set the job queue to hConf if it is provided
        Configuration hConf = new Configuration(this.hConf);
        String schedulerQueue = options.getArguments().getOption(Constants.AppFabric.APP_SCHEDULER_QUEUE);
        if (schedulerQueue != null && !schedulerQueue.isEmpty()) {
            hConf.set(JobContext.QUEUE_NAME, schedulerQueue);
        }
        Service mapReduceRuntimeService = new MapReduceRuntimeService(injector, cConf, hConf, mapReduce, spec, context, program.getJarLocation(), locationFactory, streamAdmin, txSystemClient, authorizationEnforcer, authenticationContext);
        mapReduceRuntimeService.addListener(createRuntimeServiceListener(program.getId(), runId, closeables, arguments, options.getUserArguments()), Threads.SAME_THREAD_EXECUTOR);
        final ProgramController controller = new MapReduceProgramController(mapReduceRuntimeService, context);
        LOG.debug("Starting MapReduce Job: {}", context);
        // be running the job, but the data directory will be owned by cdap.
        if (MapReduceTaskContextProvider.isLocal(hConf) || UserGroupInformation.isSecurityEnabled()) {
            mapReduceRuntimeService.start();
        } else {
            ProgramRunners.startAsUser(cConf.get(Constants.CFG_HDFS_USER), mapReduceRuntimeService);
        }
        return controller;
    } catch (Exception e) {
        closeAllQuietly(closeables);
        throw Throwables.propagate(e);
    }
}

Also used : ApplicationSpecification(co.cask.cdap.api.app.ApplicationSpecification) CConfiguration(co.cask.cdap.common.conf.CConfiguration) Configuration(org.apache.hadoop.conf.Configuration) Closeable(java.io.Closeable) ArrayList(java.util.ArrayList) MapReduce(co.cask.cdap.api.mapreduce.MapReduce) NameMappedDatasetFramework(co.cask.cdap.internal.app.runtime.workflow.NameMappedDatasetFramework) DatasetFramework(co.cask.cdap.data2.dataset2.DatasetFramework) InstantiatorFactory(co.cask.cdap.common.lang.InstantiatorFactory) MetricsFieldSetter(co.cask.cdap.internal.app.runtime.MetricsFieldSetter) ProgramType(co.cask.cdap.proto.ProgramType) RunId(org.apache.twill.api.RunId) ProgramController(co.cask.cdap.app.runtime.ProgramController) MapReduceSpecification(co.cask.cdap.api.mapreduce.MapReduceSpecification) Arguments(co.cask.cdap.app.runtime.Arguments) MessagingService(co.cask.cdap.messaging.MessagingService) MetricsCollectionService(co.cask.cdap.api.metrics.MetricsCollectionService) Service(com.google.common.util.concurrent.Service) ProgramId(co.cask.cdap.proto.id.ProgramId) BasicProgramContext(co.cask.cdap.internal.app.runtime.BasicProgramContext) DataSetFieldSetter(co.cask.cdap.internal.app.runtime.DataSetFieldSetter) PropertyFieldSetter(co.cask.cdap.common.lang.PropertyFieldSetter) WorkflowProgramInfo(co.cask.cdap.internal.app.runtime.workflow.WorkflowProgramInfo) PluginInstantiator(co.cask.cdap.internal.app.runtime.plugin.PluginInstantiator) ProgramContextAware(co.cask.cdap.data.ProgramContextAware)

Example 2 with MapReduce

use of co.cask.cdap.api.mapreduce.MapReduce in project cdap by caskdata.

the class MapReduceRuntimeService method destroy.

/**
   * Calls the destroy method of {@link ProgramLifecycle}.
   */
private void destroy(final boolean succeeded, final String failureInfo) throws Exception {
    // if any exception happens during output committing, we want the MapReduce to fail.
    // for that to happen it is not sufficient to set the status to failed, we have to throw an exception,
    // otherwise the shutdown completes successfully and the completed() callback is called.
    // thus: remember the exception and throw it at the end.
    final AtomicReference<Exception> failureCause = new AtomicReference<>();
    // TODO (CDAP-1952): this should be done in the output committer, to make the M/R fail if addPartition fails
    try {
        context.execute(new TxRunnable() {

            @Override
            public void run(DatasetContext ctxt) throws Exception {
                ClassLoader oldClassLoader = ClassLoaders.setContextClassLoader(job.getConfiguration().getClassLoader());
                try {
                    for (Map.Entry<String, ProvidedOutput> output : context.getOutputs().entrySet()) {
                        commitOutput(succeeded, output.getKey(), output.getValue().getOutputFormatProvider(), failureCause);
                        if (succeeded && failureCause.get() != null) {
                            // mapreduce was successful but this output committer failed: call onFailure() for all committers
                            for (ProvidedOutput toFail : context.getOutputs().values()) {
                                commitOutput(false, toFail.getAlias(), toFail.getOutputFormatProvider(), failureCause);
                            }
                            break;
                        }
                    }
                    // if there was a failure, we must throw an exception to fail the transaction
                    // this will roll back all the outputs and also make sure that postCommit() is not called
                    // throwing the failure cause: it will be wrapped in a TxFailure and handled in the outer catch()
                    Exception cause = failureCause.get();
                    if (cause != null) {
                        failureCause.set(null);
                        throw cause;
                    }
                } finally {
                    ClassLoaders.setContextClassLoader(oldClassLoader);
                }
            }
        });
    } catch (TransactionFailureException e) {
        LOG.error("Transaction failure when committing dataset outputs", e);
        if (failureCause.get() != null) {
            failureCause.get().addSuppressed(e);
        } else {
            failureCause.set(e);
        }
    }
    final boolean success = succeeded && failureCause.get() == null;
    context.setState(getProgramState(success, failureInfo));
    final TransactionControl txControl = mapReduce instanceof ProgramLifecycle ? Transactions.getTransactionControl(TransactionControl.IMPLICIT, MapReduce.class, mapReduce, "destroy") : TransactionControl.IMPLICIT;
    try {
        if (TransactionControl.IMPLICIT == txControl) {
            context.execute(new TxRunnable() {

                @Override
                public void run(DatasetContext context) throws Exception {
                    doDestroy(success);
                }
            });
        } else {
            doDestroy(success);
        }
    } catch (Throwable e) {
        if (e instanceof TransactionFailureException && e.getCause() != null && !(e instanceof TransactionConflictException)) {
            e = e.getCause();
        }
        LOG.warn("Error executing the destroy method of the MapReduce program {}", context.getProgram().getName(), e);
    }
    // this is needed to make the run fail if there was an exception. See comment at beginning of this method
    if (failureCause.get() != null) {
        throw failureCause.get();
    }
}

Also used : ProgramLifecycle(co.cask.cdap.api.ProgramLifecycle) TransactionConflictException(org.apache.tephra.TransactionConflictException) AtomicReference(java.util.concurrent.atomic.AtomicReference) ProvidedOutput(co.cask.cdap.internal.app.runtime.batch.dataset.output.ProvidedOutput) ProvisionException(com.google.inject.ProvisionException) IOException(java.io.IOException) TransactionFailureException(org.apache.tephra.TransactionFailureException) URISyntaxException(java.net.URISyntaxException) TransactionConflictException(org.apache.tephra.TransactionConflictException) AbstractMapReduce(co.cask.cdap.api.mapreduce.AbstractMapReduce) MapReduce(co.cask.cdap.api.mapreduce.MapReduce) JarEntry(java.util.jar.JarEntry) TransactionFailureException(org.apache.tephra.TransactionFailureException) TxRunnable(co.cask.cdap.api.TxRunnable) TransactionControl(co.cask.cdap.api.annotation.TransactionControl) WeakReferenceDelegatorClassLoader(co.cask.cdap.common.lang.WeakReferenceDelegatorClassLoader) CombineClassLoader(co.cask.cdap.common.lang.CombineClassLoader) DatasetContext(co.cask.cdap.api.data.DatasetContext)

Aggregations

MapReduce (co.cask.cdap.api.mapreduce.MapReduce)2 ProgramLifecycle (co.cask.cdap.api.ProgramLifecycle)1 TxRunnable (co.cask.cdap.api.TxRunnable)1 TransactionControl (co.cask.cdap.api.annotation.TransactionControl)1 ApplicationSpecification (co.cask.cdap.api.app.ApplicationSpecification)1 DatasetContext (co.cask.cdap.api.data.DatasetContext)1 AbstractMapReduce (co.cask.cdap.api.mapreduce.AbstractMapReduce)1 MapReduceSpecification (co.cask.cdap.api.mapreduce.MapReduceSpecification)1 MetricsCollectionService (co.cask.cdap.api.metrics.MetricsCollectionService)1 Arguments (co.cask.cdap.app.runtime.Arguments)1 ProgramController (co.cask.cdap.app.runtime.ProgramController)1 CConfiguration (co.cask.cdap.common.conf.CConfiguration)1 CombineClassLoader (co.cask.cdap.common.lang.CombineClassLoader)1 InstantiatorFactory (co.cask.cdap.common.lang.InstantiatorFactory)1 PropertyFieldSetter (co.cask.cdap.common.lang.PropertyFieldSetter)1 WeakReferenceDelegatorClassLoader (co.cask.cdap.common.lang.WeakReferenceDelegatorClassLoader)1 ProgramContextAware (co.cask.cdap.data.ProgramContextAware)1 DatasetFramework (co.cask.cdap.data2.dataset2.DatasetFramework)1 BasicProgramContext (co.cask.cdap.internal.app.runtime.BasicProgramContext)1 DataSetFieldSetter (co.cask.cdap.internal.app.runtime.DataSetFieldSetter)1