Search in sources :

Example 1 with MapReduceSpecification

use of io.cdap.cdap.api.mapreduce.MapReduceSpecification in project cdap by caskdata.

the class DistributedMapReduceProgramRunner method setupLaunchConfig.

@Override
protected void setupLaunchConfig(ProgramLaunchConfig launchConfig, Program program, ProgramOptions options, CConfiguration cConf, Configuration hConf, File tempDir) {
    ApplicationSpecification appSpec = program.getApplicationSpecification();
    MapReduceSpecification spec = appSpec.getMapReduce().get(program.getName());
    // Get the resource for the container that runs the mapred client that will launch the actual mapred job.
    Map<String, String> clientArgs = RuntimeArguments.extractScope("task", "client", options.getUserArguments().asMap());
    // Add runnable. Only one instance for the MR driver
    launchConfig.addRunnable(spec.getName(), new MapReduceTwillRunnable(spec.getName()), 1, clientArgs, spec.getDriverResources(), 0);
    if (clusterMode == ClusterMode.ON_PREMISE || cConf.getBoolean(Constants.AppFabric.PROGRAM_REMOTE_RUNNER, false)) {
        // Add extra resources, classpath and dependencies
        launchConfig.addExtraResources(MapReduceContainerHelper.localizeFramework(hConf, new HashMap<>())).addExtraClasspath(MapReduceContainerHelper.addMapReduceClassPath(hConf, new ArrayList<>())).addExtraDependencies(YarnClientProtocolProvider.class);
    }
}
Also used : ApplicationSpecification(io.cdap.cdap.api.app.ApplicationSpecification) HashMap(java.util.HashMap) MapReduceSpecification(io.cdap.cdap.api.mapreduce.MapReduceSpecification)

Example 2 with MapReduceSpecification

use of io.cdap.cdap.api.mapreduce.MapReduceSpecification in project cdap by caskdata.

the class DistributedMapReduceProgramRunner method validateOptions.

@Override
protected void validateOptions(Program program, ProgramOptions options) {
    super.validateOptions(program, options);
    // Extract and verify parameters
    ApplicationSpecification appSpec = program.getApplicationSpecification();
    Preconditions.checkNotNull(appSpec, "Missing application specification.");
    ProgramType processorType = program.getType();
    Preconditions.checkNotNull(processorType, "Missing processor type.");
    Preconditions.checkArgument(processorType == ProgramType.MAPREDUCE, "Only MapReduce process type is supported.");
    MapReduceSpecification spec = appSpec.getMapReduce().get(program.getName());
    Preconditions.checkNotNull(spec, "Missing MapReduceSpecification for %s", program.getName());
}
Also used : ApplicationSpecification(io.cdap.cdap.api.app.ApplicationSpecification) MapReduceSpecification(io.cdap.cdap.api.mapreduce.MapReduceSpecification) ProgramType(io.cdap.cdap.proto.ProgramType)

Example 3 with MapReduceSpecification

use of io.cdap.cdap.api.mapreduce.MapReduceSpecification in project cdap by caskdata.

the class DefaultStoreTest method testLoadingProgram.

@Test
public void testLoadingProgram() throws Exception {
    ApplicationSpecification appSpec = Specifications.from(new FooApp());
    ApplicationId appId = NamespaceId.DEFAULT.app(appSpec.getName());
    store.addApplication(appId, appSpec);
    ProgramDescriptor descriptor = store.loadProgram(appId.mr("mrJob1"));
    Assert.assertNotNull(descriptor);
    MapReduceSpecification mrSpec = descriptor.getSpecification();
    Assert.assertEquals("mrJob1", mrSpec.getName());
    Assert.assertEquals(FooMapReduceJob.class.getName(), mrSpec.getClassName());
}
Also used : ApplicationSpecification(io.cdap.cdap.api.app.ApplicationSpecification) MapReduceSpecification(io.cdap.cdap.api.mapreduce.MapReduceSpecification) ProgramDescriptor(io.cdap.cdap.app.program.ProgramDescriptor) ApplicationId(io.cdap.cdap.proto.id.ApplicationId) Test(org.junit.Test)

Example 4 with MapReduceSpecification

use of io.cdap.cdap.api.mapreduce.MapReduceSpecification in project cdap by caskdata.

the class DefaultMapReduceConfigurer method createSpecification.

public MapReduceSpecification createSpecification() {
    Set<String> datasets = new HashSet<>();
    Reflections.visit(mapReduce, mapReduce.getClass(), new PropertyFieldExtractor(properties), new DataSetFieldExtractor(datasets));
    return new MapReduceSpecification(mapReduce.getClass().getName(), name, description, inputDataset, outputDataset, datasets, properties, driverResources, mapperResources, reducerResources, getPlugins());
}
Also used : MapReduceSpecification(io.cdap.cdap.api.mapreduce.MapReduceSpecification) PropertyFieldExtractor(io.cdap.cdap.internal.specification.PropertyFieldExtractor) DataSetFieldExtractor(io.cdap.cdap.internal.specification.DataSetFieldExtractor) HashSet(java.util.HashSet)

Example 5 with MapReduceSpecification

use of io.cdap.cdap.api.mapreduce.MapReduceSpecification in project cdap by caskdata.

the class MapReduceTaskContextProvider method createCacheLoader.

/**
 * Creates a {@link CacheLoader} for the task context cache.
 */
private CacheLoader<ContextCacheKey, BasicMapReduceTaskContext> createCacheLoader(final Injector injector) {
    DiscoveryServiceClient discoveryServiceClient = injector.getInstance(DiscoveryServiceClient.class);
    DatasetFramework datasetFramework = injector.getInstance(DatasetFramework.class);
    SecureStore secureStore = injector.getInstance(SecureStore.class);
    SecureStoreManager secureStoreManager = injector.getInstance(SecureStoreManager.class);
    MessagingService messagingService = injector.getInstance(MessagingService.class);
    // Multiple instances of BasicMapReduceTaskContext can share the same program.
    AtomicReference<Program> programRef = new AtomicReference<>();
    MetadataReader metadataReader = injector.getInstance(MetadataReader.class);
    MetadataPublisher metadataPublisher = injector.getInstance(MetadataPublisher.class);
    FieldLineageWriter fieldLineageWriter = injector.getInstance(FieldLineageWriter.class);
    RemoteClientFactory remoteClientFactory = injector.getInstance(RemoteClientFactory.class);
    return new CacheLoader<ContextCacheKey, BasicMapReduceTaskContext>() {

        @Override
        public BasicMapReduceTaskContext load(ContextCacheKey key) throws Exception {
            TaskAttemptID taskAttemptId = key.getTaskAttemptID();
            // taskAttemptId could be null if used from a org.apache.hadoop.mapreduce.Partitioner or
            // from a org.apache.hadoop.io.RawComparator, in which case we can get the JobId from the conf. Note that the
            // JobId isn't in the conf for the OutputCommitter#setupJob method, in which case we use the taskAttemptId
            Path txFile = MainOutputCommitter.getTxFile(key.getConfiguration(), taskAttemptId != null ? taskAttemptId.getJobID() : null);
            FileSystem fs = txFile.getFileSystem(key.getConfiguration());
            Transaction transaction = null;
            if (fs.exists(txFile)) {
                try (FSDataInputStream txFileInputStream = fs.open(txFile)) {
                    transaction = new TransactionCodec().decode(ByteStreams.toByteArray(txFileInputStream));
                }
            }
            MapReduceContextConfig contextConfig = new MapReduceContextConfig(key.getConfiguration());
            MapReduceClassLoader classLoader = MapReduceClassLoader.getFromConfiguration(key.getConfiguration());
            Program program = programRef.get();
            if (program == null) {
                // Creation of program is relatively cheap, so just create and do compare and set.
                programRef.compareAndSet(null, createProgram(contextConfig, classLoader.getProgramClassLoader()));
                program = programRef.get();
            }
            WorkflowProgramInfo workflowInfo = contextConfig.getWorkflowProgramInfo();
            DatasetFramework programDatasetFramework = workflowInfo == null ? datasetFramework : NameMappedDatasetFramework.createFromWorkflowProgramInfo(datasetFramework, workflowInfo, program.getApplicationSpecification());
            // Setup dataset framework context, if required
            if (programDatasetFramework instanceof ProgramContextAware) {
                ProgramRunId programRunId = program.getId().run(ProgramRunners.getRunId(contextConfig.getProgramOptions()));
                ((ProgramContextAware) programDatasetFramework).setContext(new BasicProgramContext(programRunId));
            }
            MapReduceSpecification spec = program.getApplicationSpecification().getMapReduce().get(program.getName());
            MetricsCollectionService metricsCollectionService = null;
            MapReduceMetrics.TaskType taskType = null;
            String taskId = null;
            ProgramOptions options = contextConfig.getProgramOptions();
            // from a org.apache.hadoop.io.RawComparator
            if (taskAttemptId != null) {
                taskId = taskAttemptId.getTaskID().toString();
                if (MapReduceMetrics.TaskType.hasType(taskAttemptId.getTaskType())) {
                    taskType = MapReduceMetrics.TaskType.from(taskAttemptId.getTaskType());
                    // if this is not for a mapper or a reducer, we don't need the metrics collection service
                    metricsCollectionService = injector.getInstance(MetricsCollectionService.class);
                    options = new SimpleProgramOptions(options.getProgramId(), options.getArguments(), new BasicArguments(RuntimeArguments.extractScope("task", taskType.toString().toLowerCase(), contextConfig.getProgramOptions().getUserArguments().asMap())), options.isDebug());
                }
            }
            CConfiguration cConf = injector.getInstance(CConfiguration.class);
            TransactionSystemClient txClient = injector.getInstance(TransactionSystemClient.class);
            NamespaceQueryAdmin namespaceQueryAdmin = injector.getInstance(NamespaceQueryAdmin.class);
            return new BasicMapReduceTaskContext(program, options, cConf, taskType, taskId, spec, workflowInfo, discoveryServiceClient, metricsCollectionService, txClient, transaction, programDatasetFramework, classLoader.getPluginInstantiator(), contextConfig.getLocalizedResources(), secureStore, secureStoreManager, accessEnforcer, authenticationContext, messagingService, mapReduceClassLoader, metadataReader, metadataPublisher, namespaceQueryAdmin, fieldLineageWriter, remoteClientFactory);
        }
    };
}
Also used : RemoteClientFactory(io.cdap.cdap.common.internal.remote.RemoteClientFactory) DiscoveryServiceClient(org.apache.twill.discovery.DiscoveryServiceClient) TaskAttemptID(org.apache.hadoop.mapreduce.TaskAttemptID) DatasetFramework(io.cdap.cdap.data2.dataset2.DatasetFramework) NameMappedDatasetFramework(io.cdap.cdap.internal.app.runtime.workflow.NameMappedDatasetFramework) TransactionSystemClient(org.apache.tephra.TransactionSystemClient) FileSystem(org.apache.hadoop.fs.FileSystem) NamespaceQueryAdmin(io.cdap.cdap.common.namespace.NamespaceQueryAdmin) SecureStoreManager(io.cdap.cdap.api.security.store.SecureStoreManager) BasicArguments(io.cdap.cdap.internal.app.runtime.BasicArguments) MapReduceMetrics(io.cdap.cdap.app.metrics.MapReduceMetrics) Path(org.apache.hadoop.fs.Path) DefaultProgram(io.cdap.cdap.app.program.DefaultProgram) Program(io.cdap.cdap.app.program.Program) MetricsCollectionService(io.cdap.cdap.api.metrics.MetricsCollectionService) MapReduceSpecification(io.cdap.cdap.api.mapreduce.MapReduceSpecification) MetadataReader(io.cdap.cdap.api.metadata.MetadataReader) MetadataPublisher(io.cdap.cdap.data2.metadata.writer.MetadataPublisher) AtomicReference(java.util.concurrent.atomic.AtomicReference) BasicProgramContext(io.cdap.cdap.internal.app.runtime.BasicProgramContext) SecureStore(io.cdap.cdap.api.security.store.SecureStore) CConfiguration(io.cdap.cdap.common.conf.CConfiguration) SimpleProgramOptions(io.cdap.cdap.internal.app.runtime.SimpleProgramOptions) ProgramOptions(io.cdap.cdap.app.runtime.ProgramOptions) MessagingService(io.cdap.cdap.messaging.MessagingService) Transaction(org.apache.tephra.Transaction) WorkflowProgramInfo(io.cdap.cdap.internal.app.runtime.workflow.WorkflowProgramInfo) TransactionCodec(org.apache.tephra.TransactionCodec) FSDataInputStream(org.apache.hadoop.fs.FSDataInputStream) CacheLoader(com.google.common.cache.CacheLoader) ProgramRunId(io.cdap.cdap.proto.id.ProgramRunId) SimpleProgramOptions(io.cdap.cdap.internal.app.runtime.SimpleProgramOptions) ProgramContextAware(io.cdap.cdap.data.ProgramContextAware) FieldLineageWriter(io.cdap.cdap.data2.metadata.writer.FieldLineageWriter)

Aggregations

MapReduceSpecification (io.cdap.cdap.api.mapreduce.MapReduceSpecification)11 ApplicationSpecification (io.cdap.cdap.api.app.ApplicationSpecification)5 JsonObject (com.google.gson.JsonObject)2 MetricsCollectionService (io.cdap.cdap.api.metrics.MetricsCollectionService)2 Plugin (io.cdap.cdap.api.plugin.Plugin)2 ServiceSpecification (io.cdap.cdap.api.service.ServiceSpecification)2 SparkSpecification (io.cdap.cdap.api.spark.SparkSpecification)2 CConfiguration (io.cdap.cdap.common.conf.CConfiguration)2 ProgramContextAware (io.cdap.cdap.data.ProgramContextAware)2 DatasetFramework (io.cdap.cdap.data2.dataset2.DatasetFramework)2 BasicProgramContext (io.cdap.cdap.internal.app.runtime.BasicProgramContext)2 NameMappedDatasetFramework (io.cdap.cdap.internal.app.runtime.workflow.NameMappedDatasetFramework)2 WorkflowProgramInfo (io.cdap.cdap.internal.app.runtime.workflow.WorkflowProgramInfo)2 MessagingService (io.cdap.cdap.messaging.MessagingService)2 ProgramType (io.cdap.cdap.proto.ProgramType)2 ApplicationId (io.cdap.cdap.proto.id.ApplicationId)2 ProgramId (io.cdap.cdap.proto.id.ProgramId)2 CacheLoader (com.google.common.cache.CacheLoader)1 Service (com.google.common.util.concurrent.Service)1 JsonElement (com.google.gson.JsonElement)1