Search in sources :

Example 1 with DatasetFramework

use of co.cask.cdap.data2.dataset2.DatasetFramework in project cdap by caskdata.

the class MapReduceProgramRunner method run.

@Override
public ProgramController run(final Program program, ProgramOptions options) {
    // Extract and verify parameters
    ApplicationSpecification appSpec = program.getApplicationSpecification();
    Preconditions.checkNotNull(appSpec, "Missing application specification.");
    ProgramType processorType = program.getType();
    Preconditions.checkNotNull(processorType, "Missing processor type.");
    Preconditions.checkArgument(processorType == ProgramType.MAPREDUCE, "Only MAPREDUCE process type is supported.");
    MapReduceSpecification spec = appSpec.getMapReduce().get(program.getName());
    Preconditions.checkNotNull(spec, "Missing MapReduceSpecification for %s", program.getName());
    Arguments arguments = options.getArguments();
    RunId runId = ProgramRunners.getRunId(options);
    WorkflowProgramInfo workflowInfo = WorkflowProgramInfo.create(arguments);
    DatasetFramework programDatasetFramework = workflowInfo == null ? datasetFramework : NameMappedDatasetFramework.createFromWorkflowProgramInfo(datasetFramework, workflowInfo, appSpec);
    // Setup dataset framework context, if required
    if (programDatasetFramework instanceof ProgramContextAware) {
        ProgramId programId = program.getId();
        ((ProgramContextAware) programDatasetFramework).setContext(new BasicProgramContext(programId.run(runId)));
    }
    MapReduce mapReduce;
    try {
        mapReduce = new InstantiatorFactory(false).get(TypeToken.of(program.<MapReduce>getMainClass())).create();
    } catch (Exception e) {
        LOG.error("Failed to instantiate MapReduce class for {}", spec.getClassName(), e);
        throw Throwables.propagate(e);
    }
    // List of all Closeable resources that needs to be cleanup
    List<Closeable> closeables = new ArrayList<>();
    try {
        PluginInstantiator pluginInstantiator = createPluginInstantiator(options, program.getClassLoader());
        if (pluginInstantiator != null) {
            closeables.add(pluginInstantiator);
        }
        final BasicMapReduceContext context = new BasicMapReduceContext(program, options, cConf, spec, workflowInfo, discoveryServiceClient, metricsCollectionService, txSystemClient, programDatasetFramework, streamAdmin, getPluginArchive(options), pluginInstantiator, secureStore, secureStoreManager, messagingService);
        Reflections.visit(mapReduce, mapReduce.getClass(), new PropertyFieldSetter(context.getSpecification().getProperties()), new MetricsFieldSetter(context.getMetrics()), new DataSetFieldSetter(context));
        // note: this sets logging context on the thread level
        LoggingContextAccessor.setLoggingContext(context.getLoggingContext());
        // Set the job queue to hConf if it is provided
        Configuration hConf = new Configuration(this.hConf);
        String schedulerQueue = options.getArguments().getOption(Constants.AppFabric.APP_SCHEDULER_QUEUE);
        if (schedulerQueue != null && !schedulerQueue.isEmpty()) {
            hConf.set(JobContext.QUEUE_NAME, schedulerQueue);
        }
        Service mapReduceRuntimeService = new MapReduceRuntimeService(injector, cConf, hConf, mapReduce, spec, context, program.getJarLocation(), locationFactory, streamAdmin, txSystemClient, authorizationEnforcer, authenticationContext);
        mapReduceRuntimeService.addListener(createRuntimeServiceListener(program.getId(), runId, closeables, arguments, options.getUserArguments()), Threads.SAME_THREAD_EXECUTOR);
        final ProgramController controller = new MapReduceProgramController(mapReduceRuntimeService, context);
        LOG.debug("Starting MapReduce Job: {}", context);
        // be running the job, but the data directory will be owned by cdap.
        if (MapReduceTaskContextProvider.isLocal(hConf) || UserGroupInformation.isSecurityEnabled()) {
            mapReduceRuntimeService.start();
        } else {
            ProgramRunners.startAsUser(cConf.get(Constants.CFG_HDFS_USER), mapReduceRuntimeService);
        }
        return controller;
    } catch (Exception e) {
        closeAllQuietly(closeables);
        throw Throwables.propagate(e);
    }
}
Also used : ApplicationSpecification(co.cask.cdap.api.app.ApplicationSpecification) CConfiguration(co.cask.cdap.common.conf.CConfiguration) Configuration(org.apache.hadoop.conf.Configuration) Closeable(java.io.Closeable) ArrayList(java.util.ArrayList) MapReduce(co.cask.cdap.api.mapreduce.MapReduce) NameMappedDatasetFramework(co.cask.cdap.internal.app.runtime.workflow.NameMappedDatasetFramework) DatasetFramework(co.cask.cdap.data2.dataset2.DatasetFramework) InstantiatorFactory(co.cask.cdap.common.lang.InstantiatorFactory) MetricsFieldSetter(co.cask.cdap.internal.app.runtime.MetricsFieldSetter) ProgramType(co.cask.cdap.proto.ProgramType) RunId(org.apache.twill.api.RunId) ProgramController(co.cask.cdap.app.runtime.ProgramController) MapReduceSpecification(co.cask.cdap.api.mapreduce.MapReduceSpecification) Arguments(co.cask.cdap.app.runtime.Arguments) MessagingService(co.cask.cdap.messaging.MessagingService) MetricsCollectionService(co.cask.cdap.api.metrics.MetricsCollectionService) Service(com.google.common.util.concurrent.Service) ProgramId(co.cask.cdap.proto.id.ProgramId) BasicProgramContext(co.cask.cdap.internal.app.runtime.BasicProgramContext) DataSetFieldSetter(co.cask.cdap.internal.app.runtime.DataSetFieldSetter) PropertyFieldSetter(co.cask.cdap.common.lang.PropertyFieldSetter) WorkflowProgramInfo(co.cask.cdap.internal.app.runtime.workflow.WorkflowProgramInfo) PluginInstantiator(co.cask.cdap.internal.app.runtime.plugin.PluginInstantiator) ProgramContextAware(co.cask.cdap.data.ProgramContextAware)

Example 2 with DatasetFramework

use of co.cask.cdap.data2.dataset2.DatasetFramework in project cdap by caskdata.

the class MapReduceTaskContextProvider method createCacheLoader.

/**
   * Creates a {@link CacheLoader} for the task context cache.
   */
private CacheLoader<ContextCacheKey, BasicMapReduceTaskContext> createCacheLoader(final Injector injector) {
    final DiscoveryServiceClient discoveryServiceClient = injector.getInstance(DiscoveryServiceClient.class);
    final DatasetFramework datasetFramework = injector.getInstance(DatasetFramework.class);
    final SecureStore secureStore = injector.getInstance(SecureStore.class);
    final SecureStoreManager secureStoreManager = injector.getInstance(SecureStoreManager.class);
    final MessagingService messagingService = injector.getInstance(MessagingService.class);
    // Multiple instances of BasicMapReduceTaskContext can shares the same program.
    final AtomicReference<Program> programRef = new AtomicReference<>();
    return new CacheLoader<ContextCacheKey, BasicMapReduceTaskContext>() {

        @Override
        public BasicMapReduceTaskContext load(ContextCacheKey key) throws Exception {
            MapReduceContextConfig contextConfig = new MapReduceContextConfig(key.getConfiguration());
            MapReduceClassLoader classLoader = MapReduceClassLoader.getFromConfiguration(key.getConfiguration());
            Program program = programRef.get();
            if (program == null) {
                // Creation of program is relatively cheap, so just create and do compare and set.
                programRef.compareAndSet(null, createProgram(contextConfig, classLoader.getProgramClassLoader()));
                program = programRef.get();
            }
            WorkflowProgramInfo workflowInfo = contextConfig.getWorkflowProgramInfo();
            DatasetFramework programDatasetFramework = workflowInfo == null ? datasetFramework : NameMappedDatasetFramework.createFromWorkflowProgramInfo(datasetFramework, workflowInfo, program.getApplicationSpecification());
            // Setup dataset framework context, if required
            if (programDatasetFramework instanceof ProgramContextAware) {
                ProgramRunId programRunId = program.getId().run(ProgramRunners.getRunId(contextConfig.getProgramOptions()));
                ((ProgramContextAware) programDatasetFramework).setContext(new BasicProgramContext(programRunId));
            }
            MapReduceSpecification spec = program.getApplicationSpecification().getMapReduce().get(program.getName());
            MetricsCollectionService metricsCollectionService = null;
            MapReduceMetrics.TaskType taskType = null;
            String taskId = null;
            TaskAttemptID taskAttemptId = key.getTaskAttemptID();
            // from a org.apache.hadoop.io.RawComparator
            if (taskAttemptId != null) {
                taskId = taskAttemptId.getTaskID().toString();
                if (MapReduceMetrics.TaskType.hasType(taskAttemptId.getTaskType())) {
                    taskType = MapReduceMetrics.TaskType.from(taskAttemptId.getTaskType());
                    // if this is not for a mapper or a reducer, we don't need the metrics collection service
                    metricsCollectionService = injector.getInstance(MetricsCollectionService.class);
                }
            }
            CConfiguration cConf = injector.getInstance(CConfiguration.class);
            TransactionSystemClient txClient = injector.getInstance(TransactionSystemClient.class);
            return new BasicMapReduceTaskContext(program, contextConfig.getProgramOptions(), cConf, taskType, taskId, spec, workflowInfo, discoveryServiceClient, metricsCollectionService, txClient, contextConfig.getTx(), programDatasetFramework, classLoader.getPluginInstantiator(), contextConfig.getLocalizedResources(), secureStore, secureStoreManager, authorizationEnforcer, authenticationContext, messagingService);
        }
    };
}
Also used : DiscoveryServiceClient(org.apache.twill.discovery.DiscoveryServiceClient) TaskAttemptID(org.apache.hadoop.mapreduce.TaskAttemptID) DatasetFramework(co.cask.cdap.data2.dataset2.DatasetFramework) NameMappedDatasetFramework(co.cask.cdap.internal.app.runtime.workflow.NameMappedDatasetFramework) TransactionSystemClient(org.apache.tephra.TransactionSystemClient) SecureStoreManager(co.cask.cdap.api.security.store.SecureStoreManager) MapReduceMetrics(co.cask.cdap.app.metrics.MapReduceMetrics) Program(co.cask.cdap.app.program.Program) DefaultProgram(co.cask.cdap.app.program.DefaultProgram) MetricsCollectionService(co.cask.cdap.api.metrics.MetricsCollectionService) MapReduceSpecification(co.cask.cdap.api.mapreduce.MapReduceSpecification) AtomicReference(java.util.concurrent.atomic.AtomicReference) BasicProgramContext(co.cask.cdap.internal.app.runtime.BasicProgramContext) SecureStore(co.cask.cdap.api.security.store.SecureStore) CConfiguration(co.cask.cdap.common.conf.CConfiguration) MessagingService(co.cask.cdap.messaging.MessagingService) WorkflowProgramInfo(co.cask.cdap.internal.app.runtime.workflow.WorkflowProgramInfo) CacheLoader(com.google.common.cache.CacheLoader) ProgramRunId(co.cask.cdap.proto.id.ProgramRunId) ProgramContextAware(co.cask.cdap.data.ProgramContextAware)

Example 3 with DatasetFramework

use of co.cask.cdap.data2.dataset2.DatasetFramework in project cdap by caskdata.

the class ContextManager method createContext.

// this method is called by the mappers/reducers of jobs launched by Hive.
private static Context createContext(Configuration conf) throws IOException {
    // Create context needs to happen only when running in as a MapReduce job.
    // In other cases, ContextManager will be initialized using saveContext method.
    CConfiguration cConf = ConfigurationUtil.get(conf, Constants.Explore.CCONF_KEY, CConfCodec.INSTANCE);
    Configuration hConf = ConfigurationUtil.get(conf, Constants.Explore.HCONF_KEY, HConfCodec.INSTANCE);
    Injector injector = createInjector(cConf, hConf);
    ZKClientService zkClientService = injector.getInstance(ZKClientService.class);
    zkClientService.startAndWait();
    DatasetFramework datasetFramework = injector.getInstance(DatasetFramework.class);
    StreamAdmin streamAdmin = injector.getInstance(StreamAdmin.class);
    SystemDatasetInstantiatorFactory datasetInstantiatorFactory = injector.getInstance(SystemDatasetInstantiatorFactory.class);
    AuthenticationContext authenticationContext = injector.getInstance(AuthenticationContext.class);
    AuthorizationEnforcer authorizationEnforcer = injector.getInstance(AuthorizationEnforcer.class);
    return new Context(datasetFramework, streamAdmin, zkClientService, datasetInstantiatorFactory, authenticationContext, authorizationEnforcer);
}
Also used : DatasetFramework(co.cask.cdap.data2.dataset2.DatasetFramework) AuthenticationContext(co.cask.cdap.security.spi.authentication.AuthenticationContext) StreamAdmin(co.cask.cdap.data2.transaction.stream.StreamAdmin) SystemDatasetInstantiatorFactory(co.cask.cdap.data.dataset.SystemDatasetInstantiatorFactory) AuthenticationContext(co.cask.cdap.security.spi.authentication.AuthenticationContext) CConfiguration(co.cask.cdap.common.conf.CConfiguration) Configuration(org.apache.hadoop.conf.Configuration) ZKClientService(org.apache.twill.zookeeper.ZKClientService) Injector(com.google.inject.Injector) AuthorizationEnforcer(co.cask.cdap.security.spi.authorization.AuthorizationEnforcer) CConfiguration(co.cask.cdap.common.conf.CConfiguration)

Example 4 with DatasetFramework

use of co.cask.cdap.data2.dataset2.DatasetFramework in project cdap by caskdata.

the class JobQueueDatasetTest method checkDatasetType.

@Test
public void checkDatasetType() throws DatasetManagementException {
    DatasetFramework dsFramework = getInjector().getInstance(DatasetFramework.class);
    Assert.assertTrue(dsFramework.hasType(NamespaceId.SYSTEM.datasetType(JobQueueDataset.class.getName())));
}
Also used : DatasetFramework(co.cask.cdap.data2.dataset2.DatasetFramework) Test(org.junit.Test)

Example 5 with DatasetFramework

use of co.cask.cdap.data2.dataset2.DatasetFramework in project cdap by caskdata.

the class JobQueueDatasetTest method before.

@Before
public void before() throws Exception {
    DatasetFramework dsFramework = getInjector().getInstance(DatasetFramework.class);
    TransactionSystemClient txClient = getInjector().getInstance(TransactionSystemClient.class);
    TransactionExecutorFactory txExecutorFactory = new DynamicTransactionExecutorFactory(txClient);
    jobQueue = dsFramework.getDataset(Schedulers.JOB_QUEUE_DATASET_ID, new HashMap<String, String>(), null);
    Assert.assertNotNull(jobQueue);
    this.txExecutor = txExecutorFactory.createExecutor(Collections.singleton((TransactionAware) jobQueue));
}
Also used : DatasetFramework(co.cask.cdap.data2.dataset2.DatasetFramework) TransactionSystemClient(org.apache.tephra.TransactionSystemClient) HashMap(java.util.HashMap) DynamicTransactionExecutorFactory(co.cask.cdap.data.runtime.DynamicTransactionExecutorFactory) TransactionExecutorFactory(co.cask.cdap.data2.transaction.TransactionExecutorFactory) DynamicTransactionExecutorFactory(co.cask.cdap.data.runtime.DynamicTransactionExecutorFactory) Before(org.junit.Before)

Aggregations

DatasetFramework (co.cask.cdap.data2.dataset2.DatasetFramework)26 Test (org.junit.Test)21 TransactionSystemClient (org.apache.tephra.TransactionSystemClient)11 CConfiguration (co.cask.cdap.common.conf.CConfiguration)10 Location (org.apache.twill.filesystem.Location)9 SystemDatasetInstantiator (co.cask.cdap.data.dataset.SystemDatasetInstantiator)8 LineageWriterDatasetFramework (co.cask.cdap.data2.metadata.writer.LineageWriterDatasetFramework)8 DatasetManagementException (co.cask.cdap.api.dataset.DatasetManagementException)7 IOException (java.io.IOException)7 LocationFactory (org.apache.twill.filesystem.LocationFactory)6 Transactional (co.cask.cdap.api.Transactional)5 DatasetManager (co.cask.cdap.api.dataset.DatasetManager)5 DefaultDatasetManager (co.cask.cdap.data2.datafabric.dataset.DefaultDatasetManager)5 MultiThreadDatasetCache (co.cask.cdap.data2.dataset2.MultiThreadDatasetCache)5 CoreDatasetsModule (co.cask.cdap.data2.dataset2.lib.table.CoreDatasetsModule)5 LogPathIdentifier (co.cask.cdap.logging.appender.system.LogPathIdentifier)5 FileMetaDataWriter (co.cask.cdap.logging.meta.FileMetaDataWriter)5 Injector (com.google.inject.Injector)5 Table (co.cask.cdap.api.dataset.table.Table)4 MetricsCollectionService (co.cask.cdap.api.metrics.MetricsCollectionService)4