Examples with HadoopClassExcluder - co.cask.cdap.common.twill.HadoopClassExcluder

Example 1 with HadoopClassExcluder

use of co.cask.cdap.common.twill.HadoopClassExcluder in project cdap by caskdata.

the class MapReduceRuntimeService method buildJobJar.

/**
 * Creates a jar that contains everything that are needed for running the MapReduce program by Hadoop.
 *
 * @return a new {@link File} containing the job jar
 */
private File buildJobJar(Job job, File tempDir) throws IOException, URISyntaxException {
    File jobJar = new File(tempDir, "job.jar");
    LOG.debug("Creating Job jar: {}", jobJar);
    // For local mode, nothing is needed in the job jar since we use the classloader in the configuration object.
    if (MapReduceTaskContextProvider.isLocal(job.getConfiguration())) {
        JarOutputStream output = new JarOutputStream(new FileOutputStream(jobJar));
        output.close();
        return jobJar;
    }
    // Excludes libraries that are for sure not needed.
    // Hadoop - Available from the cluster
    // Spark - MR never uses Spark
    final HadoopClassExcluder hadoopClassExcluder = new HadoopClassExcluder();
    ApplicationBundler appBundler = new ApplicationBundler(new ClassAcceptor() {

        @Override
        public boolean accept(String className, URL classUrl, URL classPathUrl) {
            if (className.startsWith("org.apache.spark") || classPathUrl.toString().contains("spark-assembly")) {
                return false;
            }
            return hadoopClassExcluder.accept(className, classUrl, classPathUrl);
        }
    });
    Set<Class<?>> classes = Sets.newHashSet();
    classes.add(MapReduce.class);
    classes.add(MapperWrapper.class);
    classes.add(ReducerWrapper.class);
    classes.add(SLF4JBridgeHandler.class);
    // take over the classloading.
    if (cConf.getBoolean(Constants.AppFabric.MAPREDUCE_INCLUDE_CUSTOM_CLASSES)) {
        try {
            Class<? extends InputFormat<?, ?>> inputFormatClass = job.getInputFormatClass();
            classes.add(inputFormatClass);
            // If it is StreamInputFormat, also add the StreamEventCodec class as well.
            if (MapReduceStreamInputFormat.class.isAssignableFrom(inputFormatClass)) {
                Class<? extends StreamEventDecoder> decoderType = MapReduceStreamInputFormat.getDecoderClass(job.getConfiguration());
                if (decoderType != null) {
                    classes.add(decoderType);
                }
            }
        } catch (Throwable t) {
            LOG.debug("InputFormat class not found: {}", t.getMessage(), t);
        // Ignore
        }
        try {
            Class<? extends OutputFormat<?, ?>> outputFormatClass = job.getOutputFormatClass();
            classes.add(outputFormatClass);
        } catch (Throwable t) {
            LOG.debug("OutputFormat class not found: {}", t.getMessage(), t);
        // Ignore
        }
    }
    // Add KMS class
    if (SecureStoreUtils.isKMSBacked(cConf) && SecureStoreUtils.isKMSCapable()) {
        classes.add(SecureStoreUtils.getKMSSecureStore());
    }
    try {
        Class<?> hbaseTableUtilClass = HBaseTableUtilFactory.getHBaseTableUtilClass(cConf);
        classes.add(hbaseTableUtilClass);
    } catch (ProvisionException e) {
        LOG.warn("Not including HBaseTableUtil classes in submitted Job Jar since they are not available");
    }
    ClassLoader oldCLassLoader = ClassLoaders.setContextClassLoader(getClass().getClassLoader());
    try {
        appBundler.createBundle(Locations.toLocation(jobJar), classes);
    } finally {
        ClassLoaders.setContextClassLoader(oldCLassLoader);
    }
    LOG.debug("Built MapReduce Job Jar at {}", jobJar.toURI());
    return jobJar;
}

Also used : HadoopClassExcluder(co.cask.cdap.common.twill.HadoopClassExcluder) JarOutputStream(java.util.jar.JarOutputStream) ClassAcceptor(org.apache.twill.api.ClassAcceptor) URL(java.net.URL) ProvisionException(com.google.inject.ProvisionException) FileOutputStream(java.io.FileOutputStream) File(java.io.File) JarFile(java.util.jar.JarFile) ApplicationBundler(org.apache.twill.internal.ApplicationBundler)

Aggregations

HadoopClassExcluder (co.cask.cdap.common.twill.HadoopClassExcluder)1 ProvisionException (com.google.inject.ProvisionException)1 File (java.io.File)1 FileOutputStream (java.io.FileOutputStream)1 URL (java.net.URL)1 JarFile (java.util.jar.JarFile)1 JarOutputStream (java.util.jar.JarOutputStream)1 ClassAcceptor (org.apache.twill.api.ClassAcceptor)1 ApplicationBundler (org.apache.twill.internal.ApplicationBundler)1