Examples with MultiInputFormat - com.twitter.elephantbird.mapreduce.input.MultiInputFormat

Example 1 with MultiInputFormat

use of com.twitter.elephantbird.mapreduce.input.MultiInputFormat in project elephant-bird by twitter.

the class HiveMultiInputFormat method initialize.

private void initialize(FileSplit split, JobConf job) throws IOException {
    LOG.info("Initializing HiveMultiInputFormat for " + split + " with job " + job);
    String thriftClassName = null;
    Properties properties = null;
    if (!"".equals(HiveConf.getVar(job, HiveConf.ConfVars.PLAN))) {
        // Running as a Hive query. Use MapredWork for metadata.
        Map<String, PartitionDesc> partitionDescMap = Utilities.getMapRedWork(job).getPathToPartitionInfo();
        if (!partitionDescMap.containsKey(split.getPath().getParent().toUri().toString())) {
            throw new RuntimeException("Failed locating partition description for " + split.getPath().toUri().toString());
        }
        properties = partitionDescMap.get(split.getPath().getParent().toUri().toString()).getTableDesc().getProperties();
    } else if (job.get(HCatConstants.HCAT_KEY_JOB_INFO, null) != null) {
        // Running as an HCatalog query. Use InputJobInfo for metadata.
        InputJobInfo inputJobInfo = (InputJobInfo) HCatUtil.deserialize(job.get(HCatConstants.HCAT_KEY_JOB_INFO));
        properties = inputJobInfo.getTableInfo().getStorerInfo().getProperties();
    } else if (job.get(Constants.SERIALIZATION_CLASS, null) != null) {
        // Running as an Presto query.
        thriftClassName = job.get(Constants.SERIALIZATION_CLASS);
    }
    if (properties != null) {
        thriftClassName = properties.getProperty(Constants.SERIALIZATION_CLASS);
    }
    if (thriftClassName == null) {
        throw new RuntimeException("Required property " + Constants.SERIALIZATION_CLASS + " is null.");
    }
    try {
        Class thriftClass = job.getClassByName(thriftClassName);
        setInputFormatInstance(new MultiInputFormat(new TypeRef(thriftClass) {
        }));
    } catch (ClassNotFoundException e) {
        throw new RuntimeException("Failed getting class for " + thriftClassName);
    }
}

Also used : MultiInputFormat(com.twitter.elephantbird.mapreduce.input.MultiInputFormat) TypeRef(com.twitter.elephantbird.util.TypeRef) PartitionDesc(org.apache.hadoop.hive.ql.plan.PartitionDesc) Properties(java.util.Properties) InputJobInfo(org.apache.hcatalog.mapreduce.InputJobInfo)

Aggregations

MultiInputFormat (com.twitter.elephantbird.mapreduce.input.MultiInputFormat)1 TypeRef (com.twitter.elephantbird.util.TypeRef)1 Properties (java.util.Properties)1 PartitionDesc (org.apache.hadoop.hive.ql.plan.PartitionDesc)1 InputJobInfo (org.apache.hcatalog.mapreduce.InputJobInfo)1