Search in sources :

Example 1 with OrcConf

use of org.apache.orc.OrcConf in project hive by apache.

the class SpecialCases method addSpecialCasesParametersToOutputJobProperties.

/**
   * Method to do any file-format specific special casing while
   * instantiating a storage handler to write. We set any parameters
   * we want to be visible to the job in jobProperties, and this will
   * be available to the job via jobconf at run time.
   *
   * This is mostly intended to be used by StorageHandlers that wrap
   * File-based OutputFormats such as FosterStorageHandler that wraps
   * RCFile, ORC, etc.
   *
   * @param jobProperties : map to write to
   * @param jobInfo : information about this output job to read from
   * @param ofclass : the output format in use
   */
public static void addSpecialCasesParametersToOutputJobProperties(Map<String, String> jobProperties, OutputJobInfo jobInfo, Class<? extends OutputFormat> ofclass) {
    if (ofclass == RCFileOutputFormat.class) {
        // RCFile specific parameter
        jobProperties.put(HiveConf.ConfVars.HIVE_RCFILE_COLUMN_NUMBER_CONF.varname, Integer.toOctalString(jobInfo.getOutputSchema().getFields().size()));
    } else if (ofclass == OrcOutputFormat.class) {
        // Special cases for ORC
        // We need to check table properties to see if a couple of parameters,
        // such as compression parameters are defined. If they are, then we copy
        // them to job properties, so that it will be available in jobconf at runtime
        // See HIVE-5504 for details
        Map<String, String> tableProps = jobInfo.getTableInfo().getTable().getParameters();
        for (OrcConf property : OrcConf.values()) {
            String propName = property.getAttribute();
            if (tableProps.containsKey(propName)) {
                jobProperties.put(propName, tableProps.get(propName));
            }
        }
    } else if (ofclass == AvroContainerOutputFormat.class) {
        // Special cases for Avro. As with ORC, we make table properties that
        // Avro is interested in available in jobconf at runtime
        Map<String, String> tableProps = jobInfo.getTableInfo().getTable().getParameters();
        for (AvroSerdeUtils.AvroTableProperties property : AvroSerdeUtils.AvroTableProperties.values()) {
            String propName = property.getPropName();
            if (tableProps.containsKey(propName)) {
                String propVal = tableProps.get(propName);
                jobProperties.put(propName, tableProps.get(propName));
            }
        }
        Properties properties = new Properties();
        properties.put("name", jobInfo.getTableName());
        List<String> colNames = jobInfo.getOutputSchema().getFieldNames();
        List<TypeInfo> colTypes = new ArrayList<TypeInfo>();
        for (HCatFieldSchema field : jobInfo.getOutputSchema().getFields()) {
            colTypes.add(TypeInfoUtils.getTypeInfoFromTypeString(field.getTypeString()));
        }
        if (jobProperties.get(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName()) == null || jobProperties.get(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName()).isEmpty()) {
            jobProperties.put(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(), AvroSerDe.getSchemaFromCols(properties, colNames, colTypes, null).toString());
        }
    }
}
Also used : OrcConf(org.apache.orc.OrcConf) ArrayList(java.util.ArrayList) Properties(java.util.Properties) OrcOutputFormat(org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat) TypeInfo(org.apache.hadoop.hive.serde2.typeinfo.TypeInfo) HCatFieldSchema(org.apache.hive.hcatalog.data.schema.HCatFieldSchema) Map(java.util.Map) AvroSerdeUtils(org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils)

Aggregations

ArrayList (java.util.ArrayList)1 Map (java.util.Map)1 Properties (java.util.Properties)1 OrcOutputFormat (org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat)1 AvroSerdeUtils (org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils)1 TypeInfo (org.apache.hadoop.hive.serde2.typeinfo.TypeInfo)1 HCatFieldSchema (org.apache.hive.hcatalog.data.schema.HCatFieldSchema)1 OrcConf (org.apache.orc.OrcConf)1