Search in sources :

Example 1 with RCFile

use of org.apache.hadoop.hive.ql.io.RCFile in project elephant-bird by twitter.

the class RCFileUtil method readMetadata.

/**
 * reads {@link ColumnarMetadata} stored in an RCFile.
 * @throws IOException if metadata is not stored or in case of any other error.
 */
public static ColumnarMetadata readMetadata(Configuration conf, Path rcfile) throws IOException {
    Metadata metadata = null;
    Configuration confCopy = new Configuration(conf);
    // set up conf to read all the columns
    ColumnProjectionUtils.setFullyReadColumns(confCopy);
    RCFile.Reader reader = new RCFile.Reader(rcfile.getFileSystem(confCopy), rcfile, confCopy);
    // ugly hack to get metadata. RCFile has to provide access to metata
    try {
        Field f = RCFile.Reader.class.getDeclaredField("metadata");
        f.setAccessible(true);
        metadata = (Metadata) f.get(reader);
    } catch (Throwable t) {
        throw new IOException("Could not access metadata field in RCFile reader", t);
    }
    reader.close();
    Text metadataKey = new Text(COLUMN_METADATA_PROTOBUF_KEY);
    if (metadata == null || metadata.get(metadataKey) == null) {
        throw new IOException("could not find ColumnarMetadata in " + rcfile);
    }
    return ColumnarMetadata.parseFrom(metadata.get(metadataKey).getBytes());
}
Also used : Field(java.lang.reflect.Field) RequiredField(org.apache.pig.LoadPushDown.RequiredField) RCFile(org.apache.hadoop.hive.ql.io.RCFile) Configuration(org.apache.hadoop.conf.Configuration) Metadata(org.apache.hadoop.io.SequenceFile.Metadata) Text(org.apache.hadoop.io.Text) IOException(java.io.IOException)

Aggregations

IOException (java.io.IOException)1 Field (java.lang.reflect.Field)1 Configuration (org.apache.hadoop.conf.Configuration)1 RCFile (org.apache.hadoop.hive.ql.io.RCFile)1 Metadata (org.apache.hadoop.io.SequenceFile.Metadata)1 Text (org.apache.hadoop.io.Text)1 RequiredField (org.apache.pig.LoadPushDown.RequiredField)1