Search in sources :

Example 6 with Metadata

use of org.apache.hadoop.io.SequenceFile.Metadata in project elephant-bird by twitter.

the class RCFileUtil method readMetadata.

/**
 * reads {@link ColumnarMetadata} stored in an RCFile.
 * @throws IOException if metadata is not stored or in case of any other error.
 */
public static ColumnarMetadata readMetadata(Configuration conf, Path rcfile) throws IOException {
    Metadata metadata = null;
    Configuration confCopy = new Configuration(conf);
    // set up conf to read all the columns
    ColumnProjectionUtils.setFullyReadColumns(confCopy);
    RCFile.Reader reader = new RCFile.Reader(rcfile.getFileSystem(confCopy), rcfile, confCopy);
    // ugly hack to get metadata. RCFile has to provide access to metata
    try {
        Field f = RCFile.Reader.class.getDeclaredField("metadata");
        f.setAccessible(true);
        metadata = (Metadata) f.get(reader);
    } catch (Throwable t) {
        throw new IOException("Could not access metadata field in RCFile reader", t);
    }
    reader.close();
    Text metadataKey = new Text(COLUMN_METADATA_PROTOBUF_KEY);
    if (metadata == null || metadata.get(metadataKey) == null) {
        throw new IOException("could not find ColumnarMetadata in " + rcfile);
    }
    return ColumnarMetadata.parseFrom(metadata.get(metadataKey).getBytes());
}
Also used : Field(java.lang.reflect.Field) RequiredField(org.apache.pig.LoadPushDown.RequiredField) RCFile(org.apache.hadoop.hive.ql.io.RCFile) Configuration(org.apache.hadoop.conf.Configuration) Metadata(org.apache.hadoop.io.SequenceFile.Metadata) Text(org.apache.hadoop.io.Text) IOException(java.io.IOException)

Aggregations

Metadata (org.apache.hadoop.io.SequenceFile.Metadata)6 Configuration (org.apache.hadoop.conf.Configuration)3 Text (org.apache.hadoop.io.Text)3 Test (org.junit.Test)3 IOException (java.io.IOException)2 Path (org.apache.hadoop.fs.Path)2 CompressionCodec (org.apache.hadoop.io.compress.CompressionCodec)2 DefaultCodec (org.apache.hadoop.io.compress.DefaultCodec)2 RecordWriter (org.apache.hadoop.mapreduce.RecordWriter)2 Field (java.lang.reflect.Field)1 MalformedURLException (java.net.MalformedURLException)1 URL (java.net.URL)1 ArrayList (java.util.ArrayList)1 Entry (java.util.Map.Entry)1 FileSystem (org.apache.hadoop.fs.FileSystem)1 RCFile (org.apache.hadoop.hive.ql.io.RCFile)1 MapFile (org.apache.hadoop.io.MapFile)1 Option (org.apache.hadoop.io.MapFile.Writer.Option)1 MapWritable (org.apache.hadoop.io.MapWritable)1 SequenceFile (org.apache.hadoop.io.SequenceFile)1