Search in sources :

Example 76 with ParquetMetadata

use of org.apache.parquet.hadoop.metadata.ParquetMetadata in project alluxio by Alluxio.

the class ParquetReader method create.

/**
 * Creates a parquet reader.
 *
 * @param uri the URI to the input
 * @return the reader
 * @throws IOException when failed to create the reader
 */
public static ParquetReader create(AlluxioURI uri) throws IOException {
    Path inputPath = new JobPath(uri.getScheme(), uri.getAuthority().toString(), uri.getPath());
    Configuration conf = ReadWriterUtils.readNoCacheConf();
    InputFile inputFile = HadoopInputFile.fromPath(inputPath, conf);
    org.apache.parquet.hadoop.ParquetReader<Record> reader = AvroParquetReader.<Record>builder(inputFile).disableCompatibility().withDataModel(GenericData.get()).withConf(conf).build();
    Schema schema;
    ParquetMetadata footer;
    try (ParquetFileReader r = new ParquetFileReader(inputFile, ParquetReadOptions.builder().build())) {
        footer = r.getFooter();
        schema = new AvroSchemaConverter().convert(footer.getFileMetaData().getSchema());
    }
    return new ParquetReader(reader, schema, footer);
}
Also used : JobPath(alluxio.job.plan.transform.format.JobPath) Path(org.apache.hadoop.fs.Path) AvroSchemaConverter(org.apache.parquet.avro.AvroSchemaConverter) JobPath(alluxio.job.plan.transform.format.JobPath) Configuration(org.apache.hadoop.conf.Configuration) ParquetMetadata(org.apache.parquet.hadoop.metadata.ParquetMetadata) Schema(org.apache.avro.Schema) TableSchema(alluxio.job.plan.transform.format.TableSchema) ParquetFileReader(org.apache.parquet.hadoop.ParquetFileReader) AvroParquetReader(org.apache.parquet.avro.AvroParquetReader) InputFile(org.apache.parquet.io.InputFile) HadoopInputFile(org.apache.parquet.hadoop.util.HadoopInputFile) Record(org.apache.avro.generic.GenericData.Record)

Aggregations

ParquetMetadata (org.apache.parquet.hadoop.metadata.ParquetMetadata)76 Path (org.apache.hadoop.fs.Path)39 BlockMetaData (org.apache.parquet.hadoop.metadata.BlockMetaData)27 Configuration (org.apache.hadoop.conf.Configuration)21 MessageType (org.apache.parquet.schema.MessageType)21 ArrayList (java.util.ArrayList)19 IOException (java.io.IOException)18 Test (org.junit.Test)17 FileSystem (org.apache.hadoop.fs.FileSystem)16 Map (java.util.Map)11 FileMetaData (org.apache.parquet.hadoop.metadata.FileMetaData)11 File (java.io.File)10 FileStatus (org.apache.hadoop.fs.FileStatus)10 ColumnPath (org.apache.parquet.hadoop.metadata.ColumnPath)9 HashMap (java.util.HashMap)8 ColumnChunkMetaData (org.apache.parquet.hadoop.metadata.ColumnChunkMetaData)7 List (java.util.List)6 FSDataInputStream (org.apache.hadoop.fs.FSDataInputStream)6 ColumnDescriptor (org.apache.parquet.column.ColumnDescriptor)6 ParquetFileReader (org.apache.parquet.hadoop.ParquetFileReader)6