Search in sources :

Example 1 with AvroSchemaConverter

use of org.apache.parquet.avro.AvroSchemaConverter in project alluxio by Alluxio.

the class ParquetReader method create.

/**
 * Creates a parquet reader.
 *
 * @param uri the URI to the input
 * @return the reader
 * @throws IOException when failed to create the reader
 */
public static ParquetReader create(AlluxioURI uri) throws IOException {
    Path inputPath = new JobPath(uri.getScheme(), uri.getAuthority().toString(), uri.getPath());
    Configuration conf = ReadWriterUtils.readNoCacheConf();
    InputFile inputFile = HadoopInputFile.fromPath(inputPath, conf);
    org.apache.parquet.hadoop.ParquetReader<Record> reader = AvroParquetReader.<Record>builder(inputFile).disableCompatibility().withDataModel(GenericData.get()).withConf(conf).build();
    Schema schema;
    ParquetMetadata footer;
    try (ParquetFileReader r = new ParquetFileReader(inputFile, ParquetReadOptions.builder().build())) {
        footer = r.getFooter();
        schema = new AvroSchemaConverter().convert(footer.getFileMetaData().getSchema());
    }
    return new ParquetReader(reader, schema, footer);
}
Also used : JobPath(alluxio.job.plan.transform.format.JobPath) Path(org.apache.hadoop.fs.Path) AvroSchemaConverter(org.apache.parquet.avro.AvroSchemaConverter) JobPath(alluxio.job.plan.transform.format.JobPath) Configuration(org.apache.hadoop.conf.Configuration) ParquetMetadata(org.apache.parquet.hadoop.metadata.ParquetMetadata) Schema(org.apache.avro.Schema) TableSchema(alluxio.job.plan.transform.format.TableSchema) ParquetFileReader(org.apache.parquet.hadoop.ParquetFileReader) AvroParquetReader(org.apache.parquet.avro.AvroParquetReader) InputFile(org.apache.parquet.io.InputFile) HadoopInputFile(org.apache.parquet.hadoop.util.HadoopInputFile) Record(org.apache.avro.generic.GenericData.Record)

Example 2 with AvroSchemaConverter

use of org.apache.parquet.avro.AvroSchemaConverter in project parquet-mr by apache.

the class Schemas method fromParquet.

public static Schema fromParquet(Configuration conf, URI location) throws IOException {
    Path path = new Path(location);
    FileSystem fs = path.getFileSystem(conf);
    ParquetMetadata footer = ParquetFileReader.readFooter(fs.getConf(), path);
    String schemaString = footer.getFileMetaData().getKeyValueMetaData().get("parquet.avro.schema");
    if (schemaString == null) {
        // try the older property
        schemaString = footer.getFileMetaData().getKeyValueMetaData().get("avro.schema");
    }
    if (schemaString != null) {
        return new Schema.Parser().parse(schemaString);
    } else {
        return new AvroSchemaConverter().convert(footer.getFileMetaData().getSchema());
    }
}
Also used : Path(org.apache.hadoop.fs.Path) AvroSchemaConverter(org.apache.parquet.avro.AvroSchemaConverter) ParquetMetadata(org.apache.parquet.hadoop.metadata.ParquetMetadata) FileSystem(org.apache.hadoop.fs.FileSystem) Schema(org.apache.avro.Schema)

Aggregations

Schema (org.apache.avro.Schema)2 Path (org.apache.hadoop.fs.Path)2 AvroSchemaConverter (org.apache.parquet.avro.AvroSchemaConverter)2 ParquetMetadata (org.apache.parquet.hadoop.metadata.ParquetMetadata)2 JobPath (alluxio.job.plan.transform.format.JobPath)1 TableSchema (alluxio.job.plan.transform.format.TableSchema)1 Record (org.apache.avro.generic.GenericData.Record)1 Configuration (org.apache.hadoop.conf.Configuration)1 FileSystem (org.apache.hadoop.fs.FileSystem)1 AvroParquetReader (org.apache.parquet.avro.AvroParquetReader)1 ParquetFileReader (org.apache.parquet.hadoop.ParquetFileReader)1 HadoopInputFile (org.apache.parquet.hadoop.util.HadoopInputFile)1 InputFile (org.apache.parquet.io.InputFile)1