Search in sources :

Example 1 with PrettyPrintWriter

use of org.apache.parquet.tools.util.PrettyPrintWriter in project parquet-mr by apache.

the class ShowSchemaCommand method execute.

@Override
public void execute(CommandLine options) throws Exception {
    super.execute(options);
    String[] args = options.getArgs();
    String input = args[0];
    Configuration conf = new Configuration();
    ParquetMetadata metaData;
    Path path = new Path(input);
    FileSystem fs = path.getFileSystem(conf);
    Path file;
    if (fs.isDirectory(path)) {
        FileStatus[] statuses = fs.listStatus(path, HiddenFileFilter.INSTANCE);
        if (statuses.length == 0) {
            throw new RuntimeException("Directory " + path.toString() + " is empty");
        }
        file = statuses[0].getPath();
    } else {
        file = path;
    }
    metaData = ParquetFileReader.readFooter(conf, file, NO_FILTER);
    MessageType schema = metaData.getFileMetaData().getSchema();
    Main.out.println(schema);
    if (options.hasOption('d')) {
        PrettyPrintWriter out = PrettyPrintWriter.stdoutPrettyPrinter().build();
        MetadataUtils.showDetails(out, metaData);
    }
}
Also used : Path(org.apache.hadoop.fs.Path) FileStatus(org.apache.hadoop.fs.FileStatus) Configuration(org.apache.hadoop.conf.Configuration) ParquetMetadata(org.apache.parquet.hadoop.metadata.ParquetMetadata) FileSystem(org.apache.hadoop.fs.FileSystem) PrettyPrintWriter(org.apache.parquet.tools.util.PrettyPrintWriter) MessageType(org.apache.parquet.schema.MessageType)

Example 2 with PrettyPrintWriter

use of org.apache.parquet.tools.util.PrettyPrintWriter in project parquet-mr by apache.

the class DumpCommand method execute.

@Override
public void execute(CommandLine options) throws Exception {
    super.execute(options);
    String[] args = options.getArgs();
    String input = args[0];
    Configuration conf = new Configuration();
    Path inpath = new Path(input);
    ParquetMetadata metaData = ParquetFileReader.readFooter(conf, inpath, NO_FILTER);
    MessageType schema = metaData.getFileMetaData().getSchema();
    boolean showmd = !options.hasOption('m');
    boolean showdt = !options.hasOption('d');
    boolean cropoutput = !options.hasOption('n');
    Set<String> showColumns = null;
    if (options.hasOption('c')) {
        String[] cols = options.getOptionValues('c');
        showColumns = new HashSet<String>(Arrays.asList(cols));
    }
    PrettyPrintWriter out = prettyPrintWriter(cropoutput);
    dump(out, metaData, schema, inpath, showmd, showdt, showColumns);
}
Also used : Path(org.apache.hadoop.fs.Path) Configuration(org.apache.hadoop.conf.Configuration) ParquetMetadata(org.apache.parquet.hadoop.metadata.ParquetMetadata) PrettyPrintWriter(org.apache.parquet.tools.util.PrettyPrintWriter) MessageType(org.apache.parquet.schema.MessageType)

Example 3 with PrettyPrintWriter

use of org.apache.parquet.tools.util.PrettyPrintWriter in project parquet-mr by apache.

the class ShowMetaCommand method execute.

@Override
public void execute(CommandLine options) throws Exception {
    super.execute(options);
    String[] args = options.getArgs();
    String input = args[0];
    Configuration conf = new Configuration();
    Path inputPath = new Path(input);
    FileStatus inputFileStatus = inputPath.getFileSystem(conf).getFileStatus(inputPath);
    List<Footer> footers = ParquetFileReader.readFooters(conf, inputFileStatus, false);
    PrettyPrintWriter out = PrettyPrintWriter.stdoutPrettyPrinter().withAutoColumn().withWhitespaceHandler(WhiteSpaceHandler.COLLAPSE_WHITESPACE).withColumnPadding(1).build();
    for (Footer f : footers) {
        out.format("file: %s%n", f.getFile());
        MetadataUtils.showDetails(out, f.getParquetMetadata());
        out.flushColumns();
    }
}
Also used : Path(org.apache.hadoop.fs.Path) FileStatus(org.apache.hadoop.fs.FileStatus) Configuration(org.apache.hadoop.conf.Configuration) Footer(org.apache.parquet.hadoop.Footer) PrettyPrintWriter(org.apache.parquet.tools.util.PrettyPrintWriter)

Aggregations

Configuration (org.apache.hadoop.conf.Configuration)3 Path (org.apache.hadoop.fs.Path)3 PrettyPrintWriter (org.apache.parquet.tools.util.PrettyPrintWriter)3 FileStatus (org.apache.hadoop.fs.FileStatus)2 ParquetMetadata (org.apache.parquet.hadoop.metadata.ParquetMetadata)2 MessageType (org.apache.parquet.schema.MessageType)2 FileSystem (org.apache.hadoop.fs.FileSystem)1 Footer (org.apache.parquet.hadoop.Footer)1