Search in sources :

Example 76 with CompressionCodec

use of org.apache.hadoop.io.compress.CompressionCodec in project hadoop by apache.

the class CompressionEmulationUtil method configureCompressionEmulation.

/**
   * Extracts compression/decompression related configuration parameters from 
   * the source configuration to the target configuration.
   */
static void configureCompressionEmulation(Configuration source, Configuration target) {
    // enable output compression
    target.setBoolean(FileOutputFormat.COMPRESS, source.getBoolean(FileOutputFormat.COMPRESS, false));
    // set the job output compression codec
    String jobOutputCompressionCodec = source.get(FileOutputFormat.COMPRESS_CODEC);
    if (jobOutputCompressionCodec != null) {
        target.set(FileOutputFormat.COMPRESS_CODEC, jobOutputCompressionCodec);
    }
    // set the job output compression type
    String jobOutputCompressionType = source.get(FileOutputFormat.COMPRESS_TYPE);
    if (jobOutputCompressionType != null) {
        target.set(FileOutputFormat.COMPRESS_TYPE, jobOutputCompressionType);
    }
    // enable map output compression
    target.setBoolean(MRJobConfig.MAP_OUTPUT_COMPRESS, source.getBoolean(MRJobConfig.MAP_OUTPUT_COMPRESS, false));
    // set the map output compression codecs
    String mapOutputCompressionCodec = source.get(MRJobConfig.MAP_OUTPUT_COMPRESS_CODEC);
    if (mapOutputCompressionCodec != null) {
        target.set(MRJobConfig.MAP_OUTPUT_COMPRESS_CODEC, mapOutputCompressionCodec);
    }
    // enable input decompression
    //TODO replace with mapInputBytes and hdfsBytesRead
    Path[] inputs = org.apache.hadoop.mapred.FileInputFormat.getInputPaths(new JobConf(source));
    boolean needsCompressedInput = false;
    CompressionCodecFactory compressionCodecs = new CompressionCodecFactory(source);
    for (Path input : inputs) {
        CompressionCodec codec = compressionCodecs.getCodec(input);
        if (codec != null) {
            needsCompressedInput = true;
        }
    }
    setInputCompressionEmulationEnabled(target, needsCompressedInput);
}
Also used : Path(org.apache.hadoop.fs.Path) CompressionCodecFactory(org.apache.hadoop.io.compress.CompressionCodecFactory) CompressionCodec(org.apache.hadoop.io.compress.CompressionCodec) JobConf(org.apache.hadoop.mapred.JobConf)

Example 77 with CompressionCodec

use of org.apache.hadoop.io.compress.CompressionCodec in project hadoop by apache.

the class Anonymizer method createJsonGenerator.

// Creates a JSON generator
private JsonGenerator createJsonGenerator(Configuration conf, Path path) throws IOException {
    FileSystem outFS = path.getFileSystem(conf);
    CompressionCodec codec = new CompressionCodecFactory(conf).getCodec(path);
    OutputStream output;
    Compressor compressor = null;
    if (codec != null) {
        compressor = CodecPool.getCompressor(codec);
        output = codec.createOutputStream(outFS.create(path), compressor);
    } else {
        output = outFS.create(path);
    }
    JsonGenerator outGen = outFactory.createGenerator(output, JsonEncoding.UTF8);
    outGen.useDefaultPrettyPrinter();
    return outGen;
}
Also used : CompressionCodecFactory(org.apache.hadoop.io.compress.CompressionCodecFactory) FileSystem(org.apache.hadoop.fs.FileSystem) OutputStream(java.io.OutputStream) Compressor(org.apache.hadoop.io.compress.Compressor) JsonGenerator(com.fasterxml.jackson.core.JsonGenerator) CompressionCodec(org.apache.hadoop.io.compress.CompressionCodec)

Example 78 with CompressionCodec

use of org.apache.hadoop.io.compress.CompressionCodec in project hadoop by apache.

the class FSImageUtil method wrapInputStreamForCompression.

public static InputStream wrapInputStreamForCompression(Configuration conf, String codec, InputStream in) throws IOException {
    if (codec.isEmpty())
        return in;
    FSImageCompression compression = FSImageCompression.createCompression(conf, codec);
    CompressionCodec imageCodec = compression.getImageCodec();
    return imageCodec.createInputStream(in);
}
Also used : CompressionCodec(org.apache.hadoop.io.compress.CompressionCodec)

Example 79 with CompressionCodec

use of org.apache.hadoop.io.compress.CompressionCodec in project carbondata by apache.

the class FileFactory method getDataInputStream.

public static DataInputStream getDataInputStream(String path, FileType fileType, int bufferSize) throws IOException {
    path = path.replace("\\", "/");
    boolean gzip = path.endsWith(".gz");
    boolean bzip2 = path.endsWith(".bz2");
    InputStream stream;
    switch(fileType) {
        case LOCAL:
            path = getUpdatedFilePath(path, fileType);
            if (gzip) {
                stream = new GZIPInputStream(new FileInputStream(path));
            } else if (bzip2) {
                stream = new BZip2CompressorInputStream(new FileInputStream(path));
            } else {
                stream = new FileInputStream(path);
            }
            break;
        case HDFS:
        case ALLUXIO:
        case VIEWFS:
            Path pt = new Path(path);
            FileSystem fs = pt.getFileSystem(configuration);
            if (bufferSize == -1) {
                stream = fs.open(pt);
            } else {
                stream = fs.open(pt, bufferSize);
            }
            String codecName = null;
            if (gzip) {
                codecName = GzipCodec.class.getName();
            } else if (bzip2) {
                codecName = BZip2Codec.class.getName();
            }
            if (null != codecName) {
                CompressionCodecFactory ccf = new CompressionCodecFactory(configuration);
                CompressionCodec codec = ccf.getCodecByClassName(codecName);
                stream = codec.createInputStream(stream);
            }
            break;
        default:
            throw new UnsupportedOperationException("unsupported file system");
    }
    return new DataInputStream(new BufferedInputStream(stream));
}
Also used : Path(org.apache.hadoop.fs.Path) DataInputStream(java.io.DataInputStream) GZIPInputStream(java.util.zip.GZIPInputStream) BufferedInputStream(java.io.BufferedInputStream) FileInputStream(java.io.FileInputStream) BZip2CompressorInputStream(org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream) FSDataInputStream(org.apache.hadoop.fs.FSDataInputStream) InputStream(java.io.InputStream) GzipCodec(org.apache.hadoop.io.compress.GzipCodec) DataInputStream(java.io.DataInputStream) FSDataInputStream(org.apache.hadoop.fs.FSDataInputStream) FileInputStream(java.io.FileInputStream) GZIPInputStream(java.util.zip.GZIPInputStream) BZip2CompressorInputStream(org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream) CompressionCodecFactory(org.apache.hadoop.io.compress.CompressionCodecFactory) BufferedInputStream(java.io.BufferedInputStream) FileSystem(org.apache.hadoop.fs.FileSystem) CompressionCodec(org.apache.hadoop.io.compress.CompressionCodec)

Example 80 with CompressionCodec

use of org.apache.hadoop.io.compress.CompressionCodec in project jena by apache.

the class AbstractCompressedNodeTupleInputFormatTests method getOutputStream.

@Override
protected OutputStream getOutputStream(File f) throws IOException {
    CompressionCodec codec = this.getCompressionCodec();
    if (codec instanceof Configurable) {
        ((Configurable) codec).setConf(this.prepareConfiguration());
    }
    FileOutputStream fileOutput = new FileOutputStream(f, false);
    return codec.createOutputStream(fileOutput);
}
Also used : FileOutputStream(java.io.FileOutputStream) CompressionCodec(org.apache.hadoop.io.compress.CompressionCodec) Configurable(org.apache.hadoop.conf.Configurable)

Aggregations

CompressionCodec (org.apache.hadoop.io.compress.CompressionCodec)111 Path (org.apache.hadoop.fs.Path)54 FileSystem (org.apache.hadoop.fs.FileSystem)41 Configuration (org.apache.hadoop.conf.Configuration)38 CompressionCodecFactory (org.apache.hadoop.io.compress.CompressionCodecFactory)37 InputStream (java.io.InputStream)18 IOException (java.io.IOException)17 Test (org.junit.Test)17 FSDataInputStream (org.apache.hadoop.fs.FSDataInputStream)15 Text (org.apache.hadoop.io.Text)14 Configurable (org.apache.hadoop.conf.Configurable)10 GzipCodec (org.apache.hadoop.io.compress.GzipCodec)10 JobConf (org.apache.hadoop.mapred.JobConf)10 SequenceFile (org.apache.hadoop.io.SequenceFile)9 OutputStream (java.io.OutputStream)8 DefaultCodec (org.apache.hadoop.io.compress.DefaultCodec)8 FileInputStream (java.io.FileInputStream)7 FSDataOutputStream (org.apache.hadoop.fs.FSDataOutputStream)6 ByteString (com.google.protobuf.ByteString)5 DataInputStream (java.io.DataInputStream)5