Search in sources :

Example 1 with OutputSerialization

use of com.amazonaws.services.s3.model.OutputSerialization in project presto by prestodb.

the class S3SelectCsvRecordReader method buildSelectObjectRequest.

@Override
public SelectObjectContentRequest buildSelectObjectRequest(Properties schema, String query, Path path) {
    SelectObjectContentRequest selectObjectRequest = new SelectObjectContentRequest();
    URI uri = path.toUri();
    selectObjectRequest.setBucketName(PrestoS3FileSystem.getBucketName(uri));
    selectObjectRequest.setKey(PrestoS3FileSystem.keyFromPath(path));
    selectObjectRequest.setExpression(query);
    selectObjectRequest.setExpressionType(ExpressionType.SQL);
    String fieldDelimiter = getFieldDelimiter(schema);
    String quoteChar = schema.getProperty(QUOTE_CHAR, null);
    String escapeChar = schema.getProperty(ESCAPE_CHAR, null);
    CSVInput selectObjectCSVInputSerialization = new CSVInput();
    selectObjectCSVInputSerialization.setRecordDelimiter(lineDelimiter);
    selectObjectCSVInputSerialization.setFieldDelimiter(fieldDelimiter);
    selectObjectCSVInputSerialization.setComments(COMMENTS_CHAR_STR);
    selectObjectCSVInputSerialization.setQuoteCharacter(quoteChar);
    selectObjectCSVInputSerialization.setQuoteEscapeCharacter(escapeChar);
    InputSerialization selectObjectInputSerialization = new InputSerialization();
    CompressionCodec codec = compressionCodecFactory.getCodec(path);
    if (codec instanceof GzipCodec) {
        selectObjectInputSerialization.setCompressionType(CompressionType.GZIP);
    } else if (codec instanceof BZip2Codec) {
        selectObjectInputSerialization.setCompressionType(CompressionType.BZIP2);
    } else if (codec != null) {
        throw new PrestoException(NOT_SUPPORTED, "Compression extension not supported for S3 Select: " + path);
    }
    selectObjectInputSerialization.setCsv(selectObjectCSVInputSerialization);
    selectObjectRequest.setInputSerialization(selectObjectInputSerialization);
    OutputSerialization selectObjectOutputSerialization = new OutputSerialization();
    CSVOutput selectObjectCSVOutputSerialization = new CSVOutput();
    selectObjectCSVOutputSerialization.setRecordDelimiter(lineDelimiter);
    selectObjectCSVOutputSerialization.setFieldDelimiter(fieldDelimiter);
    selectObjectCSVOutputSerialization.setQuoteCharacter(quoteChar);
    selectObjectCSVOutputSerialization.setQuoteEscapeCharacter(escapeChar);
    selectObjectOutputSerialization.setCsv(selectObjectCSVOutputSerialization);
    selectObjectRequest.setOutputSerialization(selectObjectOutputSerialization);
    return selectObjectRequest;
}
Also used : SelectObjectContentRequest(com.amazonaws.services.s3.model.SelectObjectContentRequest) GzipCodec(org.apache.hadoop.io.compress.GzipCodec) InputSerialization(com.amazonaws.services.s3.model.InputSerialization) CSVInput(com.amazonaws.services.s3.model.CSVInput) BZip2Codec(org.apache.hadoop.io.compress.BZip2Codec) PrestoException(com.facebook.presto.spi.PrestoException) CompressionCodec(org.apache.hadoop.io.compress.CompressionCodec) URI(java.net.URI) OutputSerialization(com.amazonaws.services.s3.model.OutputSerialization) CSVOutput(com.amazonaws.services.s3.model.CSVOutput)

Aggregations

CSVInput (com.amazonaws.services.s3.model.CSVInput)1 CSVOutput (com.amazonaws.services.s3.model.CSVOutput)1 InputSerialization (com.amazonaws.services.s3.model.InputSerialization)1 OutputSerialization (com.amazonaws.services.s3.model.OutputSerialization)1 SelectObjectContentRequest (com.amazonaws.services.s3.model.SelectObjectContentRequest)1 PrestoException (com.facebook.presto.spi.PrestoException)1 URI (java.net.URI)1 BZip2Codec (org.apache.hadoop.io.compress.BZip2Codec)1 CompressionCodec (org.apache.hadoop.io.compress.CompressionCodec)1 GzipCodec (org.apache.hadoop.io.compress.GzipCodec)1