Search in sources :

Example 1 with InputSerialization

use of com.amazonaws.services.s3.model.InputSerialization in project presto by prestodb.

the class S3SelectCsvRecordReader method buildSelectObjectRequest.

@Override
public SelectObjectContentRequest buildSelectObjectRequest(Properties schema, String query, Path path) {
    SelectObjectContentRequest selectObjectRequest = new SelectObjectContentRequest();
    URI uri = path.toUri();
    selectObjectRequest.setBucketName(PrestoS3FileSystem.getBucketName(uri));
    selectObjectRequest.setKey(PrestoS3FileSystem.keyFromPath(path));
    selectObjectRequest.setExpression(query);
    selectObjectRequest.setExpressionType(ExpressionType.SQL);
    String fieldDelimiter = getFieldDelimiter(schema);
    String quoteChar = schema.getProperty(QUOTE_CHAR, null);
    String escapeChar = schema.getProperty(ESCAPE_CHAR, null);
    CSVInput selectObjectCSVInputSerialization = new CSVInput();
    selectObjectCSVInputSerialization.setRecordDelimiter(lineDelimiter);
    selectObjectCSVInputSerialization.setFieldDelimiter(fieldDelimiter);
    selectObjectCSVInputSerialization.setComments(COMMENTS_CHAR_STR);
    selectObjectCSVInputSerialization.setQuoteCharacter(quoteChar);
    selectObjectCSVInputSerialization.setQuoteEscapeCharacter(escapeChar);
    InputSerialization selectObjectInputSerialization = new InputSerialization();
    CompressionCodec codec = compressionCodecFactory.getCodec(path);
    if (codec instanceof GzipCodec) {
        selectObjectInputSerialization.setCompressionType(CompressionType.GZIP);
    } else if (codec instanceof BZip2Codec) {
        selectObjectInputSerialization.setCompressionType(CompressionType.BZIP2);
    } else if (codec != null) {
        throw new PrestoException(NOT_SUPPORTED, "Compression extension not supported for S3 Select: " + path);
    }
    selectObjectInputSerialization.setCsv(selectObjectCSVInputSerialization);
    selectObjectRequest.setInputSerialization(selectObjectInputSerialization);
    OutputSerialization selectObjectOutputSerialization = new OutputSerialization();
    CSVOutput selectObjectCSVOutputSerialization = new CSVOutput();
    selectObjectCSVOutputSerialization.setRecordDelimiter(lineDelimiter);
    selectObjectCSVOutputSerialization.setFieldDelimiter(fieldDelimiter);
    selectObjectCSVOutputSerialization.setQuoteCharacter(quoteChar);
    selectObjectCSVOutputSerialization.setQuoteEscapeCharacter(escapeChar);
    selectObjectOutputSerialization.setCsv(selectObjectCSVOutputSerialization);
    selectObjectRequest.setOutputSerialization(selectObjectOutputSerialization);
    return selectObjectRequest;
}
Also used : SelectObjectContentRequest(com.amazonaws.services.s3.model.SelectObjectContentRequest) GzipCodec(org.apache.hadoop.io.compress.GzipCodec) InputSerialization(com.amazonaws.services.s3.model.InputSerialization) CSVInput(com.amazonaws.services.s3.model.CSVInput) BZip2Codec(org.apache.hadoop.io.compress.BZip2Codec) PrestoException(com.facebook.presto.spi.PrestoException) CompressionCodec(org.apache.hadoop.io.compress.CompressionCodec) URI(java.net.URI) OutputSerialization(com.amazonaws.services.s3.model.OutputSerialization) CSVOutput(com.amazonaws.services.s3.model.CSVOutput)

Example 2 with InputSerialization

use of com.amazonaws.services.s3.model.InputSerialization in project urban-eureka by errir503.

the class S3SelectCsvRecordReader method buildSelectObjectRequest.

@Override
public SelectObjectContentRequest buildSelectObjectRequest(Properties schema, String query, Path path) {
    SelectObjectContentRequest selectObjectRequest = new SelectObjectContentRequest();
    URI uri = path.toUri();
    selectObjectRequest.setBucketName(PrestoS3FileSystem.getBucketName(uri));
    selectObjectRequest.setKey(PrestoS3FileSystem.keyFromPath(path));
    selectObjectRequest.setExpression(query);
    selectObjectRequest.setExpressionType(ExpressionType.SQL);
    String fieldDelimiter = getFieldDelimiter(schema);
    String quoteChar = schema.getProperty(QUOTE_CHAR, null);
    String escapeChar = schema.getProperty(ESCAPE_CHAR, null);
    CSVInput selectObjectCSVInputSerialization = new CSVInput();
    selectObjectCSVInputSerialization.setRecordDelimiter(lineDelimiter);
    selectObjectCSVInputSerialization.setFieldDelimiter(fieldDelimiter);
    selectObjectCSVInputSerialization.setComments(COMMENTS_CHAR_STR);
    selectObjectCSVInputSerialization.setQuoteCharacter(quoteChar);
    selectObjectCSVInputSerialization.setQuoteEscapeCharacter(escapeChar);
    InputSerialization selectObjectInputSerialization = new InputSerialization();
    CompressionCodec codec = compressionCodecFactory.getCodec(path);
    if (codec instanceof GzipCodec) {
        selectObjectInputSerialization.setCompressionType(CompressionType.GZIP);
    } else if (codec instanceof BZip2Codec) {
        selectObjectInputSerialization.setCompressionType(CompressionType.BZIP2);
    } else if (codec != null) {
        throw new PrestoException(NOT_SUPPORTED, "Compression extension not supported for S3 Select: " + path);
    }
    selectObjectInputSerialization.setCsv(selectObjectCSVInputSerialization);
    selectObjectRequest.setInputSerialization(selectObjectInputSerialization);
    OutputSerialization selectObjectOutputSerialization = new OutputSerialization();
    CSVOutput selectObjectCSVOutputSerialization = new CSVOutput();
    selectObjectCSVOutputSerialization.setRecordDelimiter(lineDelimiter);
    selectObjectCSVOutputSerialization.setFieldDelimiter(fieldDelimiter);
    selectObjectCSVOutputSerialization.setQuoteCharacter(quoteChar);
    selectObjectCSVOutputSerialization.setQuoteEscapeCharacter(escapeChar);
    selectObjectOutputSerialization.setCsv(selectObjectCSVOutputSerialization);
    selectObjectRequest.setOutputSerialization(selectObjectOutputSerialization);
    return selectObjectRequest;
}
Also used : SelectObjectContentRequest(com.amazonaws.services.s3.model.SelectObjectContentRequest) GzipCodec(org.apache.hadoop.io.compress.GzipCodec) InputSerialization(com.amazonaws.services.s3.model.InputSerialization) CSVInput(com.amazonaws.services.s3.model.CSVInput) BZip2Codec(org.apache.hadoop.io.compress.BZip2Codec) PrestoException(com.facebook.presto.spi.PrestoException) CompressionCodec(org.apache.hadoop.io.compress.CompressionCodec) URI(java.net.URI) OutputSerialization(com.amazonaws.services.s3.model.OutputSerialization) CSVOutput(com.amazonaws.services.s3.model.CSVOutput)

Example 3 with InputSerialization

use of com.amazonaws.services.s3.model.InputSerialization in project aws-sdk-java-v2 by aws.

the class SelectObjectContentTest method runSimpleQuery.

private static CompletableFuture<Void> runSimpleQuery(S3AsyncClient s3, SelectObjectContentResponseHandler handler) {
    InputSerialization inputSerialization = InputSerialization.builder().csv(CSVInput.builder().build()).compressionType(CompressionType.NONE).build();
    OutputSerialization outputSerialization = OutputSerialization.builder().csv(CSVOutput.builder().build()).build();
    SelectObjectContentRequest select = SelectObjectContentRequest.builder().bucket("test-bucket").key("test-key").expression("test-query").expressionType(ExpressionType.SQL).inputSerialization(inputSerialization).outputSerialization(outputSerialization).build();
    return s3.selectObjectContent(select, handler);
}
Also used : SelectObjectContentRequest(software.amazon.awssdk.services.s3.model.SelectObjectContentRequest) InputSerialization(software.amazon.awssdk.services.s3.model.InputSerialization) OutputSerialization(software.amazon.awssdk.services.s3.model.OutputSerialization)

Example 4 with InputSerialization

use of com.amazonaws.services.s3.model.InputSerialization in project pxf by greenplum-db.

the class S3SelectAccessorTest method testFileHeaderInfoIsUse.

@Test
public void testFileHeaderInfoIsUse() {
    RequestContext context = getDefaultRequestContext();
    context.addOption("FILE_HEADER", "USE");
    InputSerialization inputSerialization = new S3SelectAccessor().getInputSerialization(context);
    assertEquals("USE", inputSerialization.getCsv().getFileHeaderInfo());
}
Also used : InputSerialization(com.amazonaws.services.s3.model.InputSerialization) RequestContext(org.greenplum.pxf.api.model.RequestContext) Test(org.junit.jupiter.api.Test)

Example 5 with InputSerialization

use of com.amazonaws.services.s3.model.InputSerialization in project pxf by greenplum-db.

the class S3SelectAccessorTest method testJSONInputSerialization.

@Test
public void testJSONInputSerialization() {
    RequestContext context = getRequestContext("s3:json");
    context.setFormat("json");
    InputSerialization inputSerialization = new S3SelectAccessor().getInputSerialization(context);
    assertNotNull(inputSerialization.getJson());
    assertNull(inputSerialization.getCsv());
    assertNull(inputSerialization.getParquet());
}
Also used : InputSerialization(com.amazonaws.services.s3.model.InputSerialization) RequestContext(org.greenplum.pxf.api.model.RequestContext) Test(org.junit.jupiter.api.Test)

Aggregations

InputSerialization (com.amazonaws.services.s3.model.InputSerialization)20 RequestContext (org.greenplum.pxf.api.model.RequestContext)14 Test (org.junit.jupiter.api.Test)14 OutputSerialization (com.amazonaws.services.s3.model.OutputSerialization)5 SelectObjectContentRequest (com.amazonaws.services.s3.model.SelectObjectContentRequest)5 CSVInput (com.amazonaws.services.s3.model.CSVInput)4 CSVOutput (com.amazonaws.services.s3.model.CSVOutput)4 URI (java.net.URI)4 BZip2Codec (org.apache.hadoop.io.compress.BZip2Codec)3 CompressionCodec (org.apache.hadoop.io.compress.CompressionCodec)3 GzipCodec (org.apache.hadoop.io.compress.GzipCodec)3 PrestoException (com.facebook.presto.spi.PrestoException)2 InputSerialization (software.amazon.awssdk.services.s3.model.InputSerialization)2 OutputSerialization (software.amazon.awssdk.services.s3.model.OutputSerialization)2 SelectObjectContentRequest (software.amazon.awssdk.services.s3.model.SelectObjectContentRequest)2 ParquetInput (com.amazonaws.services.s3.model.ParquetInput)1 PrestoException (io.prestosql.spi.PrestoException)1 SQLException (java.sql.SQLException)1