Search in sources :

Example 6 with QueryStatusChecker

use of com.amazonaws.athena.connector.lambda.QueryStatusChecker in project foundry-athena-query-federation-connector by palantir.

the class FoundryRecordHandler method doReadRecords.

@Override
@SuppressWarnings("MustBeClosedChecker")
public RecordResponse doReadRecords(BlockAllocator allocator, ReadRecordsRequest request) throws Exception {
    log.info("doReadRecords: {}:{}", request.getSchema(), request.getSplit().getSpillLocation());
    log.debug("Reading records with constraints: {}", request.getConstraints());
    SpillConfig spillConfig = getSpillConfig(request);
    S3Spiller spiller = new S3Spiller(amazonS3, spillConfig, allocator);
    List<String> columnNames = request.getSchema().getFields().stream().map(Field::getName).collect(Collectors.toList());
    // create a temporary block to obtain a handle to the BufferAllocator and allocator id
    BufferAllocator bufferAllocator;
    String allocatorId;
    try (Block block = allocator.createBlock(request.getSchema())) {
        bufferAllocator = block.getFieldVectors().get(0).getAllocator();
        allocatorId = block.getAllocatorId();
    }
    throttlingInvoker.setBlockSpiller(spiller);
    try (QueryStatusChecker queryStatusChecker = new QueryStatusChecker(athena, athenaInvoker, request.getQueryId());
        InputStream is = throttlingInvoker.invoke(() -> recordService.fetchSlice(foundryAuthProvider.getAuthHeader(), FetchSliceRequest.builder().slice(Slices.INSTANCE.fromSplit(request.getSplit())).columnNames(columnNames).maxBatchSize(SafeLong.of(spillConfig.getMaxBlockBytes())).build()))) {
        // we do not auto-close the reader to avoid releasing the buffers before serialization in the case
        // the block is held in memory
        PeekableArrowStreamReader reader = new PeekableArrowStreamReader(is, bufferAllocator);
        VectorSchemaRoot vectorSchemaRoot = reader.getVectorSchemaRoot();
        Block block = new Block(allocatorId, request.getSchema(), vectorSchemaRoot);
        reader.loadNextBatch();
        // spill if we have more blocks to read or the current block is too large to return
        if (reader.hasNextBatch() || block.getSize() > spillConfig.getMaxInlineBlockSize()) {
            do {
                spiller.spillBlock(block);
            } while (queryStatusChecker.isQueryRunning() && reader.loadNextBatch());
            // we have spilled so we can clean up the reader
            reader.close();
            return new RemoteReadRecordsResponse(request.getCatalogName(), request.getSchema(), spiller.getSpillLocations(), spillConfig.getEncryptionKey());
        } else {
            // no more batches so immediately return the block
            return new ReadRecordsResponse(request.getCatalogName(), block);
        }
    }
}
Also used : VectorSchemaRoot(org.apache.arrow.vector.VectorSchemaRoot) SpillConfig(com.amazonaws.athena.connector.lambda.data.SpillConfig) RemoteReadRecordsResponse(com.amazonaws.athena.connector.lambda.records.RemoteReadRecordsResponse) QueryStatusChecker(com.amazonaws.athena.connector.lambda.QueryStatusChecker) InputStream(java.io.InputStream) RemoteReadRecordsResponse(com.amazonaws.athena.connector.lambda.records.RemoteReadRecordsResponse) ReadRecordsResponse(com.amazonaws.athena.connector.lambda.records.ReadRecordsResponse) Block(com.amazonaws.athena.connector.lambda.data.Block) BufferAllocator(org.apache.arrow.memory.BufferAllocator)

Example 7 with QueryStatusChecker

use of com.amazonaws.athena.connector.lambda.QueryStatusChecker in project aws-athena-query-federation by awslabs.

the class BigQueryRecordHandlerTest method testReadWithConstraint.

@Test
public void testReadWithConstraint() throws Exception {
    try (ReadRecordsRequest request = new ReadRecordsRequest(federatedIdentity, BigQueryTestUtils.PROJECT_1_NAME, "queryId", new TableName("dataset1", "table1"), BigQueryTestUtils.getBlockTestSchema(), Split.newBuilder(S3SpillLocation.newBuilder().withBucket(bucket).withPrefix(prefix).withSplitId(UUID.randomUUID().toString()).withQueryId(UUID.randomUUID().toString()).withIsDirectory(true).build(), keyFactory.create()).build(), new Constraints(Collections.EMPTY_MAP), // This is ignored when directly calling readWithConstraints.
    0, 0)) {
        // This is ignored when directly calling readWithConstraints.
        // Always return try for the evaluator to keep all rows.
        ConstraintEvaluator evaluator = mock(ConstraintEvaluator.class);
        when(evaluator.apply(any(String.class), any(Object.class))).thenAnswer((InvocationOnMock invocationOnMock) -> {
            return true;
        });
        // Populate the schema and data that the mocked Google BigQuery client will return.
        com.google.cloud.bigquery.Schema tableSchema = BigQueryTestUtils.getTestSchema();
        List<FieldValueList> tableRows = Arrays.asList(BigQueryTestUtils.getBigQueryFieldValueList(false, 1000, "test1", 123123.12312), BigQueryTestUtils.getBigQueryFieldValueList(true, 500, "test2", 5345234.22111), BigQueryTestUtils.getBigQueryFieldValueList(false, 700, "test3", 324324.23423), BigQueryTestUtils.getBigQueryFieldValueList(true, 900, null, null), BigQueryTestUtils.getBigQueryFieldValueList(null, null, "test5", 2342.234234), BigQueryTestUtils.getBigQueryFieldValueList(true, 1200, "test6", 1123.12312), BigQueryTestUtils.getBigQueryFieldValueList(false, 100, "test7", 1313.12312), BigQueryTestUtils.getBigQueryFieldValueList(true, 120, "test8", 12313.1312), BigQueryTestUtils.getBigQueryFieldValueList(false, 300, "test9", 12323.1312));
        Page<FieldValueList> fieldValueList = new BigQueryPage<>(tableRows);
        TableResult result = new TableResult(tableSchema, tableRows.size(), fieldValueList);
        // Mock out the Google BigQuery Job.
        Job mockBigQueryJob = mock(Job.class);
        when(mockBigQueryJob.isDone()).thenReturn(false).thenReturn(true);
        when(mockBigQueryJob.getQueryResults()).thenReturn(result);
        when(bigQuery.create(any(JobInfo.class))).thenReturn(mockBigQueryJob);
        QueryStatusChecker queryStatusChecker = mock(QueryStatusChecker.class);
        when(queryStatusChecker.isQueryRunning()).thenReturn(true);
        // Execute the test
        bigQueryRecordHandler.readWithConstraint(spillWriter, request, queryStatusChecker);
        PowerMockito.mockStatic(System.class);
        PowerMockito.when(System.getenv(anyString())).thenReturn("test");
        logger.info("Project Name: " + BigQueryUtils.getProjectName(request.getCatalogName()));
        // Ensure that there was a spill so that we can read the spilled block.
        assertTrue(spillWriter.spilled());
    }
}
Also used : ConstraintEvaluator(com.amazonaws.athena.connector.lambda.domain.predicate.ConstraintEvaluator) TableName(com.amazonaws.athena.connector.lambda.domain.TableName) ReadRecordsRequest(com.amazonaws.athena.connector.lambda.records.ReadRecordsRequest) Constraints(com.amazonaws.athena.connector.lambda.domain.predicate.Constraints) QueryStatusChecker(com.amazonaws.athena.connector.lambda.QueryStatusChecker) InvocationOnMock(org.mockito.invocation.InvocationOnMock) S3Object(com.amazonaws.services.s3.model.S3Object) com.google.cloud.bigquery(com.google.cloud.bigquery) PrepareForTest(org.powermock.core.classloader.annotations.PrepareForTest) Test(org.junit.Test)

Example 8 with QueryStatusChecker

use of com.amazonaws.athena.connector.lambda.QueryStatusChecker in project aws-athena-query-federation by awslabs.

the class JdbcMetadataHandlerTest method setup.

@Before
public void setup() {
    this.jdbcConnectionFactory = Mockito.mock(JdbcConnectionFactory.class);
    this.connection = Mockito.mock(Connection.class, Mockito.RETURNS_DEEP_STUBS);
    Mockito.when(this.jdbcConnectionFactory.getConnection(Mockito.any(JdbcCredentialProvider.class))).thenReturn(this.connection);
    this.secretsManager = Mockito.mock(AWSSecretsManager.class);
    this.athena = Mockito.mock(AmazonAthena.class);
    Mockito.when(this.secretsManager.getSecretValue(Mockito.eq(new GetSecretValueRequest().withSecretId("testSecret")))).thenReturn(new GetSecretValueResult().withSecretString("{\"username\": \"testUser\", \"password\": \"testPassword\"}"));
    DatabaseConnectionConfig databaseConnectionConfig = new DatabaseConnectionConfig("testCatalog", "fakedatabase", "fakedatabase://jdbc:fakedatabase://hostname/${testSecret}", "testSecret");
    this.jdbcMetadataHandler = new JdbcMetadataHandler(databaseConnectionConfig, this.secretsManager, this.athena, jdbcConnectionFactory) {

        @Override
        public Schema getPartitionSchema(final String catalogName) {
            return PARTITION_SCHEMA;
        }

        @Override
        public void getPartitions(final BlockWriter blockWriter, final GetTableLayoutRequest getTableLayoutRequest, QueryStatusChecker queryStatusChecker) {
        }

        @Override
        public GetSplitsResponse doGetSplits(BlockAllocator blockAllocator, GetSplitsRequest getSplitsRequest) {
            return null;
        }
    };
    this.federatedIdentity = Mockito.mock(FederatedIdentity.class);
    this.blockAllocator = Mockito.mock(BlockAllocator.class);
}
Also used : JdbcConnectionFactory(com.amazonaws.athena.connectors.jdbc.connection.JdbcConnectionFactory) GetSplitsRequest(com.amazonaws.athena.connector.lambda.metadata.GetSplitsRequest) AWSSecretsManager(com.amazonaws.services.secretsmanager.AWSSecretsManager) Schema(org.apache.arrow.vector.types.pojo.Schema) Connection(java.sql.Connection) DatabaseConnectionConfig(com.amazonaws.athena.connectors.jdbc.connection.DatabaseConnectionConfig) GetSecretValueResult(com.amazonaws.services.secretsmanager.model.GetSecretValueResult) QueryStatusChecker(com.amazonaws.athena.connector.lambda.QueryStatusChecker) FederatedIdentity(com.amazonaws.athena.connector.lambda.security.FederatedIdentity) GetSplitsResponse(com.amazonaws.athena.connector.lambda.metadata.GetSplitsResponse) GetTableLayoutRequest(com.amazonaws.athena.connector.lambda.metadata.GetTableLayoutRequest) BlockAllocator(com.amazonaws.athena.connector.lambda.data.BlockAllocator) GetSecretValueRequest(com.amazonaws.services.secretsmanager.model.GetSecretValueRequest) BlockWriter(com.amazonaws.athena.connector.lambda.data.BlockWriter) JdbcCredentialProvider(com.amazonaws.athena.connectors.jdbc.connection.JdbcCredentialProvider) AmazonAthena(com.amazonaws.services.athena.AmazonAthena) Before(org.junit.Before)

Example 9 with QueryStatusChecker

use of com.amazonaws.athena.connector.lambda.QueryStatusChecker in project aws-athena-query-federation by awslabs.

the class MetadataHandler method doGetTableLayout.

/**
 * Used to get the partitions that must be read from the request table in order to satisfy the requested predicate.
 *
 * @param allocator Tool for creating and managing Apache Arrow Blocks.
 * @param request Provides details of the catalog, database, and table being queried as well as any filter predicate.
 * @return A GetTableLayoutResponse which primarily contains:
 * 1. An Apache Arrow Block with 0 or more partitions to read. 0 partitions implies there are 0 rows to read.
 * 2. Set<String> of partition column names which should correspond to columns in your Apache Arrow Block.
 * @note Partitions are opaque to Amazon Athena in that it does not understand their contents, just that it must call
 * doGetSplits(...) for each partition you return in order to determine which reads to perform and if those reads
 * can be parallelized. This means the contents of this response are more for you than they are for Athena.
 * @note Partitions are partially opaque to Amazon Athena in that it only understands your partition columns and
 * how to filter out partitions that do not meet the query's constraints. Any additional columns you add to the
 * partition data are ignored by Athena but passed on to calls on GetSplits.
 */
public GetTableLayoutResponse doGetTableLayout(final BlockAllocator allocator, final GetTableLayoutRequest request) throws Exception {
    SchemaBuilder constraintSchema = new SchemaBuilder().newBuilder();
    SchemaBuilder partitionSchemaBuilder = new SchemaBuilder().newBuilder();
    /**
     * Add our partition columns to the response schema so the engine knows how to interpret the list of
     * partitions we are going to return.
     */
    for (String nextPartCol : request.getPartitionCols()) {
        Field partitionCol = request.getSchema().findField(nextPartCol);
        partitionSchemaBuilder.addField(nextPartCol, partitionCol.getType());
        constraintSchema.addField(nextPartCol, partitionCol.getType());
    }
    enhancePartitionSchema(partitionSchemaBuilder, request);
    Schema partitionSchema = partitionSchemaBuilder.build();
    if (partitionSchema.getFields().isEmpty() && partitionSchema.getCustomMetadata().isEmpty()) {
        // Even though our table doesn't support complex layouts, partitioning or metadata, we need to convey that there is at least
        // 1 partition to read as part of the query or Athena will assume partition pruning found no candidate layouts to read.
        Block partitions = BlockUtils.newBlock(allocator, PARTITION_ID_COL, Types.MinorType.INT.getType(), 1);
        return new GetTableLayoutResponse(request.getCatalogName(), request.getTableName(), partitions);
    }
    /**
     * Now use the constraint that was in the request to do some partition pruning. Here we are just
     * generating some fake values for the partitions but in a real implementation you'd use your metastore
     * or knowledge of the actual table's physical layout to do this.
     */
    try (ConstraintEvaluator constraintEvaluator = new ConstraintEvaluator(allocator, constraintSchema.build(), request.getConstraints());
        QueryStatusChecker queryStatusChecker = new QueryStatusChecker(athena, athenaInvoker, request.getQueryId())) {
        Block partitions = allocator.createBlock(partitionSchemaBuilder.build());
        partitions.constrain(constraintEvaluator);
        SimpleBlockWriter blockWriter = new SimpleBlockWriter(partitions);
        getPartitions(blockWriter, request, queryStatusChecker);
        return new GetTableLayoutResponse(request.getCatalogName(), request.getTableName(), partitions);
    }
}
Also used : Field(org.apache.arrow.vector.types.pojo.Field) GetTableLayoutResponse(com.amazonaws.athena.connector.lambda.metadata.GetTableLayoutResponse) QueryStatusChecker(com.amazonaws.athena.connector.lambda.QueryStatusChecker) Schema(org.apache.arrow.vector.types.pojo.Schema) SchemaBuilder(com.amazonaws.athena.connector.lambda.data.SchemaBuilder) Block(com.amazonaws.athena.connector.lambda.data.Block) ConstraintEvaluator(com.amazonaws.athena.connector.lambda.domain.predicate.ConstraintEvaluator) SimpleBlockWriter(com.amazonaws.athena.connector.lambda.data.SimpleBlockWriter)

Example 10 with QueryStatusChecker

use of com.amazonaws.athena.connector.lambda.QueryStatusChecker in project aws-athena-query-federation by awslabs.

the class RecordHandler method doReadRecords.

/**
 * Used to read the row data associated with the provided Split.
 *
 * @param allocator Tool for creating and managing Apache Arrow Blocks.
 * @param request Details of the read request, including:
 * 1. The Split
 * 2. The Catalog, Database, and Table the read request is for.
 * 3. The filtering predicate (if any)
 * 4. The columns required for projection.
 * @return A RecordResponse which either a ReadRecordsResponse or a RemoteReadRecordsResponse containing the row
 * data for the requested Split.
 */
public RecordResponse doReadRecords(BlockAllocator allocator, ReadRecordsRequest request) throws Exception {
    logger.info("doReadRecords: {}:{}", request.getSchema(), request.getSplit().getSpillLocation());
    SpillConfig spillConfig = getSpillConfig(request);
    try (ConstraintEvaluator evaluator = new ConstraintEvaluator(allocator, request.getSchema(), request.getConstraints());
        S3BlockSpiller spiller = new S3BlockSpiller(amazonS3, spillConfig, allocator, request.getSchema(), evaluator);
        QueryStatusChecker queryStatusChecker = new QueryStatusChecker(athena, athenaInvoker, request.getQueryId())) {
        readWithConstraint(spiller, request, queryStatusChecker);
        if (!spiller.spilled()) {
            return new ReadRecordsResponse(request.getCatalogName(), spiller.getBlock());
        } else {
            return new RemoteReadRecordsResponse(request.getCatalogName(), request.getSchema(), spiller.getSpillLocations(), spillConfig.getEncryptionKey());
        }
    }
}
Also used : SpillConfig(com.amazonaws.athena.connector.lambda.data.SpillConfig) RemoteReadRecordsResponse(com.amazonaws.athena.connector.lambda.records.RemoteReadRecordsResponse) QueryStatusChecker(com.amazonaws.athena.connector.lambda.QueryStatusChecker) RemoteReadRecordsResponse(com.amazonaws.athena.connector.lambda.records.RemoteReadRecordsResponse) ReadRecordsResponse(com.amazonaws.athena.connector.lambda.records.ReadRecordsResponse) S3BlockSpiller(com.amazonaws.athena.connector.lambda.data.S3BlockSpiller) ConstraintEvaluator(com.amazonaws.athena.connector.lambda.domain.predicate.ConstraintEvaluator)

Aggregations

QueryStatusChecker (com.amazonaws.athena.connector.lambda.QueryStatusChecker)10 ConstraintEvaluator (com.amazonaws.athena.connector.lambda.domain.predicate.ConstraintEvaluator)5 BlockAllocator (com.amazonaws.athena.connector.lambda.data.BlockAllocator)4 Constraints (com.amazonaws.athena.connector.lambda.domain.predicate.Constraints)4 Schema (org.apache.arrow.vector.types.pojo.Schema)4 Test (org.junit.Test)4 Block (com.amazonaws.athena.connector.lambda.data.Block)3 BlockAllocatorImpl (com.amazonaws.athena.connector.lambda.data.BlockAllocatorImpl)3 BlockWriter (com.amazonaws.athena.connector.lambda.data.BlockWriter)3 SpillConfig (com.amazonaws.athena.connector.lambda.data.SpillConfig)3 TableName (com.amazonaws.athena.connector.lambda.domain.TableName)3 GetTableLayoutRequest (com.amazonaws.athena.connector.lambda.metadata.GetTableLayoutRequest)3 ReadRecordsRequest (com.amazonaws.athena.connector.lambda.records.ReadRecordsRequest)3 com.google.cloud.bigquery (com.google.cloud.bigquery)3 Before (org.junit.Before)3 InvocationOnMock (org.mockito.invocation.InvocationOnMock)3 PrepareForTest (org.powermock.core.classloader.annotations.PrepareForTest)3 S3BlockSpiller (com.amazonaws.athena.connector.lambda.data.S3BlockSpiller)2 SchemaBuilder (com.amazonaws.athena.connector.lambda.data.SchemaBuilder)2 GetSplitsRequest (com.amazonaws.athena.connector.lambda.metadata.GetSplitsRequest)2