Search in sources :

Example 41 with ReadRecordsRequest

use of com.amazonaws.athena.connector.lambda.records.ReadRecordsRequest in project aws-athena-query-federation by awslabs.

the class DataLakeGen2MuxRecordHandlerTest method readWithConstraint.

@Test
public void readWithConstraint() {
    BlockSpiller blockSpiller = Mockito.mock(BlockSpiller.class);
    ReadRecordsRequest readRecordsRequest = Mockito.mock(ReadRecordsRequest.class);
    Mockito.when(readRecordsRequest.getCatalogName()).thenReturn(DataLakeGen2Constants.NAME);
    this.jdbcRecordHandler.readWithConstraint(blockSpiller, readRecordsRequest, queryStatusChecker);
    Mockito.verify(this.dataLakeGen2RecordHandler, Mockito.times(1)).readWithConstraint(Mockito.eq(blockSpiller), Mockito.eq(readRecordsRequest), Mockito.eq(queryStatusChecker));
}
Also used : ReadRecordsRequest(com.amazonaws.athena.connector.lambda.records.ReadRecordsRequest) BlockSpiller(com.amazonaws.athena.connector.lambda.data.BlockSpiller) Test(org.junit.Test)

Example 42 with ReadRecordsRequest

use of com.amazonaws.athena.connector.lambda.records.ReadRecordsRequest in project aws-athena-query-federation by awslabs.

the class DynamoDBRecordHandlerTest method testReadScanSplit.

@Test
public void testReadScanSplit() throws Exception {
    Split split = Split.newBuilder(SPILL_LOCATION, keyFactory.create()).add(TABLE_METADATA, TEST_TABLE).add(SEGMENT_ID_PROPERTY, "0").add(SEGMENT_COUNT_METADATA, "1").build();
    ReadRecordsRequest request = new ReadRecordsRequest(TEST_IDENTITY, TEST_CATALOG_NAME, TEST_QUERY_ID, TEST_TABLE_NAME, schema, split, new Constraints(ImmutableMap.of()), // too big to spill
    100_000_000_000L, 100_000_000_000L);
    RecordResponse rawResponse = handler.doReadRecords(allocator, request);
    assertTrue(rawResponse instanceof ReadRecordsResponse);
    ReadRecordsResponse response = (ReadRecordsResponse) rawResponse;
    logger.info("testReadScanSplit: rows[{}]", response.getRecordCount());
    assertEquals(1000, response.getRecords().getRowCount());
    logger.info("testReadScanSplit: {}", BlockUtils.rowToString(response.getRecords(), 0));
}
Also used : ReadRecordsRequest(com.amazonaws.athena.connector.lambda.records.ReadRecordsRequest) Constraints(com.amazonaws.athena.connector.lambda.domain.predicate.Constraints) ReadRecordsResponse(com.amazonaws.athena.connector.lambda.records.ReadRecordsResponse) RecordResponse(com.amazonaws.athena.connector.lambda.records.RecordResponse) Split(com.amazonaws.athena.connector.lambda.domain.Split) Test(org.junit.Test)

Example 43 with ReadRecordsRequest

use of com.amazonaws.athena.connector.lambda.records.ReadRecordsRequest in project aws-athena-query-federation by awslabs.

the class ElasticsearchRecordHandlerTest method doReadRecordsNoSpill.

@Test
public void doReadRecordsNoSpill() throws Exception {
    logger.info("doReadRecordsNoSpill: enter");
    SearchHit[] searchHit = new SearchHit[2];
    searchHit[0] = new SearchHit(1);
    searchHit[1] = new SearchHit(2);
    SearchHits searchHits = new SearchHits(searchHit, new TotalHits(2, TotalHits.Relation.EQUAL_TO), 4);
    when(mockResponse.getHits()).thenReturn(searchHits);
    Map<String, ValueSet> constraintsMap = new HashMap<>();
    constraintsMap.put("myshort", SortedRangeSet.copyOf(Types.MinorType.SMALLINT.getType(), ImmutableList.of(Range.range(allocator, Types.MinorType.SMALLINT.getType(), (short) 1955, false, (short) 1972, true)), false));
    List<String> expectedProjection = new ArrayList<>();
    mapping.getFields().forEach(field -> expectedProjection.add(field.getName()));
    String expectedPredicate = "(_exists_:myshort) AND myshort:({1955 TO 1972])";
    ReadRecordsRequest request = new ReadRecordsRequest(fakeIdentity(), "elasticsearch", "queryId-" + System.currentTimeMillis(), new TableName("movies", "mishmash"), mapping, split, new Constraints(constraintsMap), // 100GB don't expect this to spill
    100_000_000_000L, 100_000_000_000L);
    RecordResponse rawResponse = handler.doReadRecords(allocator, request);
    // Capture the SearchRequest object from the call to client.getDocuments().
    // The former contains information such as the projection and predicate.
    ArgumentCaptor<SearchRequest> argumentCaptor = ArgumentCaptor.forClass(SearchRequest.class);
    verify(mockClient).getDocuments(argumentCaptor.capture());
    SearchRequest searchRequest = argumentCaptor.getValue();
    // Get the actual projection and compare to the expected one.
    List<String> actualProjection = ImmutableList.copyOf(searchRequest.source().fetchSource().includes());
    assertEquals("Projections do not match", expectedProjection, actualProjection);
    // Get the actual predicate and compare to the expected one.
    String actualPredicate = searchRequest.source().query().queryName();
    assertEquals("Predicates do not match", expectedPredicate, actualPredicate);
    assertTrue(rawResponse instanceof ReadRecordsResponse);
    ReadRecordsResponse response = (ReadRecordsResponse) rawResponse;
    logger.info("doReadRecordsNoSpill: rows[{}]", response.getRecordCount());
    assertEquals(2, response.getRecords().getRowCount());
    for (int i = 0; i < response.getRecords().getRowCount(); ++i) {
        logger.info("doReadRecordsNoSpill - Row: {}, {}", i, BlockUtils.rowToString(response.getRecords(), i));
    }
    logger.info("doReadRecordsNoSpill: exit");
}
Also used : TotalHits(org.apache.lucene.search.TotalHits) SearchRequest(org.elasticsearch.action.search.SearchRequest) SearchHit(org.elasticsearch.search.SearchHit) HashMap(java.util.HashMap) ReadRecordsResponse(com.amazonaws.athena.connector.lambda.records.ReadRecordsResponse) RemoteReadRecordsResponse(com.amazonaws.athena.connector.lambda.records.RemoteReadRecordsResponse) ArrayList(java.util.ArrayList) Mockito.anyString(org.mockito.Mockito.anyString) RecordResponse(com.amazonaws.athena.connector.lambda.records.RecordResponse) TableName(com.amazonaws.athena.connector.lambda.domain.TableName) ReadRecordsRequest(com.amazonaws.athena.connector.lambda.records.ReadRecordsRequest) Constraints(com.amazonaws.athena.connector.lambda.domain.predicate.Constraints) SearchHits(org.elasticsearch.search.SearchHits) ValueSet(com.amazonaws.athena.connector.lambda.domain.predicate.ValueSet) Test(org.junit.Test)

Example 44 with ReadRecordsRequest

use of com.amazonaws.athena.connector.lambda.records.ReadRecordsRequest in project aws-athena-query-federation by awslabs.

the class DynamoDBRecordHandlerTest method testReadQuerySplit.

@Test
public void testReadQuerySplit() throws Exception {
    Map<String, String> expressionNames = ImmutableMap.of("#col_1", "col_1");
    Map<String, AttributeValue> expressionValues = ImmutableMap.of(":v0", toAttributeValue(1));
    Split split = Split.newBuilder(SPILL_LOCATION, keyFactory.create()).add(TABLE_METADATA, TEST_TABLE).add(HASH_KEY_NAME_METADATA, "col_0").add("col_0", toJsonString(toAttributeValue("test_str_0"))).add(RANGE_KEY_FILTER_METADATA, "#col_1 >= :v0").add(EXPRESSION_NAMES_METADATA, toJsonString(expressionNames)).add(EXPRESSION_VALUES_METADATA, toJsonString(expressionValues)).build();
    ReadRecordsRequest request = new ReadRecordsRequest(TEST_IDENTITY, TEST_CATALOG_NAME, TEST_QUERY_ID, TEST_TABLE_NAME, schema, split, new Constraints(ImmutableMap.of()), // too big to spill
    100_000_000_000L, 100_000_000_000L);
    RecordResponse rawResponse = handler.doReadRecords(allocator, request);
    assertTrue(rawResponse instanceof ReadRecordsResponse);
    ReadRecordsResponse response = (ReadRecordsResponse) rawResponse;
    logger.info("testReadQuerySplit: rows[{}]", response.getRecordCount());
    assertEquals(2, response.getRecords().getRowCount());
    logger.info("testReadQuerySplit: {}", BlockUtils.rowToString(response.getRecords(), 0));
}
Also used : ItemUtils.toAttributeValue(com.amazonaws.services.dynamodbv2.document.ItemUtils.toAttributeValue) AttributeValue(com.amazonaws.services.dynamodbv2.model.AttributeValue) ReadRecordsRequest(com.amazonaws.athena.connector.lambda.records.ReadRecordsRequest) Constraints(com.amazonaws.athena.connector.lambda.domain.predicate.Constraints) ReadRecordsResponse(com.amazonaws.athena.connector.lambda.records.ReadRecordsResponse) Jackson.toJsonString(com.amazonaws.util.json.Jackson.toJsonString) RecordResponse(com.amazonaws.athena.connector.lambda.records.RecordResponse) Split(com.amazonaws.athena.connector.lambda.domain.Split) Test(org.junit.Test)

Example 45 with ReadRecordsRequest

use of com.amazonaws.athena.connector.lambda.records.ReadRecordsRequest in project aws-athena-query-federation by awslabs.

the class ExampleRecordHandlerTest method doReadRecordsSpill.

@Test
public void doReadRecordsSpill() throws Exception {
    logger.info("doReadRecordsSpill: enter");
    for (int i = 0; i < 2; i++) {
        EncryptionKey encryptionKey = (i % 2 == 0) ? keyFactory.create() : null;
        logger.info("doReadRecordsSpill: Using encryptionKey[" + encryptionKey + "]");
        Map<String, ValueSet> constraintsMap = new HashMap<>();
        constraintsMap.put("col3", SortedRangeSet.copyOf(Types.MinorType.FLOAT8.getType(), ImmutableList.of(Range.greaterThan(allocator, Types.MinorType.FLOAT8.getType(), -10000D)), false));
        constraintsMap.put("unknown", EquatableValueSet.newBuilder(allocator, Types.MinorType.FLOAT8.getType(), false, true).add(1.1D).build());
        constraintsMap.put("unknown2", new AllOrNoneValueSet(Types.MinorType.FLOAT8.getType(), false, true));
        ReadRecordsRequest request = new ReadRecordsRequest(IdentityUtil.fakeIdentity(), "catalog", "queryId-" + System.currentTimeMillis(), new TableName("schema", "table"), schemaForRead, Split.newBuilder(makeSpillLocation(), encryptionKey).add("year", "10").add("month", "10").add("day", "10").build(), new Constraints(constraintsMap), // ~1.5MB so we should see some spill
        1_600_000L, 1000L);
        ObjectMapperUtil.assertSerialization(request);
        RecordResponse rawResponse = recordService.readRecords(request);
        ObjectMapperUtil.assertSerialization(rawResponse);
        assertTrue(rawResponse instanceof RemoteReadRecordsResponse);
        try (RemoteReadRecordsResponse response = (RemoteReadRecordsResponse) rawResponse) {
            logger.info("doReadRecordsSpill: remoteBlocks[{}]", response.getRemoteBlocks().size());
            assertTrue(response.getNumberBlocks() > 1);
            int blockNum = 0;
            for (SpillLocation next : response.getRemoteBlocks()) {
                S3SpillLocation spillLocation = (S3SpillLocation) next;
                try (Block block = spillReader.read(spillLocation, response.getEncryptionKey(), response.getSchema())) {
                    logger.info("doReadRecordsSpill: blockNum[{}] and recordCount[{}]", blockNum++, block.getRowCount());
                    // assertTrue(++blockNum < response.getRemoteBlocks().size() && block.getRowCount() > 10_000);
                    logger.info("doReadRecordsSpill: {}", BlockUtils.rowToString(block, 0));
                    assertNotNull(BlockUtils.rowToString(block, 0));
                }
            }
        }
    }
    logger.info("doReadRecordsSpill: exit");
}
Also used : RemoteReadRecordsResponse(com.amazonaws.athena.connector.lambda.records.RemoteReadRecordsResponse) SpillLocation(com.amazonaws.athena.connector.lambda.domain.spill.SpillLocation) S3SpillLocation(com.amazonaws.athena.connector.lambda.domain.spill.S3SpillLocation) HashMap(java.util.HashMap) AllOrNoneValueSet(com.amazonaws.athena.connector.lambda.domain.predicate.AllOrNoneValueSet) EncryptionKey(com.amazonaws.athena.connector.lambda.security.EncryptionKey) Matchers.anyString(org.mockito.Matchers.anyString) RecordResponse(com.amazonaws.athena.connector.lambda.records.RecordResponse) TableName(com.amazonaws.athena.connector.lambda.domain.TableName) ReadRecordsRequest(com.amazonaws.athena.connector.lambda.records.ReadRecordsRequest) Constraints(com.amazonaws.athena.connector.lambda.domain.predicate.Constraints) S3SpillLocation(com.amazonaws.athena.connector.lambda.domain.spill.S3SpillLocation) Block(com.amazonaws.athena.connector.lambda.data.Block) ValueSet(com.amazonaws.athena.connector.lambda.domain.predicate.ValueSet) EquatableValueSet(com.amazonaws.athena.connector.lambda.domain.predicate.EquatableValueSet) AllOrNoneValueSet(com.amazonaws.athena.connector.lambda.domain.predicate.AllOrNoneValueSet) Test(org.junit.Test)

Aggregations

ReadRecordsRequest (com.amazonaws.athena.connector.lambda.records.ReadRecordsRequest)85 Test (org.junit.Test)82 Constraints (com.amazonaws.athena.connector.lambda.domain.predicate.Constraints)55 TableName (com.amazonaws.athena.connector.lambda.domain.TableName)40 Split (com.amazonaws.athena.connector.lambda.domain.Split)35 RecordResponse (com.amazonaws.athena.connector.lambda.records.RecordResponse)33 BlockSpiller (com.amazonaws.athena.connector.lambda.data.BlockSpiller)31 HashMap (java.util.HashMap)28 Schema (org.apache.arrow.vector.types.pojo.Schema)28 ValueSet (com.amazonaws.athena.connector.lambda.domain.predicate.ValueSet)27 ReadRecordsResponse (com.amazonaws.athena.connector.lambda.records.ReadRecordsResponse)27 S3SpillLocation (com.amazonaws.athena.connector.lambda.domain.spill.S3SpillLocation)23 Matchers.anyString (org.mockito.Matchers.anyString)23 RemoteReadRecordsResponse (com.amazonaws.athena.connector.lambda.records.RemoteReadRecordsResponse)20 ArrayList (java.util.ArrayList)16 InvocationOnMock (org.mockito.invocation.InvocationOnMock)16 Connection (java.sql.Connection)15 Block (com.amazonaws.athena.connector.lambda.data.Block)14 SpillLocation (com.amazonaws.athena.connector.lambda.domain.spill.SpillLocation)12 EquatableValueSet (com.amazonaws.athena.connector.lambda.domain.predicate.EquatableValueSet)11