Search in sources :

Example 1 with GetSplitsRequest

use of com.amazonaws.athena.connector.lambda.metadata.GetSplitsRequest in project aws-athena-query-federation by awslabs.

the class CompositeHandlerTest method doGetSplits.

@Test
public void doGetSplits() throws Exception {
    GetSplitsRequest req = mock(GetSplitsRequest.class);
    when(req.getRequestType()).thenReturn(MetadataRequestType.GET_SPLITS);
    SpillLocationVerifier mockVerifier = mock(SpillLocationVerifier.class);
    doNothing().when(mockVerifier).checkBucketAuthZ(any(String.class));
    Whitebox.setInternalState(mockMetadataHandler, "verifier", mockVerifier);
    compositeHandler.handleRequest(allocator, req, new ByteArrayOutputStream(), objectMapper);
    verify(mockMetadataHandler, times(1)).doGetSplits(any(BlockAllocatorImpl.class), any(GetSplitsRequest.class));
}
Also used : GetSplitsRequest(com.amazonaws.athena.connector.lambda.metadata.GetSplitsRequest) BlockAllocatorImpl(com.amazonaws.athena.connector.lambda.data.BlockAllocatorImpl) SpillLocationVerifier(com.amazonaws.athena.connector.lambda.domain.spill.SpillLocationVerifier) ByteArrayOutputStream(java.io.ByteArrayOutputStream) Test(org.junit.Test)

Example 2 with GetSplitsRequest

use of com.amazonaws.athena.connector.lambda.metadata.GetSplitsRequest in project aws-athena-query-federation by awslabs.

the class GlueMetadataHandlerTest method setUp.

@Before
public void setUp() throws Exception {
    logger.info("{}: enter", testName.getMethodName());
    handler = new GlueMetadataHandler(mockGlue, new LocalKeyFactory(), mock(AWSSecretsManager.class), mock(AmazonAthena.class), "glue-test", "spill-bucket", "spill-prefix") {

        @Override
        public GetTableLayoutResponse doGetTableLayout(BlockAllocator blockAllocator, GetTableLayoutRequest request) {
            throw new UnsupportedOperationException();
        }

        @Override
        public void getPartitions(BlockWriter blockWriter, GetTableLayoutRequest request, QueryStatusChecker queryStatusChecker) throws Exception {
            throw new UnsupportedOperationException();
        }

        @Override
        public GetSplitsResponse doGetSplits(BlockAllocator blockAllocator, GetSplitsRequest request) {
            throw new UnsupportedOperationException();
        }
    };
    allocator = new BlockAllocatorImpl();
    // doListTables pagination.
    when(mockGlue.getTables(any(GetTablesRequest.class))).thenAnswer((InvocationOnMock invocationOnMock) -> {
        GetTablesRequest request = (GetTablesRequest) invocationOnMock.getArguments()[0];
        String nextToken = request.getNextToken();
        int pageSize = request.getMaxResults() == null ? UNLIMITED_PAGE_SIZE_VALUE : request.getMaxResults();
        assertEquals(accountId, request.getCatalogId());
        assertEquals(schema, request.getDatabaseName());
        GetTablesResult mockResult = mock(GetTablesResult.class);
        if (pageSize == UNLIMITED_PAGE_SIZE_VALUE) {
            // Simulate full list of tables returned from Glue.
            when(mockResult.getTableList()).thenReturn(unPaginatedTables);
            when(mockResult.getNextToken()).thenReturn(null);
        } else {
            // Simulate paginated list of tables returned from Glue.
            List<Table> paginatedTables = unPaginatedTables.stream().sorted(Comparator.comparing(Table::getName)).filter(table -> nextToken == null || table.getName().compareTo(nextToken) >= 0).limit(pageSize + 1).collect(Collectors.toList());
            if (paginatedTables.size() > pageSize) {
                when(mockResult.getNextToken()).thenReturn(paginatedTables.get(pageSize).getName());
                when(mockResult.getTableList()).thenReturn(paginatedTables.subList(0, pageSize));
            } else {
                when(mockResult.getNextToken()).thenReturn(null);
                when(mockResult.getTableList()).thenReturn(paginatedTables);
            }
        }
        return mockResult;
    });
}
Also used : Table(com.amazonaws.services.glue.model.Table) GetSplitsRequest(com.amazonaws.athena.connector.lambda.metadata.GetSplitsRequest) GetTablesResult(com.amazonaws.services.glue.model.GetTablesResult) LocalKeyFactory(com.amazonaws.athena.connector.lambda.security.LocalKeyFactory) GetTableLayoutResponse(com.amazonaws.athena.connector.lambda.metadata.GetTableLayoutResponse) QueryStatusChecker(com.amazonaws.athena.connector.lambda.QueryStatusChecker) BlockAllocatorImpl(com.amazonaws.athena.connector.lambda.data.BlockAllocatorImpl) GetSplitsResponse(com.amazonaws.athena.connector.lambda.metadata.GetSplitsResponse) InvocationOnMock(org.mockito.invocation.InvocationOnMock) BlockAllocator(com.amazonaws.athena.connector.lambda.data.BlockAllocator) GetTableLayoutRequest(com.amazonaws.athena.connector.lambda.metadata.GetTableLayoutRequest) BlockWriter(com.amazonaws.athena.connector.lambda.data.BlockWriter) GetTablesRequest(com.amazonaws.services.glue.model.GetTablesRequest) Before(org.junit.Before)

Example 3 with GetSplitsRequest

use of com.amazonaws.athena.connector.lambda.metadata.GetSplitsRequest in project aws-athena-query-federation by awslabs.

the class ExampleMetadataHandlerTest method doGetSplits.

/**
 * The goal of this test is to test happy case for getting splits and also to exercise the continuation token
 * logic specifically.
 */
@Test
public void doGetSplits() {
    logger.info("doGetSplits: enter");
    String yearCol = "year";
    String monthCol = "month";
    String dayCol = "day";
    // This is the schema that ExampleMetadataHandler has layed out for a 'Partition' so we need to populate this
    // minimal set of info here.
    Schema schema = SchemaBuilder.newBuilder().addField(yearCol, new ArrowType.Int(16, false)).addField(monthCol, new ArrowType.Int(16, false)).addField(dayCol, new ArrowType.Int(16, false)).addField(ExampleMetadataHandler.PARTITION_LOCATION, new ArrowType.Utf8()).addField(ExampleMetadataHandler.SERDE, new ArrowType.Utf8()).build();
    List<String> partitionCols = new ArrayList<>();
    partitionCols.add(yearCol);
    partitionCols.add(monthCol);
    partitionCols.add(dayCol);
    Map<String, ValueSet> constraintsMap = new HashMap<>();
    constraintsMap.put(dayCol, SortedRangeSet.copyOf(Types.MinorType.INT.getType(), ImmutableList.of(Range.greaterThan(allocator, Types.MinorType.INT.getType(), 20)), false));
    Block partitions = allocator.createBlock(schema);
    int num_partitions = 100;
    for (int i = 0; i < num_partitions; i++) {
        BlockUtils.setValue(partitions.getFieldVector(yearCol), i, 2016 + i);
        BlockUtils.setValue(partitions.getFieldVector(monthCol), i, (i % 12) + 1);
        BlockUtils.setValue(partitions.getFieldVector(dayCol), i, (i % 28) + 1);
        BlockUtils.setValue(partitions.getFieldVector(ExampleMetadataHandler.PARTITION_LOCATION), i, String.valueOf(i));
        BlockUtils.setValue(partitions.getFieldVector(ExampleMetadataHandler.SERDE), i, "TextInputType");
    }
    partitions.setRowCount(num_partitions);
    String continuationToken = null;
    GetSplitsRequest originalReq = new GetSplitsRequest(IdentityUtil.fakeIdentity(), "queryId", "catalog_name", new TableName("schema", "table_name"), partitions, partitionCols, new Constraints(constraintsMap), continuationToken);
    int numContinuations = 0;
    do {
        GetSplitsRequest req = new GetSplitsRequest(originalReq, continuationToken);
        ObjectMapperUtil.assertSerialization(req);
        logger.info("doGetSplits: req[{}]", req);
        metadataHandler.setEncryption(numContinuations % 2 == 0);
        logger.info("doGetSplits: Toggle encryption " + (numContinuations % 2 == 0));
        MetadataResponse rawResponse = metadataHandler.doGetSplits(allocator, req);
        ObjectMapperUtil.assertSerialization(rawResponse);
        assertEquals(MetadataRequestType.GET_SPLITS, rawResponse.getRequestType());
        GetSplitsResponse response = (GetSplitsResponse) rawResponse;
        continuationToken = response.getContinuationToken();
        logger.info("doGetSplits: continuationToken[{}] - numSplits[{}] - maxSplits[{}]", new Object[] { continuationToken, response.getSplits().size(), MAX_SPLITS_PER_REQUEST });
        for (Split nextSplit : response.getSplits()) {
            if (numContinuations % 2 == 0) {
                assertNotNull(nextSplit.getEncryptionKey());
            } else {
                assertNull(nextSplit.getEncryptionKey());
            }
            assertNotNull(nextSplit.getProperty(SplitProperties.LOCATION.getId()));
            assertNotNull(nextSplit.getProperty(SplitProperties.SERDE.getId()));
            assertNotNull(nextSplit.getProperty(SplitProperties.SPLIT_PART.getId()));
        }
        assertTrue("Continuation criteria violated", (response.getSplits().size() == MAX_SPLITS_PER_REQUEST && response.getContinuationToken() != null) || response.getSplits().size() < MAX_SPLITS_PER_REQUEST);
        if (continuationToken != null) {
            numContinuations++;
        }
    } while (continuationToken != null);
    assertTrue(numContinuations > 0);
    logger.info("doGetSplits: exit");
}
Also used : GetSplitsRequest(com.amazonaws.athena.connector.lambda.metadata.GetSplitsRequest) HashMap(java.util.HashMap) Schema(org.apache.arrow.vector.types.pojo.Schema) ArrowType(org.apache.arrow.vector.types.pojo.ArrowType) ArrayList(java.util.ArrayList) TableName(com.amazonaws.athena.connector.lambda.domain.TableName) Constraints(com.amazonaws.athena.connector.lambda.domain.predicate.Constraints) GetSplitsResponse(com.amazonaws.athena.connector.lambda.metadata.GetSplitsResponse) MetadataResponse(com.amazonaws.athena.connector.lambda.metadata.MetadataResponse) Block(com.amazonaws.athena.connector.lambda.data.Block) Split(com.amazonaws.athena.connector.lambda.domain.Split) ValueSet(com.amazonaws.athena.connector.lambda.domain.predicate.ValueSet) Test(org.junit.Test)

Example 4 with GetSplitsRequest

use of com.amazonaws.athena.connector.lambda.metadata.GetSplitsRequest in project aws-athena-query-federation by awslabs.

the class HbaseMetadataHandlerTest method doGetSplits.

@Test
public void doGetSplits() throws IOException {
    List<HRegionInfo> regionServers = new ArrayList<>();
    regionServers.add(TestUtils.makeRegion(1, "schema1", "table1"));
    regionServers.add(TestUtils.makeRegion(2, "schema1", "table1"));
    regionServers.add(TestUtils.makeRegion(3, "schema1", "table1"));
    regionServers.add(TestUtils.makeRegion(4, "schema1", "table1"));
    when(mockClient.getTableRegions(any())).thenReturn(regionServers);
    List<String> partitionCols = new ArrayList<>();
    Block partitions = BlockUtils.newBlock(allocator, "partitionId", Types.MinorType.INT.getType(), 0);
    String continuationToken = null;
    GetSplitsRequest originalReq = new GetSplitsRequest(IDENTITY, QUERY_ID, DEFAULT_CATALOG, TABLE_NAME, partitions, partitionCols, new Constraints(new HashMap<>()), null);
    GetSplitsRequest req = new GetSplitsRequest(originalReq, continuationToken);
    logger.info("doGetSplits: req[{}]", req);
    MetadataResponse rawResponse = handler.doGetSplits(allocator, req);
    assertEquals(MetadataRequestType.GET_SPLITS, rawResponse.getRequestType());
    GetSplitsResponse response = (GetSplitsResponse) rawResponse;
    continuationToken = response.getContinuationToken();
    logger.info("doGetSplits: continuationToken[{}] - numSplits[{}]", new Object[] { continuationToken, response.getSplits().size() });
    assertTrue("Continuation criteria violated", response.getSplits().size() == 4);
    assertTrue("Continuation criteria violated", response.getContinuationToken() == null);
}
Also used : HRegionInfo(org.apache.hadoop.hbase.HRegionInfo) Constraints(com.amazonaws.athena.connector.lambda.domain.predicate.Constraints) GetSplitsRequest(com.amazonaws.athena.connector.lambda.metadata.GetSplitsRequest) HashMap(java.util.HashMap) GetSplitsResponse(com.amazonaws.athena.connector.lambda.metadata.GetSplitsResponse) ArrayList(java.util.ArrayList) MetadataResponse(com.amazonaws.athena.connector.lambda.metadata.MetadataResponse) Block(com.amazonaws.athena.connector.lambda.data.Block) Matchers.anyString(org.mockito.Matchers.anyString) Test(org.junit.Test)

Example 5 with GetSplitsRequest

use of com.amazonaws.athena.connector.lambda.metadata.GetSplitsRequest in project aws-athena-query-federation by awslabs.

the class MySqlMetadataHandlerTest method doGetSplitsContinuation.

@Test
public void doGetSplitsContinuation() throws Exception {
    BlockAllocator blockAllocator = new BlockAllocatorImpl();
    Constraints constraints = Mockito.mock(Constraints.class);
    TableName tableName = new TableName("testSchema", "testTable");
    Schema partitionSchema = this.mySqlMetadataHandler.getPartitionSchema("testCatalogName");
    Set<String> partitionCols = partitionSchema.getFields().stream().map(Field::getName).collect(Collectors.toSet());
    GetTableLayoutRequest getTableLayoutRequest = new GetTableLayoutRequest(this.federatedIdentity, "testQueryId", "testCatalogName", tableName, constraints, partitionSchema, partitionCols);
    PreparedStatement preparedStatement = Mockito.mock(PreparedStatement.class);
    Mockito.when(this.connection.prepareStatement(MySqlMetadataHandler.GET_PARTITIONS_QUERY)).thenReturn(preparedStatement);
    String[] columns = { "partition_name" };
    int[] types = { Types.VARCHAR };
    Object[][] values = { { "p0" }, { "p1" } };
    ResultSet resultSet = mockResultSet(columns, types, values, new AtomicInteger(-1));
    final String expectedQuery = String.format(MySqlMetadataHandler.GET_PARTITIONS_QUERY, tableName.getTableName(), tableName.getSchemaName());
    Mockito.when(preparedStatement.executeQuery()).thenReturn(resultSet);
    Mockito.when(this.connection.getMetaData().getSearchStringEscape()).thenReturn(null);
    GetTableLayoutResponse getTableLayoutResponse = this.mySqlMetadataHandler.doGetTableLayout(blockAllocator, getTableLayoutRequest);
    BlockAllocator splitBlockAllocator = new BlockAllocatorImpl();
    GetSplitsRequest getSplitsRequest = new GetSplitsRequest(this.federatedIdentity, "testQueryId", "testCatalogName", tableName, getTableLayoutResponse.getPartitions(), new ArrayList<>(partitionCols), constraints, "1");
    GetSplitsResponse getSplitsResponse = this.mySqlMetadataHandler.doGetSplits(splitBlockAllocator, getSplitsRequest);
    Set<Map<String, String>> expectedSplits = new HashSet<>();
    expectedSplits.add(Collections.singletonMap("partition_name", "p1"));
    Assert.assertEquals(expectedSplits.size(), getSplitsResponse.getSplits().size());
    Set<Map<String, String>> actualSplits = getSplitsResponse.getSplits().stream().map(Split::getProperties).collect(Collectors.toSet());
    Assert.assertEquals(expectedSplits, actualSplits);
}
Also used : GetSplitsRequest(com.amazonaws.athena.connector.lambda.metadata.GetSplitsRequest) Schema(org.apache.arrow.vector.types.pojo.Schema) PreparedStatement(java.sql.PreparedStatement) TableName(com.amazonaws.athena.connector.lambda.domain.TableName) Constraints(com.amazonaws.athena.connector.lambda.domain.predicate.Constraints) GetTableLayoutResponse(com.amazonaws.athena.connector.lambda.metadata.GetTableLayoutResponse) BlockAllocatorImpl(com.amazonaws.athena.connector.lambda.data.BlockAllocatorImpl) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) GetSplitsResponse(com.amazonaws.athena.connector.lambda.metadata.GetSplitsResponse) BlockAllocator(com.amazonaws.athena.connector.lambda.data.BlockAllocator) GetTableLayoutRequest(com.amazonaws.athena.connector.lambda.metadata.GetTableLayoutRequest) ResultSet(java.sql.ResultSet) Map(java.util.Map) HashSet(java.util.HashSet) Test(org.junit.Test)

Aggregations

GetSplitsRequest (com.amazonaws.athena.connector.lambda.metadata.GetSplitsRequest)46 Test (org.junit.Test)41 GetSplitsResponse (com.amazonaws.athena.connector.lambda.metadata.GetSplitsResponse)32 Constraints (com.amazonaws.athena.connector.lambda.domain.predicate.Constraints)29 TableName (com.amazonaws.athena.connector.lambda.domain.TableName)24 Schema (org.apache.arrow.vector.types.pojo.Schema)24 GetTableLayoutRequest (com.amazonaws.athena.connector.lambda.metadata.GetTableLayoutRequest)17 BlockAllocator (com.amazonaws.athena.connector.lambda.data.BlockAllocator)16 BlockAllocatorImpl (com.amazonaws.athena.connector.lambda.data.BlockAllocatorImpl)16 GetTableLayoutResponse (com.amazonaws.athena.connector.lambda.metadata.GetTableLayoutResponse)16 HashMap (java.util.HashMap)16 HashSet (java.util.HashSet)15 MetadataResponse (com.amazonaws.athena.connector.lambda.metadata.MetadataResponse)14 Map (java.util.Map)14 Block (com.amazonaws.athena.connector.lambda.data.Block)13 ResultSet (java.sql.ResultSet)12 AtomicInteger (java.util.concurrent.atomic.AtomicInteger)12 PreparedStatement (java.sql.PreparedStatement)9 ArrayList (java.util.ArrayList)9 Split (com.amazonaws.athena.connector.lambda.domain.Split)8