Search in sources :

Example 86 with Constraints

use of com.amazonaws.athena.connector.lambda.domain.predicate.Constraints in project aws-athena-query-federation by awslabs.

the class ExampleMetadataHandlerTest method getPartitions.

@Test
public void getPartitions() throws Exception {
    if (!enableTests) {
        // We do this because until you complete the tutorial these tests will fail. When you attempt to publis
        // using ../toos/publish.sh ...  it will set the publishing flag and force these tests. This is how we
        // avoid breaking the build but still have a useful tutorial. We are also duplicateing this block
        // on purpose since this is a somewhat odd pattern.
        logger.info("getPartitions: Tests are disabled, to enable them set the 'publishing' environment variable " + "using maven clean install -Dpublishing=true");
        return;
    }
    logger.info("doGetTableLayout - enter");
    Schema tableSchema = SchemaBuilder.newBuilder().addIntField("day").addIntField("month").addIntField("year").build();
    Set<String> partitionCols = new HashSet<>();
    partitionCols.add("day");
    partitionCols.add("month");
    partitionCols.add("year");
    Map<String, ValueSet> constraintsMap = new HashMap<>();
    constraintsMap.put("day", SortedRangeSet.copyOf(Types.MinorType.INT.getType(), ImmutableList.of(Range.greaterThan(allocator, Types.MinorType.INT.getType(), 0)), false));
    constraintsMap.put("month", SortedRangeSet.copyOf(Types.MinorType.INT.getType(), ImmutableList.of(Range.greaterThan(allocator, Types.MinorType.INT.getType(), 0)), false));
    constraintsMap.put("year", SortedRangeSet.copyOf(Types.MinorType.INT.getType(), ImmutableList.of(Range.greaterThan(allocator, Types.MinorType.INT.getType(), 2000)), false));
    GetTableLayoutRequest req = null;
    GetTableLayoutResponse res = null;
    try {
        req = new GetTableLayoutRequest(fakeIdentity(), "queryId", "default", new TableName("schema1", "table1"), new Constraints(constraintsMap), tableSchema, partitionCols);
        res = handler.doGetTableLayout(allocator, req);
        logger.info("doGetTableLayout - {}", res);
        Block partitions = res.getPartitions();
        for (int row = 0; row < partitions.getRowCount() && row < 10; row++) {
            logger.info("doGetTableLayout:{} {}", row, BlockUtils.rowToString(partitions, row));
        }
        assertTrue(partitions.getRowCount() > 0);
        logger.info("doGetTableLayout: partitions[{}]", partitions.getRowCount());
    } finally {
        try {
            req.close();
            res.close();
        } catch (Exception ex) {
            logger.error("doGetTableLayout: ", ex);
        }
    }
    logger.info("doGetTableLayout - exit");
}
Also used : HashMap(java.util.HashMap) Schema(org.apache.arrow.vector.types.pojo.Schema) TableName(com.amazonaws.athena.connector.lambda.domain.TableName) Constraints(com.amazonaws.athena.connector.lambda.domain.predicate.Constraints) GetTableLayoutResponse(com.amazonaws.athena.connector.lambda.metadata.GetTableLayoutResponse) GetTableLayoutRequest(com.amazonaws.athena.connector.lambda.metadata.GetTableLayoutRequest) Block(com.amazonaws.athena.connector.lambda.data.Block) ValueSet(com.amazonaws.athena.connector.lambda.domain.predicate.ValueSet) HashSet(java.util.HashSet) Test(org.junit.Test)

Example 87 with Constraints

use of com.amazonaws.athena.connector.lambda.domain.predicate.Constraints in project aws-athena-query-federation by awslabs.

the class ConnectorValidator method getSplits.

private static GetSplitsResponse getSplits(TestConfig testConfig, TableName table, Schema schema, GetTableLayoutResponse tableLayout, Set<String> partitionColumns, String continuationToken) {
    Constraints constraints = parseConstraints(schema, testConfig.getConstraints());
    GetSplitsResponse splitsResponse = LambdaMetadataProvider.getSplits(testConfig.getCatalogId(), table, constraints, tableLayout.getPartitions(), new ArrayList<>(partitionColumns), continuationToken, testConfig.getMetadataFunction(), testConfig.getIdentity());
    log.info("Found " + splitsResponse.getSplits().size() + " splits in batch.");
    if (continuationToken == null) {
        checkState(!splitsResponse.getSplits().isEmpty(), "Table " + toQualifiedTableName(table) + " did not return any splits. This can happen if the table" + " is empty but could also indicate an issue." + " Please populate the table or specify a different table.");
    } else {
        checkState(!splitsResponse.getSplits().isEmpty(), "Table " + toQualifiedTableName(table) + " did not return any splits in the second batch despite returning" + " a continuation token with the first batch.");
    }
    return splitsResponse;
}
Also used : Constraints(com.amazonaws.athena.connector.lambda.domain.predicate.Constraints) ConstraintParser.parseConstraints(com.amazonaws.athena.connector.validation.ConstraintParser.parseConstraints) GetSplitsResponse(com.amazonaws.athena.connector.lambda.metadata.GetSplitsResponse)

Example 88 with Constraints

use of com.amazonaws.athena.connector.lambda.domain.predicate.Constraints in project aws-athena-query-federation by awslabs.

the class ConnectorValidator method readRecords.

private static ReadRecordsResponse readRecords(TestConfig testConfig, TableName table, Schema schema, Collection<Split> splits) {
    Constraints constraints = parseConstraints(schema, testConfig.getConstraints());
    Split split = getRandomElement(splits);
    log.info("Executing randomly selected split with properties: {}", split.getProperties());
    ReadRecordsResponse records = LambdaRecordProvider.readRecords(testConfig.getCatalogId(), table, constraints, schema, split, testConfig.getRecordFunction(), testConfig.getIdentity());
    log.info("Received " + records.getRecordCount() + " records.");
    checkState(records.getRecordCount() > 0, "Table " + toQualifiedTableName(table) + " did not return any rows in the tested split, even though an empty constraint was used." + " This can happen if the table is empty but could also indicate an issue." + " Please populate the table or specify a different table.");
    log.info("Discovered columns: " + records.getSchema().getFields().stream().map(f -> f.getName() + ":" + f.getType().getTypeID()).collect(Collectors.toList()));
    if (records.getRecordCount() == 0) {
        return records;
    }
    log.info("First row of split: " + rowToString(records.getRecords(), 0));
    return records;
}
Also used : GetSplitsResponse(com.amazonaws.athena.connector.lambda.metadata.GetSplitsResponse) Schema(org.apache.arrow.vector.types.pojo.Schema) Options(org.apache.commons.cli.Options) LoggerFactory(org.slf4j.LoggerFactory) Random(java.util.Random) BlockAllocator(com.amazonaws.athena.connector.lambda.data.BlockAllocator) HelpFormatter(org.apache.commons.cli.HelpFormatter) DefaultParser(org.apache.commons.cli.DefaultParser) ArrayList(java.util.ArrayList) Preconditions.checkArgument(com.google.common.base.Preconditions.checkArgument) Objects.requireNonNull(java.util.Objects.requireNonNull) CommandLine(org.apache.commons.cli.CommandLine) BlockAllocatorImpl(com.amazonaws.athena.connector.lambda.data.BlockAllocatorImpl) FederatedIdentity(com.amazonaws.athena.connector.lambda.security.FederatedIdentity) ListTablesResponse(com.amazonaws.athena.connector.lambda.metadata.ListTablesResponse) ListSchemasResponse(com.amazonaws.athena.connector.lambda.metadata.ListSchemasResponse) Logger(org.slf4j.Logger) Iterator(java.util.Iterator) Collection(java.util.Collection) Split(com.amazonaws.athena.connector.lambda.domain.Split) ReadRecordsResponse(com.amazonaws.athena.connector.lambda.records.ReadRecordsResponse) Set(java.util.Set) BlockUtils.rowToString(com.amazonaws.athena.connector.lambda.data.BlockUtils.rowToString) Field(org.apache.arrow.vector.types.pojo.Field) Collectors(java.util.stream.Collectors) TableName(com.amazonaws.athena.connector.lambda.domain.TableName) Sets(com.google.common.collect.Sets) Preconditions.checkState(com.google.common.base.Preconditions.checkState) Constraints(com.amazonaws.athena.connector.lambda.domain.predicate.Constraints) GetTableLayoutResponse(com.amazonaws.athena.connector.lambda.metadata.GetTableLayoutResponse) List(java.util.List) ConstraintParser.parseConstraints(com.amazonaws.athena.connector.validation.ConstraintParser.parseConstraints) ParseException(org.apache.commons.cli.ParseException) Optional(java.util.Optional) Collections(java.util.Collections) GetTableResponse(com.amazonaws.athena.connector.lambda.metadata.GetTableResponse) Constraints(com.amazonaws.athena.connector.lambda.domain.predicate.Constraints) ConstraintParser.parseConstraints(com.amazonaws.athena.connector.validation.ConstraintParser.parseConstraints) ReadRecordsResponse(com.amazonaws.athena.connector.lambda.records.ReadRecordsResponse) Split(com.amazonaws.athena.connector.lambda.domain.Split)

Example 89 with Constraints

use of com.amazonaws.athena.connector.lambda.domain.predicate.Constraints in project aws-athena-query-federation by awslabs.

the class ConnectorValidator method getTableLayout.

private static GetTableLayoutResponse getTableLayout(TestConfig testConfig, TableName table, Schema schema, Set<String> partitionColumns) {
    Constraints constraints = parseConstraints(schema, testConfig.getConstraints());
    GetTableLayoutResponse tableLayout = LambdaMetadataProvider.getTableLayout(testConfig.getCatalogId(), table, constraints, schema, partitionColumns, testConfig.getMetadataFunction(), testConfig.getIdentity());
    log.info("Found " + tableLayout.getPartitions().getRowCount() + " partitions.");
    checkState(tableLayout.getPartitions().getRowCount() > 0, "Table " + toQualifiedTableName(table) + " did not return any partitions. This can happen if the table" + " is empty but could also indicate an issue." + " Please populate the table or specify a different table.");
    return tableLayout;
}
Also used : Constraints(com.amazonaws.athena.connector.lambda.domain.predicate.Constraints) ConstraintParser.parseConstraints(com.amazonaws.athena.connector.validation.ConstraintParser.parseConstraints) GetTableLayoutResponse(com.amazonaws.athena.connector.lambda.metadata.GetTableLayoutResponse)

Example 90 with Constraints

use of com.amazonaws.athena.connector.lambda.domain.predicate.Constraints in project aws-athena-query-federation by awslabs.

the class ExampleRecordHandlerTest method doReadRecordsSpill.

@Test
public void doReadRecordsSpill() throws Exception {
    logger.info("doReadRecordsSpill: enter");
    for (int i = 0; i < 2; i++) {
        EncryptionKey encryptionKey = (i % 2 == 0) ? keyFactory.create() : null;
        logger.info("doReadRecordsSpill: Using encryptionKey[" + encryptionKey + "]");
        Map<String, ValueSet> constraintsMap = new HashMap<>();
        constraintsMap.put("col3", SortedRangeSet.copyOf(Types.MinorType.FLOAT8.getType(), ImmutableList.of(Range.greaterThan(allocator, Types.MinorType.FLOAT8.getType(), -10000D)), false));
        constraintsMap.put("unknown", EquatableValueSet.newBuilder(allocator, Types.MinorType.FLOAT8.getType(), false, true).add(1.1D).build());
        constraintsMap.put("unknown2", new AllOrNoneValueSet(Types.MinorType.FLOAT8.getType(), false, true));
        ReadRecordsRequest request = new ReadRecordsRequest(IdentityUtil.fakeIdentity(), "catalog", "queryId-" + System.currentTimeMillis(), new TableName("schema", "table"), schemaForRead, Split.newBuilder(makeSpillLocation(), encryptionKey).add("year", "10").add("month", "10").add("day", "10").build(), new Constraints(constraintsMap), // ~1.5MB so we should see some spill
        1_600_000L, 1000L);
        ObjectMapperUtil.assertSerialization(request);
        RecordResponse rawResponse = recordService.readRecords(request);
        ObjectMapperUtil.assertSerialization(rawResponse);
        assertTrue(rawResponse instanceof RemoteReadRecordsResponse);
        try (RemoteReadRecordsResponse response = (RemoteReadRecordsResponse) rawResponse) {
            logger.info("doReadRecordsSpill: remoteBlocks[{}]", response.getRemoteBlocks().size());
            assertTrue(response.getNumberBlocks() > 1);
            int blockNum = 0;
            for (SpillLocation next : response.getRemoteBlocks()) {
                S3SpillLocation spillLocation = (S3SpillLocation) next;
                try (Block block = spillReader.read(spillLocation, response.getEncryptionKey(), response.getSchema())) {
                    logger.info("doReadRecordsSpill: blockNum[{}] and recordCount[{}]", blockNum++, block.getRowCount());
                    // assertTrue(++blockNum < response.getRemoteBlocks().size() && block.getRowCount() > 10_000);
                    logger.info("doReadRecordsSpill: {}", BlockUtils.rowToString(block, 0));
                    assertNotNull(BlockUtils.rowToString(block, 0));
                }
            }
        }
    }
    logger.info("doReadRecordsSpill: exit");
}
Also used : RemoteReadRecordsResponse(com.amazonaws.athena.connector.lambda.records.RemoteReadRecordsResponse) SpillLocation(com.amazonaws.athena.connector.lambda.domain.spill.SpillLocation) S3SpillLocation(com.amazonaws.athena.connector.lambda.domain.spill.S3SpillLocation) HashMap(java.util.HashMap) AllOrNoneValueSet(com.amazonaws.athena.connector.lambda.domain.predicate.AllOrNoneValueSet) EncryptionKey(com.amazonaws.athena.connector.lambda.security.EncryptionKey) Matchers.anyString(org.mockito.Matchers.anyString) RecordResponse(com.amazonaws.athena.connector.lambda.records.RecordResponse) TableName(com.amazonaws.athena.connector.lambda.domain.TableName) ReadRecordsRequest(com.amazonaws.athena.connector.lambda.records.ReadRecordsRequest) Constraints(com.amazonaws.athena.connector.lambda.domain.predicate.Constraints) S3SpillLocation(com.amazonaws.athena.connector.lambda.domain.spill.S3SpillLocation) Block(com.amazonaws.athena.connector.lambda.data.Block) ValueSet(com.amazonaws.athena.connector.lambda.domain.predicate.ValueSet) EquatableValueSet(com.amazonaws.athena.connector.lambda.domain.predicate.EquatableValueSet) AllOrNoneValueSet(com.amazonaws.athena.connector.lambda.domain.predicate.AllOrNoneValueSet) Test(org.junit.Test)

Aggregations

Constraints (com.amazonaws.athena.connector.lambda.domain.predicate.Constraints)182 Test (org.junit.Test)172 TableName (com.amazonaws.athena.connector.lambda.domain.TableName)148 Schema (org.apache.arrow.vector.types.pojo.Schema)136 ValueSet (com.amazonaws.athena.connector.lambda.domain.predicate.ValueSet)64 HashMap (java.util.HashMap)63 Split (com.amazonaws.athena.connector.lambda.domain.Split)59 ReadRecordsRequest (com.amazonaws.athena.connector.lambda.records.ReadRecordsRequest)55 AtomicInteger (java.util.concurrent.atomic.AtomicInteger)54 BlockAllocator (com.amazonaws.athena.connector.lambda.data.BlockAllocator)50 GetTableLayoutRequest (com.amazonaws.athena.connector.lambda.metadata.GetTableLayoutRequest)47 BlockAllocatorImpl (com.amazonaws.athena.connector.lambda.data.BlockAllocatorImpl)44 ArrayList (java.util.ArrayList)44 PreparedStatement (java.sql.PreparedStatement)42 GetTableLayoutResponse (com.amazonaws.athena.connector.lambda.metadata.GetTableLayoutResponse)41 Block (com.amazonaws.athena.connector.lambda.data.Block)36 SchemaBuilder (com.amazonaws.athena.connector.lambda.data.SchemaBuilder)35 RecordResponse (com.amazonaws.athena.connector.lambda.records.RecordResponse)33 ResultSet (java.sql.ResultSet)32 GetSplitsResponse (com.amazonaws.athena.connector.lambda.metadata.GetSplitsResponse)30