Search in sources :

Example 1 with SchemaBuilder

use of com.amazonaws.athena.connector.lambda.data.SchemaBuilder in project aws-athena-query-federation by awslabs.

the class GlueMetadataHandlerTest method populateSourceTableFromLocation.

@Test
public void populateSourceTableFromLocation() {
    Map<String, String> params = new HashMap<>();
    List<String> partitions = Arrays.asList("aws", "aws-cn", "aws-us-gov");
    for (String partition : partitions) {
        StorageDescriptor storageDescriptor = new StorageDescriptor().withLocation(String.format("arn:%s:dynamodb:us-east-1:012345678910:table/My-Table", partition));
        Table table = new Table().withParameters(params).withStorageDescriptor(storageDescriptor);
        SchemaBuilder schemaBuilder = new SchemaBuilder();
        populateSourceTableNameIfAvailable(table, schemaBuilder);
        Schema schema = schemaBuilder.build();
        assertEquals("My-Table", getSourceTableName(schema));
    }
}
Also used : Table(com.amazonaws.services.glue.model.Table) HashMap(java.util.HashMap) Schema(org.apache.arrow.vector.types.pojo.Schema) StorageDescriptor(com.amazonaws.services.glue.model.StorageDescriptor) SchemaBuilder(com.amazonaws.athena.connector.lambda.data.SchemaBuilder) Test(org.junit.Test)

Example 2 with SchemaBuilder

use of com.amazonaws.athena.connector.lambda.data.SchemaBuilder in project aws-athena-query-federation by awslabs.

the class SchemaSerializationTest method serializationTest.

@Test
public void serializationTest() throws IOException {
    logger.info("serializationTest - enter");
    SchemaBuilder schemaBuilder = new SchemaBuilder();
    schemaBuilder.addMetadata("meta1", "meta-value-1");
    schemaBuilder.addMetadata("meta2", "meta-value-2");
    schemaBuilder.addField("intfield1", new ArrowType.Int(32, true));
    schemaBuilder.addField("doublefield2", new ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE));
    schemaBuilder.addField("varcharfield3", new ArrowType.Utf8());
    Schema expectedSchema = schemaBuilder.build();
    SchemaSerDe serDe = new SchemaSerDe();
    ByteArrayOutputStream schemaOut = new ByteArrayOutputStream();
    serDe.serialize(expectedSchema, schemaOut);
    TestPojo expected = new TestPojo(expectedSchema);
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    objectMapper.writeValue(out, expected);
    TestPojo actual = objectMapper.readValue(new ByteArrayInputStream(out.toByteArray()), TestPojo.class);
    Schema actualSchema = actual.getSchema();
    logger.info("serializationTest - fields[{}]", actualSchema.getFields());
    logger.info("serializationTest - meta[{}]", actualSchema.getCustomMetadata());
    assertEquals(expectedSchema.getFields(), actualSchema.getFields());
    assertEquals(expectedSchema.getCustomMetadata(), actualSchema.getCustomMetadata());
    logger.info("serializationTest - exit");
}
Also used : ByteArrayInputStream(java.io.ByteArrayInputStream) Schema(org.apache.arrow.vector.types.pojo.Schema) SchemaBuilder(com.amazonaws.athena.connector.lambda.data.SchemaBuilder) ArrowType(org.apache.arrow.vector.types.pojo.ArrowType) ByteArrayOutputStream(java.io.ByteArrayOutputStream) SchemaSerDe(com.amazonaws.athena.connector.lambda.data.SchemaSerDe) Test(org.junit.Test)

Example 3 with SchemaBuilder

use of com.amazonaws.athena.connector.lambda.data.SchemaBuilder in project aws-athena-query-federation by awslabs.

the class BigQueryMetadataHandler method getSchema.

/**
 * Getting Bigquery table schema details
 * @param projectName
 * @param datasetName
 * @param tableName
 * @return
 */
private Schema getSchema(String projectName, String datasetName, String tableName) {
    Schema schema = null;
    datasetName = fixCaseForDatasetName(projectName, datasetName, bigQuery);
    tableName = fixCaseForTableName(projectName, datasetName, tableName, bigQuery);
    TableId tableId = TableId.of(projectName, datasetName, tableName);
    Table response = bigQuery.getTable(tableId);
    TableDefinition tableDefinition = response.getDefinition();
    SchemaBuilder schemaBuilder = SchemaBuilder.newBuilder();
    List<String> timeStampColsList = new ArrayList<>();
    for (Field field : tableDefinition.getSchema().getFields()) {
        if (field.getType().getStandardType().toString().equals("TIMESTAMP")) {
            timeStampColsList.add(field.getName());
        }
        schemaBuilder.addField(field.getName(), translateToArrowType(field.getType()));
    }
    schemaBuilder.addMetadata("timeStampCols", timeStampColsList.toString());
    logger.debug("BigQuery table schema {}", schemaBuilder.toString());
    return schemaBuilder.build();
}
Also used : TableId(com.google.cloud.bigquery.TableId) Field(com.google.cloud.bigquery.Field) Table(com.google.cloud.bigquery.Table) Schema(org.apache.arrow.vector.types.pojo.Schema) SchemaBuilder(com.amazonaws.athena.connector.lambda.data.SchemaBuilder) ArrayList(java.util.ArrayList) TableDefinition(com.google.cloud.bigquery.TableDefinition)

Example 4 with SchemaBuilder

use of com.amazonaws.athena.connector.lambda.data.SchemaBuilder in project aws-athena-query-federation by awslabs.

the class HbaseMetadataHandler method doGetTable.

/**
 * If Glue is enabled as a source of supplemental metadata we look up the requested Schema/Table in Glue and
 * filters out any results that don't have the HBASE_METADATA_FLAG set. If no matching results were found in Glue,
 * then we resort to inferring the schema of the HBase table using HbaseSchemaUtils.inferSchema(...). If there
 * is no such table in HBase the operation will fail.
 *
 * @see GlueMetadataHandler
 */
@Override
public GetTableResponse doGetTable(BlockAllocator blockAllocator, GetTableRequest request) throws Exception {
    logger.info("doGetTable: enter", request.getTableName());
    Schema origSchema = null;
    try {
        if (awsGlue != null) {
            origSchema = super.doGetTable(blockAllocator, request, TABLE_FILTER).getSchema();
        }
    } catch (RuntimeException ex) {
        logger.warn("doGetTable: Unable to retrieve table[{}:{}] from AWSGlue.", request.getTableName().getSchemaName(), request.getTableName().getTableName(), ex);
    }
    if (origSchema == null) {
        origSchema = HbaseSchemaUtils.inferSchema(getOrCreateConn(request), request.getTableName(), NUM_ROWS_TO_SCAN);
    }
    SchemaBuilder schemaBuilder = SchemaBuilder.newBuilder();
    origSchema.getFields().forEach((Field field) -> schemaBuilder.addField(field.getName(), field.getType(), field.getChildren()));
    origSchema.getCustomMetadata().entrySet().forEach((Map.Entry<String, String> meta) -> schemaBuilder.addMetadata(meta.getKey(), meta.getValue()));
    schemaBuilder.addField(HbaseSchemaUtils.ROW_COLUMN_NAME, Types.MinorType.VARCHAR.getType());
    Schema schema = schemaBuilder.build();
    logger.info("doGetTable: return {}", schema);
    return new GetTableResponse(request.getCatalogName(), request.getTableName(), schema);
}
Also used : Field(org.apache.arrow.vector.types.pojo.Field) GetTableResponse(com.amazonaws.athena.connector.lambda.metadata.GetTableResponse) Schema(org.apache.arrow.vector.types.pojo.Schema) SchemaBuilder(com.amazonaws.athena.connector.lambda.data.SchemaBuilder)

Example 5 with SchemaBuilder

use of com.amazonaws.athena.connector.lambda.data.SchemaBuilder in project aws-athena-query-federation by awslabs.

the class HiveRecordHandlerTest method buildSplitSql.

@Test
public void buildSplitSql() throws SQLException {
    TableName tableName = new TableName("testSchema", "testTable");
    SchemaBuilder schemaBuilder = SchemaBuilder.newBuilder();
    schemaBuilder.addField(FieldBuilder.newBuilder("testCol1", Types.MinorType.INT.getType()).build());
    schemaBuilder.addField(FieldBuilder.newBuilder("testCol2", Types.MinorType.DATEDAY.getType()).build());
    schemaBuilder.addField(FieldBuilder.newBuilder("testCol3", Types.MinorType.DATEMILLI.getType()).build());
    schemaBuilder.addField(FieldBuilder.newBuilder("testCol4", Types.MinorType.VARBINARY.getType()).build());
    schemaBuilder.addField(FieldBuilder.newBuilder("partition", Types.MinorType.VARCHAR.getType()).build());
    Schema schema = schemaBuilder.build();
    Split split = Mockito.mock(Split.class);
    Mockito.when(split.getProperties()).thenReturn(Collections.singletonMap("partition", "p0"));
    Mockito.when(split.getProperty(Mockito.eq("partition"))).thenReturn("p0");
    Range range1a = Mockito.mock(Range.class, Mockito.RETURNS_DEEP_STUBS);
    Mockito.when(range1a.isSingleValue()).thenReturn(true);
    Mockito.when(range1a.getLow().getValue()).thenReturn(1);
    Range range1b = Mockito.mock(Range.class, Mockito.RETURNS_DEEP_STUBS);
    Mockito.when(range1b.isSingleValue()).thenReturn(true);
    Mockito.when(range1b.getLow().getValue()).thenReturn(2);
    ValueSet valueSet1 = Mockito.mock(SortedRangeSet.class, Mockito.RETURNS_DEEP_STUBS);
    Mockito.when(valueSet1.getRanges().getOrderedRanges()).thenReturn(ImmutableList.of(range1a, range1b));
    final long dateDays = TimeUnit.DAYS.toDays(Date.valueOf("2020-01-05").getTime());
    ValueSet valueSet2 = getSingleValueSet(dateDays);
    Constraints constraints = Mockito.mock(Constraints.class);
    Mockito.when(constraints.getSummary()).thenReturn(new ImmutableMap.Builder<String, ValueSet>().put("testCol2", valueSet2).build());
    PreparedStatement expectedPreparedStatement = Mockito.mock(PreparedStatement.class);
    Mockito.when(this.connection.prepareStatement(Mockito.anyString())).thenReturn(expectedPreparedStatement);
    PreparedStatement preparedStatement = this.hiveRecordHandler.buildSplitSql(this.connection, "testCatalogName", tableName, schema, constraints, split);
    Assert.assertEquals(expectedPreparedStatement, preparedStatement);
}
Also used : TableName(com.amazonaws.athena.connector.lambda.domain.TableName) Constraints(com.amazonaws.athena.connector.lambda.domain.predicate.Constraints) Schema(org.apache.arrow.vector.types.pojo.Schema) SchemaBuilder(com.amazonaws.athena.connector.lambda.data.SchemaBuilder) PreparedStatement(java.sql.PreparedStatement) Split(com.amazonaws.athena.connector.lambda.domain.Split) Range(com.amazonaws.athena.connector.lambda.domain.predicate.Range) ValueSet(com.amazonaws.athena.connector.lambda.domain.predicate.ValueSet) ImmutableMap(com.google.common.collect.ImmutableMap) Test(org.junit.Test)

Aggregations

SchemaBuilder (com.amazonaws.athena.connector.lambda.data.SchemaBuilder)68 Schema (org.apache.arrow.vector.types.pojo.Schema)48 TableName (com.amazonaws.athena.connector.lambda.domain.TableName)43 Test (org.junit.Test)43 PreparedStatement (java.sql.PreparedStatement)37 ResultSet (java.sql.ResultSet)35 Constraints (com.amazonaws.athena.connector.lambda.domain.predicate.Constraints)30 BlockAllocatorImpl (com.amazonaws.athena.connector.lambda.data.BlockAllocatorImpl)23 AtomicInteger (java.util.concurrent.atomic.AtomicInteger)23 BlockAllocator (com.amazonaws.athena.connector.lambda.data.BlockAllocator)20 Split (com.amazonaws.athena.connector.lambda.domain.Split)17 ArrowType (org.apache.arrow.vector.types.pojo.ArrowType)17 ArrayList (java.util.ArrayList)15 ValueSet (com.amazonaws.athena.connector.lambda.domain.predicate.ValueSet)12 GetTableLayoutResponse (com.amazonaws.athena.connector.lambda.metadata.GetTableLayoutResponse)12 GetTableResponse (com.amazonaws.athena.connector.lambda.metadata.GetTableResponse)12 GetTableLayoutRequest (com.amazonaws.athena.connector.lambda.metadata.GetTableLayoutRequest)11 Connection (java.sql.Connection)10 HashMap (java.util.HashMap)10 ImmutableMap (com.google.common.collect.ImmutableMap)8