Search in sources :

Example 51 with TableFieldSchema

use of com.google.api.services.bigquery.model.TableFieldSchema in project beam by apache.

the class BigQueryAvroUtils method convertField.

private static Field convertField(TableFieldSchema bigQueryField) {
    ImmutableCollection<Type> avroTypes = BIG_QUERY_TO_AVRO_TYPES.get(bigQueryField.getType());
    if (avroTypes.isEmpty()) {
        throw new IllegalArgumentException("Unable to map BigQuery field type " + bigQueryField.getType() + " to avro type.");
    }
    Type avroType = avroTypes.iterator().next();
    Schema elementSchema;
    if (avroType == Type.RECORD) {
        elementSchema = toGenericAvroSchema(bigQueryField.getName(), bigQueryField.getFields());
    } else {
        elementSchema = Schema.create(avroType);
    }
    Schema fieldSchema;
    if (bigQueryField.getMode() == null || "NULLABLE".equals(bigQueryField.getMode())) {
        fieldSchema = Schema.createUnion(Schema.create(Type.NULL), elementSchema);
    } else if ("REQUIRED".equals(bigQueryField.getMode())) {
        fieldSchema = elementSchema;
    } else if ("REPEATED".equals(bigQueryField.getMode())) {
        fieldSchema = Schema.createArray(elementSchema);
    } else {
        throw new IllegalArgumentException(String.format("Unknown BigQuery Field Mode: %s", bigQueryField.getMode()));
    }
    return new Field(bigQueryField.getName(), fieldSchema, bigQueryField.getDescription(), (Object) null);
}
Also used : Field(org.apache.avro.Schema.Field) Type(org.apache.avro.Schema.Type) LogicalType(org.apache.avro.LogicalType) TableSchema(com.google.api.services.bigquery.model.TableSchema) TableFieldSchema(com.google.api.services.bigquery.model.TableFieldSchema) Schema(org.apache.avro.Schema)

Example 52 with TableFieldSchema

use of com.google.api.services.bigquery.model.TableFieldSchema in project beam by apache.

the class BigQueryAvroUtils method mapTableFieldSchema.

private static Stream<TableFieldSchema> mapTableFieldSchema(TableFieldSchema fieldSchema, Schema avroSchema) {
    Field avroFieldSchema = avroSchema.getField(fieldSchema.getName());
    if (avroFieldSchema == null) {
        return Stream.empty();
    } else if (avroFieldSchema.schema().getType() != Type.RECORD) {
        return Stream.of(fieldSchema);
    }
    List<TableFieldSchema> subSchemas = fieldSchema.getFields().stream().flatMap(subSchema -> mapTableFieldSchema(subSchema, avroFieldSchema.schema())).collect(Collectors.toList());
    TableFieldSchema output = new TableFieldSchema().setCategories(fieldSchema.getCategories()).setDescription(fieldSchema.getDescription()).setFields(subSchemas).setMode(fieldSchema.getMode()).setName(fieldSchema.getName()).setType(fieldSchema.getType());
    return Stream.of(output);
}
Also used : HOUR_OF_DAY(java.time.temporal.ChronoField.HOUR_OF_DAY) SECOND_OF_MINUTE(java.time.temporal.ChronoField.SECOND_OF_MINUTE) DateTimeFormatterBuilder(java.time.format.DateTimeFormatterBuilder) Preconditions.checkNotNull(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull) Verify.verify(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Verify.verify) ByteBuffer(java.nio.ByteBuffer) ArrayList(java.util.ArrayList) BigDecimal(java.math.BigDecimal) LogicalTypes(org.apache.avro.LogicalTypes) ImmutableCollection(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableCollection) TableRow(com.google.api.services.bigquery.model.TableRow) LocalTime(java.time.LocalTime) TableSchema(com.google.api.services.bigquery.model.TableSchema) Type(org.apache.avro.Schema.Type) Nullable(org.checkerframework.checker.nullness.qual.Nullable) Conversions(org.apache.avro.Conversions) DateTimeFormat(org.joda.time.format.DateTimeFormat) GenericRecord(org.apache.avro.generic.GenericRecord) TableFieldSchema(com.google.api.services.bigquery.model.TableFieldSchema) Schema(org.apache.avro.Schema) Field(org.apache.avro.Schema.Field) MINUTE_OF_HOUR(java.time.temporal.ChronoField.MINUTE_OF_HOUR) LogicalType(org.apache.avro.LogicalType) ImmutableMultimap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMultimap) DateTimeFormatter(org.joda.time.format.DateTimeFormatter) Collectors(java.util.stream.Collectors) List(java.util.List) NANO_OF_SECOND(java.time.temporal.ChronoField.NANO_OF_SECOND) Stream(java.util.stream.Stream) VisibleForTesting(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting) LocalDate(java.time.LocalDate) Verify.verifyNotNull(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Verify.verifyNotNull) MoreObjects.firstNonNull(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects.firstNonNull) BaseEncoding(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.BaseEncoding) Field(org.apache.avro.Schema.Field) TableFieldSchema(com.google.api.services.bigquery.model.TableFieldSchema)

Example 53 with TableFieldSchema

use of com.google.api.services.bigquery.model.TableFieldSchema in project beam by apache.

the class BigQueryAvroUtils method convertNullableField.

@Nullable
private static Object convertNullableField(Schema avroSchema, TableFieldSchema fieldSchema, Object v) {
    // NULLABLE fields are represented as an Avro Union of the corresponding type and "null".
    verify(avroSchema.getType() == Type.UNION, "Expected Avro schema type UNION, not %s, for BigQuery NULLABLE field %s", avroSchema.getType(), fieldSchema.getName());
    List<Schema> unionTypes = avroSchema.getTypes();
    verify(unionTypes.size() == 2, "BigQuery NULLABLE field %s should be an Avro UNION of NULL and another type, not %s", fieldSchema.getName(), unionTypes);
    if (v == null) {
        return null;
    }
    Type firstType = unionTypes.get(0).getType();
    if (!firstType.equals(Type.NULL)) {
        return convertRequiredField(firstType, unionTypes.get(0).getLogicalType(), fieldSchema, v);
    }
    return convertRequiredField(unionTypes.get(1).getType(), unionTypes.get(1).getLogicalType(), fieldSchema, v);
}
Also used : Type(org.apache.avro.Schema.Type) LogicalType(org.apache.avro.LogicalType) TableSchema(com.google.api.services.bigquery.model.TableSchema) TableFieldSchema(com.google.api.services.bigquery.model.TableFieldSchema) Schema(org.apache.avro.Schema) Nullable(org.checkerframework.checker.nullness.qual.Nullable)

Example 54 with TableFieldSchema

use of com.google.api.services.bigquery.model.TableFieldSchema in project beam by apache.

the class BigQueryAvroUtils method convertGenericRecordToTableRow.

private static TableRow convertGenericRecordToTableRow(GenericRecord record, List<TableFieldSchema> fields) {
    TableRow row = new TableRow();
    for (TableFieldSchema subSchema : fields) {
        // Per https://cloud.google.com/bigquery/docs/reference/v2/tables#schema, the name field
        // is required, so it may not be null.
        Field field = record.getSchema().getField(subSchema.getName());
        Object convertedValue = getTypedCellValue(field.schema(), subSchema, record.get(field.name()));
        if (convertedValue != null) {
            // To match the JSON files exported by BigQuery, do not include null values in the output.
            row.set(field.name(), convertedValue);
        }
    }
    return row;
}
Also used : Field(org.apache.avro.Schema.Field) TableRow(com.google.api.services.bigquery.model.TableRow) TableFieldSchema(com.google.api.services.bigquery.model.TableFieldSchema)

Example 55 with TableFieldSchema

use of com.google.api.services.bigquery.model.TableFieldSchema in project beam by apache.

the class BigQueryUtils method fromTableFieldSchemaType.

/**
 * Get the Beam {@link FieldType} from a BigQuery type name.
 *
 * <p>Supports both standard and legacy SQL types.
 *
 * @param typeName Name of the type
 * @param nestedFields Nested fields for the given type (eg. RECORD type)
 * @return Corresponding Beam {@link FieldType}
 */
@Experimental(Kind.SCHEMAS)
private static FieldType fromTableFieldSchemaType(String typeName, List<TableFieldSchema> nestedFields, SchemaConversionOptions options) {
    switch(typeName) {
        case "STRING":
            return FieldType.STRING;
        case "BYTES":
            return FieldType.BYTES;
        case "INT64":
        case "INTEGER":
            return FieldType.INT64;
        case "FLOAT64":
        case "FLOAT":
            return FieldType.DOUBLE;
        case "BOOL":
        case "BOOLEAN":
            return FieldType.BOOLEAN;
        case "NUMERIC":
            return FieldType.DECIMAL;
        case "TIMESTAMP":
            return FieldType.DATETIME;
        case "TIME":
            return FieldType.logicalType(SqlTypes.TIME);
        case "DATE":
            return FieldType.logicalType(SqlTypes.DATE);
        case "DATETIME":
            return FieldType.logicalType(SqlTypes.DATETIME);
        case "STRUCT":
        case "RECORD":
            if (options.getInferMaps() && nestedFields.size() == 2) {
                TableFieldSchema key = nestedFields.get(0);
                TableFieldSchema value = nestedFields.get(1);
                if (BIGQUERY_MAP_KEY_FIELD_NAME.equals(key.getName()) && BIGQUERY_MAP_VALUE_FIELD_NAME.equals(value.getName())) {
                    return FieldType.map(fromTableFieldSchemaType(key.getType(), key.getFields(), options), fromTableFieldSchemaType(value.getType(), value.getFields(), options));
                }
            }
            Schema rowSchema = fromTableFieldSchema(nestedFields, options);
            return FieldType.row(rowSchema);
        default:
            throw new UnsupportedOperationException("Converting BigQuery type " + typeName + " to Beam type is unsupported");
    }
}
Also used : TableSchema(com.google.api.services.bigquery.model.TableSchema) TableFieldSchema(com.google.api.services.bigquery.model.TableFieldSchema) Schema(org.apache.beam.sdk.schemas.Schema) TableFieldSchema(com.google.api.services.bigquery.model.TableFieldSchema) Experimental(org.apache.beam.sdk.annotations.Experimental)

Aggregations

TableFieldSchema (com.google.api.services.bigquery.model.TableFieldSchema)80 TableSchema (com.google.api.services.bigquery.model.TableSchema)71 TableRow (com.google.api.services.bigquery.model.TableRow)56 Test (org.junit.Test)45 Table (com.google.api.services.bigquery.model.Table)25 TableReference (com.google.api.services.bigquery.model.TableReference)23 ArrayList (java.util.ArrayList)17 BigQueryHelpers.toJsonString (org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.toJsonString)16 List (java.util.List)15 Map (java.util.Map)15 PipelineOptions (org.apache.beam.sdk.options.PipelineOptions)14 TestPipeline (org.apache.beam.sdk.testing.TestPipeline)13 Pipeline (org.apache.beam.sdk.Pipeline)12 ByteString (com.google.protobuf.ByteString)10 JsonSchemaToTableSchema (org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.JsonSchemaToTableSchema)10 Write (org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write)10 Method (org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.Method)10 BigQueryResourceNaming.createTempTableReference (org.apache.beam.sdk.io.gcp.bigquery.BigQueryResourceNaming.createTempTableReference)9 FakeBigQueryServices (org.apache.beam.sdk.io.gcp.testing.FakeBigQueryServices)9 ErrorProto (com.google.api.services.bigquery.model.ErrorProto)8