Search in sources :

Example 41 with FieldAccessDescriptor

use of org.apache.beam.sdk.schemas.FieldAccessDescriptor in project beam by apache.

the class RenameFields method renameSchema.

// Apply the user-specified renames to the input schema.
@VisibleForTesting
static void renameSchema(Schema inputSchema, Collection<RenamePair> renames, Map<UUID, Schema> renamedSchemasMap, Map<UUID, BitSet> nestedFieldRenamedMap) {
    // The mapping of renames to apply at this level of the schema.
    Map<Integer, String> topLevelRenames = Maps.newHashMap();
    // For nested schemas, collect all applicable renames here.
    Multimap<Integer, RenamePair> nestedRenames = ArrayListMultimap.create();
    for (RenamePair rename : renames) {
        FieldAccessDescriptor access = rename.getFieldAccessDescriptor();
        if (!access.fieldIdsAccessed().isEmpty()) {
            // This references a field at this level of the schema.
            Integer fieldId = Iterables.getOnlyElement(access.fieldIdsAccessed());
            topLevelRenames.put(fieldId, rename.getNewName());
        } else {
            // This references a nested field.
            Map.Entry<Integer, FieldAccessDescriptor> nestedAccess = Iterables.getOnlyElement(access.nestedFieldsById().entrySet());
            nestedFieldRenamedMap.computeIfAbsent(inputSchema.getUUID(), s -> new BitSet(inputSchema.getFieldCount())).set(nestedAccess.getKey());
            nestedRenames.put(nestedAccess.getKey(), RenamePair.of(nestedAccess.getValue(), rename.getNewName()));
        }
    }
    Schema.Builder builder = Schema.builder();
    for (int i = 0; i < inputSchema.getFieldCount(); ++i) {
        Field field = inputSchema.getField(i);
        FieldType fieldType = field.getType();
        String newName = topLevelRenames.getOrDefault(i, field.getName());
        Collection<RenamePair> nestedFieldRenames = nestedRenames.asMap().getOrDefault(i, Collections.emptyList());
        builder.addField(newName, renameFieldType(fieldType, nestedFieldRenames, renamedSchemasMap, nestedFieldRenamedMap));
    }
    renamedSchemasMap.put(inputSchema.getUUID(), builder.build());
}
Also used : Experimental(org.apache.beam.sdk.annotations.Experimental) PTransform(org.apache.beam.sdk.transforms.PTransform) Multimap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Multimap) Kind(org.apache.beam.sdk.annotations.Experimental.Kind) Map(java.util.Map) FieldAccessDescriptor(org.apache.beam.sdk.schemas.FieldAccessDescriptor) Iterables(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables) Maps(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps) ArrayListMultimap(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ArrayListMultimap) Row(org.apache.beam.sdk.values.Row) Nullable(javax.annotation.Nullable) Field(org.apache.beam.sdk.schemas.Schema.Field) DoFn(org.apache.beam.sdk.transforms.DoFn) Collection(java.util.Collection) Lists(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists) FieldType(org.apache.beam.sdk.schemas.Schema.FieldType) UUID(java.util.UUID) PCollection(org.apache.beam.sdk.values.PCollection) Collectors(java.util.stream.Collectors) Schema(org.apache.beam.sdk.schemas.Schema) Serializable(java.io.Serializable) List(java.util.List) ParDo(org.apache.beam.sdk.transforms.ParDo) VisibleForTesting(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting) AutoValue(com.google.auto.value.AutoValue) ImmutableList(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList) BitSet(java.util.BitSet) Collections(java.util.Collections) FieldAccessDescriptor(org.apache.beam.sdk.schemas.FieldAccessDescriptor) Schema(org.apache.beam.sdk.schemas.Schema) BitSet(java.util.BitSet) FieldType(org.apache.beam.sdk.schemas.Schema.FieldType) Field(org.apache.beam.sdk.schemas.Schema.Field) Map(java.util.Map) VisibleForTesting(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting)

Example 42 with FieldAccessDescriptor

use of org.apache.beam.sdk.schemas.FieldAccessDescriptor in project beam by apache.

the class SelectHelpersTest method testSelectIterableOfRowPartial.

@Test
public void testSelectIterableOfRowPartial() {
    FieldAccessDescriptor fieldAccessDescriptor = FieldAccessDescriptor.withFieldNames("rowIter[].field1").resolve(ITERABLE_SCHEMA);
    Schema outputSchema = SelectHelpers.getOutputSchema(ITERABLE_SCHEMA, fieldAccessDescriptor);
    Schema expectedSchema = Schema.builder().addIterableField("field1", FieldType.STRING).build();
    assertEquals(expectedSchema, outputSchema);
    Row row = selectRow(ITERABLE_SCHEMA, fieldAccessDescriptor, ITERABLE_ROW);
    Row expectedRow = Row.withSchema(expectedSchema).addIterable(ImmutableList.of("first", "first")).build();
    assertEquals(expectedRow, row);
}
Also used : FieldAccessDescriptor(org.apache.beam.sdk.schemas.FieldAccessDescriptor) Schema(org.apache.beam.sdk.schemas.Schema) Row(org.apache.beam.sdk.values.Row) Test(org.junit.Test)

Example 43 with FieldAccessDescriptor

use of org.apache.beam.sdk.schemas.FieldAccessDescriptor in project beam by apache.

the class SelectHelpersTest method testSelectArrayOfRowArray.

@Test
public void testSelectArrayOfRowArray() {
    FieldAccessDescriptor fieldAccessDescriptor = FieldAccessDescriptor.withFieldNames("arrayOfRowArray[][].field1").resolve(ARRAY_SCHEMA);
    Schema outputSchema = SelectHelpers.getOutputSchema(ARRAY_SCHEMA, fieldAccessDescriptor);
    Schema expectedSchema = Schema.builder().addArrayField("field1", FieldType.array(FieldType.STRING)).build();
    assertEquals(expectedSchema, outputSchema);
    Row row = selectRow(ARRAY_SCHEMA, fieldAccessDescriptor, ARRAY_ROW);
    Row expectedRow = Row.withSchema(expectedSchema).addArray(ImmutableList.of("first"), ImmutableList.of("first")).build();
    assertEquals(expectedRow, row);
}
Also used : FieldAccessDescriptor(org.apache.beam.sdk.schemas.FieldAccessDescriptor) Schema(org.apache.beam.sdk.schemas.Schema) Row(org.apache.beam.sdk.values.Row) Test(org.junit.Test)

Example 44 with FieldAccessDescriptor

use of org.apache.beam.sdk.schemas.FieldAccessDescriptor in project beam by apache.

the class SelectHelpersTest method testSelectNullableNestedRowArray.

@Test
public void testSelectNullableNestedRowArray() {
    FieldAccessDescriptor fieldAccessDescriptor1 = FieldAccessDescriptor.withFieldNames("nestedArray.field1").resolve(NESTED_NULLABLE_SCHEMA);
    Row out1 = selectRow(NESTED_NULLABLE_SCHEMA, fieldAccessDescriptor1, Row.nullRow(NESTED_NULLABLE_SCHEMA));
    assertNull(out1.getValue(0));
    FieldAccessDescriptor fieldAccessDescriptor2 = FieldAccessDescriptor.withFieldNames("nestedArray.*").resolve(NESTED_NULLABLE_SCHEMA);
    Row out2 = selectRow(NESTED_NULLABLE_SCHEMA, fieldAccessDescriptor2, Row.nullRow(NESTED_NULLABLE_SCHEMA));
    assertEquals(Collections.nCopies(4, null), out2.getValues());
}
Also used : FieldAccessDescriptor(org.apache.beam.sdk.schemas.FieldAccessDescriptor) Row(org.apache.beam.sdk.values.Row) Test(org.junit.Test)

Example 45 with FieldAccessDescriptor

use of org.apache.beam.sdk.schemas.FieldAccessDescriptor in project beam by apache.

the class SelectHelpersTest method testSelectArrayOfRow.

@Test
public void testSelectArrayOfRow() {
    FieldAccessDescriptor fieldAccessDescriptor = FieldAccessDescriptor.withFieldNames("rowArray").resolve(ARRAY_SCHEMA);
    Schema outputSchema = SelectHelpers.getOutputSchema(ARRAY_SCHEMA, fieldAccessDescriptor);
    Schema expectedSchema = Schema.builder().addArrayField("rowArray", FieldType.row(FLAT_SCHEMA)).build();
    assertEquals(expectedSchema, outputSchema);
    Row row = selectRow(ARRAY_SCHEMA, fieldAccessDescriptor, ARRAY_ROW);
    Row expectedRow = Row.withSchema(expectedSchema).addArray(FLAT_ROW, FLAT_ROW).build();
    assertEquals(expectedRow, row);
}
Also used : FieldAccessDescriptor(org.apache.beam.sdk.schemas.FieldAccessDescriptor) Schema(org.apache.beam.sdk.schemas.Schema) Row(org.apache.beam.sdk.values.Row) Test(org.junit.Test)

Aggregations

FieldAccessDescriptor (org.apache.beam.sdk.schemas.FieldAccessDescriptor)65 Test (org.junit.Test)49 Row (org.apache.beam.sdk.values.Row)47 Schema (org.apache.beam.sdk.schemas.Schema)42 PCollection (org.apache.beam.sdk.values.PCollection)16 Map (java.util.Map)12 Pipeline (org.apache.beam.sdk.Pipeline)11 ProjectionProducer (org.apache.beam.sdk.schemas.ProjectionProducer)9 ImmutableMap (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap)8 ParDo (org.apache.beam.sdk.transforms.ParDo)5 DoFnSchemaInformation (org.apache.beam.sdk.transforms.DoFnSchemaInformation)4 PBegin (org.apache.beam.sdk.values.PBegin)4 DefaultTableFilter (org.apache.beam.sdk.extensions.sql.meta.DefaultTableFilter)3 FieldType (org.apache.beam.sdk.schemas.Schema.FieldType)3 PTransform (org.apache.beam.sdk.transforms.PTransform)3 List (java.util.List)2 Collectors (java.util.stream.Collectors)2 AutoValueSchema (org.apache.beam.sdk.schemas.AutoValueSchema)2 FieldDescriptor (org.apache.beam.sdk.schemas.FieldAccessDescriptor.FieldDescriptor)2 Field (org.apache.beam.sdk.schemas.Schema.Field)2