Search in sources :

Example 1 with RowType

use of org.apache.flink.table.types.logical.RowType in project flink by apache.

the class KinesisDynamicTableSinkFactoryTest method testGoodTableSinkCopyForPartitionedTable.

@Test
public void testGoodTableSinkCopyForPartitionedTable() {
    ResolvedSchema sinkSchema = defaultSinkSchema();
    DataType physicalDataType = sinkSchema.toPhysicalRowDataType();
    Map<String, String> sinkOptions = defaultTableOptions().build();
    List<String> sinkPartitionKeys = Arrays.asList("name", "curr_id");
    // Construct actual DynamicTableSink using FactoryUtil
    KinesisDynamicSink actualSink = (KinesisDynamicSink) createTableSink(sinkSchema, sinkPartitionKeys, sinkOptions);
    // Construct expected DynamicTableSink using factory under test
    KinesisDynamicSink expectedSink = (KinesisDynamicSink) new KinesisDynamicSink.KinesisDynamicTableSinkBuilder().setConsumedDataType(physicalDataType).setStream(STREAM_NAME).setKinesisClientProperties(defaultProducerProperties()).setEncodingFormat(new TestFormatFactory.EncodingFormatMock(",")).setPartitioner(new RowDataFieldsKinesisPartitionKeyGenerator((RowType) physicalDataType.getLogicalType(), sinkPartitionKeys)).build();
    Assertions.assertThat(actualSink).isEqualTo(expectedSink.copy());
    Assertions.assertThat(expectedSink).isNotSameAs(expectedSink.copy());
}
Also used : DataType(org.apache.flink.table.types.DataType) RowType(org.apache.flink.table.types.logical.RowType) ResolvedSchema(org.apache.flink.table.catalog.ResolvedSchema) Test(org.junit.Test)

Example 2 with RowType

use of org.apache.flink.table.types.logical.RowType in project flink by apache.

the class KinesisDynamicTableSinkFactoryTest method testGoodTableSinkForPartitionedTable.

@Test
public void testGoodTableSinkForPartitionedTable() {
    ResolvedSchema sinkSchema = defaultSinkSchema();
    DataType physicalDataType = sinkSchema.toPhysicalRowDataType();
    Map<String, String> sinkOptions = defaultTableOptions().build();
    List<String> sinkPartitionKeys = Arrays.asList("name", "curr_id");
    // Construct actual DynamicTableSink using FactoryUtil
    KinesisDynamicSink actualSink = (KinesisDynamicSink) createTableSink(sinkSchema, sinkPartitionKeys, sinkOptions);
    // Construct expected DynamicTableSink using factory under test
    KinesisDynamicSink expectedSink = (KinesisDynamicSink) new KinesisDynamicSink.KinesisDynamicTableSinkBuilder().setConsumedDataType(physicalDataType).setStream(STREAM_NAME).setKinesisClientProperties(defaultProducerProperties()).setEncodingFormat(new TestFormatFactory.EncodingFormatMock(",")).setPartitioner(new RowDataFieldsKinesisPartitionKeyGenerator((RowType) physicalDataType.getLogicalType(), sinkPartitionKeys)).build();
    // verify that the constructed DynamicTableSink is as expected
    Assertions.assertThat(actualSink).isEqualTo(expectedSink);
    // verify the produced sink
    DynamicTableSink.SinkRuntimeProvider sinkFunctionProvider = actualSink.getSinkRuntimeProvider(new SinkRuntimeProviderContext(false));
    Sink<RowData> sinkFunction = ((SinkV2Provider) sinkFunctionProvider).createSink();
    Assertions.assertThat(sinkFunction).isInstanceOf(KinesisDataStreamsSink.class);
}
Also used : SinkRuntimeProviderContext(org.apache.flink.table.runtime.connector.sink.SinkRuntimeProviderContext) RowType(org.apache.flink.table.types.logical.RowType) DynamicTableSink(org.apache.flink.table.connector.sink.DynamicTableSink) RowData(org.apache.flink.table.data.RowData) DataType(org.apache.flink.table.types.DataType) SinkV2Provider(org.apache.flink.table.connector.sink.SinkV2Provider) ResolvedSchema(org.apache.flink.table.catalog.ResolvedSchema) Test(org.junit.Test)

Example 3 with RowType

use of org.apache.flink.table.types.logical.RowType in project flink by apache.

the class HiveLookupTableSource method getLookupFunction.

private TableFunction<RowData> getLookupFunction(int[] keys) {
    final String defaultPartitionName = JobConfUtils.getDefaultPartitionName(jobConf);
    PartitionFetcher.Context<HiveTablePartition> fetcherContext = new HiveTablePartitionFetcherContext(tablePath, hiveShim, new JobConfWrapper(jobConf), catalogTable.getPartitionKeys(), getProducedTableSchema().getFieldDataTypes(), getProducedTableSchema().getFieldNames(), configuration, defaultPartitionName);
    final PartitionFetcher<HiveTablePartition> partitionFetcher;
    // avoid lambda capture
    final ObjectPath tableFullPath = tablePath;
    if (catalogTable.getPartitionKeys().isEmpty()) {
        // non-partitioned table, the fetcher fetches the partition which represents the given
        // table.
        partitionFetcher = context -> {
            List<HiveTablePartition> partValueList = new ArrayList<>();
            partValueList.add(context.getPartition(new ArrayList<>()).orElseThrow(() -> new IllegalArgumentException(String.format("Fetch partition fail for hive table %s.", tableFullPath))));
            return partValueList;
        };
    } else if (isStreamingSource()) {
        // streaming-read partitioned table, the fetcher fetches the latest partition of the
        // given table.
        partitionFetcher = context -> {
            List<HiveTablePartition> partValueList = new ArrayList<>();
            List<PartitionFetcher.Context.ComparablePartitionValue> comparablePartitionValues = context.getComparablePartitionValueList();
            // fetch latest partitions for partitioned table
            if (comparablePartitionValues.size() > 0) {
                // sort in desc order
                comparablePartitionValues.sort((o1, o2) -> o2.getComparator().compareTo(o1.getComparator()));
                PartitionFetcher.Context.ComparablePartitionValue maxPartition = comparablePartitionValues.get(0);
                partValueList.add(context.getPartition((List<String>) maxPartition.getPartitionValue()).orElseThrow(() -> new IllegalArgumentException(String.format("Fetch partition fail for hive table %s.", tableFullPath))));
            } else {
                throw new IllegalArgumentException(String.format("At least one partition is required when set '%s' to 'latest' in temporal join," + " but actual partition number is '%s' for hive table %s", STREAMING_SOURCE_PARTITION_INCLUDE.key(), comparablePartitionValues.size(), tableFullPath));
            }
            return partValueList;
        };
    } else {
        // bounded-read partitioned table, the fetcher fetches all partitions of the given
        // filesystem table.
        partitionFetcher = context -> {
            List<HiveTablePartition> partValueList = new ArrayList<>();
            List<PartitionFetcher.Context.ComparablePartitionValue> comparablePartitionValues = context.getComparablePartitionValueList();
            for (PartitionFetcher.Context.ComparablePartitionValue comparablePartitionValue : comparablePartitionValues) {
                partValueList.add(context.getPartition((List<String>) comparablePartitionValue.getPartitionValue()).orElseThrow(() -> new IllegalArgumentException(String.format("Fetch partition fail for hive table %s.", tableFullPath))));
            }
            return partValueList;
        };
    }
    PartitionReader<HiveTablePartition, RowData> partitionReader = new HiveInputFormatPartitionReader(flinkConf, jobConf, hiveVersion, tablePath, getProducedTableSchema().getFieldDataTypes(), getProducedTableSchema().getFieldNames(), catalogTable.getPartitionKeys(), projectedFields, flinkConf.get(HiveOptions.TABLE_EXEC_HIVE_FALLBACK_MAPRED_READER));
    return new FileSystemLookupFunction<>(partitionFetcher, fetcherContext, partitionReader, (RowType) getProducedTableSchema().toRowDataType().getLogicalType(), keys, hiveTableReloadInterval);
}
Also used : HivePartitionUtils(org.apache.flink.connectors.hive.util.HivePartitionUtils) TableFunction(org.apache.flink.table.functions.TableFunction) PartitionReader(org.apache.flink.connector.file.table.PartitionReader) DataType(org.apache.flink.table.types.DataType) CatalogTable(org.apache.flink.table.catalog.CatalogTable) LoggerFactory(org.slf4j.LoggerFactory) STREAMING_SOURCE_PARTITION_INCLUDE(org.apache.flink.connectors.hive.HiveOptions.STREAMING_SOURCE_PARTITION_INCLUDE) HiveInputFormatPartitionReader(org.apache.flink.connectors.hive.read.HiveInputFormatPartitionReader) JobConfUtils(org.apache.flink.connectors.hive.util.JobConfUtils) RowType(org.apache.flink.table.types.logical.RowType) ObjectPath(org.apache.flink.table.catalog.ObjectPath) Partition(org.apache.hadoop.hive.metastore.api.Partition) HiveShim(org.apache.flink.table.catalog.hive.client.HiveShim) ArrayList(java.util.ArrayList) LookupTableSource(org.apache.flink.table.connector.source.LookupTableSource) ReadableConfig(org.apache.flink.configuration.ReadableConfig) Duration(java.time.Duration) HivePartitionFetcherContextBase(org.apache.flink.connectors.hive.read.HivePartitionFetcherContextBase) RowData(org.apache.flink.table.data.RowData) Logger(org.slf4j.Logger) STREAMING_SOURCE_CONSUME_START_OFFSET(org.apache.flink.connectors.hive.HiveOptions.STREAMING_SOURCE_CONSUME_START_OFFSET) PartitionFetcher(org.apache.flink.connector.file.table.PartitionFetcher) Configuration(org.apache.flink.configuration.Configuration) Preconditions(org.apache.flink.util.Preconditions) VisibleForTesting(org.apache.flink.annotation.VisibleForTesting) JobConf(org.apache.hadoop.mapred.JobConf) LOOKUP_JOIN_CACHE_TTL(org.apache.flink.connectors.hive.HiveOptions.LOOKUP_JOIN_CACHE_TTL) List(java.util.List) Optional(java.util.Optional) TableFunctionProvider(org.apache.flink.table.connector.source.TableFunctionProvider) STREAMING_SOURCE_MONITOR_INTERVAL(org.apache.flink.connectors.hive.HiveOptions.STREAMING_SOURCE_MONITOR_INTERVAL) NoSuchObjectException(org.apache.hadoop.hive.metastore.api.NoSuchObjectException) ObjectPath(org.apache.flink.table.catalog.ObjectPath) HiveInputFormatPartitionReader(org.apache.flink.connectors.hive.read.HiveInputFormatPartitionReader) ArrayList(java.util.ArrayList) RowData(org.apache.flink.table.data.RowData) PartitionFetcher(org.apache.flink.connector.file.table.PartitionFetcher) ArrayList(java.util.ArrayList) List(java.util.List)

Example 4 with RowType

use of org.apache.flink.table.types.logical.RowType in project flink by apache.

the class HBaseTableSchema method fromDataType.

/**
 * Construct a {@link HBaseTableSchema} from a {@link DataType}.
 */
public static HBaseTableSchema fromDataType(DataType physicalRowType) {
    HBaseTableSchema hbaseSchema = new HBaseTableSchema();
    RowType rowType = (RowType) physicalRowType.getLogicalType();
    for (RowType.RowField field : rowType.getFields()) {
        LogicalType fieldType = field.getType();
        if (fieldType.getTypeRoot() == LogicalTypeRoot.ROW) {
            RowType familyType = (RowType) fieldType;
            String familyName = field.getName();
            for (RowType.RowField qualifier : familyType.getFields()) {
                hbaseSchema.addColumn(familyName, qualifier.getName(), fromLogicalToDataType(qualifier.getType()));
            }
        } else if (fieldType.getChildren().size() == 0) {
            hbaseSchema.setRowKey(field.getName(), fromLogicalToDataType(fieldType));
        } else {
            throw new IllegalArgumentException("Unsupported field type '" + fieldType + "' for HBase.");
        }
    }
    return hbaseSchema;
}
Also used : RowType(org.apache.flink.table.types.logical.RowType) LogicalType(org.apache.flink.table.types.logical.LogicalType)

Example 5 with RowType

use of org.apache.flink.table.types.logical.RowType in project flink by apache.

the class AvroSchemaConverter method convertToSchema.

/**
 * Converts Flink SQL {@link LogicalType} (can be nested) into an Avro schema.
 *
 * <p>The "{rowName}_" is used as the nested row type name prefix in order to generate the right
 * schema. Nested record type that only differs with type name is still compatible.
 *
 * @param logicalType logical type
 * @param rowName the record name
 * @return Avro's {@link Schema} matching this logical type.
 */
public static Schema convertToSchema(LogicalType logicalType, String rowName) {
    int precision;
    boolean nullable = logicalType.isNullable();
    switch(logicalType.getTypeRoot()) {
        case NULL:
            return SchemaBuilder.builder().nullType();
        case BOOLEAN:
            Schema bool = SchemaBuilder.builder().booleanType();
            return nullable ? nullableSchema(bool) : bool;
        case TINYINT:
        case SMALLINT:
        case INTEGER:
            Schema integer = SchemaBuilder.builder().intType();
            return nullable ? nullableSchema(integer) : integer;
        case BIGINT:
            Schema bigint = SchemaBuilder.builder().longType();
            return nullable ? nullableSchema(bigint) : bigint;
        case FLOAT:
            Schema f = SchemaBuilder.builder().floatType();
            return nullable ? nullableSchema(f) : f;
        case DOUBLE:
            Schema d = SchemaBuilder.builder().doubleType();
            return nullable ? nullableSchema(d) : d;
        case CHAR:
        case VARCHAR:
            Schema str = SchemaBuilder.builder().stringType();
            return nullable ? nullableSchema(str) : str;
        case BINARY:
        case VARBINARY:
            Schema binary = SchemaBuilder.builder().bytesType();
            return nullable ? nullableSchema(binary) : binary;
        case TIMESTAMP_WITHOUT_TIME_ZONE:
            // use long to represents Timestamp
            final TimestampType timestampType = (TimestampType) logicalType;
            precision = timestampType.getPrecision();
            org.apache.avro.LogicalType avroLogicalType;
            if (precision <= 3) {
                avroLogicalType = LogicalTypes.timestampMillis();
            } else {
                throw new IllegalArgumentException("Avro does not support TIMESTAMP type " + "with precision: " + precision + ", it only supports precision less than 3.");
            }
            Schema timestamp = avroLogicalType.addToSchema(SchemaBuilder.builder().longType());
            return nullable ? nullableSchema(timestamp) : timestamp;
        case DATE:
            // use int to represents Date
            Schema date = LogicalTypes.date().addToSchema(SchemaBuilder.builder().intType());
            return nullable ? nullableSchema(date) : date;
        case TIME_WITHOUT_TIME_ZONE:
            precision = ((TimeType) logicalType).getPrecision();
            if (precision > 3) {
                throw new IllegalArgumentException("Avro does not support TIME type with precision: " + precision + ", it only supports precision less than 3.");
            }
            // use int to represents Time, we only support millisecond when deserialization
            Schema time = LogicalTypes.timeMillis().addToSchema(SchemaBuilder.builder().intType());
            return nullable ? nullableSchema(time) : time;
        case DECIMAL:
            DecimalType decimalType = (DecimalType) logicalType;
            // store BigDecimal as byte[]
            Schema decimal = LogicalTypes.decimal(decimalType.getPrecision(), decimalType.getScale()).addToSchema(SchemaBuilder.builder().bytesType());
            return nullable ? nullableSchema(decimal) : decimal;
        case ROW:
            RowType rowType = (RowType) logicalType;
            List<String> fieldNames = rowType.getFieldNames();
            // we have to make sure the record name is different in a Schema
            SchemaBuilder.FieldAssembler<Schema> builder = SchemaBuilder.builder().record(rowName).fields();
            for (int i = 0; i < rowType.getFieldCount(); i++) {
                String fieldName = fieldNames.get(i);
                LogicalType fieldType = rowType.getTypeAt(i);
                SchemaBuilder.GenericDefault<Schema> fieldBuilder = builder.name(fieldName).type(convertToSchema(fieldType, rowName + "_" + fieldName));
                if (fieldType.isNullable()) {
                    builder = fieldBuilder.withDefault(null);
                } else {
                    builder = fieldBuilder.noDefault();
                }
            }
            Schema record = builder.endRecord();
            return nullable ? nullableSchema(record) : record;
        case MULTISET:
        case MAP:
            Schema map = SchemaBuilder.builder().map().values(convertToSchema(extractValueTypeToAvroMap(logicalType), rowName));
            return nullable ? nullableSchema(map) : map;
        case ARRAY:
            ArrayType arrayType = (ArrayType) logicalType;
            Schema array = SchemaBuilder.builder().array().items(convertToSchema(arrayType.getElementType(), rowName));
            return nullable ? nullableSchema(array) : array;
        case RAW:
        case TIMESTAMP_WITH_LOCAL_TIME_ZONE:
        default:
            throw new UnsupportedOperationException("Unsupported to derive Schema for type: " + logicalType);
    }
}
Also used : Schema(org.apache.avro.Schema) AvroRowDeserializationSchema(org.apache.flink.formats.avro.AvroRowDeserializationSchema) AvroRowSerializationSchema(org.apache.flink.formats.avro.AvroRowSerializationSchema) RowType(org.apache.flink.table.types.logical.RowType) LogicalType(org.apache.flink.table.types.logical.LogicalType) ArrayType(org.apache.flink.table.types.logical.ArrayType) SchemaBuilder(org.apache.avro.SchemaBuilder) TimestampType(org.apache.flink.table.types.logical.TimestampType) DecimalType(org.apache.flink.table.types.logical.DecimalType)

Aggregations

RowType (org.apache.flink.table.types.logical.RowType)212 RowData (org.apache.flink.table.data.RowData)108 LogicalType (org.apache.flink.table.types.logical.LogicalType)59 DataType (org.apache.flink.table.types.DataType)57 Transformation (org.apache.flink.api.dag.Transformation)50 ExecEdge (org.apache.flink.table.planner.plan.nodes.exec.ExecEdge)46 TableException (org.apache.flink.table.api.TableException)37 Test (org.junit.Test)36 GenericRowData (org.apache.flink.table.data.GenericRowData)33 ArrayList (java.util.ArrayList)28 List (java.util.List)28 OneInputTransformation (org.apache.flink.streaming.api.transformations.OneInputTransformation)26 RowDataKeySelector (org.apache.flink.table.runtime.keyselector.RowDataKeySelector)25 CodeGeneratorContext (org.apache.flink.table.planner.codegen.CodeGeneratorContext)22 TableConfig (org.apache.flink.table.api.TableConfig)19 ArrayType (org.apache.flink.table.types.logical.ArrayType)19 TimestampType (org.apache.flink.table.types.logical.TimestampType)19 DecimalType (org.apache.flink.table.types.logical.DecimalType)17 Collections (java.util.Collections)16 AggregateInfoList (org.apache.flink.table.planner.plan.utils.AggregateInfoList)16