Search in sources :

Example 11 with DeserializationSchema

use of org.apache.flink.api.common.serialization.DeserializationSchema in project flink by apache.

the class KafkaSource method createReader.

@VisibleForTesting
SourceReader<OUT, KafkaPartitionSplit> createReader(SourceReaderContext readerContext, Consumer<Collection<String>> splitFinishedHook) throws Exception {
    FutureCompletingBlockingQueue<RecordsWithSplitIds<ConsumerRecord<byte[], byte[]>>> elementsQueue = new FutureCompletingBlockingQueue<>();
    deserializationSchema.open(new DeserializationSchema.InitializationContext() {

        @Override
        public MetricGroup getMetricGroup() {
            return readerContext.metricGroup().addGroup("deserializer");
        }

        @Override
        public UserCodeClassLoader getUserCodeClassLoader() {
            return readerContext.getUserCodeClassLoader();
        }
    });
    final KafkaSourceReaderMetrics kafkaSourceReaderMetrics = new KafkaSourceReaderMetrics(readerContext.metricGroup());
    Supplier<KafkaPartitionSplitReader> splitReaderSupplier = () -> new KafkaPartitionSplitReader(props, readerContext, kafkaSourceReaderMetrics);
    KafkaRecordEmitter<OUT> recordEmitter = new KafkaRecordEmitter<>(deserializationSchema);
    return new KafkaSourceReader<>(elementsQueue, new KafkaSourceFetcherManager(elementsQueue, splitReaderSupplier::get, splitFinishedHook), recordEmitter, toConfiguration(props), readerContext, kafkaSourceReaderMetrics);
}
Also used : KafkaSourceFetcherManager(org.apache.flink.connector.kafka.source.reader.fetcher.KafkaSourceFetcherManager) MetricGroup(org.apache.flink.metrics.MetricGroup) KafkaSourceReaderMetrics(org.apache.flink.connector.kafka.source.metrics.KafkaSourceReaderMetrics) KafkaSourceReader(org.apache.flink.connector.kafka.source.reader.KafkaSourceReader) RecordsWithSplitIds(org.apache.flink.connector.base.source.reader.RecordsWithSplitIds) KafkaRecordDeserializationSchema(org.apache.flink.connector.kafka.source.reader.deserializer.KafkaRecordDeserializationSchema) DeserializationSchema(org.apache.flink.api.common.serialization.DeserializationSchema) UserCodeClassLoader(org.apache.flink.util.UserCodeClassLoader) FutureCompletingBlockingQueue(org.apache.flink.connector.base.source.reader.synchronization.FutureCompletingBlockingQueue) KafkaRecordEmitter(org.apache.flink.connector.kafka.source.reader.KafkaRecordEmitter) KafkaPartitionSplitReader(org.apache.flink.connector.kafka.source.reader.KafkaPartitionSplitReader) VisibleForTesting(org.apache.flink.annotation.VisibleForTesting)

Example 12 with DeserializationSchema

use of org.apache.flink.api.common.serialization.DeserializationSchema in project flink by apache.

the class KinesisConsumerTest method testKinesisConsumerThrowsExceptionIfSchemaImplementsCollector.

@Test
public void testKinesisConsumerThrowsExceptionIfSchemaImplementsCollector() {
    DeserializationSchema<Object> schemaWithCollector = new DeserializationSchema<Object>() {

        @Override
        public Object deserialize(byte[] message) throws IOException {
            return null;
        }

        @Override
        public void deserialize(byte[] message, Collector<Object> out) throws IOException {
        // we do not care about the implementation. we should just check if this
        // method is declared
        }

        @Override
        public boolean isEndOfStream(Object nextElement) {
            return false;
        }

        @Override
        public TypeInformation<Object> getProducedType() {
            return null;
        }
    };
    thrown.expect(IllegalArgumentException.class);
    thrown.expectMessage("Kinesis consumer does not support DeserializationSchema that implements deserialization with a" + " Collector. Unsupported DeserializationSchema: " + "org.apache.flink.streaming.connectors.kinesis.KinesisConsumerTest");
    new FlinkKinesisConsumer<>("fakeStream", schemaWithCollector, new Properties());
}
Also used : Collector(org.apache.flink.util.Collector) Properties(java.util.Properties) DeserializationSchema(org.apache.flink.api.common.serialization.DeserializationSchema) Test(org.junit.Test)

Example 13 with DeserializationSchema

use of org.apache.flink.api.common.serialization.DeserializationSchema in project flink by apache.

the class OggJsonDecodingFormat method createRuntimeDecoder.

@Override
public DeserializationSchema<RowData> createRuntimeDecoder(DynamicTableSource.Context context, DataType physicalDataType) {
    final List<ReadableMetadata> readableMetadata = metadataKeys.stream().map(k -> Stream.of(ReadableMetadata.values()).filter(rm -> rm.key.equals(k)).findFirst().orElseThrow(IllegalStateException::new)).collect(Collectors.toList());
    final List<DataTypes.Field> metadataFields = readableMetadata.stream().map(m -> DataTypes.FIELD(m.key, m.dataType)).collect(Collectors.toList());
    final DataType producedDataType = DataTypeUtils.appendRowFields(physicalDataType, metadataFields);
    final TypeInformation<RowData> producedTypeInfo = context.createTypeInformation(producedDataType);
    return new OggJsonDeserializationSchema(physicalDataType, readableMetadata, producedTypeInfo, ignoreParseErrors, timestampFormat);
}
Also used : DataType(org.apache.flink.table.types.DataType) DynamicTableSource(org.apache.flink.table.connector.source.DynamicTableSource) RowData(org.apache.flink.table.data.RowData) DateTimeUtils(org.apache.flink.table.utils.DateTimeUtils) ChangelogMode(org.apache.flink.table.connector.ChangelogMode) DataTypes(org.apache.flink.table.api.DataTypes) TimestampFormat(org.apache.flink.formats.common.TimestampFormat) Collectors(java.util.stream.Collectors) DeserializationSchema(org.apache.flink.api.common.serialization.DeserializationSchema) LinkedHashMap(java.util.LinkedHashMap) DecodingFormat(org.apache.flink.table.connector.format.DecodingFormat) MetadataConverter(org.apache.flink.formats.json.ogg.OggJsonDeserializationSchema.MetadataConverter) List(java.util.List) GenericRowData(org.apache.flink.table.data.GenericRowData) Stream(java.util.stream.Stream) RowKind(org.apache.flink.types.RowKind) Map(java.util.Map) TypeInformation(org.apache.flink.api.common.typeinfo.TypeInformation) Collections(java.util.Collections) DataTypeUtils(org.apache.flink.table.types.utils.DataTypeUtils) RowData(org.apache.flink.table.data.RowData) GenericRowData(org.apache.flink.table.data.GenericRowData) DataType(org.apache.flink.table.types.DataType)

Example 14 with DeserializationSchema

use of org.apache.flink.api.common.serialization.DeserializationSchema in project flink by apache.

the class UpsertKafkaDynamicTableFactory method createDynamicTableSource.

@Override
public DynamicTableSource createDynamicTableSource(Context context) {
    FactoryUtil.TableFactoryHelper helper = FactoryUtil.createTableFactoryHelper(this, context);
    ReadableConfig tableOptions = helper.getOptions();
    DecodingFormat<DeserializationSchema<RowData>> keyDecodingFormat = helper.discoverDecodingFormat(DeserializationFormatFactory.class, KEY_FORMAT);
    DecodingFormat<DeserializationSchema<RowData>> valueDecodingFormat = helper.discoverDecodingFormat(DeserializationFormatFactory.class, VALUE_FORMAT);
    // Validate the option data type.
    helper.validateExcept(PROPERTIES_PREFIX);
    validateSource(tableOptions, keyDecodingFormat, valueDecodingFormat, context.getPrimaryKeyIndexes());
    Tuple2<int[], int[]> keyValueProjections = createKeyValueProjections(context.getCatalogTable());
    String keyPrefix = tableOptions.getOptional(KEY_FIELDS_PREFIX).orElse(null);
    Properties properties = getKafkaProperties(context.getCatalogTable().getOptions());
    // always use earliest to keep data integrity
    StartupMode earliest = StartupMode.EARLIEST;
    return new KafkaDynamicSource(context.getPhysicalRowDataType(), keyDecodingFormat, new DecodingFormatWrapper(valueDecodingFormat), keyValueProjections.f0, keyValueProjections.f1, keyPrefix, getSourceTopics(tableOptions), getSourceTopicPattern(tableOptions), properties, earliest, Collections.emptyMap(), 0, true, context.getObjectIdentifier().asSummaryString());
}
Also used : ReadableConfig(org.apache.flink.configuration.ReadableConfig) FactoryUtil(org.apache.flink.table.factories.FactoryUtil) StartupMode(org.apache.flink.streaming.connectors.kafka.config.StartupMode) KafkaConnectorOptionsUtil.getKafkaProperties(org.apache.flink.streaming.connectors.kafka.table.KafkaConnectorOptionsUtil.getKafkaProperties) Properties(java.util.Properties) DeserializationSchema(org.apache.flink.api.common.serialization.DeserializationSchema)

Example 15 with DeserializationSchema

use of org.apache.flink.api.common.serialization.DeserializationSchema in project flink by apache.

the class FileSystemTableSource method getScanRuntimeProvider.

@Override
public ScanRuntimeProvider getScanRuntimeProvider(ScanContext scanContext) {
    // When this table has no partition, just return a empty source.
    if (!partitionKeys.isEmpty() && getOrFetchPartitions().isEmpty()) {
        return InputFormatProvider.of(new CollectionInputFormat<>(new ArrayList<>(), null));
    }
    // Resolve metadata and make sure to filter out metadata not in the producedDataType
    final List<String> metadataKeys = DataType.getFieldNames(producedDataType).stream().filter(((this.metadataKeys == null) ? Collections.emptyList() : this.metadataKeys)::contains).collect(Collectors.toList());
    final List<ReadableFileInfo> metadataToExtract = metadataKeys.stream().map(ReadableFileInfo::resolve).collect(Collectors.toList());
    // Filter out partition columns not in producedDataType
    final List<String> partitionKeysToExtract = DataType.getFieldNames(producedDataType).stream().filter(this.partitionKeys::contains).collect(Collectors.toList());
    // Compute the physical projection and the physical data type, that is
    // the type without partition columns and metadata in the same order of the schema
    DataType physicalDataType = physicalRowDataType;
    final Projection partitionKeysProjections = Projection.fromFieldNames(physicalDataType, partitionKeysToExtract);
    final Projection physicalProjections = (projectFields != null ? Projection.of(projectFields) : Projection.all(physicalDataType)).difference(partitionKeysProjections);
    physicalDataType = partitionKeysProjections.complement(physicalDataType).project(physicalDataType);
    if (bulkReaderFormat != null) {
        if (bulkReaderFormat instanceof BulkDecodingFormat && filters != null && filters.size() > 0) {
            ((BulkDecodingFormat<RowData>) bulkReaderFormat).applyFilters(filters);
        }
        BulkFormat<RowData, FileSourceSplit> format;
        if (bulkReaderFormat instanceof ProjectableDecodingFormat) {
            format = ((ProjectableDecodingFormat<BulkFormat<RowData, FileSourceSplit>>) bulkReaderFormat).createRuntimeDecoder(scanContext, physicalDataType, physicalProjections.toNestedIndexes());
        } else {
            format = new ProjectingBulkFormat(bulkReaderFormat.createRuntimeDecoder(scanContext, physicalDataType), physicalProjections.toTopLevelIndexes(), scanContext.createTypeInformation(physicalProjections.project(physicalDataType)));
        }
        format = wrapBulkFormat(scanContext, format, producedDataType, metadataToExtract, partitionKeysToExtract);
        return createSourceProvider(format);
    } else if (deserializationFormat != null) {
        BulkFormat<RowData, FileSourceSplit> format;
        if (deserializationFormat instanceof ProjectableDecodingFormat) {
            format = new DeserializationSchemaAdapter(((ProjectableDecodingFormat<DeserializationSchema<RowData>>) deserializationFormat).createRuntimeDecoder(scanContext, physicalDataType, physicalProjections.toNestedIndexes()));
        } else {
            format = new ProjectingBulkFormat(new DeserializationSchemaAdapter(deserializationFormat.createRuntimeDecoder(scanContext, physicalDataType)), physicalProjections.toTopLevelIndexes(), scanContext.createTypeInformation(physicalProjections.project(physicalDataType)));
        }
        format = wrapBulkFormat(scanContext, format, producedDataType, metadataToExtract, partitionKeysToExtract);
        return createSourceProvider(format);
    } else {
        throw new TableException("Can not find format factory.");
    }
}
Also used : TableException(org.apache.flink.table.api.TableException) ProjectableDecodingFormat(org.apache.flink.table.connector.format.ProjectableDecodingFormat) FileSourceSplit(org.apache.flink.connector.file.src.FileSourceSplit) ArrayList(java.util.ArrayList) Projection(org.apache.flink.table.connector.Projection) BulkDecodingFormat(org.apache.flink.connector.file.table.format.BulkDecodingFormat) DeserializationSchema(org.apache.flink.api.common.serialization.DeserializationSchema) RowData(org.apache.flink.table.data.RowData) DataType(org.apache.flink.table.types.DataType) BulkFormat(org.apache.flink.connector.file.src.reader.BulkFormat)

Aggregations

DeserializationSchema (org.apache.flink.api.common.serialization.DeserializationSchema)17 RowData (org.apache.flink.table.data.RowData)9 DataType (org.apache.flink.table.types.DataType)9 DecodingFormat (org.apache.flink.table.connector.format.DecodingFormat)7 DynamicTableSource (org.apache.flink.table.connector.source.DynamicTableSource)7 Collections (java.util.Collections)6 List (java.util.List)6 Map (java.util.Map)6 TypeInformation (org.apache.flink.api.common.typeinfo.TypeInformation)6 LinkedHashMap (java.util.LinkedHashMap)5 Properties (java.util.Properties)5 Collectors (java.util.stream.Collectors)5 Stream (java.util.stream.Stream)5 ReadableConfig (org.apache.flink.configuration.ReadableConfig)5 DataTypes (org.apache.flink.table.api.DataTypes)5 ChangelogMode (org.apache.flink.table.connector.ChangelogMode)5 RowKind (org.apache.flink.types.RowKind)5 FactoryUtil (org.apache.flink.table.factories.FactoryUtil)4 IOException (java.io.IOException)3 HashMap (java.util.HashMap)3