Search in sources :

Example 1 with ScalaConversionUtils.asJavaOptional

use of io.openlineage.spark.agent.util.ScalaConversionUtils.asJavaOptional in project OpenLineage by OpenLineage.

the class KafkaRelationVisitor method createDatasetsFromOptions.

private static <D extends OpenLineage.Dataset> List<D> createDatasetsFromOptions(DatasetFactory<D> datasetFactory, Map<String, String> sourceOptions, StructType schema) {
    List<String> topics;
    Optional<String> servers = asJavaOptional(sourceOptions.get("kafka.bootstrap.servers"));
    // don't support subscribePattern, as it will report dataset nodes that don't exist
    topics = Stream.concat(// handle "subscribe" and "topic" here to handle single topic reads/writes
    Stream.of("subscribe", "topic").map(it -> sourceOptions.get(it)).filter(it -> it.nonEmpty()).map(it -> it.get()).map(String.class::cast), // https://spark.apache.org/docs/3.1.2/structured-streaming-kafka-integration.html
    ScalaConversionUtils.asJavaOptional(sourceOptions.get("assign")).map((String str) -> {
        try {
            JsonNode jsonNode = new ObjectMapper().readTree(str);
            long fieldCount = jsonNode.size();
            return StreamSupport.stream(Spliterators.spliterator(jsonNode.fieldNames(), fieldCount, Spliterator.SIZED & Spliterator.IMMUTABLE), false);
        } catch (IOException e) {
            log.warn("Unable to find topics from Kafka source configuration {}", str, e);
        }
        return Stream.<String>empty();
    }).orElse(Stream.empty())).collect(Collectors.toList());
    String server = servers.map(str -> {
        if (!str.matches("\\w+://.*")) {
            return "PLAINTEXT://" + str;
        } else {
            return str;
        }
    }).map(str -> URI.create(str.split(",")[0])).map(uri -> uri.getHost() + ":" + uri.getPort()).orElse("");
    String namespace = "kafka://" + server;
    return topics.stream().map(topic -> datasetFactory.getDataset(topic, namespace, schema)).collect(Collectors.toList());
}
Also used : Map$(scala.collection.immutable.Map$) Spliterators(java.util.Spliterators) LogicalRelation(org.apache.spark.sql.execution.datasources.LogicalRelation) ScalaConversionUtils.asJavaOptional(io.openlineage.spark.agent.util.ScalaConversionUtils.asJavaOptional) JsonNode(com.fasterxml.jackson.databind.JsonNode) StreamSupport(java.util.stream.StreamSupport) URI(java.net.URI) StructType(org.apache.spark.sql.types.StructType) SaveMode(org.apache.spark.sql.SaveMode) LogicalPlan(org.apache.spark.sql.catalyst.plans.logical.LogicalPlan) QueryPlanVisitor(io.openlineage.spark.api.QueryPlanVisitor) DatasetFactory(io.openlineage.spark.api.DatasetFactory) OpenLineageContext(io.openlineage.spark.api.OpenLineageContext) ObjectMapper(com.fasterxml.jackson.databind.ObjectMapper) IOException(java.io.IOException) ScalaConversionUtils(io.openlineage.spark.agent.util.ScalaConversionUtils) Field(java.lang.reflect.Field) Collectors(java.util.stream.Collectors) KafkaSourceProvider(org.apache.spark.sql.kafka010.KafkaSourceProvider) List(java.util.List) Slf4j(lombok.extern.slf4j.Slf4j) Stream(java.util.stream.Stream) Map(scala.collection.immutable.Map) Optional(java.util.Optional) CreatableRelationProvider(org.apache.spark.sql.sources.CreatableRelationProvider) KafkaRelation(org.apache.spark.sql.kafka010.KafkaRelation) OpenLineage(io.openlineage.client.OpenLineage) Spliterator(java.util.Spliterator) JsonNode(com.fasterxml.jackson.databind.JsonNode) IOException(java.io.IOException) ObjectMapper(com.fasterxml.jackson.databind.ObjectMapper)

Aggregations

JsonNode (com.fasterxml.jackson.databind.JsonNode)1 ObjectMapper (com.fasterxml.jackson.databind.ObjectMapper)1 OpenLineage (io.openlineage.client.OpenLineage)1 ScalaConversionUtils (io.openlineage.spark.agent.util.ScalaConversionUtils)1 ScalaConversionUtils.asJavaOptional (io.openlineage.spark.agent.util.ScalaConversionUtils.asJavaOptional)1 DatasetFactory (io.openlineage.spark.api.DatasetFactory)1 OpenLineageContext (io.openlineage.spark.api.OpenLineageContext)1 QueryPlanVisitor (io.openlineage.spark.api.QueryPlanVisitor)1 IOException (java.io.IOException)1 Field (java.lang.reflect.Field)1 URI (java.net.URI)1 List (java.util.List)1 Optional (java.util.Optional)1 Spliterator (java.util.Spliterator)1 Spliterators (java.util.Spliterators)1 Collectors (java.util.stream.Collectors)1 Stream (java.util.stream.Stream)1 StreamSupport (java.util.stream.StreamSupport)1 Slf4j (lombok.extern.slf4j.Slf4j)1 SaveMode (org.apache.spark.sql.SaveMode)1