Search in sources :

Example 1 with TransformEvaluator

use of com.cloudera.dataflow.spark.TransformEvaluator in project spark-dataflow by cloudera.

the class StreamingTransformTranslator method kafka.

private static <K, V> TransformEvaluator<KafkaIO.Read.Unbound<K, V>> kafka() {
    return new TransformEvaluator<KafkaIO.Read.Unbound<K, V>>() {

        @Override
        public void evaluate(KafkaIO.Read.Unbound<K, V> transform, EvaluationContext context) {
            StreamingEvaluationContext sec = (StreamingEvaluationContext) context;
            JavaStreamingContext jssc = sec.getStreamingContext();
            Class<K> keyClazz = transform.getKeyClass();
            Class<V> valueClazz = transform.getValueClass();
            Class<? extends Decoder<K>> keyDecoderClazz = transform.getKeyDecoderClass();
            Class<? extends Decoder<V>> valueDecoderClazz = transform.getValueDecoderClass();
            Map<String, String> kafkaParams = transform.getKafkaParams();
            Set<String> topics = transform.getTopics();
            JavaPairInputDStream<K, V> inputPairStream = KafkaUtils.createDirectStream(jssc, keyClazz, valueClazz, keyDecoderClazz, valueDecoderClazz, kafkaParams, topics);
            JavaDStream<WindowedValue<KV<K, V>>> inputStream = inputPairStream.map(new Function<Tuple2<K, V>, KV<K, V>>() {

                @Override
                public KV<K, V> call(Tuple2<K, V> t2) throws Exception {
                    return KV.of(t2._1(), t2._2());
                }
            }).map(WindowingHelpers.<KV<K, V>>windowFunction());
            sec.setStream(transform, inputStream);
        }
    };
}
Also used : KafkaIO(com.cloudera.dataflow.io.KafkaIO) TransformEvaluator(com.cloudera.dataflow.spark.TransformEvaluator) JavaStreamingContext(org.apache.spark.streaming.api.java.JavaStreamingContext) DoFnFunction(com.cloudera.dataflow.spark.DoFnFunction) Function(org.apache.spark.api.java.function.Function) KV(com.google.cloud.dataflow.sdk.values.KV) Tuple2(scala.Tuple2) WindowedValue(com.google.cloud.dataflow.sdk.util.WindowedValue) EvaluationContext(com.cloudera.dataflow.spark.EvaluationContext)

Example 2 with TransformEvaluator

use of com.cloudera.dataflow.spark.TransformEvaluator in project spark-dataflow by cloudera.

the class StreamingTransformTranslator method window.

private static <T, W extends BoundedWindow> TransformEvaluator<Window.Bound<T>> window() {
    return new TransformEvaluator<Window.Bound<T>>() {

        @Override
        public void evaluate(Window.Bound<T> transform, EvaluationContext context) {
            StreamingEvaluationContext sec = (StreamingEvaluationContext) context;
            //--- first we apply windowing to the stream
            WindowFn<? super T, W> windowFn = WINDOW_FG.get("windowFn", transform);
            @SuppressWarnings("unchecked") JavaDStream<WindowedValue<T>> dStream = (JavaDStream<WindowedValue<T>>) sec.getStream(transform);
            if (windowFn instanceof FixedWindows) {
                Duration windowDuration = Durations.milliseconds(((FixedWindows) windowFn).getSize().getMillis());
                sec.setStream(transform, dStream.window(windowDuration));
            } else if (windowFn instanceof SlidingWindows) {
                Duration windowDuration = Durations.milliseconds(((SlidingWindows) windowFn).getSize().getMillis());
                Duration slideDuration = Durations.milliseconds(((SlidingWindows) windowFn).getPeriod().getMillis());
                sec.setStream(transform, dStream.window(windowDuration, slideDuration));
            }
            //--- then we apply windowing to the elements
            DoFn<T, T> addWindowsDoFn = new AssignWindowsDoFn<>(windowFn);
            DoFnFunction<T, T> dofn = new DoFnFunction<>(addWindowsDoFn, ((StreamingEvaluationContext) context).getRuntimeContext(), null);
            @SuppressWarnings("unchecked") JavaDStreamLike<WindowedValue<T>, ?, JavaRDD<WindowedValue<T>>> dstream = (JavaDStreamLike<WindowedValue<T>, ?, JavaRDD<WindowedValue<T>>>) sec.getStream(transform);
            sec.setStream(transform, dstream.mapPartitions(dofn));
        }
    };
}
Also used : BoundedWindow(com.google.cloud.dataflow.sdk.transforms.windowing.BoundedWindow) Window(com.google.cloud.dataflow.sdk.transforms.windowing.Window) FixedWindows(com.google.cloud.dataflow.sdk.transforms.windowing.FixedWindows) Duration(org.apache.spark.streaming.Duration) AssignWindowsDoFn(com.google.cloud.dataflow.sdk.util.AssignWindowsDoFn) JavaDStream(org.apache.spark.streaming.api.java.JavaDStream) TransformEvaluator(com.cloudera.dataflow.spark.TransformEvaluator) JavaRDD(org.apache.spark.api.java.JavaRDD) DoFnFunction(com.cloudera.dataflow.spark.DoFnFunction) WindowedValue(com.google.cloud.dataflow.sdk.util.WindowedValue) JavaDStreamLike(org.apache.spark.streaming.api.java.JavaDStreamLike) EvaluationContext(com.cloudera.dataflow.spark.EvaluationContext) SlidingWindows(com.google.cloud.dataflow.sdk.transforms.windowing.SlidingWindows)

Aggregations

DoFnFunction (com.cloudera.dataflow.spark.DoFnFunction)2 EvaluationContext (com.cloudera.dataflow.spark.EvaluationContext)2 TransformEvaluator (com.cloudera.dataflow.spark.TransformEvaluator)2 WindowedValue (com.google.cloud.dataflow.sdk.util.WindowedValue)2 KafkaIO (com.cloudera.dataflow.io.KafkaIO)1 BoundedWindow (com.google.cloud.dataflow.sdk.transforms.windowing.BoundedWindow)1 FixedWindows (com.google.cloud.dataflow.sdk.transforms.windowing.FixedWindows)1 SlidingWindows (com.google.cloud.dataflow.sdk.transforms.windowing.SlidingWindows)1 Window (com.google.cloud.dataflow.sdk.transforms.windowing.Window)1 AssignWindowsDoFn (com.google.cloud.dataflow.sdk.util.AssignWindowsDoFn)1 KV (com.google.cloud.dataflow.sdk.values.KV)1 JavaRDD (org.apache.spark.api.java.JavaRDD)1 Function (org.apache.spark.api.java.function.Function)1 Duration (org.apache.spark.streaming.Duration)1 JavaDStream (org.apache.spark.streaming.api.java.JavaDStream)1 JavaDStreamLike (org.apache.spark.streaming.api.java.JavaDStreamLike)1 JavaStreamingContext (org.apache.spark.streaming.api.java.JavaStreamingContext)1 Tuple2 (scala.Tuple2)1