Search in sources :

Example 1 with MetricsContainer

use of org.apache.beam.sdk.metrics.MetricsContainer in project beam by apache.

the class BatchedStreamingWrite method reportStreamingApiLogging.

private void reportStreamingApiLogging(BigQueryOptions options) {
    MetricsContainer processWideContainer = MetricsEnvironment.getProcessWideContainer();
    if (processWideContainer instanceof MetricsLogger) {
        MetricsLogger processWideMetricsLogger = (MetricsLogger) processWideContainer;
        processWideMetricsLogger.tryLoggingMetrics("API call Metrics: \n", this.allowedMetricUrns, options.getBqStreamingApiLoggingFrequencySec() * 1000L);
    }
}
Also used : MetricsLogger(org.apache.beam.runners.core.metrics.MetricsLogger) MetricsContainer(org.apache.beam.sdk.metrics.MetricsContainer)

Example 2 with MetricsContainer

use of org.apache.beam.sdk.metrics.MetricsContainer in project beam by apache.

the class StateSpecFunctions method mapSourceFunction.

/**
 * A {@link org.apache.spark.streaming.StateSpec} function to support reading from an {@link
 * UnboundedSource}.
 *
 * <p>This StateSpec function expects the following:
 *
 * <ul>
 *   <li>Key: The (partitioned) Source to read from.
 *   <li>Value: An optional {@link UnboundedSource.CheckpointMark} to start from.
 *   <li>State: A byte representation of the (previously) persisted CheckpointMark.
 * </ul>
 *
 * And returns an iterator over all read values (for the micro-batch).
 *
 * <p>This stateful operation could be described as a flatMap over a single-element stream, which
 * outputs all the elements read from the {@link UnboundedSource} for this micro-batch. Since
 * micro-batches are bounded, the provided UnboundedSource is wrapped by a {@link
 * MicrobatchSource} that applies bounds in the form of duration and max records (per
 * micro-batch).
 *
 * <p>In order to avoid using Spark Guava's classes which pollute the classpath, we use the {@link
 * StateSpec#function(scala.Function3)} signature which employs scala's native {@link
 * scala.Option}, instead of the {@link
 * StateSpec#function(org.apache.spark.api.java.function.Function3)} signature, which employs
 * Guava's {@link Optional}.
 *
 * <p>See also <a href="https://issues.apache.org/jira/browse/SPARK-4819">SPARK-4819</a>.
 *
 * @param options A serializable {@link SerializablePipelineOptions}.
 * @param <T> The type of the input stream elements.
 * @param <CheckpointMarkT> The type of the {@link UnboundedSource.CheckpointMark}.
 * @return The appropriate {@link org.apache.spark.streaming.StateSpec} function.
 */
public static <T, CheckpointMarkT extends UnboundedSource.CheckpointMark> scala.Function3<Source<T>, Option<CheckpointMarkT>, State<Tuple2<byte[], Instant>>, Tuple2<Iterable<byte[]>, Metadata>> mapSourceFunction(final SerializablePipelineOptions options, final String stepName) {
    return new SerializableFunction3<Source<T>, Option<CheckpointMarkT>, State<Tuple2<byte[], Instant>>, Tuple2<Iterable<byte[]>, Metadata>>() {

        @Override
        public Tuple2<Iterable<byte[]>, Metadata> apply(Source<T> source, Option<CheckpointMarkT> startCheckpointMark, State<Tuple2<byte[], Instant>> state) {
            MetricsContainerStepMap metricsContainers = new MetricsContainerStepMap();
            MetricsContainer metricsContainer = metricsContainers.getContainer(stepName);
            // since they may report metrics.
            try (Closeable ignored = MetricsEnvironment.scopedMetricsContainer(metricsContainer)) {
                // source as MicrobatchSource
                MicrobatchSource<T, CheckpointMarkT> microbatchSource = (MicrobatchSource<T, CheckpointMarkT>) source;
                // Initial high/low watermarks.
                Instant lowWatermark = BoundedWindow.TIMESTAMP_MIN_VALUE;
                final Instant highWatermark;
                // if state exists, use it, otherwise it's first time so use the startCheckpointMark.
                // startCheckpointMark may be EmptyCheckpointMark (the Spark Java API tries to apply
                // Optional(null)), which is handled by the UnboundedSource implementation.
                Coder<CheckpointMarkT> checkpointCoder = microbatchSource.getCheckpointMarkCoder();
                CheckpointMarkT checkpointMark;
                if (state.exists()) {
                    // previous (output) watermark is now the low watermark.
                    lowWatermark = state.get()._2();
                    checkpointMark = CoderHelpers.fromByteArray(state.get()._1(), checkpointCoder);
                    LOG.info("Continue reading from an existing CheckpointMark.");
                } else if (startCheckpointMark.isDefined() && !startCheckpointMark.get().equals(EmptyCheckpointMark.get())) {
                    checkpointMark = startCheckpointMark.get();
                    LOG.info("Start reading from a provided CheckpointMark.");
                } else {
                    checkpointMark = null;
                    LOG.info("No CheckpointMark provided, start reading from default.");
                }
                // create reader.
                final MicrobatchSource.Reader /*<T>*/
                microbatchReader;
                final Stopwatch stopwatch = Stopwatch.createStarted();
                long readDurationMillis = 0;
                try {
                    microbatchReader = (MicrobatchSource.Reader) microbatchSource.getOrCreateReader(options.get(), checkpointMark);
                } catch (IOException e) {
                    throw new RuntimeException(e);
                }
                // read microbatch as a serialized collection.
                final List<byte[]> readValues = new ArrayList<>();
                WindowedValue.FullWindowedValueCoder<T> coder = WindowedValue.FullWindowedValueCoder.of(source.getOutputCoder(), GlobalWindow.Coder.INSTANCE);
                try {
                    // measure how long a read takes per-partition.
                    boolean finished = !microbatchReader.start();
                    while (!finished) {
                        final WindowedValue<T> wv = WindowedValue.of((T) microbatchReader.getCurrent(), microbatchReader.getCurrentTimestamp(), GlobalWindow.INSTANCE, PaneInfo.NO_FIRING);
                        readValues.add(CoderHelpers.toByteArray(wv, coder));
                        finished = !microbatchReader.advance();
                    }
                    // end-of-read watermark is the high watermark, but don't allow decrease.
                    final Instant sourceWatermark = microbatchReader.getWatermark();
                    highWatermark = sourceWatermark.isAfter(lowWatermark) ? sourceWatermark : lowWatermark;
                    readDurationMillis = stopwatch.stop().elapsed(TimeUnit.MILLISECONDS);
                    LOG.info("Source id {} spent {} millis on reading.", microbatchSource.getId(), readDurationMillis);
                    // if the Source does not supply a CheckpointMark skip updating the state.
                    @SuppressWarnings("unchecked") final CheckpointMarkT finishedReadCheckpointMark = (CheckpointMarkT) microbatchReader.getCheckpointMark();
                    byte[] codedCheckpoint = CoderHelpers.toByteArray(finishedReadCheckpointMark, checkpointCoder);
                    // persist the end-of-read (high) watermark for following read, where it will become
                    // the next low watermark.
                    state.update(new Tuple2<>(codedCheckpoint, highWatermark));
                } catch (IOException e) {
                    throw new RuntimeException("Failed to read from reader.", e);
                }
                final ArrayList<byte[]> payload = Lists.newArrayList(Iterators.unmodifiableIterator(readValues.iterator()));
                return new Tuple2<>(payload, new Metadata(readValues.size(), lowWatermark, highWatermark, readDurationMillis, metricsContainers));
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        }
    };
}
Also used : MetricsContainerStepMap(org.apache.beam.runners.core.metrics.MetricsContainerStepMap) Closeable(java.io.Closeable) Metadata(org.apache.beam.runners.spark.io.SparkUnboundedSource.Metadata) Stopwatch(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Stopwatch) ArrayList(java.util.ArrayList) UnboundedSource(org.apache.beam.sdk.io.UnboundedSource) Source(org.apache.beam.sdk.io.Source) MicrobatchSource(org.apache.beam.runners.spark.io.MicrobatchSource) MetricsContainer(org.apache.beam.sdk.metrics.MetricsContainer) WindowedValue(org.apache.beam.sdk.util.WindowedValue) MicrobatchSource(org.apache.beam.runners.spark.io.MicrobatchSource) Instant(org.joda.time.Instant) IOException(java.io.IOException) Tuple2(scala.Tuple2) State(org.apache.spark.streaming.State) Option(scala.Option)

Example 3 with MetricsContainer

use of org.apache.beam.sdk.metrics.MetricsContainer in project beam by apache.

the class FlinkMetricContainerTest method testDistribution.

@Test
public void testDistribution() {
    FlinkMetricContainer.FlinkDistributionGauge flinkGauge = new FlinkMetricContainer.FlinkDistributionGauge(DistributionResult.IDENTITY_ELEMENT);
    when(metricGroup.gauge(eq("namespace.name"), anyObject())).thenReturn(flinkGauge);
    MetricsContainer step = container.getMetricsContainer("step");
    MetricName metricName = MetricName.named("namespace", "name");
    Distribution distribution = step.getDistribution(metricName);
    assertThat(flinkGauge.getValue(), is(DistributionResult.IDENTITY_ELEMENT));
    // first set will install the mocked distribution
    container.updateMetrics("step");
    distribution.update(42);
    distribution.update(-23);
    distribution.update(0);
    distribution.update(1);
    container.updateMetrics("step");
    assertThat(flinkGauge.getValue().getMax(), is(42L));
    assertThat(flinkGauge.getValue().getMin(), is(-23L));
    assertThat(flinkGauge.getValue().getCount(), is(4L));
    assertThat(flinkGauge.getValue().getSum(), is(20L));
    assertThat(flinkGauge.getValue().getMean(), is(5.0));
}
Also used : MetricName(org.apache.beam.sdk.metrics.MetricName) MonitoringInfoMetricName(org.apache.beam.runners.core.metrics.MonitoringInfoMetricName) MetricsContainer(org.apache.beam.sdk.metrics.MetricsContainer) Distribution(org.apache.beam.sdk.metrics.Distribution) FlinkDistributionGauge(org.apache.beam.runners.flink.metrics.FlinkMetricContainer.FlinkDistributionGauge) FlinkDistributionGauge(org.apache.beam.runners.flink.metrics.FlinkMetricContainer.FlinkDistributionGauge) Test(org.junit.Test)

Example 4 with MetricsContainer

use of org.apache.beam.sdk.metrics.MetricsContainer in project beam by apache.

the class StateSpecFunctions method mapSourceFunction.

/**
   * A {@link org.apache.spark.streaming.StateSpec} function to support reading from
   * an {@link UnboundedSource}.
   *
   * <p>This StateSpec function expects the following:
   * <ul>
   * <li>Key: The (partitioned) Source to read from.</li>
   * <li>Value: An optional {@link UnboundedSource.CheckpointMark} to start from.</li>
   * <li>State: A byte representation of the (previously) persisted CheckpointMark.</li>
   * </ul>
   * And returns an iterator over all read values (for the micro-batch).
   *
   * <p>This stateful operation could be described as a flatMap over a single-element stream, which
   * outputs all the elements read from the {@link UnboundedSource} for this micro-batch.
   * Since micro-batches are bounded, the provided UnboundedSource is wrapped by a
   * {@link MicrobatchSource} that applies bounds in the form of duration and max records
   * (per micro-batch).
   *
   *
   * <p>In order to avoid using Spark Guava's classes which pollute the
   * classpath, we use the {@link StateSpec#function(scala.Function3)} signature which employs
   * scala's native {@link scala.Option}, instead of the
   * {@link StateSpec#function(org.apache.spark.api.java.function.Function3)} signature,
   * which employs Guava's {@link com.google.common.base.Optional}.
   *
   * <p>See also <a href="https://issues.apache.org/jira/browse/SPARK-4819">SPARK-4819</a>.</p>
   *
   * @param runtimeContext    A serializable {@link SparkRuntimeContext}.
   * @param <T>               The type of the input stream elements.
   * @param <CheckpointMarkT> The type of the {@link UnboundedSource.CheckpointMark}.
   * @return The appropriate {@link org.apache.spark.streaming.StateSpec} function.
   */
public static <T, CheckpointMarkT extends UnboundedSource.CheckpointMark> scala.Function3<Source<T>, scala.Option<CheckpointMarkT>, State<Tuple2<byte[], Instant>>, Tuple2<Iterable<byte[]>, Metadata>> mapSourceFunction(final SparkRuntimeContext runtimeContext, final String stepName) {
    return new SerializableFunction3<Source<T>, Option<CheckpointMarkT>, State<Tuple2<byte[], Instant>>, Tuple2<Iterable<byte[]>, Metadata>>() {

        @Override
        public Tuple2<Iterable<byte[]>, Metadata> apply(Source<T> source, scala.Option<CheckpointMarkT> startCheckpointMark, State<Tuple2<byte[], Instant>> state) {
            MetricsContainerStepMap metricsContainers = new MetricsContainerStepMap();
            MetricsContainer metricsContainer = metricsContainers.getContainer(stepName);
            // since they may report metrics.
            try (Closeable ignored = MetricsEnvironment.scopedMetricsContainer(metricsContainer)) {
                // source as MicrobatchSource
                MicrobatchSource<T, CheckpointMarkT> microbatchSource = (MicrobatchSource<T, CheckpointMarkT>) source;
                // Initial high/low watermarks.
                Instant lowWatermark = BoundedWindow.TIMESTAMP_MIN_VALUE;
                final Instant highWatermark;
                // if state exists, use it, otherwise it's first time so use the startCheckpointMark.
                // startCheckpointMark may be EmptyCheckpointMark (the Spark Java API tries to apply
                // Optional(null)), which is handled by the UnboundedSource implementation.
                Coder<CheckpointMarkT> checkpointCoder = microbatchSource.getCheckpointMarkCoder();
                CheckpointMarkT checkpointMark;
                if (state.exists()) {
                    // previous (output) watermark is now the low watermark.
                    lowWatermark = state.get()._2();
                    checkpointMark = CoderHelpers.fromByteArray(state.get()._1(), checkpointCoder);
                    LOG.info("Continue reading from an existing CheckpointMark.");
                } else if (startCheckpointMark.isDefined() && !startCheckpointMark.get().equals(EmptyCheckpointMark.get())) {
                    checkpointMark = startCheckpointMark.get();
                    LOG.info("Start reading from a provided CheckpointMark.");
                } else {
                    checkpointMark = null;
                    LOG.info("No CheckpointMark provided, start reading from default.");
                }
                // create reader.
                final MicrobatchSource.Reader /*<T>*/
                microbatchReader;
                final Stopwatch stopwatch = Stopwatch.createStarted();
                long readDurationMillis = 0;
                try {
                    microbatchReader = (MicrobatchSource.Reader) microbatchSource.getOrCreateReader(runtimeContext.getPipelineOptions(), checkpointMark);
                } catch (IOException e) {
                    throw new RuntimeException(e);
                }
                // read microbatch as a serialized collection.
                final List<byte[]> readValues = new ArrayList<>();
                WindowedValue.FullWindowedValueCoder<T> coder = WindowedValue.FullWindowedValueCoder.of(source.getDefaultOutputCoder(), GlobalWindow.Coder.INSTANCE);
                try {
                    // measure how long a read takes per-partition.
                    boolean finished = !microbatchReader.start();
                    while (!finished) {
                        final WindowedValue<T> wv = WindowedValue.of((T) microbatchReader.getCurrent(), microbatchReader.getCurrentTimestamp(), GlobalWindow.INSTANCE, PaneInfo.NO_FIRING);
                        readValues.add(CoderHelpers.toByteArray(wv, coder));
                        finished = !microbatchReader.advance();
                    }
                    // end-of-read watermark is the high watermark, but don't allow decrease.
                    final Instant sourceWatermark = microbatchReader.getWatermark();
                    highWatermark = sourceWatermark.isAfter(lowWatermark) ? sourceWatermark : lowWatermark;
                    readDurationMillis = stopwatch.stop().elapsed(TimeUnit.MILLISECONDS);
                    LOG.info("Source id {} spent {} millis on reading.", microbatchSource.getId(), readDurationMillis);
                    // if the Source does not supply a CheckpointMark skip updating the state.
                    @SuppressWarnings("unchecked") final CheckpointMarkT finishedReadCheckpointMark = (CheckpointMarkT) microbatchReader.getCheckpointMark();
                    byte[] codedCheckpoint = new byte[0];
                    if (finishedReadCheckpointMark != null) {
                        codedCheckpoint = CoderHelpers.toByteArray(finishedReadCheckpointMark, checkpointCoder);
                    } else {
                        LOG.info("Skipping checkpoint marking because the reader failed to supply one.");
                    }
                    // persist the end-of-read (high) watermark for following read, where it will become
                    // the next low watermark.
                    state.update(new Tuple2<>(codedCheckpoint, highWatermark));
                } catch (IOException e) {
                    throw new RuntimeException("Failed to read from reader.", e);
                }
                final ArrayList<byte[]> payload = Lists.newArrayList(Iterators.unmodifiableIterator(readValues.iterator()));
                return new Tuple2<>((Iterable<byte[]>) payload, new Metadata(readValues.size(), lowWatermark, highWatermark, readDurationMillis, metricsContainers));
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        }
    };
}
Also used : MetricsContainerStepMap(org.apache.beam.runners.core.metrics.MetricsContainerStepMap) Closeable(java.io.Closeable) Metadata(org.apache.beam.runners.spark.io.SparkUnboundedSource.Metadata) Stopwatch(com.google.common.base.Stopwatch) ArrayList(java.util.ArrayList) UnboundedSource(org.apache.beam.sdk.io.UnboundedSource) Source(org.apache.beam.sdk.io.Source) MicrobatchSource(org.apache.beam.runners.spark.io.MicrobatchSource) MetricsContainer(org.apache.beam.sdk.metrics.MetricsContainer) WindowedValue(org.apache.beam.sdk.util.WindowedValue) MicrobatchSource(org.apache.beam.runners.spark.io.MicrobatchSource) Instant(org.joda.time.Instant) IOException(java.io.IOException) Tuple2(scala.Tuple2) State(org.apache.spark.streaming.State) Option(scala.Option)

Example 5 with MetricsContainer

use of org.apache.beam.sdk.metrics.MetricsContainer in project beam by apache.

the class LabeledMetricsTest method testOperationsUpdateCounterFromContainerWhenContainerIsPresent.

@Test
public void testOperationsUpdateCounterFromContainerWhenContainerIsPresent() {
    HashMap<String, String> labels = new HashMap<String, String>();
    String urn = MonitoringInfoConstants.Urns.ELEMENT_COUNT;
    MonitoringInfoMetricName name = MonitoringInfoMetricName.named(urn, labels);
    MetricsContainer mockContainer = Mockito.mock(MetricsContainer.class);
    Counter mockCounter = Mockito.mock(Counter.class);
    when(mockContainer.getCounter(name)).thenReturn(mockCounter);
    Counter counter = LabeledMetrics.counter(name);
    MetricsEnvironment.setCurrentContainer(mockContainer);
    counter.inc();
    verify(mockCounter).inc(1);
    counter.inc(47L);
    verify(mockCounter).inc(47);
    counter.dec(5L);
    verify(mockCounter).inc(-5);
}
Also used : Counter(org.apache.beam.sdk.metrics.Counter) HashMap(java.util.HashMap) MetricsContainer(org.apache.beam.sdk.metrics.MetricsContainer) Test(org.junit.Test)

Aggregations

MetricsContainer (org.apache.beam.sdk.metrics.MetricsContainer)10 Test (org.junit.Test)6 MonitoringInfoMetricName (org.apache.beam.runners.core.metrics.MonitoringInfoMetricName)3 MetricName (org.apache.beam.sdk.metrics.MetricName)3 Closeable (java.io.Closeable)2 IOException (java.io.IOException)2 ArrayList (java.util.ArrayList)2 ExecutionState (org.apache.beam.runners.core.metrics.ExecutionStateTracker.ExecutionState)2 MetricsContainerStepMap (org.apache.beam.runners.core.metrics.MetricsContainerStepMap)2 NoopProfileScope (org.apache.beam.runners.dataflow.worker.profiler.ScopedProfiler.NoopProfileScope)2 ProfileScope (org.apache.beam.runners.dataflow.worker.profiler.ScopedProfiler.ProfileScope)2 FlinkDistributionGauge (org.apache.beam.runners.flink.metrics.FlinkMetricContainer.FlinkDistributionGauge)2 MicrobatchSource (org.apache.beam.runners.spark.io.MicrobatchSource)2 Metadata (org.apache.beam.runners.spark.io.SparkUnboundedSource.Metadata)2 Source (org.apache.beam.sdk.io.Source)2 UnboundedSource (org.apache.beam.sdk.io.UnboundedSource)2 Counter (org.apache.beam.sdk.metrics.Counter)2 WindowedValue (org.apache.beam.sdk.util.WindowedValue)2 State (org.apache.spark.streaming.State)2 Instant (org.joda.time.Instant)2