Search in sources :

Example 1 with CheckpointFailureReason

use of org.apache.flink.runtime.checkpoint.CheckpointFailureReason in project flink by apache.

the class AsyncCheckpointRunnableTest method testDeclineAsyncCheckpoint.

@Test
public void testDeclineAsyncCheckpoint() {
    CheckpointFailureReason originalReason = CheckpointFailureReason.CHECKPOINT_DECLINED_INPUT_END_OF_STREAM;
    final Map<OperatorID, OperatorSnapshotFutures> snapshotsInProgress = new HashMap<>();
    snapshotsInProgress.put(new OperatorID(), new OperatorSnapshotFutures(DoneFuture.of(SnapshotResult.empty()), DoneFuture.of(SnapshotResult.empty()), DoneFuture.of(SnapshotResult.empty()), DoneFuture.of(SnapshotResult.empty()), ExceptionallyDoneFuture.of(new CheckpointException(originalReason)), DoneFuture.of(SnapshotResult.empty())));
    final TestEnvironment environment = new TestEnvironment();
    final AsyncCheckpointRunnable runnable = createAsyncRunnable(snapshotsInProgress, environment, false, true);
    runnable.run();
    Assert.assertSame(environment.getCause().getCheckpointFailureReason(), originalReason);
}
Also used : OperatorSnapshotFutures(org.apache.flink.streaming.api.operators.OperatorSnapshotFutures) CheckpointFailureReason(org.apache.flink.runtime.checkpoint.CheckpointFailureReason) HashMap(java.util.HashMap) CheckpointException(org.apache.flink.runtime.checkpoint.CheckpointException) OperatorID(org.apache.flink.runtime.jobgraph.OperatorID) Test(org.junit.Test)

Example 2 with CheckpointFailureReason

use of org.apache.flink.runtime.checkpoint.CheckpointFailureReason in project flink by apache.

the class AsyncCheckpointRunnable method handleExecutionException.

private void handleExecutionException(Exception e) {
    boolean didCleanup = false;
    AsyncCheckpointState currentState = asyncCheckpointState.get();
    while (AsyncCheckpointState.DISCARDED != currentState) {
        if (asyncCheckpointState.compareAndSet(currentState, AsyncCheckpointState.DISCARDED)) {
            didCleanup = true;
            try {
                cleanup();
            } catch (Exception cleanupException) {
                e.addSuppressed(cleanupException);
            }
            Exception checkpointException = new Exception("Could not materialize checkpoint " + checkpointMetaData.getCheckpointId() + " for operator " + taskName + '.', e);
            if (isTaskRunning.get()) {
                // failing the task.
                try {
                    Optional<CheckpointException> underlyingCheckpointException = ExceptionUtils.findThrowable(checkpointException, CheckpointException.class);
                    // If this failure is already a CheckpointException, do not overwrite the
                    // original CheckpointFailureReason
                    CheckpointFailureReason reportedFailureReason = underlyingCheckpointException.map(exception -> exception.getCheckpointFailureReason()).orElse(CheckpointFailureReason.CHECKPOINT_ASYNC_EXCEPTION);
                    taskEnvironment.declineCheckpoint(checkpointMetaData.getCheckpointId(), new CheckpointException(reportedFailureReason, checkpointException));
                } catch (Exception unhandled) {
                    AsynchronousException asyncException = new AsynchronousException(unhandled);
                    asyncExceptionHandler.handleAsyncException("Failure in asynchronous checkpoint materialization", asyncException);
                }
            } else {
                // We never decline checkpoint after task is not running to avoid unexpected job
                // failover, which caused by exceeding checkpoint tolerable failure threshold.
                LOG.info("Ignore decline of checkpoint {} as task is not running anymore.", checkpointMetaData.getCheckpointId());
            }
            currentState = AsyncCheckpointState.DISCARDED;
        } else {
            currentState = asyncCheckpointState.get();
        }
    }
    if (!didCleanup) {
        LOG.trace("Caught followup exception from a failed checkpoint thread. This can be ignored.", e);
    }
}
Also used : CheckpointMetricsBuilder(org.apache.flink.runtime.checkpoint.CheckpointMetricsBuilder) Tuple2(org.apache.flink.api.java.tuple.Tuple2) CheckpointMetaData(org.apache.flink.runtime.checkpoint.CheckpointMetaData) OperatorSnapshotFinalizer(org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer) LoggerFactory(org.slf4j.LoggerFactory) ExceptionUtils(org.apache.flink.util.ExceptionUtils) CompletableFuture(java.util.concurrent.CompletableFuture) CheckpointFailureReason(org.apache.flink.runtime.checkpoint.CheckpointFailureReason) AtomicReference(java.util.concurrent.atomic.AtomicReference) Supplier(java.util.function.Supplier) OperatorSnapshotFutures(org.apache.flink.streaming.api.operators.OperatorSnapshotFutures) CheckpointException(org.apache.flink.runtime.checkpoint.CheckpointException) Map(java.util.Map) Preconditions.checkNotNull(org.apache.flink.util.Preconditions.checkNotNull) TaskStateSnapshot(org.apache.flink.runtime.checkpoint.TaskStateSnapshot) Preconditions.checkState(org.apache.flink.util.Preconditions.checkState) Logger(org.slf4j.Logger) FileSystemSafetyNet(org.apache.flink.core.fs.FileSystemSafetyNet) Consumer(java.util.function.Consumer) AsyncExceptionHandler(org.apache.flink.runtime.taskmanager.AsyncExceptionHandler) AsynchronousException(org.apache.flink.runtime.taskmanager.AsynchronousException) Closeable(java.io.Closeable) OperatorID(org.apache.flink.runtime.jobgraph.OperatorID) Optional(java.util.Optional) CheckpointMetrics(org.apache.flink.runtime.checkpoint.CheckpointMetrics) Environment(org.apache.flink.runtime.execution.Environment) CheckpointFailureReason(org.apache.flink.runtime.checkpoint.CheckpointFailureReason) CheckpointException(org.apache.flink.runtime.checkpoint.CheckpointException) AsynchronousException(org.apache.flink.runtime.taskmanager.AsynchronousException) CheckpointException(org.apache.flink.runtime.checkpoint.CheckpointException) AsynchronousException(org.apache.flink.runtime.taskmanager.AsynchronousException)

Aggregations

CheckpointException (org.apache.flink.runtime.checkpoint.CheckpointException)2 CheckpointFailureReason (org.apache.flink.runtime.checkpoint.CheckpointFailureReason)2 OperatorID (org.apache.flink.runtime.jobgraph.OperatorID)2 OperatorSnapshotFutures (org.apache.flink.streaming.api.operators.OperatorSnapshotFutures)2 Closeable (java.io.Closeable)1 HashMap (java.util.HashMap)1 Map (java.util.Map)1 Optional (java.util.Optional)1 CompletableFuture (java.util.concurrent.CompletableFuture)1 AtomicReference (java.util.concurrent.atomic.AtomicReference)1 Consumer (java.util.function.Consumer)1 Supplier (java.util.function.Supplier)1 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)1 FileSystemSafetyNet (org.apache.flink.core.fs.FileSystemSafetyNet)1 CheckpointMetaData (org.apache.flink.runtime.checkpoint.CheckpointMetaData)1 CheckpointMetrics (org.apache.flink.runtime.checkpoint.CheckpointMetrics)1 CheckpointMetricsBuilder (org.apache.flink.runtime.checkpoint.CheckpointMetricsBuilder)1 TaskStateSnapshot (org.apache.flink.runtime.checkpoint.TaskStateSnapshot)1 Environment (org.apache.flink.runtime.execution.Environment)1 AsyncExceptionHandler (org.apache.flink.runtime.taskmanager.AsyncExceptionHandler)1