Search in sources :

Example 1 with FailureHandlingResult

use of org.apache.flink.runtime.executiongraph.failover.flip1.FailureHandlingResult in project flink by apache.

the class DefaultScheduler method handleGlobalFailure.

@Override
public void handleGlobalFailure(final Throwable error) {
    final long timestamp = System.currentTimeMillis();
    setGlobalFailureCause(error, timestamp);
    log.info("Trying to recover from a global failure.", error);
    final FailureHandlingResult failureHandlingResult = executionFailureHandler.getGlobalFailureHandlingResult(error, timestamp);
    maybeRestartTasks(failureHandlingResult);
}
Also used : FailureHandlingResult(org.apache.flink.runtime.executiongraph.failover.flip1.FailureHandlingResult)

Example 2 with FailureHandlingResult

use of org.apache.flink.runtime.executiongraph.failover.flip1.FailureHandlingResult in project flink by apache.

the class FailureHandlingResultSnapshot method create.

/**
 * Creates a {@code FailureHandlingResultSnapshot} based on the passed {@link
 * FailureHandlingResult} and {@link ExecutionVertex ExecutionVertices}.
 *
 * @param failureHandlingResult The {@code FailureHandlingResult} that is used for extracting
 *     the failure information.
 * @param latestExecutionLookup The look-up function for retrieving the latest {@link Execution}
 *     instance for a given {@link ExecutionVertexID}.
 * @return The {@code FailureHandlingResultSnapshot}.
 */
public static FailureHandlingResultSnapshot create(FailureHandlingResult failureHandlingResult, Function<ExecutionVertexID, Execution> latestExecutionLookup) {
    final Execution rootCauseExecution = failureHandlingResult.getExecutionVertexIdOfFailedTask().map(latestExecutionLookup).orElse(null);
    Preconditions.checkArgument(rootCauseExecution == null || rootCauseExecution.getFailureInfo().isPresent(), String.format("The execution %s didn't provide a failure info even though the corresponding ExecutionVertex %s is marked as having handled the root cause of this failure.", // added to make the compiler happy
    rootCauseExecution != null ? rootCauseExecution.getAttemptId() : "(null)", failureHandlingResult.getExecutionVertexIdOfFailedTask().map(Objects::toString).orElse("(null)")));
    final ExecutionVertexID rootCauseExecutionVertexId = failureHandlingResult.getExecutionVertexIdOfFailedTask().orElse(null);
    final Set<Execution> concurrentlyFailedExecutions = failureHandlingResult.getVerticesToRestart().stream().filter(executionVertexId -> !executionVertexId.equals(rootCauseExecutionVertexId)).map(latestExecutionLookup).filter(execution -> execution.getFailureInfo().isPresent()).collect(Collectors.toSet());
    return new FailureHandlingResultSnapshot(rootCauseExecution, ErrorInfo.handleMissingThrowable(failureHandlingResult.getError()), failureHandlingResult.getTimestamp(), concurrentlyFailedExecutions);
}
Also used : ErrorInfo(org.apache.flink.runtime.executiongraph.ErrorInfo) Set(java.util.Set) ExecutionVertexID(org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID) Preconditions(org.apache.flink.util.Preconditions) Function(java.util.function.Function) Collectors(java.util.stream.Collectors) VisibleForTesting(org.apache.flink.annotation.VisibleForTesting) Execution(org.apache.flink.runtime.executiongraph.Execution) FailureHandlingResult(org.apache.flink.runtime.executiongraph.failover.flip1.FailureHandlingResult) Objects(java.util.Objects) Optional(java.util.Optional) ExecutionVertex(org.apache.flink.runtime.executiongraph.ExecutionVertex) Collections(java.util.Collections) Nullable(javax.annotation.Nullable) Execution(org.apache.flink.runtime.executiongraph.Execution) ExecutionVertexID(org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID)

Example 3 with FailureHandlingResult

use of org.apache.flink.runtime.executiongraph.failover.flip1.FailureHandlingResult in project flink by apache.

the class FailureHandlingResultSnapshotTest method testGlobalFailureHandlingResultSnapshotCreation.

@Test
public void testGlobalFailureHandlingResultSnapshotCreation() {
    final Throwable rootCause = new FlinkException("Expected exception: root cause");
    final long timestamp = System.currentTimeMillis();
    final ExecutionVertex failedExecutionVertex0 = extractExecutionVertex(0);
    final Throwable failure0 = new RuntimeException("Expected exception: failure #0");
    final ExecutionVertex failedExecutionVertex1 = extractExecutionVertex(1);
    final Throwable failure1 = new IllegalStateException("Expected exception: failure #1");
    triggerFailure(failedExecutionVertex0, failure0);
    triggerFailure(failedExecutionVertex1, failure1);
    final FailureHandlingResult failureHandlingResult = FailureHandlingResult.restartable(null, rootCause, timestamp, StreamSupport.stream(executionGraph.getAllExecutionVertices().spliterator(), false).map(ExecutionVertex::getID).collect(Collectors.toSet()), 0L, true);
    final FailureHandlingResultSnapshot testInstance = FailureHandlingResultSnapshot.create(failureHandlingResult, this::getLatestExecution);
    assertThat(testInstance.getRootCause(), is(rootCause));
    assertThat(testInstance.getTimestamp(), is(timestamp));
    assertThat(testInstance.getRootCauseExecution().isPresent(), is(false));
    assertThat(testInstance.getConcurrentlyFailedExecution(), IsIterableContainingInAnyOrder.containsInAnyOrder(failedExecutionVertex0.getCurrentExecutionAttempt(), failedExecutionVertex1.getCurrentExecutionAttempt()));
}
Also used : FailureHandlingResult(org.apache.flink.runtime.executiongraph.failover.flip1.FailureHandlingResult) SerializedThrowable(org.apache.flink.util.SerializedThrowable) FlinkException(org.apache.flink.util.FlinkException) ExecutionVertex(org.apache.flink.runtime.executiongraph.ExecutionVertex) Test(org.junit.Test)

Example 4 with FailureHandlingResult

use of org.apache.flink.runtime.executiongraph.failover.flip1.FailureHandlingResult in project flink by apache.

the class FailureHandlingResultSnapshotTest method testMissingThrowableHandling.

// see FLINK-22060/FLINK-21376
@Test
public void testMissingThrowableHandling() {
    final ExecutionVertex rootCauseExecutionVertex = extractExecutionVertex(0);
    final long rootCauseTimestamp = triggerFailure(rootCauseExecutionVertex, null);
    final FailureHandlingResult failureHandlingResult = FailureHandlingResult.restartable(rootCauseExecutionVertex.getID(), null, rootCauseTimestamp, StreamSupport.stream(executionGraph.getAllExecutionVertices().spliterator(), false).map(ExecutionVertex::getID).collect(Collectors.toSet()), 0L, false);
    final FailureHandlingResultSnapshot testInstance = FailureHandlingResultSnapshot.create(failureHandlingResult, this::getLatestExecution);
    final Throwable actualException = new SerializedThrowable(testInstance.getRootCause()).deserializeError(ClassLoader.getSystemClassLoader());
    assertThat(actualException, IsInstanceOf.instanceOf(FlinkException.class));
    assertThat(actualException, FlinkMatchers.containsMessage(ErrorInfo.handleMissingThrowable(null).getMessage()));
    assertThat(testInstance.getTimestamp(), is(rootCauseTimestamp));
    assertThat(testInstance.getRootCauseExecution().isPresent(), is(true));
    assertThat(testInstance.getRootCauseExecution().get(), is(rootCauseExecutionVertex.getCurrentExecutionAttempt()));
}
Also used : FailureHandlingResult(org.apache.flink.runtime.executiongraph.failover.flip1.FailureHandlingResult) SerializedThrowable(org.apache.flink.util.SerializedThrowable) ExecutionVertex(org.apache.flink.runtime.executiongraph.ExecutionVertex) FlinkException(org.apache.flink.util.FlinkException) SerializedThrowable(org.apache.flink.util.SerializedThrowable) Test(org.junit.Test)

Example 5 with FailureHandlingResult

use of org.apache.flink.runtime.executiongraph.failover.flip1.FailureHandlingResult in project flink by apache.

the class DefaultScheduler method handleTaskFailure.

private void handleTaskFailure(final ExecutionVertexID executionVertexId, @Nullable final Throwable error) {
    final long timestamp = System.currentTimeMillis();
    setGlobalFailureCause(error, timestamp);
    notifyCoordinatorsAboutTaskFailure(executionVertexId, error);
    final FailureHandlingResult failureHandlingResult = executionFailureHandler.getFailureHandlingResult(executionVertexId, error, timestamp);
    maybeRestartTasks(failureHandlingResult);
}
Also used : FailureHandlingResult(org.apache.flink.runtime.executiongraph.failover.flip1.FailureHandlingResult)

Aggregations

FailureHandlingResult (org.apache.flink.runtime.executiongraph.failover.flip1.FailureHandlingResult)7 ExecutionVertex (org.apache.flink.runtime.executiongraph.ExecutionVertex)5 Test (org.junit.Test)4 SerializedThrowable (org.apache.flink.util.SerializedThrowable)3 FlinkException (org.apache.flink.util.FlinkException)2 Collections (java.util.Collections)1 Objects (java.util.Objects)1 Optional (java.util.Optional)1 Set (java.util.Set)1 Function (java.util.function.Function)1 Collectors (java.util.stream.Collectors)1 Nullable (javax.annotation.Nullable)1 VisibleForTesting (org.apache.flink.annotation.VisibleForTesting)1 ErrorInfo (org.apache.flink.runtime.executiongraph.ErrorInfo)1 Execution (org.apache.flink.runtime.executiongraph.Execution)1 ExecutionVertexID (org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID)1 Preconditions (org.apache.flink.util.Preconditions)1