Search in sources :

Example 1 with ArchivedExecutionVertex

use of org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex in project flink by apache.

the class DefaultSchedulerLocalRecoveryITCase method assertNonLocalRecoveredTasksEquals.

private void assertNonLocalRecoveredTasksEquals(ArchivedExecutionGraph graph, int expected) {
    int nonLocalRecoveredTasks = 0;
    for (ArchivedExecutionVertex vertex : graph.getAllExecutionVertices()) {
        int currentAttemptNumber = vertex.getCurrentExecutionAttempt().getAttemptNumber();
        if (currentAttemptNumber == 0) {
            // the task had never restarted and do not need to recover
            continue;
        }
        AllocationID priorAllocation = vertex.getPriorExecutionAttempt(currentAttemptNumber - 1).getAssignedAllocationID();
        AllocationID currentAllocation = vertex.getCurrentExecutionAttempt().getAssignedAllocationID();
        assertNotNull(priorAllocation);
        assertNotNull(currentAllocation);
        if (!currentAllocation.equals(priorAllocation)) {
            nonLocalRecoveredTasks++;
        }
    }
    assertThat(nonLocalRecoveredTasks, is(expected));
}
Also used : ArchivedExecutionVertex(org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex) AllocationID(org.apache.flink.runtime.clusterframework.types.AllocationID)

Example 2 with ArchivedExecutionVertex

use of org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex in project flink by apache.

the class DefaultSchedulerTest method failureInfoIsSetAfterTaskFailure.

@Test
public void failureInfoIsSetAfterTaskFailure() {
    final JobGraph jobGraph = singleNonParallelJobVertexJobGraph();
    final DefaultScheduler scheduler = createSchedulerAndStartScheduling(jobGraph);
    final ArchivedExecutionVertex onlyExecutionVertex = Iterables.getOnlyElement(scheduler.requestJob().getArchivedExecutionGraph().getAllExecutionVertices());
    final ExecutionAttemptID attemptId = onlyExecutionVertex.getCurrentExecutionAttempt().getAttemptId();
    final String exceptionMessage = "expected exception";
    scheduler.updateTaskExecutionState(new TaskExecutionState(attemptId, ExecutionState.FAILED, new RuntimeException(exceptionMessage)));
    final ErrorInfo failureInfo = scheduler.requestJob().getArchivedExecutionGraph().getFailureInfo();
    assertThat(failureInfo, is(notNullValue()));
    assertThat(failureInfo.getExceptionAsString(), containsString(exceptionMessage));
}
Also used : JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) ExecutionAttemptID(org.apache.flink.runtime.executiongraph.ExecutionAttemptID) ArchivedExecutionVertex(org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex) ErrorInfo(org.apache.flink.runtime.executiongraph.ErrorInfo) Matchers.containsString(org.hamcrest.Matchers.containsString) TaskExecutionState(org.apache.flink.runtime.taskmanager.TaskExecutionState) AdaptiveSchedulerTest(org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerTest) Test(org.junit.Test)

Example 3 with ArchivedExecutionVertex

use of org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex in project flink by apache.

the class DefaultSchedulerTest method restoreStateWhenRestartingTasks.

@Test
public void restoreStateWhenRestartingTasks() throws Exception {
    final JobGraph jobGraph = singleNonParallelJobVertexJobGraph();
    enableCheckpointing(jobGraph);
    final CountDownLatch checkpointTriggeredLatch = getCheckpointTriggeredLatch();
    final DefaultScheduler scheduler = createSchedulerAndStartScheduling(jobGraph);
    final ArchivedExecutionVertex onlyExecutionVertex = Iterables.getOnlyElement(scheduler.requestJob().getArchivedExecutionGraph().getAllExecutionVertices());
    final ExecutionAttemptID attemptId = onlyExecutionVertex.getCurrentExecutionAttempt().getAttemptId();
    transitionToRunning(scheduler, attemptId);
    final CheckpointCoordinator checkpointCoordinator = getCheckpointCoordinator(scheduler);
    // register a stateful master hook to help verify state restore
    final TestMasterHook masterHook = TestMasterHook.fromId("testHook");
    checkpointCoordinator.addMasterHook(masterHook);
    // complete one checkpoint for state restore
    checkpointCoordinator.triggerCheckpoint(false);
    checkpointTriggeredLatch.await();
    final long checkpointId = checkpointCoordinator.getPendingCheckpoints().keySet().iterator().next();
    acknowledgePendingCheckpoint(scheduler, checkpointId);
    scheduler.updateTaskExecutionState(createFailedTaskExecutionState(attemptId));
    taskRestartExecutor.triggerScheduledTasks();
    assertThat(masterHook.getRestoreCount(), is(equalTo(1)));
}
Also used : TestMasterHook(org.apache.flink.runtime.checkpoint.hooks.TestMasterHook) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) ExecutionAttemptID(org.apache.flink.runtime.executiongraph.ExecutionAttemptID) CheckpointCoordinator(org.apache.flink.runtime.checkpoint.CheckpointCoordinator) SchedulerTestingUtils.getCheckpointCoordinator(org.apache.flink.runtime.scheduler.SchedulerTestingUtils.getCheckpointCoordinator) ArchivedExecutionVertex(org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex) CountDownLatch(java.util.concurrent.CountDownLatch) AdaptiveSchedulerTest(org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerTest) Test(org.junit.Test)

Example 4 with ArchivedExecutionVertex

use of org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex in project flink by apache.

the class DefaultSchedulerTest method handleGlobalFailureWithLocalFailure.

/**
 * This test covers the use-case where a global fail-over is followed by a local task failure.
 * It verifies (besides checking the expected deployments) that the assert in the global
 * recovery handling of {@link SchedulerBase#restoreState} is not triggered due to version
 * updates.
 */
@Test
public void handleGlobalFailureWithLocalFailure() {
    final JobGraph jobGraph = singleJobVertexJobGraph(2);
    final JobVertex onlyJobVertex = getOnlyJobVertex(jobGraph);
    enableCheckpointing(jobGraph);
    final DefaultScheduler scheduler = createSchedulerAndStartScheduling(jobGraph);
    final List<ExecutionAttemptID> attemptIds = StreamSupport.stream(scheduler.requestJob().getArchivedExecutionGraph().getAllExecutionVertices().spliterator(), false).map(ArchivedExecutionVertex::getCurrentExecutionAttempt).map(ArchivedExecution::getAttemptId).collect(Collectors.toList());
    final ExecutionAttemptID localFailureAttemptId = attemptIds.get(0);
    scheduler.handleGlobalFailure(new Exception("global failure"));
    // the local failure shouldn't affect the global fail-over
    scheduler.updateTaskExecutionState(new TaskExecutionState(localFailureAttemptId, ExecutionState.FAILED, new Exception("local failure")));
    for (ExecutionAttemptID attemptId : attemptIds) {
        scheduler.updateTaskExecutionState(new TaskExecutionState(attemptId, ExecutionState.CANCELED));
    }
    taskRestartExecutor.triggerScheduledTasks();
    final ExecutionVertexID executionVertexId0 = new ExecutionVertexID(onlyJobVertex.getID(), 0);
    final ExecutionVertexID executionVertexId1 = new ExecutionVertexID(onlyJobVertex.getID(), 1);
    assertThat("The execution vertices should be deployed in a specific order reflecting the scheduling start and the global fail-over afterwards.", testExecutionVertexOperations.getDeployedVertices(), contains(executionVertexId0, executionVertexId1, executionVertexId0, executionVertexId1));
}
Also used : JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) JobVertex(org.apache.flink.runtime.jobgraph.JobVertex) ExecutionAttemptID(org.apache.flink.runtime.executiongraph.ExecutionAttemptID) ExecutionVertexID(org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID) ArchivedExecutionVertex(org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex) FlinkException(org.apache.flink.util.FlinkException) NoResourceAvailableException(org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException) TaskExecutionState(org.apache.flink.runtime.taskmanager.TaskExecutionState) AdaptiveSchedulerTest(org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerTest) Test(org.junit.Test)

Example 5 with ArchivedExecutionVertex

use of org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex in project flink by apache.

the class DefaultSchedulerTest method handleGlobalFailure.

@Test
public void handleGlobalFailure() {
    final JobGraph jobGraph = singleNonParallelJobVertexJobGraph();
    final JobVertex onlyJobVertex = getOnlyJobVertex(jobGraph);
    final DefaultScheduler scheduler = createSchedulerAndStartScheduling(jobGraph);
    scheduler.handleGlobalFailure(new Exception("forced failure"));
    final ArchivedExecutionVertex onlyExecutionVertex = Iterables.getOnlyElement(scheduler.requestJob().getArchivedExecutionGraph().getAllExecutionVertices());
    final ExecutionAttemptID attemptId = onlyExecutionVertex.getCurrentExecutionAttempt().getAttemptId();
    scheduler.updateTaskExecutionState(new TaskExecutionState(attemptId, ExecutionState.CANCELED));
    taskRestartExecutor.triggerScheduledTasks();
    final List<ExecutionVertexID> deployedExecutionVertices = testExecutionVertexOperations.getDeployedVertices();
    final ExecutionVertexID executionVertexId = new ExecutionVertexID(onlyJobVertex.getID(), 0);
    assertThat(deployedExecutionVertices, contains(executionVertexId, executionVertexId));
}
Also used : JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) JobVertex(org.apache.flink.runtime.jobgraph.JobVertex) ExecutionAttemptID(org.apache.flink.runtime.executiongraph.ExecutionAttemptID) ExecutionVertexID(org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID) ArchivedExecutionVertex(org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex) FlinkException(org.apache.flink.util.FlinkException) NoResourceAvailableException(org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException) TaskExecutionState(org.apache.flink.runtime.taskmanager.TaskExecutionState) AdaptiveSchedulerTest(org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerTest) Test(org.junit.Test)

Aggregations

ArchivedExecutionVertex (org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex)24 Test (org.junit.Test)21 ExecutionAttemptID (org.apache.flink.runtime.executiongraph.ExecutionAttemptID)19 JobGraph (org.apache.flink.runtime.jobgraph.JobGraph)19 AdaptiveSchedulerTest (org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerTest)18 JobVertex (org.apache.flink.runtime.jobgraph.JobVertex)7 ExecutionVertexID (org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID)7 TaskExecutionState (org.apache.flink.runtime.taskmanager.TaskExecutionState)7 ExecutionState (org.apache.flink.runtime.execution.ExecutionState)5 JobStatus (org.apache.flink.api.common.JobStatus)4 ArchivedExecution (org.apache.flink.runtime.executiongraph.ArchivedExecution)4 LocalTaskManagerLocation (org.apache.flink.runtime.taskmanager.LocalTaskManagerLocation)4 JobID (org.apache.flink.api.common.JobID)3 Configuration (org.apache.flink.configuration.Configuration)3 TestingCheckpointRecoveryFactory (org.apache.flink.runtime.checkpoint.TestingCheckpointRecoveryFactory)3 AllocationID (org.apache.flink.runtime.clusterframework.types.AllocationID)3 ArchivedExecutionJobVertex (org.apache.flink.runtime.executiongraph.ArchivedExecutionJobVertex)3 ErrorInfo (org.apache.flink.runtime.executiongraph.ErrorInfo)3 RootExceptionHistoryEntry (org.apache.flink.runtime.scheduler.exceptionhistory.RootExceptionHistoryEntry)3 ArrayList (java.util.ArrayList)2