Search in sources :

Example 1 with TestMasterHook

use of org.apache.flink.runtime.checkpoint.hooks.TestMasterHook in project flink by apache.

the class DefaultSchedulerTest method restoreStateWhenRestartingTasks.

@Test
public void restoreStateWhenRestartingTasks() throws Exception {
    final JobGraph jobGraph = singleNonParallelJobVertexJobGraph();
    enableCheckpointing(jobGraph);
    final CountDownLatch checkpointTriggeredLatch = getCheckpointTriggeredLatch();
    final DefaultScheduler scheduler = createSchedulerAndStartScheduling(jobGraph);
    final ArchivedExecutionVertex onlyExecutionVertex = Iterables.getOnlyElement(scheduler.requestJob().getArchivedExecutionGraph().getAllExecutionVertices());
    final ExecutionAttemptID attemptId = onlyExecutionVertex.getCurrentExecutionAttempt().getAttemptId();
    transitionToRunning(scheduler, attemptId);
    final CheckpointCoordinator checkpointCoordinator = getCheckpointCoordinator(scheduler);
    // register a stateful master hook to help verify state restore
    final TestMasterHook masterHook = TestMasterHook.fromId("testHook");
    checkpointCoordinator.addMasterHook(masterHook);
    // complete one checkpoint for state restore
    checkpointCoordinator.triggerCheckpoint(false);
    checkpointTriggeredLatch.await();
    final long checkpointId = checkpointCoordinator.getPendingCheckpoints().keySet().iterator().next();
    acknowledgePendingCheckpoint(scheduler, checkpointId);
    scheduler.updateTaskExecutionState(createFailedTaskExecutionState(attemptId));
    taskRestartExecutor.triggerScheduledTasks();
    assertThat(masterHook.getRestoreCount(), is(equalTo(1)));
}
Also used : TestMasterHook(org.apache.flink.runtime.checkpoint.hooks.TestMasterHook) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) ExecutionAttemptID(org.apache.flink.runtime.executiongraph.ExecutionAttemptID) CheckpointCoordinator(org.apache.flink.runtime.checkpoint.CheckpointCoordinator) SchedulerTestingUtils.getCheckpointCoordinator(org.apache.flink.runtime.scheduler.SchedulerTestingUtils.getCheckpointCoordinator) ArchivedExecutionVertex(org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex) CountDownLatch(java.util.concurrent.CountDownLatch) AdaptiveSchedulerTest(org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerTest) Test(org.junit.Test)

Example 2 with TestMasterHook

use of org.apache.flink.runtime.checkpoint.hooks.TestMasterHook in project flink by apache.

the class DefaultSchedulerTest method failGlobalWhenRestoringStateFails.

@Test
public void failGlobalWhenRestoringStateFails() throws Exception {
    final JobGraph jobGraph = singleNonParallelJobVertexJobGraph();
    final JobVertex onlyJobVertex = getOnlyJobVertex(jobGraph);
    enableCheckpointing(jobGraph);
    final CountDownLatch checkpointTriggeredLatch = getCheckpointTriggeredLatch();
    final DefaultScheduler scheduler = createSchedulerAndStartScheduling(jobGraph);
    final ArchivedExecutionVertex onlyExecutionVertex = Iterables.getOnlyElement(scheduler.requestJob().getArchivedExecutionGraph().getAllExecutionVertices());
    final ExecutionAttemptID attemptId = onlyExecutionVertex.getCurrentExecutionAttempt().getAttemptId();
    transitionToRunning(scheduler, attemptId);
    final CheckpointCoordinator checkpointCoordinator = getCheckpointCoordinator(scheduler);
    // register a master hook to fail state restore
    final TestMasterHook masterHook = TestMasterHook.fromId("testHook");
    masterHook.enableFailOnRestore();
    checkpointCoordinator.addMasterHook(masterHook);
    // complete one checkpoint for state restore
    checkpointCoordinator.triggerCheckpoint(false);
    checkpointTriggeredLatch.await();
    final long checkpointId = checkpointCoordinator.getPendingCheckpoints().keySet().iterator().next();
    acknowledgePendingCheckpoint(scheduler, checkpointId);
    scheduler.updateTaskExecutionState(createFailedTaskExecutionState(attemptId));
    taskRestartExecutor.triggerScheduledTasks();
    final List<ExecutionVertexID> deployedExecutionVertices = testExecutionVertexOperations.getDeployedVertices();
    // the first task failover should be skipped on state restore failure
    final ExecutionVertexID executionVertexId = new ExecutionVertexID(onlyJobVertex.getID(), 0);
    assertThat(deployedExecutionVertices, contains(executionVertexId));
    // a global failure should be triggered on state restore failure
    masterHook.disableFailOnRestore();
    taskRestartExecutor.triggerScheduledTasks();
    assertThat(deployedExecutionVertices, contains(executionVertexId, executionVertexId));
}
Also used : TestMasterHook(org.apache.flink.runtime.checkpoint.hooks.TestMasterHook) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) JobVertex(org.apache.flink.runtime.jobgraph.JobVertex) ExecutionAttemptID(org.apache.flink.runtime.executiongraph.ExecutionAttemptID) CheckpointCoordinator(org.apache.flink.runtime.checkpoint.CheckpointCoordinator) SchedulerTestingUtils.getCheckpointCoordinator(org.apache.flink.runtime.scheduler.SchedulerTestingUtils.getCheckpointCoordinator) ExecutionVertexID(org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID) ArchivedExecutionVertex(org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex) CountDownLatch(java.util.concurrent.CountDownLatch) AdaptiveSchedulerTest(org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerTest) Test(org.junit.Test)

Aggregations

CountDownLatch (java.util.concurrent.CountDownLatch)2 CheckpointCoordinator (org.apache.flink.runtime.checkpoint.CheckpointCoordinator)2 TestMasterHook (org.apache.flink.runtime.checkpoint.hooks.TestMasterHook)2 ArchivedExecutionVertex (org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex)2 ExecutionAttemptID (org.apache.flink.runtime.executiongraph.ExecutionAttemptID)2 JobGraph (org.apache.flink.runtime.jobgraph.JobGraph)2 SchedulerTestingUtils.getCheckpointCoordinator (org.apache.flink.runtime.scheduler.SchedulerTestingUtils.getCheckpointCoordinator)2 AdaptiveSchedulerTest (org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerTest)2 Test (org.junit.Test)2 JobVertex (org.apache.flink.runtime.jobgraph.JobVertex)1 ExecutionVertexID (org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID)1