Search in sources :

Example 1 with TestRestartBackoffTimeStrategy

use of org.apache.flink.runtime.executiongraph.failover.flip1.TestRestartBackoffTimeStrategy in project flink by apache.

the class DefaultSchedulerTest method setUp.

@Before
public void setUp() throws Exception {
    executor = Executors.newSingleThreadExecutor();
    scheduledExecutorService = new DirectScheduledExecutorService();
    configuration = new Configuration();
    testRestartBackoffTimeStrategy = new TestRestartBackoffTimeStrategy(true, 0);
    testExecutionVertexOperations = new TestExecutionVertexOperationsDecorator(new DefaultExecutionVertexOperations());
    executionVertexVersioner = new ExecutionVertexVersioner();
    executionSlotAllocatorFactory = new TestExecutionSlotAllocatorFactory();
    testExecutionSlotAllocator = executionSlotAllocatorFactory.getTestExecutionSlotAllocator();
    shuffleMaster = new TestingShuffleMaster();
    partitionTracker = new TestingJobMasterPartitionTracker();
    timeout = Time.seconds(60);
}
Also used : Configuration(org.apache.flink.configuration.Configuration) TestRestartBackoffTimeStrategy(org.apache.flink.runtime.executiongraph.failover.flip1.TestRestartBackoffTimeStrategy) DirectScheduledExecutorService(org.apache.flink.runtime.testutils.DirectScheduledExecutorService) TestingJobMasterPartitionTracker(org.apache.flink.runtime.io.network.partition.TestingJobMasterPartitionTracker) TestingShuffleMaster(org.apache.flink.runtime.shuffle.TestingShuffleMaster) Before(org.junit.Before)

Example 2 with TestRestartBackoffTimeStrategy

use of org.apache.flink.runtime.executiongraph.failover.flip1.TestRestartBackoffTimeStrategy in project flink by apache.

the class ExecutionGraphRestartTest method testCancelWhileFailing.

@Test
public void testCancelWhileFailing() throws Exception {
    try (SlotPool slotPool = SlotPoolUtils.createDeclarativeSlotPoolBridge()) {
        SchedulerBase scheduler = SchedulerTestingUtils.newSchedulerBuilder(createJobGraph(), mainThreadExecutor).setExecutionSlotAllocatorFactory(createExecutionSlotAllocatorFactory(slotPool)).setRestartBackoffTimeStrategy(new TestRestartBackoffTimeStrategy(false, Long.MAX_VALUE)).build();
        ExecutionGraph graph = scheduler.getExecutionGraph();
        startScheduling(scheduler);
        offerSlots(slotPool, NUM_TASKS);
        assertEquals(JobStatus.RUNNING, graph.getState());
        switchAllTasksToRunning(graph);
        scheduler.handleGlobalFailure(new Exception("test"));
        assertEquals(JobStatus.FAILING, graph.getState());
        scheduler.cancel();
        assertEquals(JobStatus.CANCELLING, graph.getState());
        // let all tasks finish cancelling
        completeCanceling(graph);
        assertEquals(JobStatus.CANCELED, graph.getState());
    }
}
Also used : TestRestartBackoffTimeStrategy(org.apache.flink.runtime.executiongraph.failover.flip1.TestRestartBackoffTimeStrategy) SchedulerBase(org.apache.flink.runtime.scheduler.SchedulerBase) IOException(java.io.IOException) SlotPool(org.apache.flink.runtime.jobmaster.slotpool.SlotPool) Test(org.junit.Test)

Example 3 with TestRestartBackoffTimeStrategy

use of org.apache.flink.runtime.executiongraph.failover.flip1.TestRestartBackoffTimeStrategy in project flink by apache.

the class ExecutionGraphRestartTest method testCancelWhileRestarting.

@Test
public void testCancelWhileRestarting() throws Exception {
    // We want to manually control the restart and delay
    try (SlotPool slotPool = SlotPoolUtils.createDeclarativeSlotPoolBridge()) {
        SchedulerBase scheduler = SchedulerTestingUtils.newSchedulerBuilder(createJobGraph(), mainThreadExecutor).setExecutionSlotAllocatorFactory(createExecutionSlotAllocatorFactory(slotPool)).setRestartBackoffTimeStrategy(new TestRestartBackoffTimeStrategy(true, Long.MAX_VALUE)).setDelayExecutor(taskRestartExecutor).build();
        ExecutionGraph executionGraph = scheduler.getExecutionGraph();
        startScheduling(scheduler);
        final ResourceID taskManagerResourceId = offerSlots(slotPool, NUM_TASKS);
        // Release the TaskManager and wait for the job to restart
        slotPool.releaseTaskManager(taskManagerResourceId, new Exception("Test Exception"));
        assertEquals(JobStatus.RESTARTING, executionGraph.getState());
        // Canceling needs to abort the restart
        scheduler.cancel();
        assertEquals(JobStatus.CANCELED, executionGraph.getState());
        taskRestartExecutor.triggerScheduledTasks();
        assertEquals(JobStatus.CANCELED, executionGraph.getState());
        for (ExecutionVertex vertex : executionGraph.getAllExecutionVertices()) {
            assertEquals(ExecutionState.FAILED, vertex.getExecutionState());
        }
    }
}
Also used : TestRestartBackoffTimeStrategy(org.apache.flink.runtime.executiongraph.failover.flip1.TestRestartBackoffTimeStrategy) ResourceID(org.apache.flink.runtime.clusterframework.types.ResourceID) SchedulerBase(org.apache.flink.runtime.scheduler.SchedulerBase) IOException(java.io.IOException) SlotPool(org.apache.flink.runtime.jobmaster.slotpool.SlotPool) Test(org.junit.Test)

Example 4 with TestRestartBackoffTimeStrategy

use of org.apache.flink.runtime.executiongraph.failover.flip1.TestRestartBackoffTimeStrategy in project flink by apache.

the class ExecutionGraphRestartTest method testFailingExecutionAfterRestart.

/**
 * Tests that a failing execution does not affect a restarted job. This is important if a
 * callback handler fails an execution after it has already reached a final state and the job
 * has been restarted.
 */
@Test
public void testFailingExecutionAfterRestart() throws Exception {
    JobVertex sender = ExecutionGraphTestUtils.createJobVertex("Task1", 1, NoOpInvokable.class);
    JobVertex receiver = ExecutionGraphTestUtils.createJobVertex("Task2", 1, NoOpInvokable.class);
    JobGraph jobGraph = JobGraphTestUtils.streamingJobGraph(sender, receiver);
    try (SlotPool slotPool = SlotPoolUtils.createDeclarativeSlotPoolBridge()) {
        SchedulerBase scheduler = SchedulerTestingUtils.newSchedulerBuilder(jobGraph, mainThreadExecutor).setExecutionSlotAllocatorFactory(createExecutionSlotAllocatorFactory(slotPool)).setRestartBackoffTimeStrategy(new TestRestartBackoffTimeStrategy(true, Long.MAX_VALUE)).setDelayExecutor(taskRestartExecutor).build();
        ExecutionGraph eg = scheduler.getExecutionGraph();
        startScheduling(scheduler);
        offerSlots(slotPool, 2);
        Iterator<ExecutionVertex> executionVertices = eg.getAllExecutionVertices().iterator();
        Execution finishedExecution = executionVertices.next().getCurrentExecutionAttempt();
        Execution failedExecution = executionVertices.next().getCurrentExecutionAttempt();
        finishedExecution.markFinished();
        failedExecution.fail(new Exception("Test Exception"));
        failedExecution.completeCancelling();
        taskRestartExecutor.triggerScheduledTasks();
        assertEquals(JobStatus.RUNNING, eg.getState());
        // At this point all resources have been assigned
        for (ExecutionVertex vertex : eg.getAllExecutionVertices()) {
            assertNotNull("No assigned resource (test instability).", vertex.getCurrentAssignedResource());
            vertex.getCurrentExecutionAttempt().switchToRecovering();
            vertex.getCurrentExecutionAttempt().switchToRunning();
        }
        // fail old finished execution, this should not affect the execution
        finishedExecution.fail(new Exception("This should have no effect"));
        for (ExecutionVertex vertex : eg.getAllExecutionVertices()) {
            vertex.getCurrentExecutionAttempt().markFinished();
        }
        // the state of the finished execution should have not changed since it is terminal
        assertEquals(ExecutionState.FINISHED, finishedExecution.getState());
        assertEquals(JobStatus.FINISHED, eg.getState());
    }
}
Also used : JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) JobVertex(org.apache.flink.runtime.jobgraph.JobVertex) TestRestartBackoffTimeStrategy(org.apache.flink.runtime.executiongraph.failover.flip1.TestRestartBackoffTimeStrategy) SchedulerBase(org.apache.flink.runtime.scheduler.SchedulerBase) IOException(java.io.IOException) SlotPool(org.apache.flink.runtime.jobmaster.slotpool.SlotPool) Test(org.junit.Test)

Example 5 with TestRestartBackoffTimeStrategy

use of org.apache.flink.runtime.executiongraph.failover.flip1.TestRestartBackoffTimeStrategy in project flink by apache.

the class ExecutionGraphRestartTest method testFailWhileCanceling.

@Test
public void testFailWhileCanceling() throws Exception {
    try (SlotPool slotPool = SlotPoolUtils.createDeclarativeSlotPoolBridge()) {
        SchedulerBase scheduler = SchedulerTestingUtils.newSchedulerBuilder(createJobGraph(), mainThreadExecutor).setExecutionSlotAllocatorFactory(createExecutionSlotAllocatorFactory(slotPool)).setRestartBackoffTimeStrategy(new TestRestartBackoffTimeStrategy(false, Long.MAX_VALUE)).build();
        ExecutionGraph graph = scheduler.getExecutionGraph();
        startScheduling(scheduler);
        offerSlots(slotPool, NUM_TASKS);
        assertEquals(JobStatus.RUNNING, graph.getState());
        switchAllTasksToRunning(graph);
        scheduler.cancel();
        assertEquals(JobStatus.CANCELLING, graph.getState());
        scheduler.handleGlobalFailure(new Exception("test"));
        assertEquals(JobStatus.FAILING, graph.getState());
        // let all tasks finish cancelling
        completeCanceling(graph);
        assertEquals(JobStatus.FAILED, graph.getState());
    }
}
Also used : TestRestartBackoffTimeStrategy(org.apache.flink.runtime.executiongraph.failover.flip1.TestRestartBackoffTimeStrategy) SchedulerBase(org.apache.flink.runtime.scheduler.SchedulerBase) IOException(java.io.IOException) SlotPool(org.apache.flink.runtime.jobmaster.slotpool.SlotPool) Test(org.junit.Test)

Aggregations

TestRestartBackoffTimeStrategy (org.apache.flink.runtime.executiongraph.failover.flip1.TestRestartBackoffTimeStrategy)8 Test (org.junit.Test)7 IOException (java.io.IOException)6 SchedulerBase (org.apache.flink.runtime.scheduler.SchedulerBase)6 SlotPool (org.apache.flink.runtime.jobmaster.slotpool.SlotPool)5 ExecutionException (java.util.concurrent.ExecutionException)1 Configuration (org.apache.flink.configuration.Configuration)1 CheckpointException (org.apache.flink.runtime.checkpoint.CheckpointException)1 ResourceID (org.apache.flink.runtime.clusterframework.types.ResourceID)1 SuppressRestartsException (org.apache.flink.runtime.execution.SuppressRestartsException)1 ArchivedExecutionGraphTest (org.apache.flink.runtime.executiongraph.ArchivedExecutionGraphTest)1 TestingJobMasterPartitionTracker (org.apache.flink.runtime.io.network.partition.TestingJobMasterPartitionTracker)1 JobGraph (org.apache.flink.runtime.jobgraph.JobGraph)1 JobVertex (org.apache.flink.runtime.jobgraph.JobVertex)1 PartitionProducerDisposedException (org.apache.flink.runtime.jobmanager.PartitionProducerDisposedException)1 TaskNotRunningException (org.apache.flink.runtime.operators.coordination.TaskNotRunningException)1 DefaultSchedulerTest (org.apache.flink.runtime.scheduler.DefaultSchedulerTest)1 TestingShuffleMaster (org.apache.flink.runtime.shuffle.TestingShuffleMaster)1 DirectScheduledExecutorService (org.apache.flink.runtime.testutils.DirectScheduledExecutorService)1 FlinkException (org.apache.flink.util.FlinkException)1