Search in sources :

Example 51 with ExecutionAttemptID

use of org.apache.flink.runtime.executiongraph.ExecutionAttemptID in project flink by apache.

the class CheckpointCoordinatorTest method testMinDelayBetweenSavepoints.

/**
	 * Tests that no minimum delay between savepoints is enforced.
	 */
@Test
public void testMinDelayBetweenSavepoints() throws Exception {
    JobID jobId = new JobID();
    final ExecutionAttemptID attemptID1 = new ExecutionAttemptID();
    ExecutionVertex vertex1 = mockExecutionVertex(attemptID1);
    CheckpointCoordinator coord = new CheckpointCoordinator(jobId, 100000, 200000, // very long min delay => should not affect savepoints
    100000000L, 1, ExternalizedCheckpointSettings.none(), new ExecutionVertex[] { vertex1 }, new ExecutionVertex[] { vertex1 }, new ExecutionVertex[] { vertex1 }, new StandaloneCheckpointIDCounter(), new StandaloneCompletedCheckpointStore(2), null, Executors.directExecutor());
    String savepointDir = tmpFolder.newFolder().getAbsolutePath();
    Future<CompletedCheckpoint> savepoint0 = coord.triggerSavepoint(0, savepointDir);
    assertFalse("Did not trigger savepoint", savepoint0.isDone());
    Future<CompletedCheckpoint> savepoint1 = coord.triggerSavepoint(1, savepointDir);
    assertFalse("Did not trigger savepoint", savepoint1.isDone());
}
Also used : ExecutionAttemptID(org.apache.flink.runtime.executiongraph.ExecutionAttemptID) JobID(org.apache.flink.api.common.JobID) ExecutionVertex(org.apache.flink.runtime.executiongraph.ExecutionVertex) Test(org.junit.Test)

Example 52 with ExecutionAttemptID

use of org.apache.flink.runtime.executiongraph.ExecutionAttemptID in project flink by apache.

the class CheckpointCoordinatorTest method testTriggerAndDeclineCheckpointSimple.

/**
	 * This test triggers a checkpoint and then sends a decline checkpoint message from
	 * one of the tasks. The expected behaviour is that said checkpoint is discarded and a new
	 * checkpoint is triggered.
	 */
@Test
public void testTriggerAndDeclineCheckpointSimple() {
    try {
        final JobID jid = new JobID();
        final long timestamp = System.currentTimeMillis();
        // create some mock Execution vertices that receive the checkpoint trigger messages
        final ExecutionAttemptID attemptID1 = new ExecutionAttemptID();
        final ExecutionAttemptID attemptID2 = new ExecutionAttemptID();
        ExecutionVertex vertex1 = mockExecutionVertex(attemptID1);
        ExecutionVertex vertex2 = mockExecutionVertex(attemptID2);
        // set up the coordinator and validate the initial state
        CheckpointCoordinator coord = new CheckpointCoordinator(jid, 600000, 600000, 0, Integer.MAX_VALUE, ExternalizedCheckpointSettings.none(), new ExecutionVertex[] { vertex1, vertex2 }, new ExecutionVertex[] { vertex1, vertex2 }, new ExecutionVertex[] { vertex1, vertex2 }, new StandaloneCheckpointIDCounter(), new StandaloneCompletedCheckpointStore(1), null, Executors.directExecutor());
        assertEquals(0, coord.getNumberOfPendingCheckpoints());
        assertEquals(0, coord.getNumberOfRetainedSuccessfulCheckpoints());
        // trigger the first checkpoint. this should succeed
        assertTrue(coord.triggerCheckpoint(timestamp, false));
        // validate that we have a pending checkpoint
        assertEquals(1, coord.getNumberOfPendingCheckpoints());
        assertEquals(0, coord.getNumberOfRetainedSuccessfulCheckpoints());
        // we have one task scheduled that will cancel after timeout
        assertEquals(1, coord.getNumScheduledTasks());
        long checkpointId = coord.getPendingCheckpoints().entrySet().iterator().next().getKey();
        PendingCheckpoint checkpoint = coord.getPendingCheckpoints().get(checkpointId);
        assertNotNull(checkpoint);
        assertEquals(checkpointId, checkpoint.getCheckpointId());
        assertEquals(timestamp, checkpoint.getCheckpointTimestamp());
        assertEquals(jid, checkpoint.getJobId());
        assertEquals(2, checkpoint.getNumberOfNonAcknowledgedTasks());
        assertEquals(0, checkpoint.getNumberOfAcknowledgedTasks());
        assertEquals(0, checkpoint.getTaskStates().size());
        assertFalse(checkpoint.isDiscarded());
        assertFalse(checkpoint.isFullyAcknowledged());
        // check that the vertices received the trigger checkpoint message
        verify(vertex1.getCurrentExecutionAttempt()).triggerCheckpoint(checkpointId, timestamp, CheckpointOptions.forFullCheckpoint());
        verify(vertex2.getCurrentExecutionAttempt()).triggerCheckpoint(checkpointId, timestamp, CheckpointOptions.forFullCheckpoint());
        CheckpointMetaData checkpointMetaData = new CheckpointMetaData(checkpointId, 0L);
        // acknowledge from one of the tasks
        coord.receiveAcknowledgeMessage(new AcknowledgeCheckpoint(jid, attemptID2, checkpointId));
        assertEquals(1, checkpoint.getNumberOfAcknowledgedTasks());
        assertEquals(1, checkpoint.getNumberOfNonAcknowledgedTasks());
        assertFalse(checkpoint.isDiscarded());
        assertFalse(checkpoint.isFullyAcknowledged());
        // acknowledge the same task again (should not matter)
        coord.receiveAcknowledgeMessage(new AcknowledgeCheckpoint(jid, attemptID2, checkpointId));
        assertFalse(checkpoint.isDiscarded());
        assertFalse(checkpoint.isFullyAcknowledged());
        // decline checkpoint from the other task, this should cancel the checkpoint
        // and trigger a new one
        coord.receiveDeclineMessage(new DeclineCheckpoint(jid, attemptID1, checkpointId));
        assertTrue(checkpoint.isDiscarded());
        // the canceler is also removed
        assertEquals(0, coord.getNumScheduledTasks());
        // validate that we have no new pending checkpoint
        assertEquals(0, coord.getNumberOfPendingCheckpoints());
        assertEquals(0, coord.getNumberOfRetainedSuccessfulCheckpoints());
        // decline again, nothing should happen
        // decline from the other task, nothing should happen
        coord.receiveDeclineMessage(new DeclineCheckpoint(jid, attemptID1, checkpointId));
        coord.receiveDeclineMessage(new DeclineCheckpoint(jid, attemptID2, checkpointId));
        assertTrue(checkpoint.isDiscarded());
        coord.shutdown(JobStatus.FINISHED);
    } catch (Exception e) {
        e.printStackTrace();
        fail(e.getMessage());
    }
}
Also used : DeclineCheckpoint(org.apache.flink.runtime.messages.checkpoint.DeclineCheckpoint) AcknowledgeCheckpoint(org.apache.flink.runtime.messages.checkpoint.AcknowledgeCheckpoint) ExecutionAttemptID(org.apache.flink.runtime.executiongraph.ExecutionAttemptID) JobID(org.apache.flink.api.common.JobID) ExecutionVertex(org.apache.flink.runtime.executiongraph.ExecutionVertex) IOException(java.io.IOException) Test(org.junit.Test)

Example 53 with ExecutionAttemptID

use of org.apache.flink.runtime.executiongraph.ExecutionAttemptID in project flink by apache.

the class CheckpointCoordinatorTest method testTriggerAndConfirmSimpleCheckpoint.

@Test
public void testTriggerAndConfirmSimpleCheckpoint() {
    try {
        final JobID jid = new JobID();
        final long timestamp = System.currentTimeMillis();
        // create some mock Execution vertices that receive the checkpoint trigger messages
        final ExecutionAttemptID attemptID1 = new ExecutionAttemptID();
        final ExecutionAttemptID attemptID2 = new ExecutionAttemptID();
        ExecutionVertex vertex1 = mockExecutionVertex(attemptID1);
        ExecutionVertex vertex2 = mockExecutionVertex(attemptID2);
        // set up the coordinator and validate the initial state
        CheckpointCoordinator coord = new CheckpointCoordinator(jid, 600000, 600000, 0, Integer.MAX_VALUE, ExternalizedCheckpointSettings.none(), new ExecutionVertex[] { vertex1, vertex2 }, new ExecutionVertex[] { vertex1, vertex2 }, new ExecutionVertex[] { vertex1, vertex2 }, new StandaloneCheckpointIDCounter(), new StandaloneCompletedCheckpointStore(1), null, Executors.directExecutor());
        assertEquals(0, coord.getNumberOfPendingCheckpoints());
        assertEquals(0, coord.getNumberOfRetainedSuccessfulCheckpoints());
        assertEquals(0, coord.getNumScheduledTasks());
        // trigger the first checkpoint. this should succeed
        assertTrue(coord.triggerCheckpoint(timestamp, false));
        // validate that we have a pending checkpoint
        assertEquals(1, coord.getNumberOfPendingCheckpoints());
        assertEquals(0, coord.getNumberOfRetainedSuccessfulCheckpoints());
        assertEquals(1, coord.getNumScheduledTasks());
        long checkpointId = coord.getPendingCheckpoints().entrySet().iterator().next().getKey();
        PendingCheckpoint checkpoint = coord.getPendingCheckpoints().get(checkpointId);
        assertNotNull(checkpoint);
        assertEquals(checkpointId, checkpoint.getCheckpointId());
        assertEquals(timestamp, checkpoint.getCheckpointTimestamp());
        assertEquals(jid, checkpoint.getJobId());
        assertEquals(2, checkpoint.getNumberOfNonAcknowledgedTasks());
        assertEquals(0, checkpoint.getNumberOfAcknowledgedTasks());
        assertEquals(0, checkpoint.getTaskStates().size());
        assertFalse(checkpoint.isDiscarded());
        assertFalse(checkpoint.isFullyAcknowledged());
        // check that the vertices received the trigger checkpoint message
        {
            verify(vertex1.getCurrentExecutionAttempt(), times(1)).triggerCheckpoint(eq(checkpointId), eq(timestamp), any(CheckpointOptions.class));
            verify(vertex2.getCurrentExecutionAttempt(), times(1)).triggerCheckpoint(eq(checkpointId), eq(timestamp), any(CheckpointOptions.class));
        }
        // acknowledge from one of the tasks
        coord.receiveAcknowledgeMessage(new AcknowledgeCheckpoint(jid, attemptID2, checkpointId));
        assertEquals(1, checkpoint.getNumberOfAcknowledgedTasks());
        assertEquals(1, checkpoint.getNumberOfNonAcknowledgedTasks());
        assertFalse(checkpoint.isDiscarded());
        assertFalse(checkpoint.isFullyAcknowledged());
        // acknowledge the same task again (should not matter)
        coord.receiveAcknowledgeMessage(new AcknowledgeCheckpoint(jid, attemptID2, checkpointId));
        assertFalse(checkpoint.isDiscarded());
        assertFalse(checkpoint.isFullyAcknowledged());
        // acknowledge the other task.
        coord.receiveAcknowledgeMessage(new AcknowledgeCheckpoint(jid, attemptID1, checkpointId));
        // the checkpoint is internally converted to a successful checkpoint and the
        // pending checkpoint object is disposed
        assertTrue(checkpoint.isDiscarded());
        // the now we should have a completed checkpoint
        assertEquals(1, coord.getNumberOfRetainedSuccessfulCheckpoints());
        assertEquals(0, coord.getNumberOfPendingCheckpoints());
        // the canceler should be removed now
        assertEquals(0, coord.getNumScheduledTasks());
        // validate that the relevant tasks got a confirmation message
        {
            verify(vertex1.getCurrentExecutionAttempt(), times(1)).triggerCheckpoint(eq(checkpointId), eq(timestamp), any(CheckpointOptions.class));
            verify(vertex2.getCurrentExecutionAttempt(), times(1)).triggerCheckpoint(eq(checkpointId), eq(timestamp), any(CheckpointOptions.class));
        }
        CompletedCheckpoint success = coord.getSuccessfulCheckpoints().get(0);
        assertEquals(jid, success.getJobId());
        assertEquals(timestamp, success.getTimestamp());
        assertEquals(checkpoint.getCheckpointId(), success.getCheckpointID());
        assertTrue(success.getTaskStates().isEmpty());
        // ---------------
        // trigger another checkpoint and see that this one replaces the other checkpoint
        // ---------------
        final long timestampNew = timestamp + 7;
        coord.triggerCheckpoint(timestampNew, false);
        long checkpointIdNew = coord.getPendingCheckpoints().entrySet().iterator().next().getKey();
        coord.receiveAcknowledgeMessage(new AcknowledgeCheckpoint(jid, attemptID1, checkpointIdNew));
        coord.receiveAcknowledgeMessage(new AcknowledgeCheckpoint(jid, attemptID2, checkpointIdNew));
        assertEquals(0, coord.getNumberOfPendingCheckpoints());
        assertEquals(1, coord.getNumberOfRetainedSuccessfulCheckpoints());
        assertEquals(0, coord.getNumScheduledTasks());
        CompletedCheckpoint successNew = coord.getSuccessfulCheckpoints().get(0);
        assertEquals(jid, successNew.getJobId());
        assertEquals(timestampNew, successNew.getTimestamp());
        assertEquals(checkpointIdNew, successNew.getCheckpointID());
        assertTrue(successNew.getTaskStates().isEmpty());
        // validate that the relevant tasks got a confirmation message
        {
            verify(vertex1.getCurrentExecutionAttempt(), times(1)).triggerCheckpoint(eq(checkpointIdNew), eq(timestampNew), any(CheckpointOptions.class));
            verify(vertex2.getCurrentExecutionAttempt(), times(1)).triggerCheckpoint(eq(checkpointIdNew), eq(timestampNew), any(CheckpointOptions.class));
            verify(vertex1.getCurrentExecutionAttempt(), times(1)).notifyCheckpointComplete(eq(checkpointIdNew), eq(timestampNew));
            verify(vertex2.getCurrentExecutionAttempt(), times(1)).notifyCheckpointComplete(eq(checkpointIdNew), eq(timestampNew));
        }
        coord.shutdown(JobStatus.FINISHED);
    } catch (Exception e) {
        e.printStackTrace();
        fail(e.getMessage());
    }
}
Also used : AcknowledgeCheckpoint(org.apache.flink.runtime.messages.checkpoint.AcknowledgeCheckpoint) ExecutionAttemptID(org.apache.flink.runtime.executiongraph.ExecutionAttemptID) JobID(org.apache.flink.api.common.JobID) ExecutionVertex(org.apache.flink.runtime.executiongraph.ExecutionVertex) IOException(java.io.IOException) Test(org.junit.Test)

Example 54 with ExecutionAttemptID

use of org.apache.flink.runtime.executiongraph.ExecutionAttemptID in project flink by apache.

the class CheckpointCoordinatorTest method testPeriodicSchedulingWithInactiveTasks.

@Test
public void testPeriodicSchedulingWithInactiveTasks() {
    try {
        final JobID jid = new JobID();
        // create some mock execution vertices and trigger some checkpoint
        final ExecutionAttemptID triggerAttemptID = new ExecutionAttemptID();
        final ExecutionAttemptID ackAttemptID = new ExecutionAttemptID();
        final ExecutionAttemptID commitAttemptID = new ExecutionAttemptID();
        ExecutionVertex triggerVertex = mockExecutionVertex(triggerAttemptID);
        ExecutionVertex ackVertex = mockExecutionVertex(ackAttemptID);
        ExecutionVertex commitVertex = mockExecutionVertex(commitAttemptID);
        final AtomicReference<ExecutionState> currentState = new AtomicReference<>(ExecutionState.CREATED);
        when(triggerVertex.getCurrentExecutionAttempt().getState()).thenAnswer(new Answer<ExecutionState>() {

            @Override
            public ExecutionState answer(InvocationOnMock invocation) {
                return currentState.get();
            }
        });
        CheckpointCoordinator coord = new CheckpointCoordinator(jid, // periodic interval is 10 ms
        10, // timeout is very long (200 s)
        200000, // no extra delay
        0L, // max two concurrent checkpoints
        2, ExternalizedCheckpointSettings.none(), new ExecutionVertex[] { triggerVertex }, new ExecutionVertex[] { ackVertex }, new ExecutionVertex[] { commitVertex }, new StandaloneCheckpointIDCounter(), new StandaloneCompletedCheckpointStore(2), null, Executors.directExecutor());
        coord.startCheckpointScheduler();
        // no checkpoint should have started so far
        Thread.sleep(200);
        assertEquals(0, coord.getNumberOfPendingCheckpoints());
        // now move the state to RUNNING
        currentState.set(ExecutionState.RUNNING);
        // the coordinator should start checkpointing now
        final long timeout = System.currentTimeMillis() + 10000;
        do {
            Thread.sleep(20);
        } while (System.currentTimeMillis() < timeout && coord.getNumberOfPendingCheckpoints() == 0);
        assertTrue(coord.getNumberOfPendingCheckpoints() > 0);
    } catch (Exception e) {
        e.printStackTrace();
        fail(e.getMessage());
    }
}
Also used : ExecutionState(org.apache.flink.runtime.execution.ExecutionState) ExecutionAttemptID(org.apache.flink.runtime.executiongraph.ExecutionAttemptID) AtomicReference(java.util.concurrent.atomic.AtomicReference) ExecutionVertex(org.apache.flink.runtime.executiongraph.ExecutionVertex) IOException(java.io.IOException) InvocationOnMock(org.mockito.invocation.InvocationOnMock) JobID(org.apache.flink.api.common.JobID) Test(org.junit.Test)

Example 55 with ExecutionAttemptID

use of org.apache.flink.runtime.executiongraph.ExecutionAttemptID in project flink by apache.

the class CheckpointStateRestoreTest method mockExecution.

private Execution mockExecution(ExecutionState state) {
    Execution mock = mock(Execution.class);
    when(mock.getAttemptId()).thenReturn(new ExecutionAttemptID());
    when(mock.getState()).thenReturn(state);
    return mock;
}
Also used : Execution(org.apache.flink.runtime.executiongraph.Execution) ExecutionAttemptID(org.apache.flink.runtime.executiongraph.ExecutionAttemptID)

Aggregations

ExecutionAttemptID (org.apache.flink.runtime.executiongraph.ExecutionAttemptID)81 Test (org.junit.Test)66 JobID (org.apache.flink.api.common.JobID)61 ExecutionVertex (org.apache.flink.runtime.executiongraph.ExecutionVertex)41 IOException (java.io.IOException)31 JobVertexID (org.apache.flink.runtime.jobgraph.JobVertexID)30 Configuration (org.apache.flink.configuration.Configuration)24 ExecutionConfig (org.apache.flink.api.common.ExecutionConfig)21 AcknowledgeCheckpoint (org.apache.flink.runtime.messages.checkpoint.AcknowledgeCheckpoint)19 ActorGateway (org.apache.flink.runtime.instance.ActorGateway)17 AkkaActorGateway (org.apache.flink.runtime.instance.AkkaActorGateway)16 TaskDeploymentDescriptor (org.apache.flink.runtime.deployment.TaskDeploymentDescriptor)15 TaskManagerServicesConfiguration (org.apache.flink.runtime.taskexecutor.TaskManagerServicesConfiguration)14 ActorRef (akka.actor.ActorRef)13 SubmitTask (org.apache.flink.runtime.messages.TaskMessages.SubmitTask)13 JavaTestKit (akka.testkit.JavaTestKit)12 BlobKey (org.apache.flink.runtime.blob.BlobKey)10 TriggerStackTraceSample (org.apache.flink.runtime.messages.StackTraceSampleMessages.TriggerStackTraceSample)10 PartitionNotFoundException (org.apache.flink.runtime.io.network.partition.PartitionNotFoundException)9 ResultPartitionID (org.apache.flink.runtime.io.network.partition.ResultPartitionID)9