Search in sources :

Example 6 with CheckpointCoordinator

use of org.apache.flink.runtime.checkpoint.CheckpointCoordinator in project flink by apache.

the class ExecutionGraph method enableCheckpointing.

public void enableCheckpointing(long interval, long checkpointTimeout, long minPauseBetweenCheckpoints, int maxConcurrentCheckpoints, ExternalizedCheckpointSettings externalizeSettings, List<ExecutionJobVertex> verticesToTrigger, List<ExecutionJobVertex> verticesToWaitFor, List<ExecutionJobVertex> verticesToCommitTo, CheckpointIDCounter checkpointIDCounter, CompletedCheckpointStore checkpointStore, String checkpointDir, StateBackend metadataStore, CheckpointStatsTracker statsTracker) {
    // simple sanity checks
    if (interval < 10 || checkpointTimeout < 10) {
        throw new IllegalArgumentException();
    }
    if (state != JobStatus.CREATED) {
        throw new IllegalStateException("Job must be in CREATED state");
    }
    ExecutionVertex[] tasksToTrigger = collectExecutionVertices(verticesToTrigger);
    ExecutionVertex[] tasksToWaitFor = collectExecutionVertices(verticesToWaitFor);
    ExecutionVertex[] tasksToCommitTo = collectExecutionVertices(verticesToCommitTo);
    // disable to make sure existing checkpoint coordinators are cleared
    try {
        disableSnaphotCheckpointing();
    } catch (Throwable t) {
        LOG.error("Error while shutting down checkpointer.");
    }
    checkpointStatsTracker = checkNotNull(statsTracker, "CheckpointStatsTracker");
    // create the coordinator that triggers and commits checkpoints and holds the state
    checkpointCoordinator = new CheckpointCoordinator(jobInformation.getJobId(), interval, checkpointTimeout, minPauseBetweenCheckpoints, maxConcurrentCheckpoints, externalizeSettings, tasksToTrigger, tasksToWaitFor, tasksToCommitTo, checkpointIDCounter, checkpointStore, checkpointDir, ioExecutor);
    checkpointCoordinator.setCheckpointStatsTracker(checkpointStatsTracker);
    // the CheckpointActivatorDeactivator should be created only if the interval is not max value
    if (interval != Long.MAX_VALUE) {
        // the periodic checkpoint scheduler is activated and deactivated as a result of
        // job status changes (running -> on, all other states -> off)
        registerJobStatusListener(checkpointCoordinator.createActivatorDeactivator());
    }
}
Also used : CheckpointCoordinator(org.apache.flink.runtime.checkpoint.CheckpointCoordinator) SerializedThrowable(org.apache.flink.runtime.util.SerializedThrowable)

Example 7 with CheckpointCoordinator

use of org.apache.flink.runtime.checkpoint.CheckpointCoordinator in project flink by apache.

the class JobMaster method declineCheckpoint.

// TODO: This method needs a leader session ID
@RpcMethod
public void declineCheckpoint(final JobID jobID, final ExecutionAttemptID executionAttemptID, final long checkpointID, final Throwable reason) {
    final DeclineCheckpoint decline = new DeclineCheckpoint(jobID, executionAttemptID, checkpointID, reason);
    final CheckpointCoordinator checkpointCoordinator = executionGraph.getCheckpointCoordinator();
    if (checkpointCoordinator != null) {
        getRpcService().execute(new Runnable() {

            @Override
            public void run() {
                try {
                    checkpointCoordinator.receiveDeclineMessage(decline);
                } catch (Exception e) {
                    log.error("Error in CheckpointCoordinator while processing {}", decline, e);
                }
            }
        });
    } else {
        log.error("Received DeclineCheckpoint message for job {} with no CheckpointCoordinator", jobGraph.getJobID());
    }
}
Also used : DeclineCheckpoint(org.apache.flink.runtime.messages.checkpoint.DeclineCheckpoint) CheckpointCoordinator(org.apache.flink.runtime.checkpoint.CheckpointCoordinator) TimeoutException(java.util.concurrent.TimeoutException) CheckpointException(org.apache.flink.runtime.checkpoint.CheckpointException) LeaderIdMismatchException(org.apache.flink.runtime.highavailability.LeaderIdMismatchException) PartitionProducerDisposedException(org.apache.flink.runtime.jobmanager.PartitionProducerDisposedException) JobExecutionException(org.apache.flink.runtime.client.JobExecutionException) IOException(java.io.IOException) RpcMethod(org.apache.flink.runtime.rpc.RpcMethod)

Example 8 with CheckpointCoordinator

use of org.apache.flink.runtime.checkpoint.CheckpointCoordinator in project flink by apache.

the class JobMaster method acknowledgeCheckpoint.

// TODO: This method needs a leader session ID
@RpcMethod
public void acknowledgeCheckpoint(final JobID jobID, final ExecutionAttemptID executionAttemptID, final long checkpointId, final CheckpointMetrics checkpointMetrics, final SubtaskState checkpointState) throws CheckpointException {
    final CheckpointCoordinator checkpointCoordinator = executionGraph.getCheckpointCoordinator();
    final AcknowledgeCheckpoint ackMessage = new AcknowledgeCheckpoint(jobID, executionAttemptID, checkpointId, checkpointMetrics, checkpointState);
    if (checkpointCoordinator != null) {
        getRpcService().execute(new Runnable() {

            @Override
            public void run() {
                try {
                    checkpointCoordinator.receiveAcknowledgeMessage(ackMessage);
                } catch (Throwable t) {
                    log.warn("Error while processing checkpoint acknowledgement message");
                }
            }
        });
    } else {
        log.error("Received AcknowledgeCheckpoint message for job {} with no CheckpointCoordinator", jobGraph.getJobID());
    }
}
Also used : AcknowledgeCheckpoint(org.apache.flink.runtime.messages.checkpoint.AcknowledgeCheckpoint) CheckpointCoordinator(org.apache.flink.runtime.checkpoint.CheckpointCoordinator) SerializedThrowable(org.apache.flink.runtime.util.SerializedThrowable) RpcMethod(org.apache.flink.runtime.rpc.RpcMethod)

Aggregations

CheckpointCoordinator (org.apache.flink.runtime.checkpoint.CheckpointCoordinator)8 HashMap (java.util.HashMap)4 JobID (org.apache.flink.api.common.JobID)4 ExecutionGraph (org.apache.flink.runtime.executiongraph.ExecutionGraph)4 ActorGateway (org.apache.flink.runtime.instance.ActorGateway)4 CancelJobWithSavepoint (org.apache.flink.runtime.messages.JobManagerMessages.CancelJobWithSavepoint)4 ExecutionGraphHolder (org.apache.flink.runtime.webmonitor.ExecutionGraphHolder)4 Test (org.junit.Test)4 FiniteDuration (scala.concurrent.duration.FiniteDuration)4 CancellationSuccess (org.apache.flink.runtime.messages.JobManagerMessages.CancellationSuccess)3 FullHttpResponse (io.netty.handler.codec.http.FullHttpResponse)2 IOException (java.io.IOException)2 RpcMethod (org.apache.flink.runtime.rpc.RpcMethod)2 SerializedThrowable (org.apache.flink.runtime.util.SerializedThrowable)2 JsonNode (org.codehaus.jackson.JsonNode)2 ObjectMapper (org.codehaus.jackson.map.ObjectMapper)2 NoSuchElementException (java.util.NoSuchElementException)1 ExecutionException (java.util.concurrent.ExecutionException)1 TimeoutException (java.util.concurrent.TimeoutException)1 JobException (org.apache.flink.runtime.JobException)1