Search in sources :

Example 1 with TriggerSavepointSuccess

use of org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepointSuccess in project flink by apache.

the class CliFrontendSavepointTest method testTriggerSavepointSuccess.

// ------------------------------------------------------------------------
// Trigger savepoint
// ------------------------------------------------------------------------
@Test
public void testTriggerSavepointSuccess() throws Exception {
    replaceStdOutAndStdErr();
    try {
        JobID jobId = new JobID();
        ActorGateway jobManager = mock(ActorGateway.class);
        Promise<Object> triggerResponse = new scala.concurrent.impl.Promise.DefaultPromise<>();
        when(jobManager.ask(Mockito.eq(new TriggerSavepoint(jobId, Option.<String>empty())), any(FiniteDuration.class))).thenReturn(triggerResponse.future());
        String savepointPath = "expectedSavepointPath";
        triggerResponse.success(new TriggerSavepointSuccess(jobId, -1, savepointPath, -1));
        CliFrontend frontend = new MockCliFrontend(CliFrontendTestUtils.getConfigDir(), jobManager);
        String[] parameters = { jobId.toString() };
        int returnCode = frontend.savepoint(parameters);
        assertEquals(0, returnCode);
        verify(jobManager, times(1)).ask(Mockito.eq(new TriggerSavepoint(jobId, Option.<String>empty())), any(FiniteDuration.class));
        assertTrue(buffer.toString().contains("expectedSavepointPath"));
    } finally {
        restoreStdOutAndStdErr();
    }
}
Also used : FiniteDuration(scala.concurrent.duration.FiniteDuration) TriggerSavepoint(org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepoint) DisposeSavepoint(org.apache.flink.runtime.messages.JobManagerMessages.DisposeSavepoint) TriggerSavepointSuccess(org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepointSuccess) ActorGateway(org.apache.flink.runtime.instance.ActorGateway) TriggerSavepoint(org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepoint) JobID(org.apache.flink.api.common.JobID) Test(org.junit.Test)

Example 2 with TriggerSavepointSuccess

use of org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepointSuccess in project flink by apache.

the class CliFrontend method triggerSavepoint.

/**
	 * Sends a {@link org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepoint}
	 * message to the job manager.
	 */
private int triggerSavepoint(SavepointOptions options, JobID jobId, String savepointDirectory) {
    try {
        ActorGateway jobManager = getJobManagerGateway(options);
        logAndSysout("Triggering savepoint for job " + jobId + ".");
        Future<Object> response = jobManager.ask(new TriggerSavepoint(jobId, Option.apply(savepointDirectory)), new FiniteDuration(1, TimeUnit.HOURS));
        Object result;
        try {
            logAndSysout("Waiting for response...");
            result = Await.result(response, FiniteDuration.Inf());
        } catch (Exception e) {
            throw new Exception("Triggering a savepoint for the job " + jobId + " failed.", e);
        }
        if (result instanceof TriggerSavepointSuccess) {
            TriggerSavepointSuccess success = (TriggerSavepointSuccess) result;
            logAndSysout("Savepoint completed. Path: " + success.savepointPath());
            logAndSysout("You can resume your program from this savepoint with the run command.");
            return 0;
        } else if (result instanceof TriggerSavepointFailure) {
            TriggerSavepointFailure failure = (TriggerSavepointFailure) result;
            throw failure.cause();
        } else {
            throw new IllegalStateException("Unknown JobManager response of type " + result.getClass());
        }
    } catch (Throwable t) {
        return handleError(t);
    }
}
Also used : TriggerSavepointFailure(org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepointFailure) ActorGateway(org.apache.flink.runtime.instance.ActorGateway) TriggerSavepoint(org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepoint) FiniteDuration(scala.concurrent.duration.FiniteDuration) ProgramInvocationException(org.apache.flink.client.program.ProgramInvocationException) ProgramMissingJobException(org.apache.flink.client.program.ProgramMissingJobException) InvalidProgramException(org.apache.flink.api.common.InvalidProgramException) ProgramParametrizationException(org.apache.flink.client.program.ProgramParametrizationException) FileNotFoundException(java.io.FileNotFoundException) InvocationTargetException(java.lang.reflect.InvocationTargetException) IllegalConfigurationException(org.apache.flink.configuration.IllegalConfigurationException) CliArgsException(org.apache.flink.client.cli.CliArgsException) IOException(java.io.IOException) TriggerSavepointSuccess(org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepointSuccess)

Example 3 with TriggerSavepointSuccess

use of org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepointSuccess in project flink by apache.

the class CliFrontendSavepointTest method testTriggerSavepointCustomTarget.

/**
	 * Tests that a CLI call with a custom savepoint directory target is
	 * forwarded correctly to the JM.
	 */
@Test
public void testTriggerSavepointCustomTarget() throws Exception {
    replaceStdOutAndStdErr();
    try {
        JobID jobId = new JobID();
        Option<String> customTarget = Option.apply("customTargetDirectory");
        ActorGateway jobManager = mock(ActorGateway.class);
        Promise<Object> triggerResponse = new scala.concurrent.impl.Promise.DefaultPromise<>();
        when(jobManager.ask(Mockito.eq(new TriggerSavepoint(jobId, customTarget)), any(FiniteDuration.class))).thenReturn(triggerResponse.future());
        String savepointPath = "expectedSavepointPath";
        triggerResponse.success(new TriggerSavepointSuccess(jobId, -1, savepointPath, -1));
        CliFrontend frontend = new MockCliFrontend(CliFrontendTestUtils.getConfigDir(), jobManager);
        String[] parameters = { jobId.toString(), customTarget.get() };
        int returnCode = frontend.savepoint(parameters);
        assertEquals(0, returnCode);
        verify(jobManager, times(1)).ask(Mockito.eq(new TriggerSavepoint(jobId, customTarget)), any(FiniteDuration.class));
        assertTrue(buffer.toString().contains("expectedSavepointPath"));
    } finally {
        restoreStdOutAndStdErr();
    }
}
Also used : FiniteDuration(scala.concurrent.duration.FiniteDuration) TriggerSavepoint(org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepoint) DisposeSavepoint(org.apache.flink.runtime.messages.JobManagerMessages.DisposeSavepoint) TriggerSavepointSuccess(org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepointSuccess) ActorGateway(org.apache.flink.runtime.instance.ActorGateway) TriggerSavepoint(org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepoint) JobID(org.apache.flink.api.common.JobID) Test(org.junit.Test)

Example 4 with TriggerSavepointSuccess

use of org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepointSuccess in project flink by apache.

the class SavepointITCase method testTriggerSavepointAndResumeWithFileBasedCheckpoints.

/**
	 * Triggers a savepoint for a job that uses the FsStateBackend. We expect
	 * that all checkpoint files are written to a new savepoint directory.
	 *
	 * <ol>
	 * <li>Submit job, wait for some progress</li>
	 * <li>Trigger savepoint and verify that savepoint has been created</li>
	 * <li>Shut down the cluster, re-submit the job from the savepoint,
	 * verify that the initial state has been reset, and
	 * all tasks are running again</li>
	 * <li>Cancel job, dispose the savepoint, and verify that everything
	 * has been cleaned up</li>
	 * </ol>
	 */
@Test
public void testTriggerSavepointAndResumeWithFileBasedCheckpoints() throws Exception {
    // Config
    final int numTaskManagers = 2;
    final int numSlotsPerTaskManager = 2;
    final int parallelism = numTaskManagers * numSlotsPerTaskManager;
    final Deadline deadline = new FiniteDuration(5, TimeUnit.MINUTES).fromNow();
    final File testRoot = folder.newFolder();
    TestingCluster flink = null;
    try {
        // Create a test actor system
        ActorSystem testActorSystem = AkkaUtils.createDefaultActorSystem();
        // Flink configuration
        final Configuration config = new Configuration();
        config.setInteger(ConfigConstants.LOCAL_NUMBER_TASK_MANAGER, numTaskManagers);
        config.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, numSlotsPerTaskManager);
        final File checkpointDir = new File(testRoot, "checkpoints");
        final File savepointRootDir = new File(testRoot, "savepoints");
        if (!checkpointDir.mkdir() || !savepointRootDir.mkdirs()) {
            fail("Test setup failed: failed to create temporary directories.");
        }
        // Use file based checkpoints
        config.setString(CoreOptions.STATE_BACKEND, "filesystem");
        config.setString(FsStateBackendFactory.CHECKPOINT_DIRECTORY_URI_CONF_KEY, checkpointDir.toURI().toString());
        config.setString(FsStateBackendFactory.MEMORY_THRESHOLD_CONF_KEY, "0");
        config.setString(ConfigConstants.SAVEPOINT_DIRECTORY_KEY, savepointRootDir.toURI().toString());
        // Start Flink
        flink = new TestingCluster(config);
        flink.start(true);
        // Submit the job
        final JobGraph jobGraph = createJobGraph(parallelism, 0, 1000);
        final JobID jobId = jobGraph.getJobID();
        // Reset the static test job helpers
        StatefulCounter.resetForTest(parallelism);
        // Retrieve the job manager
        ActorGateway jobManager = Await.result(flink.leaderGateway().future(), deadline.timeLeft());
        LOG.info("Submitting job " + jobGraph.getJobID() + " in detached mode.");
        flink.submitJobDetached(jobGraph);
        LOG.info("Waiting for some progress.");
        // wait for the JobManager to be ready
        Future<Object> allRunning = jobManager.ask(new WaitForAllVerticesToBeRunning(jobId), deadline.timeLeft());
        Await.ready(allRunning, deadline.timeLeft());
        // wait for the Tasks to be ready
        StatefulCounter.getProgressLatch().await(deadline.timeLeft().toMillis(), TimeUnit.MILLISECONDS);
        LOG.info("Triggering a savepoint.");
        Future<Object> savepointPathFuture = jobManager.ask(new TriggerSavepoint(jobId, Option.<String>empty()), deadline.timeLeft());
        final String savepointPath = ((TriggerSavepointSuccess) Await.result(savepointPathFuture, deadline.timeLeft())).savepointPath();
        LOG.info("Retrieved savepoint path: " + savepointPath + ".");
        // Retrieve the savepoint from the testing job manager
        LOG.info("Requesting the savepoint.");
        Future<Object> savepointFuture = jobManager.ask(new RequestSavepoint(savepointPath), deadline.timeLeft());
        SavepointV1 savepoint = (SavepointV1) ((ResponseSavepoint) Await.result(savepointFuture, deadline.timeLeft())).savepoint();
        LOG.info("Retrieved savepoint: " + savepointPath + ".");
        // Shut down the Flink cluster (thereby canceling the job)
        LOG.info("Shutting down Flink cluster.");
        flink.shutdown();
        flink.awaitTermination();
        // - Verification START -------------------------------------------
        // Only one savepoint should exist
        File[] files = savepointRootDir.listFiles();
        if (files != null) {
            assertEquals("Savepoint not created in expected directory", 1, files.length);
            assertTrue("Savepoint did not create self-contained directory", files[0].isDirectory());
            File savepointDir = files[0];
            File[] savepointFiles = savepointDir.listFiles();
            assertNotNull(savepointFiles);
            // Expect one metadata file and one checkpoint file per stateful
            // parallel subtask
            String errMsg = "Did not write expected number of savepoint/checkpoint files to directory: " + Arrays.toString(savepointFiles);
            assertEquals(errMsg, 1 + parallelism, savepointFiles.length);
        } else {
            fail("Savepoint not created in expected directory");
        }
        // We currently have the following directory layout: checkpointDir/jobId/chk-ID
        File jobCheckpoints = new File(checkpointDir, jobId.toString());
        if (jobCheckpoints.exists()) {
            files = jobCheckpoints.listFiles();
            assertNotNull("Checkpoint directory empty", files);
            assertEquals("Checkpoints directory not clean: " + Arrays.toString(files), 0, files.length);
        }
        // - Verification END ---------------------------------------------
        // Restart the cluster
        LOG.info("Restarting Flink cluster.");
        flink.start();
        // Retrieve the job manager
        LOG.info("Retrieving JobManager.");
        jobManager = Await.result(flink.leaderGateway().future(), deadline.timeLeft());
        LOG.info("JobManager: " + jobManager + ".");
        // Reset static test helpers
        StatefulCounter.resetForTest(parallelism);
        // Gather all task deployment descriptors
        final Throwable[] error = new Throwable[1];
        final TestingCluster finalFlink = flink;
        final Multimap<JobVertexID, TaskDeploymentDescriptor> tdds = HashMultimap.create();
        new JavaTestKit(testActorSystem) {

            {
                new Within(deadline.timeLeft()) {

                    @Override
                    protected void run() {
                        try {
                            // Register to all submit task messages for job
                            for (ActorRef taskManager : finalFlink.getTaskManagersAsJava()) {
                                taskManager.tell(new TestingTaskManagerMessages.RegisterSubmitTaskListener(jobId), getTestActor());
                            }
                            // Set the savepoint path
                            jobGraph.setSavepointRestoreSettings(SavepointRestoreSettings.forPath(savepointPath));
                            LOG.info("Resubmitting job " + jobGraph.getJobID() + " with " + "savepoint path " + savepointPath + " in detached mode.");
                            // Submit the job
                            finalFlink.submitJobDetached(jobGraph);
                            int numTasks = 0;
                            for (JobVertex jobVertex : jobGraph.getVertices()) {
                                numTasks += jobVertex.getParallelism();
                            }
                            // Gather the task deployment descriptors
                            LOG.info("Gathering " + numTasks + " submitted " + "TaskDeploymentDescriptor instances.");
                            for (int i = 0; i < numTasks; i++) {
                                ResponseSubmitTaskListener resp = (ResponseSubmitTaskListener) expectMsgAnyClassOf(getRemainingTime(), ResponseSubmitTaskListener.class);
                                TaskDeploymentDescriptor tdd = resp.tdd();
                                LOG.info("Received: " + tdd.toString() + ".");
                                TaskInformation taskInformation = tdd.getSerializedTaskInformation().deserializeValue(getClass().getClassLoader());
                                tdds.put(taskInformation.getJobVertexId(), tdd);
                            }
                        } catch (Throwable t) {
                            error[0] = t;
                        }
                    }
                };
            }
        };
        // - Verification START -------------------------------------------
        String errMsg = "Error during gathering of TaskDeploymentDescriptors";
        assertNull(errMsg, error[0]);
        // have a matching task deployment descriptor.
        for (TaskState taskState : savepoint.getTaskStates()) {
            Collection<TaskDeploymentDescriptor> taskTdds = tdds.get(taskState.getJobVertexID());
            errMsg = "Missing task for savepoint state for operator " + taskState.getJobVertexID() + ".";
            assertTrue(errMsg, taskTdds.size() > 0);
            assertEquals(taskState.getNumberCollectedStates(), taskTdds.size());
            for (TaskDeploymentDescriptor tdd : taskTdds) {
                SubtaskState subtaskState = taskState.getState(tdd.getSubtaskIndex());
                assertNotNull(subtaskState);
                errMsg = "Initial operator state mismatch.";
                assertEquals(errMsg, subtaskState.getLegacyOperatorState(), tdd.getTaskStateHandles().getLegacyOperatorState());
            }
        }
        // Await state is restored
        StatefulCounter.getRestoreLatch().await(deadline.timeLeft().toMillis(), TimeUnit.MILLISECONDS);
        // Await some progress after restore
        StatefulCounter.getProgressLatch().await(deadline.timeLeft().toMillis(), TimeUnit.MILLISECONDS);
        // - Verification END ---------------------------------------------
        LOG.info("Cancelling job " + jobId + ".");
        jobManager.tell(new CancelJob(jobId));
        LOG.info("Disposing savepoint " + savepointPath + ".");
        Future<Object> disposeFuture = jobManager.ask(new DisposeSavepoint(savepointPath), deadline.timeLeft());
        errMsg = "Failed to dispose savepoint " + savepointPath + ".";
        Object resp = Await.result(disposeFuture, deadline.timeLeft());
        assertTrue(errMsg, resp.getClass() == getDisposeSavepointSuccess().getClass());
        // - Verification START -------------------------------------------
        // The checkpoint files
        List<File> checkpointFiles = new ArrayList<>();
        for (TaskState stateForTaskGroup : savepoint.getTaskStates()) {
            for (SubtaskState subtaskState : stateForTaskGroup.getStates()) {
                ChainedStateHandle<StreamStateHandle> streamTaskState = subtaskState.getLegacyOperatorState();
                for (int i = 0; i < streamTaskState.getLength(); i++) {
                    if (streamTaskState.get(i) != null) {
                        FileStateHandle fileStateHandle = (FileStateHandle) streamTaskState.get(i);
                        checkpointFiles.add(new File(fileStateHandle.getFilePath().toUri()));
                    }
                }
            }
        }
        // The checkpoint files of the savepoint should have been discarded
        for (File f : checkpointFiles) {
            errMsg = "Checkpoint file " + f + " not cleaned up properly.";
            assertFalse(errMsg, f.exists());
        }
        if (checkpointFiles.size() > 0) {
            File parent = checkpointFiles.get(0).getParentFile();
            errMsg = "Checkpoint parent directory " + parent + " not cleaned up properly.";
            assertFalse(errMsg, parent.exists());
        }
        // All savepoints should have been cleaned up
        errMsg = "Savepoints directory not cleaned up properly: " + Arrays.toString(savepointRootDir.listFiles()) + ".";
        assertEquals(errMsg, 0, savepointRootDir.listFiles().length);
    // - Verification END ---------------------------------------------
    } finally {
        if (flink != null) {
            flink.shutdown();
        }
    }
}
Also used : ActorSystem(akka.actor.ActorSystem) RequestSavepoint(org.apache.flink.runtime.testingUtils.TestingJobManagerMessages.RequestSavepoint) Configuration(org.apache.flink.configuration.Configuration) ActorRef(akka.actor.ActorRef) JobVertexID(org.apache.flink.runtime.jobgraph.JobVertexID) ArrayList(java.util.ArrayList) ResponseSubmitTaskListener(org.apache.flink.runtime.testingUtils.TestingTaskManagerMessages.ResponseSubmitTaskListener) TestingCluster(org.apache.flink.runtime.testingUtils.TestingCluster) StreamStateHandle(org.apache.flink.runtime.state.StreamStateHandle) SavepointV1(org.apache.flink.runtime.checkpoint.savepoint.SavepointV1) ActorGateway(org.apache.flink.runtime.instance.ActorGateway) TaskDeploymentDescriptor(org.apache.flink.runtime.deployment.TaskDeploymentDescriptor) CancelJob(org.apache.flink.runtime.messages.JobManagerMessages.CancelJob) TestingTaskManagerMessages(org.apache.flink.runtime.testingUtils.TestingTaskManagerMessages) TaskInformation(org.apache.flink.runtime.executiongraph.TaskInformation) WaitForAllVerticesToBeRunning(org.apache.flink.runtime.testingUtils.TestingJobManagerMessages.WaitForAllVerticesToBeRunning) Deadline(scala.concurrent.duration.Deadline) FiniteDuration(scala.concurrent.duration.FiniteDuration) FileStateHandle(org.apache.flink.runtime.state.filesystem.FileStateHandle) TriggerSavepoint(org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepoint) ResponseSavepoint(org.apache.flink.runtime.testingUtils.TestingJobManagerMessages.ResponseSavepoint) RequestSavepoint(org.apache.flink.runtime.testingUtils.TestingJobManagerMessages.RequestSavepoint) DisposeSavepoint(org.apache.flink.runtime.messages.JobManagerMessages.DisposeSavepoint) TriggerSavepointSuccess(org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepointSuccess) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) JobVertex(org.apache.flink.runtime.jobgraph.JobVertex) DisposeSavepoint(org.apache.flink.runtime.messages.JobManagerMessages.DisposeSavepoint) SubtaskState(org.apache.flink.runtime.checkpoint.SubtaskState) TriggerSavepoint(org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepoint) File(java.io.File) TaskState(org.apache.flink.runtime.checkpoint.TaskState) JobID(org.apache.flink.api.common.JobID) JavaTestKit(akka.testkit.JavaTestKit) Test(org.junit.Test)

Example 5 with TriggerSavepointSuccess

use of org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepointSuccess in project flink by apache.

the class JobManagerTest method testSavepointRestoreSettings.

/**
	 * Tests that configured {@link SavepointRestoreSettings} are respected.
	 */
@Test
public void testSavepointRestoreSettings() throws Exception {
    FiniteDuration timeout = new FiniteDuration(30, TimeUnit.SECONDS);
    ActorSystem actorSystem = null;
    ActorGateway jobManager = null;
    ActorGateway archiver = null;
    ActorGateway taskManager = null;
    try {
        actorSystem = AkkaUtils.createLocalActorSystem(new Configuration());
        Tuple2<ActorRef, ActorRef> master = JobManager.startJobManagerActors(new Configuration(), actorSystem, TestingUtils.defaultExecutor(), TestingUtils.defaultExecutor(), Option.apply("jm"), Option.apply("arch"), TestingJobManager.class, TestingMemoryArchivist.class);
        jobManager = new AkkaActorGateway(master._1(), null);
        archiver = new AkkaActorGateway(master._2(), null);
        Configuration tmConfig = new Configuration();
        tmConfig.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, 4);
        ActorRef taskManagerRef = TaskManager.startTaskManagerComponentsAndActor(tmConfig, ResourceID.generate(), actorSystem, "localhost", Option.apply("tm"), Option.<LeaderRetrievalService>apply(new StandaloneLeaderRetrievalService(jobManager.path())), true, TestingTaskManager.class);
        taskManager = new AkkaActorGateway(taskManagerRef, null);
        // Wait until connected
        Object msg = new TestingTaskManagerMessages.NotifyWhenRegisteredAtJobManager(jobManager.actor());
        Await.ready(taskManager.ask(msg, timeout), timeout);
        // Create job graph
        JobVertex sourceVertex = new JobVertex("Source");
        sourceVertex.setInvokableClass(BlockingStatefulInvokable.class);
        sourceVertex.setParallelism(1);
        JobGraph jobGraph = new JobGraph("TestingJob", sourceVertex);
        JobSnapshottingSettings snapshottingSettings = new JobSnapshottingSettings(Collections.singletonList(sourceVertex.getID()), Collections.singletonList(sourceVertex.getID()), Collections.singletonList(sourceVertex.getID()), // deactivated checkpointing
        Long.MAX_VALUE, 360000, 0, Integer.MAX_VALUE, ExternalizedCheckpointSettings.none(), null, true);
        jobGraph.setSnapshotSettings(snapshottingSettings);
        // Submit job graph
        msg = new JobManagerMessages.SubmitJob(jobGraph, ListeningBehaviour.DETACHED);
        Await.result(jobManager.ask(msg, timeout), timeout);
        // Wait for all tasks to be running
        msg = new TestingJobManagerMessages.WaitForAllVerticesToBeRunning(jobGraph.getJobID());
        Await.result(jobManager.ask(msg, timeout), timeout);
        // Trigger savepoint
        File targetDirectory = tmpFolder.newFolder();
        msg = new TriggerSavepoint(jobGraph.getJobID(), Option.apply(targetDirectory.getAbsolutePath()));
        Future<Object> future = jobManager.ask(msg, timeout);
        Object result = Await.result(future, timeout);
        String savepointPath = ((TriggerSavepointSuccess) result).savepointPath();
        // Cancel because of restarts
        msg = new TestingJobManagerMessages.NotifyWhenJobRemoved(jobGraph.getJobID());
        Future<?> removedFuture = jobManager.ask(msg, timeout);
        Future<?> cancelFuture = jobManager.ask(new CancelJob(jobGraph.getJobID()), timeout);
        Object response = Await.result(cancelFuture, timeout);
        assertTrue("Unexpected response: " + response, response instanceof CancellationSuccess);
        Await.ready(removedFuture, timeout);
        // Adjust the job (we need a new operator ID)
        JobVertex newSourceVertex = new JobVertex("NewSource");
        newSourceVertex.setInvokableClass(BlockingStatefulInvokable.class);
        newSourceVertex.setParallelism(1);
        JobGraph newJobGraph = new JobGraph("NewTestingJob", newSourceVertex);
        JobSnapshottingSettings newSnapshottingSettings = new JobSnapshottingSettings(Collections.singletonList(newSourceVertex.getID()), Collections.singletonList(newSourceVertex.getID()), Collections.singletonList(newSourceVertex.getID()), // deactivated checkpointing
        Long.MAX_VALUE, 360000, 0, Integer.MAX_VALUE, ExternalizedCheckpointSettings.none(), null, true);
        newJobGraph.setSnapshotSettings(newSnapshottingSettings);
        SavepointRestoreSettings restoreSettings = SavepointRestoreSettings.forPath(savepointPath, false);
        newJobGraph.setSavepointRestoreSettings(restoreSettings);
        msg = new JobManagerMessages.SubmitJob(newJobGraph, ListeningBehaviour.DETACHED);
        response = Await.result(jobManager.ask(msg, timeout), timeout);
        assertTrue("Unexpected response: " + response, response instanceof JobManagerMessages.JobResultFailure);
        JobManagerMessages.JobResultFailure failure = (JobManagerMessages.JobResultFailure) response;
        Throwable cause = failure.cause().deserializeError(ClassLoader.getSystemClassLoader());
        assertTrue(cause instanceof IllegalStateException);
        assertTrue(cause.getMessage().contains("allowNonRestoredState"));
        // Wait until removed
        msg = new TestingJobManagerMessages.NotifyWhenJobRemoved(newJobGraph.getJobID());
        Await.ready(jobManager.ask(msg, timeout), timeout);
        // Resubmit, but allow non restored state now
        restoreSettings = SavepointRestoreSettings.forPath(savepointPath, true);
        newJobGraph.setSavepointRestoreSettings(restoreSettings);
        msg = new JobManagerMessages.SubmitJob(newJobGraph, ListeningBehaviour.DETACHED);
        response = Await.result(jobManager.ask(msg, timeout), timeout);
        assertTrue("Unexpected response: " + response, response instanceof JobManagerMessages.JobSubmitSuccess);
    } finally {
        if (actorSystem != null) {
            actorSystem.shutdown();
        }
        if (archiver != null) {
            archiver.actor().tell(PoisonPill.getInstance(), ActorRef.noSender());
        }
        if (jobManager != null) {
            jobManager.actor().tell(PoisonPill.getInstance(), ActorRef.noSender());
        }
        if (taskManager != null) {
            taskManager.actor().tell(PoisonPill.getInstance(), ActorRef.noSender());
        }
    }
}
Also used : ActorSystem(akka.actor.ActorSystem) AkkaActorGateway(org.apache.flink.runtime.instance.AkkaActorGateway) JobSubmitSuccess(org.apache.flink.runtime.messages.JobManagerMessages.JobSubmitSuccess) Configuration(org.apache.flink.configuration.Configuration) ActorRef(akka.actor.ActorRef) TestingJobManagerMessages(org.apache.flink.runtime.testingUtils.TestingJobManagerMessages) ActorGateway(org.apache.flink.runtime.instance.ActorGateway) AkkaActorGateway(org.apache.flink.runtime.instance.AkkaActorGateway) CancelJob(org.apache.flink.runtime.messages.JobManagerMessages.CancelJob) WaitForAllVerticesToBeRunning(org.apache.flink.runtime.testingUtils.TestingJobManagerMessages.WaitForAllVerticesToBeRunning) JobSnapshottingSettings(org.apache.flink.runtime.jobgraph.tasks.JobSnapshottingSettings) JobManagerMessages(org.apache.flink.runtime.messages.JobManagerMessages) TestingJobManagerMessages(org.apache.flink.runtime.testingUtils.TestingJobManagerMessages) FiniteDuration(scala.concurrent.duration.FiniteDuration) SubmitJob(org.apache.flink.runtime.messages.JobManagerMessages.SubmitJob) TriggerSavepointSuccess(org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepointSuccess) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) JobVertex(org.apache.flink.runtime.jobgraph.JobVertex) StandaloneLeaderRetrievalService(org.apache.flink.runtime.leaderretrieval.StandaloneLeaderRetrievalService) CancellationSuccess(org.apache.flink.runtime.messages.JobManagerMessages.CancellationSuccess) TriggerSavepoint(org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepoint) File(java.io.File) SavepointRestoreSettings(org.apache.flink.runtime.jobgraph.SavepointRestoreSettings) Test(org.junit.Test)

Aggregations

ActorGateway (org.apache.flink.runtime.instance.ActorGateway)6 TriggerSavepoint (org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepoint)6 TriggerSavepointSuccess (org.apache.flink.runtime.messages.JobManagerMessages.TriggerSavepointSuccess)6 FiniteDuration (scala.concurrent.duration.FiniteDuration)6 Test (org.junit.Test)5 ActorRef (akka.actor.ActorRef)3 ActorSystem (akka.actor.ActorSystem)3 File (java.io.File)3 JobID (org.apache.flink.api.common.JobID)3 Configuration (org.apache.flink.configuration.Configuration)3 JobGraph (org.apache.flink.runtime.jobgraph.JobGraph)3 JobVertex (org.apache.flink.runtime.jobgraph.JobVertex)3 DisposeSavepoint (org.apache.flink.runtime.messages.JobManagerMessages.DisposeSavepoint)3 WaitForAllVerticesToBeRunning (org.apache.flink.runtime.testingUtils.TestingJobManagerMessages.WaitForAllVerticesToBeRunning)3 AkkaActorGateway (org.apache.flink.runtime.instance.AkkaActorGateway)2 JobSnapshottingSettings (org.apache.flink.runtime.jobgraph.tasks.JobSnapshottingSettings)2 StandaloneLeaderRetrievalService (org.apache.flink.runtime.leaderretrieval.StandaloneLeaderRetrievalService)2 JobManagerMessages (org.apache.flink.runtime.messages.JobManagerMessages)2 CancelJob (org.apache.flink.runtime.messages.JobManagerMessages.CancelJob)2 SubmitJob (org.apache.flink.runtime.messages.JobManagerMessages.SubmitJob)2