Search in sources :

Example 1 with TestingLeaderRetrievalService

use of org.apache.flink.runtime.leaderelection.TestingLeaderRetrievalService in project flink by apache.

the class JobClientActorTest method testConnectionTimeoutAfterJobSubmission.

/** Tests that a {@link org.apache.flink.runtime.client.JobClientActorConnectionTimeoutException}
	 * is thrown after a successful job submission if the JobManager dies.
	 *
	 * @throws Exception
	 */
@Test(expected = JobClientActorConnectionTimeoutException.class)
public void testConnectionTimeoutAfterJobSubmission() throws Exception {
    FiniteDuration jobClientActorTimeout = new FiniteDuration(5, TimeUnit.SECONDS);
    FiniteDuration timeout = jobClientActorTimeout.$times(2);
    UUID leaderSessionID = UUID.randomUUID();
    ActorRef jobManager = system.actorOf(Props.create(JobAcceptingActor.class, leaderSessionID));
    TestingLeaderRetrievalService testingLeaderRetrievalService = new TestingLeaderRetrievalService(jobManager.path().toString(), leaderSessionID);
    Props jobClientActorProps = JobSubmissionClientActor.createActorProps(testingLeaderRetrievalService, jobClientActorTimeout, false, clientConfig);
    ActorRef jobClientActor = system.actorOf(jobClientActorProps);
    Future<Object> jobExecutionResult = Patterns.ask(jobClientActor, new JobClientMessages.SubmitJobAndWait(testJobGraph), new Timeout(timeout));
    Future<Object> waitFuture = Patterns.ask(jobManager, new RegisterTest(), new Timeout(timeout));
    Await.result(waitFuture, timeout);
    jobManager.tell(PoisonPill.getInstance(), ActorRef.noSender());
    Await.result(jobExecutionResult, timeout);
}
Also used : TestingLeaderRetrievalService(org.apache.flink.runtime.leaderelection.TestingLeaderRetrievalService) ActorRef(akka.actor.ActorRef) JobClientMessages(org.apache.flink.runtime.messages.JobClientMessages) Timeout(akka.util.Timeout) FiniteDuration(scala.concurrent.duration.FiniteDuration) UUID(java.util.UUID) Props(akka.actor.Props) Test(org.junit.Test)

Example 2 with TestingLeaderRetrievalService

use of org.apache.flink.runtime.leaderelection.TestingLeaderRetrievalService in project flink by apache.

the class JobClientActorTest method testSubmissionTimeout.

/** Tests that a {@link JobClientActorSubmissionTimeoutException} is thrown when the job cannot
	 * be submitted by the JobSubmissionClientActor. This is here the case, because the started JobManager
	 * never replies to a {@link SubmitJob} message.
	 *
	 * @throws Exception
	 */
@Test(expected = JobClientActorSubmissionTimeoutException.class)
public void testSubmissionTimeout() throws Exception {
    FiniteDuration jobClientActorTimeout = new FiniteDuration(5, TimeUnit.SECONDS);
    FiniteDuration timeout = jobClientActorTimeout.$times(2);
    UUID leaderSessionID = UUID.randomUUID();
    ActorRef jobManager = system.actorOf(Props.create(PlainActor.class, leaderSessionID));
    TestingLeaderRetrievalService testingLeaderRetrievalService = new TestingLeaderRetrievalService(jobManager.path().toString(), leaderSessionID);
    Props jobClientActorProps = JobSubmissionClientActor.createActorProps(testingLeaderRetrievalService, jobClientActorTimeout, false, clientConfig);
    ActorRef jobClientActor = system.actorOf(jobClientActorProps);
    Future<Object> jobExecutionResult = Patterns.ask(jobClientActor, new JobClientMessages.SubmitJobAndWait(testJobGraph), new Timeout(timeout));
    Await.result(jobExecutionResult, timeout);
}
Also used : TestingLeaderRetrievalService(org.apache.flink.runtime.leaderelection.TestingLeaderRetrievalService) ActorRef(akka.actor.ActorRef) JobClientMessages(org.apache.flink.runtime.messages.JobClientMessages) Timeout(akka.util.Timeout) FiniteDuration(scala.concurrent.duration.FiniteDuration) UUID(java.util.UUID) Props(akka.actor.Props) Test(org.junit.Test)

Example 3 with TestingLeaderRetrievalService

use of org.apache.flink.runtime.leaderelection.TestingLeaderRetrievalService in project flink by apache.

the class AkkaKvStateLocationLookupServiceTest method testLeaderSessionIdChange.

/**
	 * Tests that messages are properly decorated with the leader session ID.
	 */
@Test
public void testLeaderSessionIdChange() throws Exception {
    TestingLeaderRetrievalService leaderRetrievalService = new TestingLeaderRetrievalService();
    Queue<LookupKvStateLocation> received = new LinkedBlockingQueue<>();
    AkkaKvStateLocationLookupService lookupService = new AkkaKvStateLocationLookupService(leaderRetrievalService, testActorSystem, TIMEOUT, new AkkaKvStateLocationLookupService.DisabledLookupRetryStrategyFactory());
    lookupService.start();
    // Create test actors with random leader session IDs
    KvStateLocation expected1 = new KvStateLocation(new JobID(), new JobVertexID(), 8282, "salt");
    UUID leaderSessionId1 = UUID.randomUUID();
    ActorRef testActor1 = LookupResponseActor.create(received, leaderSessionId1, expected1);
    String testActorAddress1 = AkkaUtils.getAkkaURL(testActorSystem, testActor1);
    KvStateLocation expected2 = new KvStateLocation(new JobID(), new JobVertexID(), 22321, "pepper");
    UUID leaderSessionId2 = UUID.randomUUID();
    ActorRef testActor2 = LookupResponseActor.create(received, leaderSessionId1, expected2);
    String testActorAddress2 = AkkaUtils.getAkkaURL(testActorSystem, testActor2);
    JobID jobId = new JobID();
    //
    // Notify about first leader
    //
    leaderRetrievalService.notifyListener(testActorAddress1, leaderSessionId1);
    KvStateLocation location = Await.result(lookupService.getKvStateLookupInfo(jobId, "rock"), TIMEOUT);
    assertEquals(expected1, location);
    assertEquals(1, received.size());
    verifyLookupMsg(received.poll(), jobId, "rock");
    //
    // Notify about second leader
    //
    leaderRetrievalService.notifyListener(testActorAddress2, leaderSessionId2);
    location = Await.result(lookupService.getKvStateLookupInfo(jobId, "roll"), TIMEOUT);
    assertEquals(expected2, location);
    assertEquals(1, received.size());
    verifyLookupMsg(received.poll(), jobId, "roll");
}
Also used : TestingLeaderRetrievalService(org.apache.flink.runtime.leaderelection.TestingLeaderRetrievalService) ActorRef(akka.actor.ActorRef) JobVertexID(org.apache.flink.runtime.jobgraph.JobVertexID) LookupKvStateLocation(org.apache.flink.runtime.query.KvStateMessage.LookupKvStateLocation) LinkedBlockingQueue(java.util.concurrent.LinkedBlockingQueue) UUID(java.util.UUID) LookupKvStateLocation(org.apache.flink.runtime.query.KvStateMessage.LookupKvStateLocation) JobID(org.apache.flink.api.common.JobID) Test(org.junit.Test)

Example 4 with TestingLeaderRetrievalService

use of org.apache.flink.runtime.leaderelection.TestingLeaderRetrievalService in project flink by apache.

the class AkkaKvStateLocationLookupServiceTest method testUnexpectedResponseType.

@Test
public void testUnexpectedResponseType() throws Exception {
    TestingLeaderRetrievalService leaderRetrievalService = new TestingLeaderRetrievalService();
    Queue<LookupKvStateLocation> received = new LinkedBlockingQueue<>();
    AkkaKvStateLocationLookupService lookupService = new AkkaKvStateLocationLookupService(leaderRetrievalService, testActorSystem, TIMEOUT, new AkkaKvStateLocationLookupService.DisabledLookupRetryStrategyFactory());
    lookupService.start();
    // Create test actors with random leader session IDs
    String expected = "unexpected-response-type";
    ActorRef testActor = LookupResponseActor.create(received, null, expected);
    String testActorAddress = AkkaUtils.getAkkaURL(testActorSystem, testActor);
    leaderRetrievalService.notifyListener(testActorAddress, null);
    try {
        Await.result(lookupService.getKvStateLookupInfo(new JobID(), "spicy"), TIMEOUT);
        fail("Did not throw expected Exception");
    } catch (Throwable ignored) {
    // Expected
    }
}
Also used : TestingLeaderRetrievalService(org.apache.flink.runtime.leaderelection.TestingLeaderRetrievalService) ActorRef(akka.actor.ActorRef) LookupKvStateLocation(org.apache.flink.runtime.query.KvStateMessage.LookupKvStateLocation) LinkedBlockingQueue(java.util.concurrent.LinkedBlockingQueue) JobID(org.apache.flink.api.common.JobID) Test(org.junit.Test)

Example 5 with TestingLeaderRetrievalService

use of org.apache.flink.runtime.leaderelection.TestingLeaderRetrievalService in project flink by apache.

the class JobManagerHARecoveryTest method testJobRecoveryWhenLosingLeadership.

/**
	 * Tests that the persisted job is not removed from the SubmittedJobGraphStore if the JobManager
	 * loses its leadership. Furthermore, it tests that the job manager can recover the job from
	 * the SubmittedJobGraphStore and checkpoint state is recovered as well.
	 */
@Test
public void testJobRecoveryWhenLosingLeadership() throws Exception {
    FiniteDuration timeout = new FiniteDuration(30, TimeUnit.SECONDS);
    FiniteDuration jobRecoveryTimeout = new FiniteDuration(3, TimeUnit.SECONDS);
    Deadline deadline = new FiniteDuration(2, TimeUnit.MINUTES).fromNow();
    Configuration flinkConfiguration = new Configuration();
    UUID leaderSessionID = UUID.randomUUID();
    UUID newLeaderSessionID = UUID.randomUUID();
    int slots = 2;
    ActorRef archive = null;
    ActorRef jobManager = null;
    ActorRef taskManager = null;
    flinkConfiguration.setString(HighAvailabilityOptions.HA_MODE, "zookeeper");
    flinkConfiguration.setString(HighAvailabilityOptions.HA_STORAGE_PATH, temporaryFolder.newFolder().toString());
    flinkConfiguration.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, slots);
    try {
        Scheduler scheduler = new Scheduler(TestingUtils.defaultExecutionContext());
        MySubmittedJobGraphStore mySubmittedJobGraphStore = new MySubmittedJobGraphStore();
        MyCheckpointStore checkpointStore = new MyCheckpointStore();
        CheckpointIDCounter checkpointCounter = new StandaloneCheckpointIDCounter();
        CheckpointRecoveryFactory checkpointStateFactory = new MyCheckpointRecoveryFactory(checkpointStore, checkpointCounter);
        TestingLeaderElectionService myLeaderElectionService = new TestingLeaderElectionService();
        TestingLeaderRetrievalService myLeaderRetrievalService = new TestingLeaderRetrievalService();
        InstanceManager instanceManager = new InstanceManager();
        instanceManager.addInstanceListener(scheduler);
        archive = system.actorOf(Props.create(MemoryArchivist.class, 10));
        Props jobManagerProps = Props.create(TestingJobManager.class, flinkConfiguration, TestingUtils.defaultExecutor(), TestingUtils.defaultExecutor(), instanceManager, scheduler, new BlobLibraryCacheManager(new BlobServer(flinkConfiguration), 3600000), archive, new FixedDelayRestartStrategy.FixedDelayRestartStrategyFactory(Int.MaxValue(), 100), timeout, myLeaderElectionService, mySubmittedJobGraphStore, checkpointStateFactory, jobRecoveryTimeout, Option.apply(null));
        jobManager = system.actorOf(jobManagerProps);
        ActorGateway gateway = new AkkaActorGateway(jobManager, leaderSessionID);
        taskManager = TaskManager.startTaskManagerComponentsAndActor(flinkConfiguration, ResourceID.generate(), system, "localhost", Option.apply("taskmanager"), Option.apply((LeaderRetrievalService) myLeaderRetrievalService), true, TestingTaskManager.class);
        ActorGateway tmGateway = new AkkaActorGateway(taskManager, leaderSessionID);
        Future<Object> tmAlive = tmGateway.ask(TestingMessages.getAlive(), deadline.timeLeft());
        Await.ready(tmAlive, deadline.timeLeft());
        JobVertex sourceJobVertex = new JobVertex("Source");
        sourceJobVertex.setInvokableClass(BlockingStatefulInvokable.class);
        sourceJobVertex.setParallelism(slots);
        JobGraph jobGraph = new JobGraph("TestingJob", sourceJobVertex);
        List<JobVertexID> vertexId = Collections.singletonList(sourceJobVertex.getID());
        jobGraph.setSnapshotSettings(new JobSnapshottingSettings(vertexId, vertexId, vertexId, 100, 10 * 60 * 1000, 0, 1, ExternalizedCheckpointSettings.none(), null, true));
        BlockingStatefulInvokable.initializeStaticHelpers(slots);
        Future<Object> isLeader = gateway.ask(TestingJobManagerMessages.getNotifyWhenLeader(), deadline.timeLeft());
        Future<Object> isConnectedToJobManager = tmGateway.ask(new TestingTaskManagerMessages.NotifyWhenRegisteredAtJobManager(jobManager), deadline.timeLeft());
        // tell jobManager that he's the leader
        myLeaderElectionService.isLeader(leaderSessionID);
        // tell taskManager who's the leader
        myLeaderRetrievalService.notifyListener(gateway.path(), leaderSessionID);
        Await.ready(isLeader, deadline.timeLeft());
        Await.ready(isConnectedToJobManager, deadline.timeLeft());
        // submit blocking job
        Future<Object> jobSubmitted = gateway.ask(new JobManagerMessages.SubmitJob(jobGraph, ListeningBehaviour.DETACHED), deadline.timeLeft());
        Await.ready(jobSubmitted, deadline.timeLeft());
        // Wait for some checkpoints to complete
        BlockingStatefulInvokable.awaitCompletedCheckpoints();
        Future<Object> jobRemoved = gateway.ask(new TestingJobManagerMessages.NotifyWhenJobRemoved(jobGraph.getJobID()), deadline.timeLeft());
        // Revoke leadership
        myLeaderElectionService.notLeader();
        // check that the job gets removed from the JobManager
        Await.ready(jobRemoved, deadline.timeLeft());
        // but stays in the submitted job graph store
        assertTrue(mySubmittedJobGraphStore.contains(jobGraph.getJobID()));
        Future<Object> jobRunning = gateway.ask(new TestingJobManagerMessages.NotifyWhenJobStatus(jobGraph.getJobID(), JobStatus.RUNNING), deadline.timeLeft());
        // Make JobManager again a leader
        myLeaderElectionService.isLeader(newLeaderSessionID);
        // tell the TaskManager about it
        myLeaderRetrievalService.notifyListener(gateway.path(), newLeaderSessionID);
        // wait that the job is recovered and reaches state RUNNING
        Await.ready(jobRunning, deadline.timeLeft());
        Future<Object> jobFinished = gateway.ask(new TestingJobManagerMessages.NotifyWhenJobRemoved(jobGraph.getJobID()), deadline.timeLeft());
        BlockingInvokable.unblock();
        // wait til the job has finished
        Await.ready(jobFinished, deadline.timeLeft());
        // check that the job has been removed from the submitted job graph store
        assertFalse(mySubmittedJobGraphStore.contains(jobGraph.getJobID()));
        // Check that state has been recovered
        long[] recoveredStates = BlockingStatefulInvokable.getRecoveredStates();
        for (long state : recoveredStates) {
            boolean isExpected = state >= BlockingStatefulInvokable.NUM_CHECKPOINTS_TO_COMPLETE;
            assertTrue("Did not recover checkpoint state correctly, expecting >= " + BlockingStatefulInvokable.NUM_CHECKPOINTS_TO_COMPLETE + ", but state was " + state, isExpected);
        }
    } finally {
        if (archive != null) {
            archive.tell(PoisonPill.getInstance(), ActorRef.noSender());
        }
        if (jobManager != null) {
            jobManager.tell(PoisonPill.getInstance(), ActorRef.noSender());
        }
        if (taskManager != null) {
            taskManager.tell(PoisonPill.getInstance(), ActorRef.noSender());
        }
    }
}
Also used : AkkaActorGateway(org.apache.flink.runtime.instance.AkkaActorGateway) BlobLibraryCacheManager(org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager) Configuration(org.apache.flink.configuration.Configuration) TestingLeaderRetrievalService(org.apache.flink.runtime.leaderelection.TestingLeaderRetrievalService) FixedDelayRestartStrategy(org.apache.flink.runtime.executiongraph.restart.FixedDelayRestartStrategy) ActorRef(akka.actor.ActorRef) Scheduler(org.apache.flink.runtime.jobmanager.scheduler.Scheduler) InstanceManager(org.apache.flink.runtime.instance.InstanceManager) JobVertexID(org.apache.flink.runtime.jobgraph.JobVertexID) Props(akka.actor.Props) TestingJobManagerMessages(org.apache.flink.runtime.testingUtils.TestingJobManagerMessages) ActorGateway(org.apache.flink.runtime.instance.ActorGateway) AkkaActorGateway(org.apache.flink.runtime.instance.AkkaActorGateway) TestingTaskManager(org.apache.flink.runtime.testingUtils.TestingTaskManager) BlobServer(org.apache.flink.runtime.blob.BlobServer) CheckpointIDCounter(org.apache.flink.runtime.checkpoint.CheckpointIDCounter) StandaloneCheckpointIDCounter(org.apache.flink.runtime.checkpoint.StandaloneCheckpointIDCounter) UUID(java.util.UUID) TestingTaskManagerMessages(org.apache.flink.runtime.testingUtils.TestingTaskManagerMessages) TestingLeaderElectionService(org.apache.flink.runtime.leaderelection.TestingLeaderElectionService) Deadline(scala.concurrent.duration.Deadline) JobSnapshottingSettings(org.apache.flink.runtime.jobgraph.tasks.JobSnapshottingSettings) JobManagerMessages(org.apache.flink.runtime.messages.JobManagerMessages) TestingJobManagerMessages(org.apache.flink.runtime.testingUtils.TestingJobManagerMessages) FiniteDuration(scala.concurrent.duration.FiniteDuration) CheckpointRecoveryFactory(org.apache.flink.runtime.checkpoint.CheckpointRecoveryFactory) CompletedCheckpoint(org.apache.flink.runtime.checkpoint.CompletedCheckpoint) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) JobVertex(org.apache.flink.runtime.jobgraph.JobVertex) StandaloneCheckpointIDCounter(org.apache.flink.runtime.checkpoint.StandaloneCheckpointIDCounter) Test(org.junit.Test)

Aggregations

TestingLeaderRetrievalService (org.apache.flink.runtime.leaderelection.TestingLeaderRetrievalService)22 Test (org.junit.Test)21 UUID (java.util.UUID)17 ActorRef (akka.actor.ActorRef)13 JobID (org.apache.flink.api.common.JobID)11 FiniteDuration (scala.concurrent.duration.FiniteDuration)10 Props (akka.actor.Props)7 Timeout (akka.util.Timeout)7 TestingHighAvailabilityServices (org.apache.flink.runtime.highavailability.TestingHighAvailabilityServices)7 Time (org.apache.flink.api.common.time.Time)6 ScheduledExecutor (org.apache.flink.runtime.concurrent.ScheduledExecutor)5 JobClientMessages (org.apache.flink.runtime.messages.JobClientMessages)5 LinkedBlockingQueue (java.util.concurrent.LinkedBlockingQueue)4 Configuration (org.apache.flink.configuration.Configuration)4 JobVertexID (org.apache.flink.runtime.jobgraph.JobVertexID)4 LookupKvStateLocation (org.apache.flink.runtime.query.KvStateMessage.LookupKvStateLocation)4 TestingFatalErrorHandler (org.apache.flink.runtime.util.TestingFatalErrorHandler)4 ResourceID (org.apache.flink.runtime.clusterframework.types.ResourceID)3 HeartbeatServices (org.apache.flink.runtime.heartbeat.HeartbeatServices)3 TestingLeaderElectionService (org.apache.flink.runtime.leaderelection.TestingLeaderElectionService)3