Search in sources :

Example 21 with TestingLeaderRetrievalService

use of org.apache.flink.runtime.leaderelection.TestingLeaderRetrievalService in project flink by apache.

the class JobMasterTest method testHeartbeatTimeoutWithTaskManager.

@Test
public void testHeartbeatTimeoutWithTaskManager() throws Exception {
    final TestingHighAvailabilityServices haServices = new TestingHighAvailabilityServices();
    final TestingLeaderRetrievalService rmLeaderRetrievalService = new TestingLeaderRetrievalService();
    haServices.setResourceManagerLeaderRetriever(rmLeaderRetrievalService);
    haServices.setCheckpointRecoveryFactory(mock(CheckpointRecoveryFactory.class));
    final TestingFatalErrorHandler testingFatalErrorHandler = new TestingFatalErrorHandler();
    final String jobManagerAddress = "jm";
    final UUID jmLeaderId = UUID.randomUUID();
    final ResourceID jmResourceId = new ResourceID(jobManagerAddress);
    final String taskManagerAddress = "tm";
    final ResourceID tmResourceId = new ResourceID(taskManagerAddress);
    final TaskManagerLocation taskManagerLocation = new TaskManagerLocation(tmResourceId, InetAddress.getLoopbackAddress(), 1234);
    final TaskExecutorGateway taskExecutorGateway = mock(TaskExecutorGateway.class);
    final TestingSerialRpcService rpc = new TestingSerialRpcService();
    rpc.registerGateway(taskManagerAddress, taskExecutorGateway);
    final long heartbeatInterval = 1L;
    final long heartbeatTimeout = 5L;
    final ScheduledExecutor scheduledExecutor = mock(ScheduledExecutor.class);
    final HeartbeatServices heartbeatServices = new TestingHeartbeatServices(heartbeatInterval, heartbeatTimeout, scheduledExecutor);
    final JobGraph jobGraph = new JobGraph();
    try {
        final JobMaster jobMaster = new JobMaster(jmResourceId, jobGraph, new Configuration(), rpc, haServices, heartbeatServices, Executors.newScheduledThreadPool(1), mock(BlobLibraryCacheManager.class), mock(RestartStrategyFactory.class), Time.of(10, TimeUnit.SECONDS), null, mock(OnCompletionActions.class), testingFatalErrorHandler, new FlinkUserCodeClassLoader(new URL[0]));
        // also start the heartbeat manager in job manager
        jobMaster.start(jmLeaderId);
        // register task manager will trigger monitoring heartbeat target, schedule heartbeat request in interval time
        jobMaster.registerTaskManager(taskManagerAddress, taskManagerLocation, jmLeaderId);
        ArgumentCaptor<Runnable> heartbeatRunnableCaptor = ArgumentCaptor.forClass(Runnable.class);
        verify(scheduledExecutor, times(1)).scheduleAtFixedRate(heartbeatRunnableCaptor.capture(), eq(0L), eq(heartbeatInterval), eq(TimeUnit.MILLISECONDS));
        Runnable heartbeatRunnable = heartbeatRunnableCaptor.getValue();
        ArgumentCaptor<Runnable> timeoutRunnableCaptor = ArgumentCaptor.forClass(Runnable.class);
        verify(scheduledExecutor).schedule(timeoutRunnableCaptor.capture(), eq(heartbeatTimeout), eq(TimeUnit.MILLISECONDS));
        Runnable timeoutRunnable = timeoutRunnableCaptor.getValue();
        // run the first heartbeat request
        heartbeatRunnable.run();
        verify(taskExecutorGateway, times(1)).heartbeatFromJobManager(eq(jmResourceId));
        // run the timeout runnable to simulate a heartbeat timeout
        timeoutRunnable.run();
        verify(taskExecutorGateway).disconnectJobManager(eq(jobGraph.getJobID()), any(TimeoutException.class));
        // check if a concurrent error occurred
        testingFatalErrorHandler.rethrowError();
    } finally {
        rpc.stopService();
    }
}
Also used : BlobLibraryCacheManager(org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager) Configuration(org.apache.flink.configuration.Configuration) TestingLeaderRetrievalService(org.apache.flink.runtime.leaderelection.TestingLeaderRetrievalService) FlinkUserCodeClassLoader(org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoader) URL(java.net.URL) ScheduledExecutor(org.apache.flink.runtime.concurrent.ScheduledExecutor) TestingHighAvailabilityServices(org.apache.flink.runtime.highavailability.TestingHighAvailabilityServices) ResourceID(org.apache.flink.runtime.clusterframework.types.ResourceID) TestingSerialRpcService(org.apache.flink.runtime.rpc.TestingSerialRpcService) UUID(java.util.UUID) TimeoutException(java.util.concurrent.TimeoutException) TestingFatalErrorHandler(org.apache.flink.runtime.util.TestingFatalErrorHandler) HeartbeatServices(org.apache.flink.runtime.heartbeat.HeartbeatServices) TaskManagerLocation(org.apache.flink.runtime.taskmanager.TaskManagerLocation) TaskExecutorGateway(org.apache.flink.runtime.taskexecutor.TaskExecutorGateway) CheckpointRecoveryFactory(org.apache.flink.runtime.checkpoint.CheckpointRecoveryFactory) OnCompletionActions(org.apache.flink.runtime.jobmanager.OnCompletionActions) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) RestartStrategyFactory(org.apache.flink.runtime.executiongraph.restart.RestartStrategyFactory) PrepareForTest(org.powermock.core.classloader.annotations.PrepareForTest) Test(org.junit.Test)

Example 22 with TestingLeaderRetrievalService

use of org.apache.flink.runtime.leaderelection.TestingLeaderRetrievalService in project flink by apache.

the class AkkaKvStateLocationLookupServiceTest method testRetryOnUnknownJobManager.

/**
	 * Tests that lookups are retried when no leader notification is available.
	 */
@Test
public void testRetryOnUnknownJobManager() throws Exception {
    final Queue<LookupRetryStrategy> retryStrategies = new LinkedBlockingQueue<>();
    LookupRetryStrategyFactory retryStrategy = new LookupRetryStrategyFactory() {

        @Override
        public LookupRetryStrategy createRetryStrategy() {
            return retryStrategies.poll();
        }
    };
    final TestingLeaderRetrievalService leaderRetrievalService = new TestingLeaderRetrievalService();
    AkkaKvStateLocationLookupService lookupService = new AkkaKvStateLocationLookupService(leaderRetrievalService, testActorSystem, TIMEOUT, retryStrategy);
    lookupService.start();
    //
    // Test call to retry
    //
    final AtomicBoolean hasRetried = new AtomicBoolean();
    retryStrategies.add(new LookupRetryStrategy() {

        @Override
        public FiniteDuration getRetryDelay() {
            return FiniteDuration.Zero();
        }

        @Override
        public boolean tryRetry() {
            if (hasRetried.compareAndSet(false, true)) {
                return true;
            }
            return false;
        }
    });
    Future<KvStateLocation> locationFuture = lookupService.getKvStateLookupInfo(new JobID(), "yessir");
    Await.ready(locationFuture, TIMEOUT);
    assertTrue("Did not retry ", hasRetried.get());
    //
    // Test leader notification after retry
    //
    Queue<LookupKvStateLocation> received = new LinkedBlockingQueue<>();
    KvStateLocation expected = new KvStateLocation(new JobID(), new JobVertexID(), 12122, "garlic");
    ActorRef testActor = LookupResponseActor.create(received, null, expected);
    final String testActorAddress = AkkaUtils.getAkkaURL(testActorSystem, testActor);
    retryStrategies.add(new LookupRetryStrategy() {

        @Override
        public FiniteDuration getRetryDelay() {
            return FiniteDuration.apply(100, TimeUnit.MILLISECONDS);
        }

        @Override
        public boolean tryRetry() {
            leaderRetrievalService.notifyListener(testActorAddress, null);
            return true;
        }
    });
    KvStateLocation location = Await.result(lookupService.getKvStateLookupInfo(new JobID(), "yessir"), TIMEOUT);
    assertEquals(expected, location);
}
Also used : LookupRetryStrategyFactory(org.apache.flink.runtime.query.AkkaKvStateLocationLookupService.LookupRetryStrategyFactory) TestingLeaderRetrievalService(org.apache.flink.runtime.leaderelection.TestingLeaderRetrievalService) ActorRef(akka.actor.ActorRef) JobVertexID(org.apache.flink.runtime.jobgraph.JobVertexID) FiniteDuration(scala.concurrent.duration.FiniteDuration) LookupRetryStrategy(org.apache.flink.runtime.query.AkkaKvStateLocationLookupService.LookupRetryStrategy) LinkedBlockingQueue(java.util.concurrent.LinkedBlockingQueue) LookupKvStateLocation(org.apache.flink.runtime.query.KvStateMessage.LookupKvStateLocation) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) LookupKvStateLocation(org.apache.flink.runtime.query.KvStateMessage.LookupKvStateLocation) JobID(org.apache.flink.api.common.JobID) Test(org.junit.Test)

Aggregations

TestingLeaderRetrievalService (org.apache.flink.runtime.leaderelection.TestingLeaderRetrievalService)22 Test (org.junit.Test)21 UUID (java.util.UUID)17 ActorRef (akka.actor.ActorRef)13 JobID (org.apache.flink.api.common.JobID)11 FiniteDuration (scala.concurrent.duration.FiniteDuration)10 Props (akka.actor.Props)7 Timeout (akka.util.Timeout)7 TestingHighAvailabilityServices (org.apache.flink.runtime.highavailability.TestingHighAvailabilityServices)7 Time (org.apache.flink.api.common.time.Time)6 ScheduledExecutor (org.apache.flink.runtime.concurrent.ScheduledExecutor)5 JobClientMessages (org.apache.flink.runtime.messages.JobClientMessages)5 LinkedBlockingQueue (java.util.concurrent.LinkedBlockingQueue)4 Configuration (org.apache.flink.configuration.Configuration)4 JobVertexID (org.apache.flink.runtime.jobgraph.JobVertexID)4 LookupKvStateLocation (org.apache.flink.runtime.query.KvStateMessage.LookupKvStateLocation)4 TestingFatalErrorHandler (org.apache.flink.runtime.util.TestingFatalErrorHandler)4 ResourceID (org.apache.flink.runtime.clusterframework.types.ResourceID)3 HeartbeatServices (org.apache.flink.runtime.heartbeat.HeartbeatServices)3 TestingLeaderElectionService (org.apache.flink.runtime.leaderelection.TestingLeaderElectionService)3