Search in sources :

Example 1 with DispatcherResourceCleanerFactory

use of org.apache.flink.runtime.dispatcher.cleanup.DispatcherResourceCleanerFactory in project flink by apache.

the class DispatcherCleanupITCase method testCleanupThroughRetries.

@Test
public void testCleanupThroughRetries() throws Exception {
    final JobGraph jobGraph = createJobGraph();
    final JobID jobId = jobGraph.getJobID();
    // JobGraphStore
    final AtomicInteger actualGlobalCleanupCallCount = new AtomicInteger();
    final OneShotLatch successfulCleanupLatch = new OneShotLatch();
    final int numberOfErrors = 5;
    final RuntimeException temporaryError = new RuntimeException("Expected RuntimeException: Unable to remove job graph.");
    final JobGraphStore jobGraphStore = createAndStartJobGraphStoreWithCleanupFailures(numberOfErrors, temporaryError, actualGlobalCleanupCallCount, successfulCleanupLatch);
    haServices.setJobGraphStore(jobGraphStore);
    // Construct leader election service.
    final TestingLeaderElectionService leaderElectionService = new TestingLeaderElectionService();
    haServices.setJobMasterLeaderElectionService(jobId, leaderElectionService);
    // start the dispatcher with enough retries on cleanup
    final JobManagerRunnerRegistry jobManagerRunnerRegistry = new DefaultJobManagerRunnerRegistry(2);
    final Dispatcher dispatcher = createTestingDispatcherBuilder().setResourceCleanerFactory(new DispatcherResourceCleanerFactory(ForkJoinPool.commonPool(), TestingRetryStrategies.createWithNumberOfRetries(numberOfErrors), jobManagerRunnerRegistry, haServices.getJobGraphStore(), blobServer, haServices, UnregisteredMetricGroups.createUnregisteredJobManagerMetricGroup())).build();
    dispatcher.start();
    toTerminate.add(dispatcher);
    leaderElectionService.isLeader(UUID.randomUUID());
    final DispatcherGateway dispatcherGateway = dispatcher.getSelfGateway(DispatcherGateway.class);
    dispatcherGateway.submitJob(jobGraph, TIMEOUT).get();
    waitForJobToFinish(leaderElectionService, dispatcherGateway, jobId);
    successfulCleanupLatch.await();
    assertThat(actualGlobalCleanupCallCount.get(), equalTo(numberOfErrors + 1));
    assertThat("The JobGraph should be removed from JobGraphStore.", haServices.getJobGraphStore().getJobIds(), IsEmptyCollection.empty());
    CommonTestUtils.waitUntilCondition(() -> haServices.getJobResultStore().hasJobResultEntry(jobId), Deadline.fromNow(Duration.ofMinutes(5)), "The JobResultStore should have this job marked as clean.");
}
Also used : TestingLeaderElectionService(org.apache.flink.runtime.leaderelection.TestingLeaderElectionService) JobGraphStore(org.apache.flink.runtime.jobmanager.JobGraphStore) TestingJobGraphStore(org.apache.flink.runtime.testutils.TestingJobGraphStore) RpcEndpoint(org.apache.flink.runtime.rpc.RpcEndpoint) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) OneShotLatch(org.apache.flink.core.testutils.OneShotLatch) DispatcherResourceCleanerFactory(org.apache.flink.runtime.dispatcher.cleanup.DispatcherResourceCleanerFactory) JobID(org.apache.flink.api.common.JobID) Test(org.junit.Test)

Aggregations

AtomicInteger (java.util.concurrent.atomic.AtomicInteger)1 JobID (org.apache.flink.api.common.JobID)1 OneShotLatch (org.apache.flink.core.testutils.OneShotLatch)1 DispatcherResourceCleanerFactory (org.apache.flink.runtime.dispatcher.cleanup.DispatcherResourceCleanerFactory)1 JobGraph (org.apache.flink.runtime.jobgraph.JobGraph)1 JobGraphStore (org.apache.flink.runtime.jobmanager.JobGraphStore)1 TestingLeaderElectionService (org.apache.flink.runtime.leaderelection.TestingLeaderElectionService)1 RpcEndpoint (org.apache.flink.runtime.rpc.RpcEndpoint)1 TestingJobGraphStore (org.apache.flink.runtime.testutils.TestingJobGraphStore)1 Test (org.junit.Test)1