Search in sources :

Example 6 with TestingMiniClusterConfiguration

use of org.apache.flink.runtime.minicluster.TestingMiniClusterConfiguration in project flink by apache.

the class ZooKeeperLeaderElectionITCase method testJobExecutionOnClusterWithLeaderChange.

/**
 * Tests that a job can be executed after a new leader has been elected. For all except for the
 * last leader, the job is blocking. The JobManager will be terminated while executing the
 * blocking job. Once only one JobManager is left, it is checked that a non-blocking can be
 * successfully executed.
 */
@Test
@Ignore("FLINK-25235")
public void testJobExecutionOnClusterWithLeaderChange() throws Exception {
    final int numDispatchers = 3;
    final int numTMs = 2;
    final int numSlotsPerTM = 2;
    final Configuration configuration = ZooKeeperTestUtils.createZooKeeperHAConfig(zkServer.getConnectString(), tempFolder.newFolder().getAbsolutePath());
    // speed up refused registration retries
    configuration.setLong(ClusterOptions.REFUSED_REGISTRATION_DELAY, 50L);
    final TestingMiniClusterConfiguration miniClusterConfiguration = TestingMiniClusterConfiguration.newBuilder().setConfiguration(configuration).setNumberDispatcherResourceManagerComponents(numDispatchers).setNumTaskManagers(numTMs).setNumSlotsPerTaskManager(numSlotsPerTM).build();
    final Deadline timeout = Deadline.fromNow(TEST_TIMEOUT);
    try (TestingMiniCluster miniCluster = TestingMiniCluster.newBuilder(miniClusterConfiguration).build();
        final CuratorFrameworkWithUnhandledErrorListener curatorFramework = ZooKeeperUtils.startCuratorFramework(configuration, exception -> fail("Fatal error in curator framework."))) {
        // We need to watch for resource manager leader changes to avoid race conditions.
        final DefaultLeaderRetrievalService resourceManagerLeaderRetrieval = ZooKeeperUtils.createLeaderRetrievalService(curatorFramework.asCuratorFramework(), ZooKeeperUtils.getLeaderPathForResourceManager(), configuration);
        @SuppressWarnings("unchecked") final CompletableFuture<String>[] resourceManagerLeaderFutures = (CompletableFuture<String>[]) new CompletableFuture[numDispatchers];
        for (int i = 0; i < numDispatchers; i++) {
            resourceManagerLeaderFutures[i] = new CompletableFuture<>();
        }
        resourceManagerLeaderRetrieval.start(new TestLeaderRetrievalListener(resourceManagerLeaderFutures));
        miniCluster.start();
        final int parallelism = numTMs * numSlotsPerTM;
        JobGraph jobGraph = createJobGraph(parallelism);
        miniCluster.submitJob(jobGraph).get();
        String previousLeaderAddress = null;
        for (int i = 0; i < numDispatchers - 1; i++) {
            final DispatcherGateway leaderDispatcherGateway = getNextLeadingDispatcherGateway(miniCluster, previousLeaderAddress, timeout);
            // Make sure resource manager has also changed leadership.
            resourceManagerLeaderFutures[i].get();
            previousLeaderAddress = leaderDispatcherGateway.getAddress();
            awaitRunningStatus(leaderDispatcherGateway, jobGraph, timeout);
            leaderDispatcherGateway.shutDownCluster();
        }
        final DispatcherGateway leaderDispatcherGateway = getNextLeadingDispatcherGateway(miniCluster, previousLeaderAddress, timeout);
        // Make sure resource manager has also changed leadership.
        resourceManagerLeaderFutures[numDispatchers - 1].get();
        awaitRunningStatus(leaderDispatcherGateway, jobGraph, timeout);
        CompletableFuture<JobResult> jobResultFuture = leaderDispatcherGateway.requestJobResult(jobGraph.getJobID(), RPC_TIMEOUT);
        BlockingOperator.unblock();
        assertThat(jobResultFuture.get().isSuccess(), is(true));
        resourceManagerLeaderRetrieval.stop();
    }
}
Also used : TestingMiniCluster(org.apache.flink.runtime.minicluster.TestingMiniCluster) TestingMiniClusterConfiguration(org.apache.flink.runtime.minicluster.TestingMiniClusterConfiguration) Configuration(org.apache.flink.configuration.Configuration) TestingMiniClusterConfiguration(org.apache.flink.runtime.minicluster.TestingMiniClusterConfiguration) JobResult(org.apache.flink.runtime.jobmaster.JobResult) Deadline(org.apache.flink.api.common.time.Deadline) CuratorFrameworkWithUnhandledErrorListener(org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener) DefaultLeaderRetrievalService(org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService) DispatcherGateway(org.apache.flink.runtime.dispatcher.DispatcherGateway) CompletableFuture(java.util.concurrent.CompletableFuture) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) Ignore(org.junit.Ignore) Test(org.junit.Test)

Aggregations

TestingMiniCluster (org.apache.flink.runtime.minicluster.TestingMiniCluster)6 TestingMiniClusterConfiguration (org.apache.flink.runtime.minicluster.TestingMiniClusterConfiguration)6 Deadline (org.apache.flink.api.common.time.Deadline)5 Configuration (org.apache.flink.configuration.Configuration)5 EmbeddedHaServicesWithLeadershipControl (org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedHaServicesWithLeadershipControl)4 JobResult (org.apache.flink.runtime.jobmaster.JobResult)4 MiniCluster (org.apache.flink.runtime.minicluster.MiniCluster)4 Test (org.junit.jupiter.api.Test)4 CompletableFuture (java.util.concurrent.CompletableFuture)2 JobID (org.apache.flink.api.common.JobID)2 JobResultEntry (org.apache.flink.runtime.highavailability.JobResultEntry)2 JobResultStore (org.apache.flink.runtime.highavailability.JobResultStore)2 EmbeddedJobResultStore (org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedJobResultStore)2 TestingJobResultStore (org.apache.flink.runtime.testutils.TestingJobResultStore)2 Duration (java.time.Duration)1 UUID (java.util.UUID)1 ExecutionException (java.util.concurrent.ExecutionException)1 Supplier (java.util.function.Supplier)1 JobStatus (org.apache.flink.api.common.JobStatus)1 ClientOptions (org.apache.flink.client.cli.ClientOptions)1