Search in sources :

Example 16 with CuratorFrameworkWithUnhandledErrorListener

use of org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener in project flink by apache.

the class ZooKeeperUtilsITCase method runWriteAndReadLeaderInformationTest.

private void runWriteAndReadLeaderInformationTest(LeaderInformation leaderInformation) throws Exception {
    final CuratorFrameworkWithUnhandledErrorListener curatorFramework = startCuratorFramework();
    final String path = "/foobar";
    try {
        ZooKeeperUtils.writeLeaderInformationToZooKeeper(leaderInformation, curatorFramework.asCuratorFramework(), () -> true, path);
        final LeaderInformation readLeaderInformation = ZooKeeperUtils.readLeaderInformation(curatorFramework.asCuratorFramework().getData().forPath(path));
        assertThat(readLeaderInformation).isEqualTo(leaderInformation);
    } finally {
        curatorFramework.close();
    }
}
Also used : CuratorFrameworkWithUnhandledErrorListener(org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener) LeaderInformation(org.apache.flink.runtime.leaderelection.LeaderInformation)

Example 17 with CuratorFrameworkWithUnhandledErrorListener

use of org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener in project flink by apache.

the class ZooKeeperStateHandleStoreTest method testLockCleanupWhenClientTimesOut.

/**
 * FLINK-6612
 *
 * <p>Tests that lock nodes will be released if the client dies.
 */
@Test
public void testLockCleanupWhenClientTimesOut() throws Exception {
    final TestingLongStateHandleHelper longStateStorage = new TestingLongStateHandleHelper();
    Configuration configuration = new Configuration();
    configuration.setString(HighAvailabilityOptions.HA_ZOOKEEPER_QUORUM, ZOOKEEPER.getConnectString());
    configuration.setInteger(HighAvailabilityOptions.ZOOKEEPER_SESSION_TIMEOUT, 100);
    configuration.setString(HighAvailabilityOptions.HA_ZOOKEEPER_ROOT, "timeout");
    try (CuratorFrameworkWithUnhandledErrorListener curatorFrameworkWrapper = ZooKeeperUtils.startCuratorFramework(configuration, NoOpFatalErrorHandler.INSTANCE);
        CuratorFrameworkWithUnhandledErrorListener curatorFrameworkWrapper2 = ZooKeeperUtils.startCuratorFramework(configuration, NoOpFatalErrorHandler.INSTANCE)) {
        CuratorFramework client = curatorFrameworkWrapper.asCuratorFramework();
        CuratorFramework client2 = curatorFrameworkWrapper2.asCuratorFramework();
        ZooKeeperStateHandleStore<TestingLongStateHandleHelper.LongStateHandle> zkStore = new ZooKeeperStateHandleStore<>(client, longStateStorage);
        final String path = "/state";
        zkStore.addAndLock(path, new TestingLongStateHandleHelper.LongStateHandle(42L));
        // this should delete all ephemeral nodes
        client.close();
        Stat stat = client2.checkExists().forPath(path);
        // check that our state node still exists
        assertNotNull(stat);
        Collection<String> children = client2.getChildren().forPath(path);
        // check that the lock node has been released
        assertEquals(0, children.size());
    }
}
Also used : CuratorFramework(org.apache.flink.shaded.curator5.org.apache.curator.framework.CuratorFramework) Stat(org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.data.Stat) Configuration(org.apache.flink.configuration.Configuration) TestingLongStateHandleHelper(org.apache.flink.runtime.persistence.TestingLongStateHandleHelper) CuratorFrameworkWithUnhandledErrorListener(org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener) Test(org.junit.Test)

Example 18 with CuratorFrameworkWithUnhandledErrorListener

use of org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener in project flink by apache.

the class ZooKeeperLeaderElectionITCase method testJobExecutionOnClusterWithLeaderChange.

/**
 * Tests that a job can be executed after a new leader has been elected. For all except for the
 * last leader, the job is blocking. The JobManager will be terminated while executing the
 * blocking job. Once only one JobManager is left, it is checked that a non-blocking can be
 * successfully executed.
 */
@Test
@Ignore("FLINK-25235")
public void testJobExecutionOnClusterWithLeaderChange() throws Exception {
    final int numDispatchers = 3;
    final int numTMs = 2;
    final int numSlotsPerTM = 2;
    final Configuration configuration = ZooKeeperTestUtils.createZooKeeperHAConfig(zkServer.getConnectString(), tempFolder.newFolder().getAbsolutePath());
    // speed up refused registration retries
    configuration.setLong(ClusterOptions.REFUSED_REGISTRATION_DELAY, 50L);
    final TestingMiniClusterConfiguration miniClusterConfiguration = TestingMiniClusterConfiguration.newBuilder().setConfiguration(configuration).setNumberDispatcherResourceManagerComponents(numDispatchers).setNumTaskManagers(numTMs).setNumSlotsPerTaskManager(numSlotsPerTM).build();
    final Deadline timeout = Deadline.fromNow(TEST_TIMEOUT);
    try (TestingMiniCluster miniCluster = TestingMiniCluster.newBuilder(miniClusterConfiguration).build();
        final CuratorFrameworkWithUnhandledErrorListener curatorFramework = ZooKeeperUtils.startCuratorFramework(configuration, exception -> fail("Fatal error in curator framework."))) {
        // We need to watch for resource manager leader changes to avoid race conditions.
        final DefaultLeaderRetrievalService resourceManagerLeaderRetrieval = ZooKeeperUtils.createLeaderRetrievalService(curatorFramework.asCuratorFramework(), ZooKeeperUtils.getLeaderPathForResourceManager(), configuration);
        @SuppressWarnings("unchecked") final CompletableFuture<String>[] resourceManagerLeaderFutures = (CompletableFuture<String>[]) new CompletableFuture[numDispatchers];
        for (int i = 0; i < numDispatchers; i++) {
            resourceManagerLeaderFutures[i] = new CompletableFuture<>();
        }
        resourceManagerLeaderRetrieval.start(new TestLeaderRetrievalListener(resourceManagerLeaderFutures));
        miniCluster.start();
        final int parallelism = numTMs * numSlotsPerTM;
        JobGraph jobGraph = createJobGraph(parallelism);
        miniCluster.submitJob(jobGraph).get();
        String previousLeaderAddress = null;
        for (int i = 0; i < numDispatchers - 1; i++) {
            final DispatcherGateway leaderDispatcherGateway = getNextLeadingDispatcherGateway(miniCluster, previousLeaderAddress, timeout);
            // Make sure resource manager has also changed leadership.
            resourceManagerLeaderFutures[i].get();
            previousLeaderAddress = leaderDispatcherGateway.getAddress();
            awaitRunningStatus(leaderDispatcherGateway, jobGraph, timeout);
            leaderDispatcherGateway.shutDownCluster();
        }
        final DispatcherGateway leaderDispatcherGateway = getNextLeadingDispatcherGateway(miniCluster, previousLeaderAddress, timeout);
        // Make sure resource manager has also changed leadership.
        resourceManagerLeaderFutures[numDispatchers - 1].get();
        awaitRunningStatus(leaderDispatcherGateway, jobGraph, timeout);
        CompletableFuture<JobResult> jobResultFuture = leaderDispatcherGateway.requestJobResult(jobGraph.getJobID(), RPC_TIMEOUT);
        BlockingOperator.unblock();
        assertThat(jobResultFuture.get().isSuccess(), is(true));
        resourceManagerLeaderRetrieval.stop();
    }
}
Also used : TestingMiniCluster(org.apache.flink.runtime.minicluster.TestingMiniCluster) TestingMiniClusterConfiguration(org.apache.flink.runtime.minicluster.TestingMiniClusterConfiguration) Configuration(org.apache.flink.configuration.Configuration) TestingMiniClusterConfiguration(org.apache.flink.runtime.minicluster.TestingMiniClusterConfiguration) JobResult(org.apache.flink.runtime.jobmaster.JobResult) Deadline(org.apache.flink.api.common.time.Deadline) CuratorFrameworkWithUnhandledErrorListener(org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener) DefaultLeaderRetrievalService(org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService) DispatcherGateway(org.apache.flink.runtime.dispatcher.DispatcherGateway) CompletableFuture(java.util.concurrent.CompletableFuture) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) Ignore(org.junit.Ignore) Test(org.junit.Test)

Aggregations

CuratorFrameworkWithUnhandledErrorListener (org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener)18 Test (org.junit.Test)11 Configuration (org.apache.flink.configuration.Configuration)8 CuratorFramework (org.apache.flink.shaded.curator5.org.apache.curator.framework.CuratorFramework)6 JobID (org.apache.flink.api.common.JobID)3 JobGraph (org.apache.flink.runtime.jobgraph.JobGraph)3 ZooKeeperLeaderRetrievalDriver (org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver)3 KeeperException (org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException)3 Mockito.anyString (org.mockito.Mockito.anyString)3 ByteArrayOutputStream (java.io.ByteArrayOutputStream)2 IOException (java.io.IOException)2 ObjectOutputStream (java.io.ObjectOutputStream)2 CompletableFuture (java.util.concurrent.CompletableFuture)2 TimeoutException (java.util.concurrent.TimeoutException)2 Nonnull (javax.annotation.Nonnull)2 CuratorFrameworkFactory (org.apache.flink.shaded.curator5.org.apache.curator.framework.CuratorFrameworkFactory)2 ACLProvider (org.apache.flink.shaded.curator5.org.apache.curator.framework.api.ACLProvider)2 ExponentialBackoffRetry (org.apache.flink.shaded.curator5.org.apache.curator.retry.ExponentialBackoffRetry)2 ACL (org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.data.ACL)2 ByteArrayInputStream (java.io.ByteArrayInputStream)1