Search in sources :

Example 1 with CuratorFrameworkWithUnhandledErrorListener

use of org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener in project flink by apache.

the class HighAvailabilityServicesUtils method createZooKeeperHaServices.

private static HighAvailabilityServices createZooKeeperHaServices(Configuration configuration, Executor executor, FatalErrorHandler fatalErrorHandler) throws Exception {
    final boolean useOldHaServices = configuration.get(HighAvailabilityOptions.USE_OLD_HA_SERVICES);
    BlobStoreService blobStoreService = BlobUtils.createBlobStoreFromConfig(configuration);
    final CuratorFrameworkWithUnhandledErrorListener curatorFrameworkWrapper = ZooKeeperUtils.startCuratorFramework(configuration, fatalErrorHandler);
    if (useOldHaServices) {
        return new ZooKeeperHaServices(curatorFrameworkWrapper, executor, configuration, blobStoreService);
    } else {
        return new ZooKeeperMultipleComponentLeaderElectionHaServices(curatorFrameworkWrapper, configuration, executor, blobStoreService, fatalErrorHandler);
    }
}
Also used : ZooKeeperHaServices(org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServices) CuratorFrameworkWithUnhandledErrorListener(org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener) ZooKeeperMultipleComponentLeaderElectionHaServices(org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperMultipleComponentLeaderElectionHaServices) BlobStoreService(org.apache.flink.runtime.blob.BlobStoreService)

Example 2 with CuratorFrameworkWithUnhandledErrorListener

use of org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener in project flink by apache.

the class ZKCheckpointIDCounterMultiServersTest method testRecoveredAfterConnectionLoss.

/**
 * Tests that {@link ZooKeeperCheckpointIDCounter} can be recovered after a connection loss
 * exception from ZooKeeper ensemble.
 *
 * <p>See also FLINK-14091.
 */
@Test
public void testRecoveredAfterConnectionLoss() throws Exception {
    final Configuration configuration = new Configuration();
    configuration.setString(HighAvailabilityOptions.HA_ZOOKEEPER_QUORUM, zooKeeperResource.getConnectString());
    final CuratorFrameworkWithUnhandledErrorListener curatorFrameworkWrapper = ZooKeeperUtils.startCuratorFramework(configuration, NoOpFatalErrorHandler.INSTANCE);
    try {
        OneShotLatch connectionLossLatch = new OneShotLatch();
        OneShotLatch reconnectedLatch = new OneShotLatch();
        TestingLastStateConnectionStateListener listener = new TestingLastStateConnectionStateListener(connectionLossLatch, reconnectedLatch);
        ZooKeeperCheckpointIDCounter idCounter = new ZooKeeperCheckpointIDCounter(curatorFrameworkWrapper.asCuratorFramework(), listener);
        idCounter.start();
        AtomicLong localCounter = new AtomicLong(1L);
        assertThat("ZooKeeperCheckpointIDCounter doesn't properly work.", idCounter.getAndIncrement(), is(localCounter.getAndIncrement()));
        zooKeeperResource.restart();
        connectionLossLatch.await();
        reconnectedLatch.await();
        assertThat("ZooKeeperCheckpointIDCounter doesn't properly work after reconnected.", idCounter.getAndIncrement(), is(localCounter.getAndIncrement()));
    } finally {
        curatorFrameworkWrapper.close();
    }
}
Also used : AtomicLong(java.util.concurrent.atomic.AtomicLong) Configuration(org.apache.flink.configuration.Configuration) CuratorFrameworkWithUnhandledErrorListener(org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener) OneShotLatch(org.apache.flink.core.testutils.OneShotLatch) Test(org.junit.Test)

Example 3 with CuratorFrameworkWithUnhandledErrorListener

use of org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener in project flink by apache.

the class ZooKeeperCompletedCheckpointStoreTest method testDiscardingCheckpointsAtShutDown.

/**
 * Tests that checkpoints are discarded when the completed checkpoint store is shut down with a
 * globally terminal state.
 */
@Test
public void testDiscardingCheckpointsAtShutDown() throws Exception {
    final SharedStateRegistry sharedStateRegistry = new SharedStateRegistryImpl();
    final Configuration configuration = new Configuration();
    configuration.setString(HighAvailabilityOptions.HA_ZOOKEEPER_QUORUM, zooKeeperResource.getConnectString());
    final CuratorFrameworkWithUnhandledErrorListener curatorFrameworkWrapper = ZooKeeperUtils.startCuratorFramework(configuration, NoOpFatalErrorHandler.INSTANCE);
    final CompletedCheckpointStore checkpointStore = createZooKeeperCheckpointStore(curatorFrameworkWrapper.asCuratorFramework());
    try {
        final CompletedCheckpointStoreTest.TestCompletedCheckpoint checkpoint1 = CompletedCheckpointStoreTest.createCheckpoint(0, sharedStateRegistry);
        checkpointStore.addCheckpointAndSubsumeOldestOne(checkpoint1, new CheckpointsCleaner(), () -> {
        });
        assertThat(checkpointStore.getAllCheckpoints(), Matchers.contains(checkpoint1));
        checkpointStore.shutdown(JobStatus.FINISHED, new CheckpointsCleaner());
        // verify that the checkpoint is discarded
        CompletedCheckpointStoreTest.verifyCheckpointDiscarded(checkpoint1);
    } finally {
        curatorFrameworkWrapper.close();
    }
}
Also used : Configuration(org.apache.flink.configuration.Configuration) SharedStateRegistryImpl(org.apache.flink.runtime.state.SharedStateRegistryImpl) CuratorFrameworkWithUnhandledErrorListener(org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener) SharedStateRegistry(org.apache.flink.runtime.state.SharedStateRegistry) Test(org.junit.Test)

Example 4 with CuratorFrameworkWithUnhandledErrorListener

use of org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener in project flink by apache.

the class ZooKeeperUtils method startCuratorFramework.

/**
 * Starts a {@link CuratorFramework} instance and connects it to the given ZooKeeper quorum from
 * a builder.
 *
 * @param builder {@link CuratorFrameworkFactory.Builder} A builder for curatorFramework.
 * @param fatalErrorHandler {@link FatalErrorHandler} fatalErrorHandler to handle unexpected
 *     errors of {@link CuratorFramework}
 * @return {@link CuratorFrameworkWithUnhandledErrorListener} instance
 */
@VisibleForTesting
public static CuratorFrameworkWithUnhandledErrorListener startCuratorFramework(CuratorFrameworkFactory.Builder builder, FatalErrorHandler fatalErrorHandler) {
    CuratorFramework cf = builder.build();
    UnhandledErrorListener unhandledErrorListener = (message, throwable) -> {
        LOG.error("Unhandled error in curator framework, error message: {}", message, throwable);
        // The exception thrown in UnhandledErrorListener will be caught by
        // CuratorFramework. So we mostly trigger exit process or interact with main
        // thread to inform the failure in FatalErrorHandler.
        fatalErrorHandler.onFatalError(throwable);
    };
    cf.getUnhandledErrorListenable().addListener(unhandledErrorListener);
    cf.start();
    return new CuratorFrameworkWithUnhandledErrorListener(cf, unhandledErrorListener);
}
Also used : SecurityOptions(org.apache.flink.configuration.SecurityOptions) Arrays(java.util.Arrays) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) LoggerFactory(org.slf4j.LoggerFactory) ObjectInputStream(java.io.ObjectInputStream) TreeCacheSelector(org.apache.flink.shaded.curator5.org.apache.curator.framework.recipes.cache.TreeCacheSelector) StringUtils(org.apache.commons.lang3.StringUtils) BooleanSupplier(java.util.function.BooleanSupplier) FileSystemStateStorageHelper(org.apache.flink.runtime.persistence.filesystem.FileSystemStateStorageHelper) LeaderRetrievalDriverFactory(org.apache.flink.runtime.leaderretrieval.LeaderRetrievalDriverFactory) ByteArrayInputStream(java.io.ByteArrayInputStream) ZooKeeperStateHandleStore(org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore) Preconditions.checkNotNull(org.apache.flink.util.Preconditions.checkNotNull) CompletedCheckpoint(org.apache.flink.runtime.checkpoint.CompletedCheckpoint) DefaultLeaderRetrievalService(org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService) LeaderElectionDriverFactory(org.apache.flink.runtime.leaderelection.LeaderElectionDriverFactory) ZooKeeperLeaderRetrievalDriver(org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver) PathChildrenCache(org.apache.flink.shaded.curator5.org.apache.curator.framework.recipes.cache.PathChildrenCache) Collection(java.util.Collection) HighAvailabilityServicesUtils(org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils) DefaultJobGraphStore(org.apache.flink.runtime.jobmanager.DefaultJobGraphStore) UUID(java.util.UUID) ZooKeeperLeaderElectionDriver(org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver) Collectors(java.util.stream.Collectors) Serializable(java.io.Serializable) ACLProvider(org.apache.flink.shaded.curator5.org.apache.curator.framework.api.ACLProvider) Stat(org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.data.Stat) List(java.util.List) ZooKeeperCheckpointStoreUtil(org.apache.flink.runtime.checkpoint.ZooKeeperCheckpointStoreUtil) CuratorFrameworkFactory(org.apache.flink.shaded.curator5.org.apache.curator.framework.CuratorFrameworkFactory) CreateMode(org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.CreateMode) CuratorFrameworkWithUnhandledErrorListener(org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener) ZooKeeperLeaderElectionDriverFactory(org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriverFactory) ZooKeeperLeaderRetrievalDriverFactory(org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriverFactory) JobGraphStore(org.apache.flink.runtime.jobmanager.JobGraphStore) IllegalConfigurationException(org.apache.flink.configuration.IllegalConfigurationException) ZooDefs(org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ZooDefs) ByteArrayOutputStream(java.io.ByteArrayOutputStream) DefaultLeaderElectionService(org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService) CuratorFramework(org.apache.flink.shaded.curator5.org.apache.curator.framework.CuratorFramework) ACL(org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.data.ACL) KeeperException(org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException) DefaultCompletedCheckpointStoreUtils(org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStoreUtils) RunnableWithException(org.apache.flink.util.function.RunnableWithException) SessionConnectionStateErrorPolicy(org.apache.flink.shaded.curator5.org.apache.curator.framework.state.SessionConnectionStateErrorPolicy) SharedStateRegistryFactory(org.apache.flink.runtime.state.SharedStateRegistryFactory) ZooKeeperJobGraphStoreWatcher(org.apache.flink.runtime.jobmanager.ZooKeeperJobGraphStoreWatcher) TreeCacheListener(org.apache.flink.shaded.curator5.org.apache.curator.framework.recipes.cache.TreeCacheListener) FatalErrorHandler(org.apache.flink.runtime.rpc.FatalErrorHandler) ObjectOutputStream(java.io.ObjectOutputStream) HighAvailabilityMode(org.apache.flink.runtime.jobmanager.HighAvailabilityMode) Nonnull(javax.annotation.Nonnull) DefaultLastStateConnectionStateListener(org.apache.flink.runtime.checkpoint.DefaultLastStateConnectionStateListener) Logger(org.slf4j.Logger) Executor(java.util.concurrent.Executor) ZooKeeperCheckpointIDCounter(org.apache.flink.runtime.checkpoint.ZooKeeperCheckpointIDCounter) Configuration(org.apache.flink.configuration.Configuration) CompletedCheckpointStore(org.apache.flink.runtime.checkpoint.CompletedCheckpointStore) IOException(java.io.IOException) DefaultCompletedCheckpointStore(org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore) DefaultACLProvider(org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.DefaultACLProvider) LeaderInformation(org.apache.flink.runtime.leaderelection.LeaderInformation) VisibleForTesting(org.apache.flink.annotation.VisibleForTesting) Executors(org.apache.flink.util.concurrent.Executors) UnhandledErrorListener(org.apache.flink.shaded.curator5.org.apache.curator.framework.api.UnhandledErrorListener) JobID(org.apache.flink.api.common.JobID) ExponentialBackoffRetry(org.apache.flink.shaded.curator5.org.apache.curator.retry.ExponentialBackoffRetry) TreeCache(org.apache.flink.shaded.curator5.org.apache.curator.framework.recipes.cache.TreeCache) RetrievableStateStorageHelper(org.apache.flink.runtime.persistence.RetrievableStateStorageHelper) HighAvailabilityOptions(org.apache.flink.configuration.HighAvailabilityOptions) ZooKeeperJobGraphStoreUtil(org.apache.flink.runtime.jobmanager.ZooKeeperJobGraphStoreUtil) CuratorFramework(org.apache.flink.shaded.curator5.org.apache.curator.framework.CuratorFramework) CuratorFrameworkWithUnhandledErrorListener(org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener) CuratorFrameworkWithUnhandledErrorListener(org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener) UnhandledErrorListener(org.apache.flink.shaded.curator5.org.apache.curator.framework.api.UnhandledErrorListener) VisibleForTesting(org.apache.flink.annotation.VisibleForTesting)

Example 5 with CuratorFrameworkWithUnhandledErrorListener

use of org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener in project flink by apache.

the class ZooKeeperLeaderElectionTest method testUnExpectedErrorForwarding.

/**
 * Test that background errors in the {@link LeaderElectionDriver} are correctly forwarded to
 * the {@link FatalErrorHandler}.
 */
@Test
public void testUnExpectedErrorForwarding() throws Exception {
    LeaderElectionDriver leaderElectionDriver = null;
    final TestingLeaderElectionEventHandler electionEventHandler = new TestingLeaderElectionEventHandler(LEADER_ADDRESS);
    final TestingFatalErrorHandler fatalErrorHandler = new TestingFatalErrorHandler();
    final FlinkRuntimeException testException = new FlinkRuntimeException("testUnExpectedErrorForwarding");
    final CuratorFrameworkFactory.Builder curatorFrameworkBuilder = CuratorFrameworkFactory.builder().connectString(testingServer.getConnectString()).retryPolicy(new ExponentialBackoffRetry(1, 0)).aclProvider(new ACLProvider() {

        // trigger background exception
        @Override
        public List<ACL> getDefaultAcl() {
            throw testException;
        }

        @Override
        public List<ACL> getAclForPath(String s) {
            throw testException;
        }
    }).namespace("flink");
    try (CuratorFrameworkWithUnhandledErrorListener curatorFrameworkWrapper = ZooKeeperUtils.startCuratorFramework(curatorFrameworkBuilder, fatalErrorHandler)) {
        CuratorFramework clientWithErrorHandler = curatorFrameworkWrapper.asCuratorFramework();
        assertFalse(fatalErrorHandler.getErrorFuture().isDone());
        leaderElectionDriver = createAndInitLeaderElectionDriver(clientWithErrorHandler, electionEventHandler);
        assertThat(fatalErrorHandler.getErrorFuture().join(), FlinkMatchers.containsCause(testException));
    } finally {
        electionEventHandler.close();
        if (leaderElectionDriver != null) {
            leaderElectionDriver.close();
        }
    }
}
Also used : TestingFatalErrorHandler(org.apache.flink.runtime.util.TestingFatalErrorHandler) ACLProvider(org.apache.flink.shaded.curator5.org.apache.curator.framework.api.ACLProvider) CuratorFramework(org.apache.flink.shaded.curator5.org.apache.curator.framework.CuratorFramework) CuratorFrameworkFactory(org.apache.flink.shaded.curator5.org.apache.curator.framework.CuratorFrameworkFactory) ExponentialBackoffRetry(org.apache.flink.shaded.curator5.org.apache.curator.retry.ExponentialBackoffRetry) FlinkRuntimeException(org.apache.flink.util.FlinkRuntimeException) CuratorFrameworkWithUnhandledErrorListener(org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener) ACL(org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.data.ACL) Mockito.anyString(org.mockito.Mockito.anyString) Test(org.junit.Test)

Aggregations

CuratorFrameworkWithUnhandledErrorListener (org.apache.flink.runtime.highavailability.zookeeper.CuratorFrameworkWithUnhandledErrorListener)18 Test (org.junit.Test)11 Configuration (org.apache.flink.configuration.Configuration)8 CuratorFramework (org.apache.flink.shaded.curator5.org.apache.curator.framework.CuratorFramework)6 JobID (org.apache.flink.api.common.JobID)3 JobGraph (org.apache.flink.runtime.jobgraph.JobGraph)3 ZooKeeperLeaderRetrievalDriver (org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver)3 KeeperException (org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException)3 Mockito.anyString (org.mockito.Mockito.anyString)3 ByteArrayOutputStream (java.io.ByteArrayOutputStream)2 IOException (java.io.IOException)2 ObjectOutputStream (java.io.ObjectOutputStream)2 CompletableFuture (java.util.concurrent.CompletableFuture)2 TimeoutException (java.util.concurrent.TimeoutException)2 Nonnull (javax.annotation.Nonnull)2 CuratorFrameworkFactory (org.apache.flink.shaded.curator5.org.apache.curator.framework.CuratorFrameworkFactory)2 ACLProvider (org.apache.flink.shaded.curator5.org.apache.curator.framework.api.ACLProvider)2 ExponentialBackoffRetry (org.apache.flink.shaded.curator5.org.apache.curator.retry.ExponentialBackoffRetry)2 ACL (org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.data.ACL)2 ByteArrayInputStream (java.io.ByteArrayInputStream)1