Search in sources :

Example 1 with EmbeddedHaServicesWithLeadershipControl

use of org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedHaServicesWithLeadershipControl in project flink by apache.

the class JobDispatcherITCase method testRecoverFromCheckpointAfterLosingAndRegainingLeadership.

@Test
public void testRecoverFromCheckpointAfterLosingAndRegainingLeadership(@TempDir Path tmpPath) throws Exception {
    final Deadline deadline = Deadline.fromNow(TIMEOUT);
    final Configuration configuration = new Configuration();
    configuration.set(HighAvailabilityOptions.HA_MODE, HighAvailabilityMode.ZOOKEEPER.name());
    final TestingMiniClusterConfiguration clusterConfiguration = TestingMiniClusterConfiguration.newBuilder().setConfiguration(configuration).build();
    final EmbeddedHaServicesWithLeadershipControl haServices = new EmbeddedHaServicesWithLeadershipControl(TestingUtils.defaultExecutor());
    final Configuration newConfiguration = new Configuration(clusterConfiguration.getConfiguration());
    final long checkpointInterval = 100;
    final JobID jobID = generateAndPersistJobGraph(newConfiguration, checkpointInterval, tmpPath);
    final TestingMiniCluster.Builder clusterBuilder = TestingMiniCluster.newBuilder(clusterConfiguration).setHighAvailabilityServicesSupplier(() -> haServices).setDispatcherResourceManagerComponentFactorySupplier(createJobModeDispatcherResourceManagerComponentFactorySupplier(newConfiguration));
    AtLeastOneCheckpointInvokable.reset();
    try (final MiniCluster cluster = clusterBuilder.build()) {
        // start mini cluster and submit the job
        cluster.start();
        AtLeastOneCheckpointInvokable.atLeastOneCheckpointCompleted.await();
        final CompletableFuture<JobResult> firstJobResult = cluster.requestJobResult(jobID);
        haServices.revokeDispatcherLeadership();
        // make sure the leadership is revoked to avoid race conditions
        Assertions.assertEquals(ApplicationStatus.UNKNOWN, firstJobResult.get().getApplicationStatus());
        haServices.grantDispatcherLeadership();
        // job is suspended, wait until it's running
        awaitJobStatus(cluster, jobID, JobStatus.RUNNING, deadline);
        CommonTestUtils.waitUntilCondition(() -> cluster.getArchivedExecutionGraph(jobID).get().getCheckpointStatsSnapshot().getLatestRestoredCheckpoint() != null, deadline);
    }
}
Also used : TestingMiniCluster(org.apache.flink.runtime.minicluster.TestingMiniCluster) TestingMiniClusterConfiguration(org.apache.flink.runtime.minicluster.TestingMiniClusterConfiguration) CheckpointCoordinatorConfiguration(org.apache.flink.runtime.jobgraph.tasks.CheckpointCoordinatorConfiguration) Configuration(org.apache.flink.configuration.Configuration) TestingMiniClusterConfiguration(org.apache.flink.runtime.minicluster.TestingMiniClusterConfiguration) JobResult(org.apache.flink.runtime.jobmaster.JobResult) Deadline(org.apache.flink.api.common.time.Deadline) EmbeddedHaServicesWithLeadershipControl(org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedHaServicesWithLeadershipControl) MiniCluster(org.apache.flink.runtime.minicluster.MiniCluster) TestingMiniCluster(org.apache.flink.runtime.minicluster.TestingMiniCluster) JobID(org.apache.flink.api.common.JobID) Test(org.junit.jupiter.api.Test)

Example 2 with EmbeddedHaServicesWithLeadershipControl

use of org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedHaServicesWithLeadershipControl in project flink by apache.

the class ApplicationDispatcherBootstrapITCase method testDispatcherRecoversAfterLosingAndRegainingLeadership.

@Test
public void testDispatcherRecoversAfterLosingAndRegainingLeadership() throws Exception {
    final String blockId = UUID.randomUUID().toString();
    final Deadline deadline = Deadline.fromNow(TIMEOUT);
    final Configuration configuration = new Configuration();
    configuration.set(HighAvailabilityOptions.HA_MODE, HighAvailabilityMode.ZOOKEEPER.name());
    configuration.set(DeploymentOptions.TARGET, EmbeddedExecutor.NAME);
    configuration.set(ClientOptions.CLIENT_RETRY_PERIOD, Duration.ofMillis(100));
    final TestingMiniClusterConfiguration clusterConfiguration = TestingMiniClusterConfiguration.newBuilder().setConfiguration(configuration).build();
    final EmbeddedHaServicesWithLeadershipControl haServices = new EmbeddedHaServicesWithLeadershipControl(TestingUtils.defaultExecutor());
    final TestingMiniCluster.Builder clusterBuilder = TestingMiniCluster.newBuilder(clusterConfiguration).setHighAvailabilityServicesSupplier(() -> haServices).setDispatcherResourceManagerComponentFactorySupplier(createApplicationModeDispatcherResourceManagerComponentFactorySupplier(clusterConfiguration.getConfiguration(), BlockingJob.getProgram(blockId)));
    try (final MiniCluster cluster = clusterBuilder.build()) {
        // start mini cluster and submit the job
        cluster.start();
        // wait until job is running
        awaitJobStatus(cluster, ApplicationDispatcherBootstrap.ZERO_JOB_ID, JobStatus.RUNNING, deadline);
        // make sure the operator is actually running
        BlockingJob.awaitRunning(blockId);
        final CompletableFuture<JobResult> firstJobResult = cluster.requestJobResult(ApplicationDispatcherBootstrap.ZERO_JOB_ID);
        haServices.revokeDispatcherLeadership();
        // make sure the leadership is revoked to avoid race conditions
        assertThat(firstJobResult.get()).extracting(JobResult::getApplicationStatus).isEqualTo(ApplicationStatus.UNKNOWN);
        haServices.grantDispatcherLeadership();
        // job is suspended, wait until it's running
        awaitJobStatus(cluster, ApplicationDispatcherBootstrap.ZERO_JOB_ID, JobStatus.RUNNING, deadline);
        // unblock processing so the job can finish
        BlockingJob.unblock(blockId);
        // and wait for it to actually finish
        final JobResult secondJobResult = cluster.requestJobResult(ApplicationDispatcherBootstrap.ZERO_JOB_ID).get();
        assertThat(secondJobResult.isSuccess()).isTrue();
        assertThat(secondJobResult.getApplicationStatus()).isEqualTo(ApplicationStatus.SUCCEEDED);
        // the cluster should shut down automatically once the application completes
        awaitClusterStopped(cluster, deadline);
    } finally {
        BlockingJob.cleanUp(blockId);
    }
}
Also used : TestingMiniCluster(org.apache.flink.runtime.minicluster.TestingMiniCluster) TestingMiniClusterConfiguration(org.apache.flink.runtime.minicluster.TestingMiniClusterConfiguration) Configuration(org.apache.flink.configuration.Configuration) TestingMiniClusterConfiguration(org.apache.flink.runtime.minicluster.TestingMiniClusterConfiguration) JobResult(org.apache.flink.runtime.jobmaster.JobResult) Deadline(org.apache.flink.api.common.time.Deadline) EmbeddedHaServicesWithLeadershipControl(org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedHaServicesWithLeadershipControl) MiniCluster(org.apache.flink.runtime.minicluster.MiniCluster) TestingMiniCluster(org.apache.flink.runtime.minicluster.TestingMiniCluster) Test(org.junit.jupiter.api.Test)

Example 3 with EmbeddedHaServicesWithLeadershipControl

use of org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedHaServicesWithLeadershipControl in project flink by apache.

the class ApplicationDispatcherBootstrapITCase method testDirtyJobResultRecoveryInApplicationMode.

@Test
public void testDirtyJobResultRecoveryInApplicationMode() throws Exception {
    final Deadline deadline = Deadline.fromNow(TIMEOUT);
    final Configuration configuration = new Configuration();
    configuration.set(HighAvailabilityOptions.HA_MODE, HighAvailabilityMode.ZOOKEEPER.name());
    configuration.set(DeploymentOptions.TARGET, EmbeddedExecutor.NAME);
    configuration.set(ClientOptions.CLIENT_RETRY_PERIOD, Duration.ofMillis(100));
    final TestingMiniClusterConfiguration clusterConfiguration = TestingMiniClusterConfiguration.newBuilder().setConfiguration(configuration).build();
    // having a dirty entry in the JobResultStore should make the ApplicationDispatcherBootstrap
    // implementation fail to submit the job
    final JobResultStore jobResultStore = new EmbeddedJobResultStore();
    jobResultStore.createDirtyResult(new JobResultEntry(TestingJobResultStore.createSuccessfulJobResult(ApplicationDispatcherBootstrap.ZERO_JOB_ID)));
    final EmbeddedHaServicesWithLeadershipControl haServices = new EmbeddedHaServicesWithLeadershipControl(TestingUtils.defaultExecutor()) {

        @Override
        public JobResultStore getJobResultStore() {
            return jobResultStore;
        }
    };
    final TestingMiniCluster.Builder clusterBuilder = TestingMiniCluster.newBuilder(clusterConfiguration).setHighAvailabilityServicesSupplier(() -> haServices).setDispatcherResourceManagerComponentFactorySupplier(createApplicationModeDispatcherResourceManagerComponentFactorySupplier(clusterConfiguration.getConfiguration(), ErrorHandlingSubmissionJob.createPackagedProgram()));
    try (final MiniCluster cluster = clusterBuilder.build()) {
        // start mini cluster and submit the job
        cluster.start();
        // the cluster should shut down automatically once the application completes
        awaitClusterStopped(cluster, deadline);
    }
    FlinkAssertions.assertThatChainOfCauses(ErrorHandlingSubmissionJob.getSubmissionException()).as("The job's main method shouldn't have been succeeded due to a DuplicateJobSubmissionException.").hasAtLeastOneElementOfType(DuplicateJobSubmissionException.class);
    assertThat(jobResultStore.hasDirtyJobResultEntry(ApplicationDispatcherBootstrap.ZERO_JOB_ID)).isFalse();
    assertThat(jobResultStore.hasCleanJobResultEntry(ApplicationDispatcherBootstrap.ZERO_JOB_ID)).isTrue();
}
Also used : TestingMiniCluster(org.apache.flink.runtime.minicluster.TestingMiniCluster) TestingMiniClusterConfiguration(org.apache.flink.runtime.minicluster.TestingMiniClusterConfiguration) Configuration(org.apache.flink.configuration.Configuration) TestingMiniClusterConfiguration(org.apache.flink.runtime.minicluster.TestingMiniClusterConfiguration) Deadline(org.apache.flink.api.common.time.Deadline) JobResultEntry(org.apache.flink.runtime.highavailability.JobResultEntry) EmbeddedHaServicesWithLeadershipControl(org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedHaServicesWithLeadershipControl) MiniCluster(org.apache.flink.runtime.minicluster.MiniCluster) TestingMiniCluster(org.apache.flink.runtime.minicluster.TestingMiniCluster) EmbeddedJobResultStore(org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedJobResultStore) TestingJobResultStore(org.apache.flink.runtime.testutils.TestingJobResultStore) JobResultStore(org.apache.flink.runtime.highavailability.JobResultStore) EmbeddedJobResultStore(org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedJobResultStore) Test(org.junit.jupiter.api.Test)

Example 4 with EmbeddedHaServicesWithLeadershipControl

use of org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedHaServicesWithLeadershipControl in project flink by apache.

the class ApplicationDispatcherBootstrapITCase method testSubmitFailedJobOnApplicationError.

@Test
public void testSubmitFailedJobOnApplicationError() throws Exception {
    final Deadline deadline = Deadline.fromNow(TIMEOUT);
    final JobID jobId = new JobID();
    final Configuration configuration = new Configuration();
    configuration.set(HighAvailabilityOptions.HA_MODE, HighAvailabilityMode.ZOOKEEPER.name());
    configuration.set(DeploymentOptions.TARGET, EmbeddedExecutor.NAME);
    configuration.set(ClientOptions.CLIENT_RETRY_PERIOD, Duration.ofMillis(100));
    configuration.set(DeploymentOptions.SHUTDOWN_ON_APPLICATION_FINISH, false);
    configuration.set(DeploymentOptions.SUBMIT_FAILED_JOB_ON_APPLICATION_ERROR, true);
    configuration.set(PipelineOptionsInternal.PIPELINE_FIXED_JOB_ID, jobId.toHexString());
    final TestingMiniClusterConfiguration clusterConfiguration = TestingMiniClusterConfiguration.newBuilder().setConfiguration(configuration).build();
    final EmbeddedHaServicesWithLeadershipControl haServices = new EmbeddedHaServicesWithLeadershipControl(TestingUtils.defaultExecutor());
    final TestingMiniCluster.Builder clusterBuilder = TestingMiniCluster.newBuilder(clusterConfiguration).setHighAvailabilityServicesSupplier(() -> haServices).setDispatcherResourceManagerComponentFactorySupplier(createApplicationModeDispatcherResourceManagerComponentFactorySupplier(clusterConfiguration.getConfiguration(), FailingJob.getProgram()));
    try (final MiniCluster cluster = clusterBuilder.build()) {
        // start mini cluster and submit the job
        cluster.start();
        // wait until the failed job has been submitted
        awaitJobStatus(cluster, jobId, JobStatus.FAILED, deadline);
        final ArchivedExecutionGraph graph = cluster.getArchivedExecutionGraph(jobId).get();
        assertThat(graph.getJobID()).isEqualTo(jobId);
        assertThat(graph.getJobName()).isEqualTo(ApplicationDispatcherBootstrap.FAILED_JOB_NAME);
        assertThat(graph.getFailureInfo()).isNotNull().extracting(ErrorInfo::getException).extracting(e -> e.deserializeError(Thread.currentThread().getContextClassLoader())).satisfies(e -> assertThat(e).isInstanceOf(ProgramInvocationException.class).hasRootCauseInstanceOf(RuntimeException.class).hasRootCauseMessage(FailingJob.EXCEPTION_MESSAGE));
    }
}
Also used : TestingMiniCluster(org.apache.flink.runtime.minicluster.TestingMiniCluster) Deadline(org.apache.flink.api.common.time.Deadline) ProgramInvocationException(org.apache.flink.client.program.ProgramInvocationException) EmbeddedHaServicesWithLeadershipControl(org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedHaServicesWithLeadershipControl) TestingMiniClusterConfiguration(org.apache.flink.runtime.minicluster.TestingMiniClusterConfiguration) Assertions.assertThat(org.assertj.core.api.Assertions.assertThat) EmbeddedJobResultStore(org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedJobResultStore) ExceptionUtils(org.apache.flink.util.ExceptionUtils) CompletableFuture(java.util.concurrent.CompletableFuture) JobStatus(org.apache.flink.api.common.JobStatus) Supplier(java.util.function.Supplier) EmbeddedExecutor(org.apache.flink.client.deployment.application.executors.EmbeddedExecutor) PipelineOptionsInternal(org.apache.flink.configuration.PipelineOptionsInternal) JobResult(org.apache.flink.runtime.jobmaster.JobResult) TestLoggerExtension(org.apache.flink.util.TestLoggerExtension) ExtendWith(org.junit.jupiter.api.extension.ExtendWith) BlockingJob(org.apache.flink.client.testjar.BlockingJob) DefaultDispatcherRunnerFactory(org.apache.flink.runtime.dispatcher.runner.DefaultDispatcherRunnerFactory) DefaultDispatcherResourceManagerComponentFactory(org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory) FlinkAssertions(org.apache.flink.core.testutils.FlinkAssertions) Duration(java.time.Duration) MiniCluster(org.apache.flink.runtime.minicluster.MiniCluster) ErrorHandlingSubmissionJob(org.apache.flink.client.testjar.ErrorHandlingSubmissionJob) DeploymentOptions(org.apache.flink.configuration.DeploymentOptions) HighAvailabilityMode(org.apache.flink.runtime.jobmanager.HighAvailabilityMode) ClientOptions(org.apache.flink.client.cli.ClientOptions) FailingJob(org.apache.flink.client.testjar.FailingJob) DispatcherResourceManagerComponentFactory(org.apache.flink.runtime.entrypoint.component.DispatcherResourceManagerComponentFactory) ArchivedExecutionGraph(org.apache.flink.runtime.executiongraph.ArchivedExecutionGraph) TestingMiniCluster(org.apache.flink.runtime.minicluster.TestingMiniCluster) ApplicationStatus(org.apache.flink.runtime.clusterframework.ApplicationStatus) Configuration(org.apache.flink.configuration.Configuration) ErrorInfo(org.apache.flink.runtime.executiongraph.ErrorInfo) JobRestEndpointFactory(org.apache.flink.runtime.rest.JobRestEndpointFactory) UUID(java.util.UUID) Test(org.junit.jupiter.api.Test) ExecutionException(java.util.concurrent.ExecutionException) TestingUtils(org.apache.flink.testutils.TestingUtils) JobResultEntry(org.apache.flink.runtime.highavailability.JobResultEntry) JobID(org.apache.flink.api.common.JobID) FlinkJobNotFoundException(org.apache.flink.runtime.messages.FlinkJobNotFoundException) TestingJobResultStore(org.apache.flink.runtime.testutils.TestingJobResultStore) PackagedProgram(org.apache.flink.client.program.PackagedProgram) JobResultStore(org.apache.flink.runtime.highavailability.JobResultStore) SessionDispatcherFactory(org.apache.flink.runtime.dispatcher.SessionDispatcherFactory) CommonTestUtils(org.apache.flink.runtime.testutils.CommonTestUtils) StandaloneResourceManagerFactory(org.apache.flink.runtime.resourcemanager.StandaloneResourceManagerFactory) HighAvailabilityOptions(org.apache.flink.configuration.HighAvailabilityOptions) DuplicateJobSubmissionException(org.apache.flink.runtime.client.DuplicateJobSubmissionException) TestingMiniClusterConfiguration(org.apache.flink.runtime.minicluster.TestingMiniClusterConfiguration) Configuration(org.apache.flink.configuration.Configuration) TestingMiniClusterConfiguration(org.apache.flink.runtime.minicluster.TestingMiniClusterConfiguration) Deadline(org.apache.flink.api.common.time.Deadline) ErrorInfo(org.apache.flink.runtime.executiongraph.ErrorInfo) ProgramInvocationException(org.apache.flink.client.program.ProgramInvocationException) EmbeddedHaServicesWithLeadershipControl(org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedHaServicesWithLeadershipControl) ArchivedExecutionGraph(org.apache.flink.runtime.executiongraph.ArchivedExecutionGraph) MiniCluster(org.apache.flink.runtime.minicluster.MiniCluster) TestingMiniCluster(org.apache.flink.runtime.minicluster.TestingMiniCluster) JobID(org.apache.flink.api.common.JobID) Test(org.junit.jupiter.api.Test)

Example 5 with EmbeddedHaServicesWithLeadershipControl

use of org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedHaServicesWithLeadershipControl in project flink by apache.

the class LeaderChangeClusterComponentsTest method setupClass.

@BeforeClass
public static void setupClass() throws Exception {
    highAvailabilityServices = new EmbeddedHaServicesWithLeadershipControl(TestingUtils.defaultExecutor());
    miniCluster = TestingMiniCluster.newBuilder(TestingMiniClusterConfiguration.newBuilder().setNumTaskManagers(NUM_TMS).setNumSlotsPerTaskManager(SLOTS_PER_TM).build()).setHighAvailabilityServicesSupplier(() -> highAvailabilityServices).build();
    miniCluster.start();
}
Also used : EmbeddedHaServicesWithLeadershipControl(org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedHaServicesWithLeadershipControl) BeforeClass(org.junit.BeforeClass)

Aggregations

EmbeddedHaServicesWithLeadershipControl (org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedHaServicesWithLeadershipControl)5 Deadline (org.apache.flink.api.common.time.Deadline)4 Configuration (org.apache.flink.configuration.Configuration)4 MiniCluster (org.apache.flink.runtime.minicluster.MiniCluster)4 TestingMiniCluster (org.apache.flink.runtime.minicluster.TestingMiniCluster)4 TestingMiniClusterConfiguration (org.apache.flink.runtime.minicluster.TestingMiniClusterConfiguration)4 Test (org.junit.jupiter.api.Test)4 JobResult (org.apache.flink.runtime.jobmaster.JobResult)3 JobID (org.apache.flink.api.common.JobID)2 JobResultEntry (org.apache.flink.runtime.highavailability.JobResultEntry)2 JobResultStore (org.apache.flink.runtime.highavailability.JobResultStore)2 EmbeddedJobResultStore (org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedJobResultStore)2 TestingJobResultStore (org.apache.flink.runtime.testutils.TestingJobResultStore)2 Duration (java.time.Duration)1 UUID (java.util.UUID)1 CompletableFuture (java.util.concurrent.CompletableFuture)1 ExecutionException (java.util.concurrent.ExecutionException)1 Supplier (java.util.function.Supplier)1 JobStatus (org.apache.flink.api.common.JobStatus)1 ClientOptions (org.apache.flink.client.cli.ClientOptions)1