Search in sources :

Example 26 with FlinkException

use of org.apache.flink.util.FlinkException in project flink by apache.

the class Dispatcher method jobReachedTerminalState.

protected CleanupJobState jobReachedTerminalState(ExecutionGraphInfo executionGraphInfo) {
    final ArchivedExecutionGraph archivedExecutionGraph = executionGraphInfo.getArchivedExecutionGraph();
    final JobStatus terminalJobStatus = archivedExecutionGraph.getState();
    Preconditions.checkArgument(terminalJobStatus.isTerminalState(), "Job %s is in state %s which is not terminal.", archivedExecutionGraph.getJobID(), terminalJobStatus);
    // the failureInfo contains the reason for why job was failed/suspended, but for
    // finished/canceled jobs it may contain the last cause of a restart (if there were any)
    // for finished/canceled jobs we don't want to print it because it is misleading
    final boolean isFailureInfoRelatedToJobTermination = terminalJobStatus == JobStatus.SUSPENDED || terminalJobStatus == JobStatus.FAILED;
    if (archivedExecutionGraph.getFailureInfo() != null && isFailureInfoRelatedToJobTermination) {
        log.info("Job {} reached terminal state {}.\n{}", archivedExecutionGraph.getJobID(), terminalJobStatus, archivedExecutionGraph.getFailureInfo().getExceptionAsString().trim());
    } else {
        log.info("Job {} reached terminal state {}.", archivedExecutionGraph.getJobID(), terminalJobStatus);
    }
    archiveExecutionGraph(executionGraphInfo);
    if (terminalJobStatus.isGloballyTerminalState()) {
        final JobID jobId = executionGraphInfo.getJobId();
        try {
            if (jobResultStore.hasCleanJobResultEntry(jobId)) {
                log.warn("Job {} is already marked as clean but clean up was triggered again.", jobId);
            } else if (!jobResultStore.hasDirtyJobResultEntry(jobId)) {
                jobResultStore.createDirtyResult(new JobResultEntry(JobResult.createFrom(executionGraphInfo.getArchivedExecutionGraph())));
                log.info("Job {} has been registered for cleanup in the JobResultStore after reaching a terminal state.", jobId);
            }
        } catch (IOException e) {
            fatalErrorHandler.onFatalError(new FlinkException(String.format("The job %s couldn't be marked as pre-cleanup finished in JobResultStore.", jobId), e));
        }
    }
    return terminalJobStatus.isGloballyTerminalState() ? CleanupJobState.GLOBAL : CleanupJobState.LOCAL;
}
Also used : JobStatus(org.apache.flink.api.common.JobStatus) ArchivedExecutionGraph(org.apache.flink.runtime.executiongraph.ArchivedExecutionGraph) JobResultEntry(org.apache.flink.runtime.highavailability.JobResultEntry) IOException(java.io.IOException) JobID(org.apache.flink.api.common.JobID) FlinkException(org.apache.flink.util.FlinkException)

Example 27 with FlinkException

use of org.apache.flink.util.FlinkException in project flink by apache.

the class SessionDispatcherLeaderProcessTest method onAddedJobGraph_failingRecovery_propagatesTheFailure.

@Test
public void onAddedJobGraph_failingRecovery_propagatesTheFailure() throws Exception {
    final FlinkException expectedFailure = new FlinkException("Expected failure");
    jobGraphStore = TestingJobGraphStore.newBuilder().setRecoverJobGraphFunction((ignoredA, ignoredB) -> {
        throw expectedFailure;
    }).build();
    try (final SessionDispatcherLeaderProcess dispatcherLeaderProcess = createDispatcherLeaderProcess()) {
        dispatcherLeaderProcess.start();
        // wait first for the dispatcher service to be created
        dispatcherLeaderProcess.getDispatcherGateway().get();
        jobGraphStore.putJobGraph(JOB_GRAPH);
        dispatcherLeaderProcess.onAddedJobGraph(JOB_GRAPH.getJobID());
        assertThat(fatalErrorHandler.getErrorFuture()).succeedsWithin(100, TimeUnit.MILLISECONDS).extracting(FlinkAssertions::chainOfCauses, STREAM_THROWABLE).contains(expectedFailure);
        assertThat(dispatcherLeaderProcess.getState()).isEqualTo(SessionDispatcherLeaderProcess.State.STOPPED);
        fatalErrorHandler.clearError();
    }
}
Also used : FlinkException(org.apache.flink.util.FlinkException) Test(org.junit.jupiter.api.Test)

Example 28 with FlinkException

use of org.apache.flink.util.FlinkException in project flink by apache.

the class SessionDispatcherLeaderProcessTest method unexpectedDispatcherServiceTerminationWhileRunning_callsFatalErrorHandler.

@Test
public void unexpectedDispatcherServiceTerminationWhileRunning_callsFatalErrorHandler() {
    final CompletableFuture<Void> terminationFuture = new CompletableFuture<>();
    dispatcherServiceFactory = createFactoryBasedOnGenericSupplier(() -> TestingDispatcherGatewayService.newBuilder().setTerminationFuture(terminationFuture).build());
    final SessionDispatcherLeaderProcess dispatcherLeaderProcess = createDispatcherLeaderProcess();
    dispatcherLeaderProcess.start();
    final FlinkException expectedFailure = new FlinkException("Expected test failure.");
    terminationFuture.completeExceptionally(expectedFailure);
    final Throwable error = fatalErrorHandler.getErrorFuture().join();
    assertThat(error).getRootCause().isEqualTo(expectedFailure);
    fatalErrorHandler.clearError();
}
Also used : CompletableFuture(java.util.concurrent.CompletableFuture) FlinkException(org.apache.flink.util.FlinkException) Test(org.junit.jupiter.api.Test)

Example 29 with FlinkException

use of org.apache.flink.util.FlinkException in project flink by apache.

the class SessionDispatcherLeaderProcessTest method recoverJobs_withRecoveryFailure_failsFatally.

@Test
public void recoverJobs_withRecoveryFailure_failsFatally() throws Exception {
    final FlinkException testException = new FlinkException("Test exception");
    jobGraphStore = TestingJobGraphStore.newBuilder().setRecoverJobGraphFunction((ignoredA, ignoredB) -> {
        throw testException;
    }).setInitialJobGraphs(Collections.singleton(JOB_GRAPH)).build();
    runJobRecoveryFailureTest(testException);
}
Also used : OneShotLatch(org.apache.flink.core.testutils.OneShotLatch) FlinkException(org.apache.flink.util.FlinkException) BeforeEach(org.junit.jupiter.api.BeforeEach) Arrays(java.util.Arrays) JobSubmissionException(org.apache.flink.runtime.client.JobSubmissionException) Assertions.assertThat(org.assertj.core.api.Assertions.assertThat) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) TestingJobGraphStore(org.apache.flink.runtime.testutils.TestingJobGraphStore) TimeoutException(java.util.concurrent.TimeoutException) CompletableFuture(java.util.concurrent.CompletableFuture) Function(java.util.function.Function) Supplier(java.util.function.Supplier) AfterAll(org.junit.jupiter.api.AfterAll) JobResult(org.apache.flink.runtime.jobmaster.JobResult) TestLoggerExtension(org.apache.flink.util.TestLoggerExtension) TestingFatalErrorHandler(org.apache.flink.runtime.util.TestingFatalErrorHandler) FutureUtils(org.apache.flink.util.concurrent.FutureUtils) ExtendWith(org.junit.jupiter.api.extension.ExtendWith) Assertions.assertThatThrownBy(org.assertj.core.api.Assertions.assertThatThrownBy) BeforeAll(org.junit.jupiter.api.BeforeAll) FlinkAssertions(org.apache.flink.core.testutils.FlinkAssertions) JobGraphTestUtils(org.apache.flink.runtime.jobgraph.JobGraphTestUtils) ThrowingConsumer(org.apache.flink.util.function.ThrowingConsumer) FlinkAssertions.anyCauseMatches(org.apache.flink.core.testutils.FlinkAssertions.anyCauseMatches) ExecutorService(java.util.concurrent.ExecutorService) Collection(java.util.Collection) Set(java.util.Set) UUID(java.util.UUID) Acknowledge(org.apache.flink.runtime.messages.Acknowledge) Executors(java.util.concurrent.Executors) ExecutorUtils(org.apache.flink.util.ExecutorUtils) Test(org.junit.jupiter.api.Test) TimeUnit(java.util.concurrent.TimeUnit) Consumer(java.util.function.Consumer) AfterEach(org.junit.jupiter.api.AfterEach) JobID(org.apache.flink.api.common.JobID) TestingJobResultStore(org.apache.flink.runtime.testutils.TestingJobResultStore) TestingDispatcherGateway(org.apache.flink.runtime.webmonitor.TestingDispatcherGateway) JobResultStore(org.apache.flink.runtime.highavailability.JobResultStore) STREAM_THROWABLE(org.apache.flink.core.testutils.FlinkAssertions.STREAM_THROWABLE) JobGraphStore(org.apache.flink.runtime.jobmanager.JobGraphStore) Collections(java.util.Collections) DuplicateJobSubmissionException(org.apache.flink.runtime.client.DuplicateJobSubmissionException) FlinkException(org.apache.flink.util.FlinkException) Test(org.junit.jupiter.api.Test)

Example 30 with FlinkException

use of org.apache.flink.util.FlinkException in project flink by apache.

the class SessionDispatcherLeaderProcessTest method recoverJobs_withJobIdRecoveryFailure_failsFatally.

@Test
public void recoverJobs_withJobIdRecoveryFailure_failsFatally() throws Exception {
    final FlinkException testException = new FlinkException("Test exception");
    jobGraphStore = TestingJobGraphStore.newBuilder().setJobIdsFunction(ignored -> {
        throw testException;
    }).build();
    runJobRecoveryFailureTest(testException);
}
Also used : FlinkException(org.apache.flink.util.FlinkException) Test(org.junit.jupiter.api.Test)

Aggregations

FlinkException (org.apache.flink.util.FlinkException)197 Test (org.junit.Test)91 CompletableFuture (java.util.concurrent.CompletableFuture)59 IOException (java.io.IOException)38 ExecutionException (java.util.concurrent.ExecutionException)26 ArrayList (java.util.ArrayList)25 JobID (org.apache.flink.api.common.JobID)24 Collection (java.util.Collection)22 CompletionException (java.util.concurrent.CompletionException)22 Configuration (org.apache.flink.configuration.Configuration)21 TimeoutException (java.util.concurrent.TimeoutException)19 FutureUtils (org.apache.flink.util.concurrent.FutureUtils)19 Time (org.apache.flink.api.common.time.Time)16 OneShotLatch (org.apache.flink.core.testutils.OneShotLatch)16 ResourceID (org.apache.flink.runtime.clusterframework.types.ResourceID)16 JobGraph (org.apache.flink.runtime.jobgraph.JobGraph)15 AllocationID (org.apache.flink.runtime.clusterframework.types.AllocationID)14 Collections (java.util.Collections)13 List (java.util.List)13 ExecutorService (java.util.concurrent.ExecutorService)13