Search in sources :

Example 1 with JobExecutionException

use of org.apache.flink.runtime.client.JobExecutionException in project flink by apache.

the class JobSubmissionFailsITCase method testExceptionInInitializeOnMaster.

@Test
public void testExceptionInInitializeOnMaster() {
    try {
        final JobVertex failingJobVertex = new FailingJobVertex("Failing job vertex");
        failingJobVertex.setInvokableClass(NoOpInvokable.class);
        final JobGraph failingJobGraph = new JobGraph("Failing testing job", failingJobVertex);
        try {
            submitJob(failingJobGraph);
            fail("Expected JobExecutionException.");
        } catch (JobExecutionException e) {
            assertEquals("Test exception.", e.getCause().getMessage());
        } catch (Throwable t) {
            t.printStackTrace();
            fail("Caught wrong exception of type " + t.getClass() + ".");
        }
        cluster.submitJobAndWait(workingJobGraph, false);
    } catch (Exception e) {
        e.printStackTrace();
        fail(e.getMessage());
    }
}
Also used : JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) JobVertex(org.apache.flink.runtime.jobgraph.JobVertex) JobExecutionException(org.apache.flink.runtime.client.JobExecutionException) JobSubmissionException(org.apache.flink.runtime.client.JobSubmissionException) JobExecutionException(org.apache.flink.runtime.client.JobExecutionException) Test(org.junit.Test)

Example 2 with JobExecutionException

use of org.apache.flink.runtime.client.JobExecutionException in project flink by apache.

the class JobSubmitTest method testFailureWhenJarBlobsMissing.

@Test
public void testFailureWhenJarBlobsMissing() {
    try {
        // create a simple job graph
        JobVertex jobVertex = new JobVertex("Test Vertex");
        jobVertex.setInvokableClass(NoOpInvokable.class);
        JobGraph jg = new JobGraph("test job", jobVertex);
        // request the blob port from the job manager
        Future<Object> future = jmGateway.ask(JobManagerMessages.getRequestBlobManagerPort(), timeout);
        int blobPort = (Integer) Await.result(future, timeout);
        // upload two dummy bytes and add their keys to the job graph as dependencies
        BlobKey key1, key2;
        BlobClient bc = new BlobClient(new InetSocketAddress("localhost", blobPort), jmConfig);
        try {
            key1 = bc.put(new byte[10]);
            key2 = bc.put(new byte[10]);
            // delete one of the blobs to make sure that the startup failed
            bc.delete(key2);
        } finally {
            bc.close();
        }
        jg.addBlob(key1);
        jg.addBlob(key2);
        // submit the job
        Future<Object> submitFuture = jmGateway.ask(new JobManagerMessages.SubmitJob(jg, ListeningBehaviour.EXECUTION_RESULT), timeout);
        try {
            Await.result(submitFuture, timeout);
        } catch (JobExecutionException e) {
            // that is what we expect
            assertTrue(e.getCause() instanceof IOException);
        } catch (Exception e) {
            fail("Wrong exception type");
        }
    } catch (Exception e) {
        e.printStackTrace();
        fail(e.getMessage());
    }
}
Also used : BlobClient(org.apache.flink.runtime.blob.BlobClient) InetSocketAddress(java.net.InetSocketAddress) JobManagerMessages(org.apache.flink.runtime.messages.JobManagerMessages) IOException(java.io.IOException) JobExecutionException(org.apache.flink.runtime.client.JobExecutionException) IOException(java.io.IOException) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) BlobKey(org.apache.flink.runtime.blob.BlobKey) JobVertex(org.apache.flink.runtime.jobgraph.JobVertex) JobExecutionException(org.apache.flink.runtime.client.JobExecutionException) Test(org.junit.Test)

Example 3 with JobExecutionException

use of org.apache.flink.runtime.client.JobExecutionException in project flink by apache.

the class JobSubmitTest method testFailureWhenInitializeOnMasterFails.

/**
	 * Verifies a correct error message when vertices with master initialization
	 * (input formats / output formats) fail.
	 */
@Test
public void testFailureWhenInitializeOnMasterFails() {
    try {
        // create a simple job graph
        JobVertex jobVertex = new JobVertex("Vertex that fails in initializeOnMaster") {

            private static final long serialVersionUID = -3540303593784587652L;

            @Override
            public void initializeOnMaster(ClassLoader loader) throws Exception {
                throw new RuntimeException("test exception");
            }
        };
        jobVertex.setInvokableClass(NoOpInvokable.class);
        JobGraph jg = new JobGraph("test job", jobVertex);
        // submit the job
        Future<Object> submitFuture = jmGateway.ask(new JobManagerMessages.SubmitJob(jg, ListeningBehaviour.EXECUTION_RESULT), timeout);
        try {
            Await.result(submitFuture, timeout);
        } catch (JobExecutionException e) {
            // that is what we expect
            // test that the exception nesting is not too deep
            assertTrue(e.getCause() instanceof RuntimeException);
        } catch (Exception e) {
            fail("Wrong exception type");
        }
    } catch (Exception e) {
        e.printStackTrace();
        fail(e.getMessage());
    }
}
Also used : JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) JobVertex(org.apache.flink.runtime.jobgraph.JobVertex) JobExecutionException(org.apache.flink.runtime.client.JobExecutionException) JobManagerMessages(org.apache.flink.runtime.messages.JobManagerMessages) JobExecutionException(org.apache.flink.runtime.client.JobExecutionException) IOException(java.io.IOException) Test(org.junit.Test)

Example 4 with JobExecutionException

use of org.apache.flink.runtime.client.JobExecutionException in project flink by apache.

the class ClusterClient method retrieveJob.

/**
	 * Reattaches to a running from from the supplied job id
	 * @param jobID The job id of the job to attach to
	 * @return The JobExecutionResult for the jobID
	 * @throws JobExecutionException if an error occurs during monitoring the job execution
	 */
public JobExecutionResult retrieveJob(JobID jobID) throws JobExecutionException {
    final LeaderRetrievalService leaderRetrievalService;
    try {
        leaderRetrievalService = LeaderRetrievalUtils.createLeaderRetrievalService(flinkConfig);
    } catch (Exception e) {
        throw new JobRetrievalException(jobID, "Could not create the leader retrieval service", e);
    }
    ActorGateway jobManagerGateway;
    try {
        jobManagerGateway = getJobManagerGateway();
    } catch (Exception e) {
        throw new JobRetrievalException(jobID, "Could not retrieve the JobManager Gateway");
    }
    final JobListeningContext listeningContext = JobClient.attachToRunningJob(jobID, jobManagerGateway, flinkConfig, actorSystemLoader.get(), leaderRetrievalService, timeout, printStatusDuringExecution);
    return JobClient.awaitJobResult(listeningContext);
}
Also used : JobListeningContext(org.apache.flink.runtime.client.JobListeningContext) JobRetrievalException(org.apache.flink.runtime.client.JobRetrievalException) LeaderRetrievalService(org.apache.flink.runtime.leaderretrieval.LeaderRetrievalService) ActorGateway(org.apache.flink.runtime.instance.ActorGateway) JobRetrievalException(org.apache.flink.runtime.client.JobRetrievalException) URISyntaxException(java.net.URISyntaxException) JobExecutionException(org.apache.flink.runtime.client.JobExecutionException) IOException(java.io.IOException) CompilerException(org.apache.flink.optimizer.CompilerException)

Example 5 with JobExecutionException

use of org.apache.flink.runtime.client.JobExecutionException in project flink by apache.

the class ExecutionGraphBuilder method buildGraph.

/**
	 * Builds the ExecutionGraph from the JobGraph.
	 * If a prior execution graph exists, the JobGraph will be attached. If no prior execution
	 * graph exists, then the JobGraph will become attach to a new empty execution graph.
	 */
public static ExecutionGraph buildGraph(@Nullable ExecutionGraph prior, JobGraph jobGraph, Configuration jobManagerConfig, ScheduledExecutorService futureExecutor, Executor ioExecutor, SlotProvider slotProvider, ClassLoader classLoader, CheckpointRecoveryFactory recoveryFactory, Time timeout, RestartStrategy restartStrategy, MetricGroup metrics, int parallelismForAutoMax, Logger log) throws JobExecutionException, JobException {
    checkNotNull(jobGraph, "job graph cannot be null");
    final String jobName = jobGraph.getName();
    final JobID jobId = jobGraph.getJobID();
    // create a new execution graph, if none exists so far
    final ExecutionGraph executionGraph;
    try {
        executionGraph = (prior != null) ? prior : new ExecutionGraph(futureExecutor, ioExecutor, jobId, jobName, jobGraph.getJobConfiguration(), jobGraph.getSerializedExecutionConfig(), timeout, restartStrategy, jobGraph.getUserJarBlobKeys(), jobGraph.getClasspaths(), slotProvider, classLoader, metrics);
    } catch (IOException e) {
        throw new JobException("Could not create the execution graph.", e);
    }
    // set the basic properties
    executionGraph.setScheduleMode(jobGraph.getScheduleMode());
    executionGraph.setQueuedSchedulingAllowed(jobGraph.getAllowQueuedScheduling());
    try {
        executionGraph.setJsonPlan(JsonPlanGenerator.generatePlan(jobGraph));
    } catch (Throwable t) {
        log.warn("Cannot create JSON plan for job", t);
        // give the graph an empty plan
        executionGraph.setJsonPlan("{}");
    }
    // initialize the vertices that have a master initialization hook
    // file output formats create directories here, input formats create splits
    final long initMasterStart = System.nanoTime();
    log.info("Running initialization on master for job {} ({}).", jobName, jobId);
    for (JobVertex vertex : jobGraph.getVertices()) {
        String executableClass = vertex.getInvokableClassName();
        if (executableClass == null || executableClass.isEmpty()) {
            throw new JobSubmissionException(jobId, "The vertex " + vertex.getID() + " (" + vertex.getName() + ") has no invokable class.");
        }
        if (vertex.getParallelism() == ExecutionConfig.PARALLELISM_AUTO_MAX) {
            vertex.setParallelism(parallelismForAutoMax);
        }
        try {
            vertex.initializeOnMaster(classLoader);
        } catch (Throwable t) {
            throw new JobExecutionException(jobId, "Cannot initialize task '" + vertex.getName() + "': " + t.getMessage(), t);
        }
    }
    log.info("Successfully ran initialization on master in {} ms.", (System.nanoTime() - initMasterStart) / 1_000_000);
    // topologically sort the job vertices and attach the graph to the existing one
    List<JobVertex> sortedTopology = jobGraph.getVerticesSortedTopologicallyFromSources();
    if (log.isDebugEnabled()) {
        log.debug("Adding {} vertices from job graph {} ({}).", sortedTopology.size(), jobName, jobId);
    }
    executionGraph.attachJobGraph(sortedTopology);
    if (log.isDebugEnabled()) {
        log.debug("Successfully created execution graph from job graph {} ({}).", jobName, jobId);
    }
    // configure the state checkpointing
    JobSnapshottingSettings snapshotSettings = jobGraph.getSnapshotSettings();
    if (snapshotSettings != null) {
        List<ExecutionJobVertex> triggerVertices = idToVertex(snapshotSettings.getVerticesToTrigger(), executionGraph);
        List<ExecutionJobVertex> ackVertices = idToVertex(snapshotSettings.getVerticesToAcknowledge(), executionGraph);
        List<ExecutionJobVertex> confirmVertices = idToVertex(snapshotSettings.getVerticesToConfirm(), executionGraph);
        CompletedCheckpointStore completedCheckpoints;
        CheckpointIDCounter checkpointIdCounter;
        try {
            int maxNumberOfCheckpointsToRetain = jobManagerConfig.getInteger(CoreOptions.MAX_RETAINED_CHECKPOINTS);
            if (maxNumberOfCheckpointsToRetain <= 0) {
                // warning and use 1 as the default value if the setting in
                // state.checkpoints.max-retained-checkpoints is not greater than 0.
                log.warn("The setting for '{} : {}' is invalid. Using default value of {}", CoreOptions.MAX_RETAINED_CHECKPOINTS.key(), maxNumberOfCheckpointsToRetain, CoreOptions.MAX_RETAINED_CHECKPOINTS.defaultValue());
                maxNumberOfCheckpointsToRetain = CoreOptions.MAX_RETAINED_CHECKPOINTS.defaultValue();
            }
            completedCheckpoints = recoveryFactory.createCheckpointStore(jobId, maxNumberOfCheckpointsToRetain, classLoader);
            checkpointIdCounter = recoveryFactory.createCheckpointIDCounter(jobId);
        } catch (Exception e) {
            throw new JobExecutionException(jobId, "Failed to initialize high-availability checkpoint handler", e);
        }
        // Maximum number of remembered checkpoints
        int historySize = jobManagerConfig.getInteger(ConfigConstants.JOB_MANAGER_WEB_CHECKPOINTS_HISTORY_SIZE, ConfigConstants.DEFAULT_JOB_MANAGER_WEB_CHECKPOINTS_HISTORY_SIZE);
        CheckpointStatsTracker checkpointStatsTracker = new CheckpointStatsTracker(historySize, ackVertices, snapshotSettings, metrics);
        // The default directory for externalized checkpoints
        String externalizedCheckpointsDir = jobManagerConfig.getString(ConfigConstants.CHECKPOINTS_DIRECTORY_KEY, null);
        // load the state backend for checkpoint metadata.
        // if specified in the application, use from there, otherwise load from configuration
        final StateBackend metadataBackend;
        final StateBackend applicationConfiguredBackend = snapshotSettings.getDefaultStateBackend();
        if (applicationConfiguredBackend != null) {
            metadataBackend = applicationConfiguredBackend;
            log.info("Using application-defined state backend for checkpoint/savepoint metadata: {}.", applicationConfiguredBackend);
        } else {
            try {
                metadataBackend = AbstractStateBackend.loadStateBackendFromConfigOrCreateDefault(jobManagerConfig, classLoader, log);
            } catch (IllegalConfigurationException | IOException | DynamicCodeLoadingException e) {
                throw new JobExecutionException(jobId, "Could not instantiate configured state backend", e);
            }
        }
        executionGraph.enableCheckpointing(snapshotSettings.getCheckpointInterval(), snapshotSettings.getCheckpointTimeout(), snapshotSettings.getMinPauseBetweenCheckpoints(), snapshotSettings.getMaxConcurrentCheckpoints(), snapshotSettings.getExternalizedCheckpointSettings(), triggerVertices, ackVertices, confirmVertices, checkpointIdCounter, completedCheckpoints, externalizedCheckpointsDir, metadataBackend, checkpointStatsTracker);
    }
    return executionGraph;
}
Also used : CheckpointStatsTracker(org.apache.flink.runtime.checkpoint.CheckpointStatsTracker) JobSnapshottingSettings(org.apache.flink.runtime.jobgraph.tasks.JobSnapshottingSettings) IllegalConfigurationException(org.apache.flink.configuration.IllegalConfigurationException) IOException(java.io.IOException) JobSubmissionException(org.apache.flink.runtime.client.JobSubmissionException) IllegalConfigurationException(org.apache.flink.configuration.IllegalConfigurationException) JobSubmissionException(org.apache.flink.runtime.client.JobSubmissionException) JobException(org.apache.flink.runtime.JobException) JobExecutionException(org.apache.flink.runtime.client.JobExecutionException) IOException(java.io.IOException) DynamicCodeLoadingException(org.apache.flink.util.DynamicCodeLoadingException) StateBackend(org.apache.flink.runtime.state.StateBackend) AbstractStateBackend(org.apache.flink.runtime.state.AbstractStateBackend) JobException(org.apache.flink.runtime.JobException) DynamicCodeLoadingException(org.apache.flink.util.DynamicCodeLoadingException) JobVertex(org.apache.flink.runtime.jobgraph.JobVertex) JobExecutionException(org.apache.flink.runtime.client.JobExecutionException) CheckpointIDCounter(org.apache.flink.runtime.checkpoint.CheckpointIDCounter) JobID(org.apache.flink.api.common.JobID) CompletedCheckpointStore(org.apache.flink.runtime.checkpoint.CompletedCheckpointStore)

Aggregations

JobExecutionException (org.apache.flink.runtime.client.JobExecutionException)43 Test (org.junit.Test)27 IOException (java.io.IOException)19 JobGraph (org.apache.flink.runtime.jobgraph.JobGraph)17 JobVertex (org.apache.flink.runtime.jobgraph.JobVertex)15 StreamExecutionEnvironment (org.apache.flink.streaming.api.environment.StreamExecutionEnvironment)9 ExecutionException (java.util.concurrent.ExecutionException)8 Optional (java.util.Optional)6 ExecutionEnvironment (org.apache.flink.api.java.ExecutionEnvironment)6 JobID (org.apache.flink.api.common.JobID)5 ProgramInvocationException (org.apache.flink.client.program.ProgramInvocationException)5 NoResourceAvailableException (org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException)5 JobException (org.apache.flink.runtime.JobException)4 URISyntaxException (java.net.URISyntaxException)3 HashMap (java.util.HashMap)3 List (java.util.List)3 JobExecutionResult (org.apache.flink.api.common.JobExecutionResult)3 JobSubmissionResult (org.apache.flink.api.common.JobSubmissionResult)3 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)3 JobSubmissionException (org.apache.flink.runtime.client.JobSubmissionException)3