Search in sources :

Example 1 with Timeout

use of akka.util.Timeout in project flink by apache.

the class JobClient method attachToRunningJob.

/**
	 * Attaches to a running Job using the JobID.
	 * Reconstructs the user class loader by downloading the jars from the JobManager.
	 */
public static JobListeningContext attachToRunningJob(JobID jobID, ActorGateway jobManagerGateWay, Configuration configuration, ActorSystem actorSystem, LeaderRetrievalService leaderRetrievalService, FiniteDuration timeout, boolean sysoutLogUpdates) {
    checkNotNull(jobID, "The jobID must not be null.");
    checkNotNull(jobManagerGateWay, "The jobManagerGateWay must not be null.");
    checkNotNull(configuration, "The configuration must not be null.");
    checkNotNull(actorSystem, "The actorSystem must not be null.");
    checkNotNull(leaderRetrievalService, "The jobManagerGateway must not be null.");
    checkNotNull(timeout, "The timeout must not be null.");
    // we create a proxy JobClientActor that deals with all communication with
    // the JobManager. It forwards the job attachments, checks the success/failure responses, logs
    // update messages, watches for disconnect between client and JobManager, ...
    Props jobClientActorProps = JobAttachmentClientActor.createActorProps(leaderRetrievalService, timeout, sysoutLogUpdates);
    ActorRef jobClientActor = actorSystem.actorOf(jobClientActorProps);
    Future<Object> attachmentFuture = Patterns.ask(jobClientActor, new JobClientMessages.AttachToJobAndWait(jobID), new Timeout(AkkaUtils.INF_TIMEOUT()));
    return new JobListeningContext(jobID, attachmentFuture, jobClientActor, timeout, actorSystem, configuration);
}
Also used : ActorRef(akka.actor.ActorRef) JobClientMessages(org.apache.flink.runtime.messages.JobClientMessages) Timeout(akka.util.Timeout) Props(akka.actor.Props)

Example 2 with Timeout

use of akka.util.Timeout in project flink by apache.

the class JobClient method submitJob.

/**
	 * Submits a job to a Flink cluster (non-blocking) and returns a JobListeningContext which can be
	 * passed to {@code awaitJobResult} to get the result of the submission.
	 * @return JobListeningContext which may be used to retrieve the JobExecutionResult via
	 * 			{@code awaitJobResult(JobListeningContext context)}.
	 */
public static JobListeningContext submitJob(ActorSystem actorSystem, Configuration config, LeaderRetrievalService leaderRetrievalService, JobGraph jobGraph, FiniteDuration timeout, boolean sysoutLogUpdates, ClassLoader classLoader) {
    checkNotNull(actorSystem, "The actorSystem must not be null.");
    checkNotNull(leaderRetrievalService, "The jobManagerGateway must not be null.");
    checkNotNull(jobGraph, "The jobGraph must not be null.");
    checkNotNull(timeout, "The timeout must not be null.");
    // for this job, we create a proxy JobClientActor that deals with all communication with
    // the JobManager. It forwards the job submission, checks the success/failure responses, logs
    // update messages, watches for disconnect between client and JobManager, ...
    Props jobClientActorProps = JobSubmissionClientActor.createActorProps(leaderRetrievalService, timeout, sysoutLogUpdates, config);
    ActorRef jobClientActor = actorSystem.actorOf(jobClientActorProps);
    Future<Object> submissionFuture = Patterns.ask(jobClientActor, new JobClientMessages.SubmitJobAndWait(jobGraph), new Timeout(AkkaUtils.INF_TIMEOUT()));
    return new JobListeningContext(jobGraph.getJobID(), submissionFuture, jobClientActor, timeout, classLoader);
}
Also used : ActorRef(akka.actor.ActorRef) JobClientMessages(org.apache.flink.runtime.messages.JobClientMessages) Timeout(akka.util.Timeout) Props(akka.actor.Props)

Example 3 with Timeout

use of akka.util.Timeout in project flink by apache.

the class FlinkResourceManager method triggerConnectingToJobManager.

/**
	 * Causes the resource manager to announce itself at the new leader JobManager and
	 * obtains its connection information and currently known TaskManagers.
	 *
	 * @param leaderAddress The akka actor URL of the new leader JobManager.
	 */
protected void triggerConnectingToJobManager(String leaderAddress) {
    LOG.info("Trying to associate with JobManager leader " + leaderAddress);
    final Object registerMessage = decorateMessage(new RegisterResourceManager(self()));
    final Object retryMessage = decorateMessage(new TriggerRegistrationAtJobManager(leaderAddress));
    // send the registration message to the JobManager
    ActorSelection jobManagerSel = context().actorSelection(leaderAddress);
    Future<Object> future = Patterns.ask(jobManagerSel, registerMessage, new Timeout(messageTimeout));
    future.onComplete(new OnComplete<Object>() {

        @Override
        public void onComplete(Throwable failure, Object msg) {
            // only process if we haven't been connected in the meantime
            if (jobManager == null) {
                if (msg != null) {
                    if (msg instanceof LeaderSessionMessage && ((LeaderSessionMessage) msg).message() instanceof RegisterResourceManagerSuccessful) {
                        self().tell(msg, ActorRef.noSender());
                    } else {
                        LOG.error("Invalid response type to registration at JobManager: {}", msg);
                        self().tell(retryMessage, ActorRef.noSender());
                    }
                } else {
                    // no success
                    LOG.error("Resource manager could not register at JobManager", failure);
                    self().tell(retryMessage, ActorRef.noSender());
                }
            }
        }
    }, context().dispatcher());
}
Also used : LeaderSessionMessage(org.apache.flink.runtime.messages.JobManagerMessages.LeaderSessionMessage) TriggerRegistrationAtJobManager(org.apache.flink.runtime.clusterframework.messages.TriggerRegistrationAtJobManager) ActorSelection(akka.actor.ActorSelection) Timeout(akka.util.Timeout) RegisterResourceManagerSuccessful(org.apache.flink.runtime.clusterframework.messages.RegisterResourceManagerSuccessful) RegisterResourceManager(org.apache.flink.runtime.clusterframework.messages.RegisterResourceManager)

Example 4 with Timeout

use of akka.util.Timeout in project flink by apache.

the class AbstractTaskManagerProcessFailureRecoveryTest method waitUntilNumTaskManagersAreRegistered.

protected void waitUntilNumTaskManagersAreRegistered(ActorRef jobManager, int numExpected, long maxDelayMillis) throws Exception {
    // 10 ms = 10,000,000 nanos
    final long pollInterval = 10_000_000;
    final long deadline = System.nanoTime() + maxDelayMillis * 1_000_000;
    long time;
    while ((time = System.nanoTime()) < deadline) {
        FiniteDuration timeout = new FiniteDuration(pollInterval, TimeUnit.NANOSECONDS);
        try {
            Future<?> result = Patterns.ask(jobManager, JobManagerMessages.getRequestNumberRegisteredTaskManager(), new Timeout(timeout));
            int numTMs = (Integer) Await.result(result, timeout);
            if (numTMs == numExpected) {
                return;
            }
        } catch (TimeoutException e) {
        // ignore and retry
        } catch (ClassCastException e) {
            fail("Wrong response: " + e.getMessage());
        }
        long timePassed = System.nanoTime() - time;
        long remainingMillis = (pollInterval - timePassed) / 1_000_000;
        if (remainingMillis > 0) {
            Thread.sleep(remainingMillis);
        }
    }
    fail("The TaskManagers did not register within the expected time (" + maxDelayMillis + "msecs)");
}
Also used : Timeout(akka.util.Timeout) FiniteDuration(scala.concurrent.duration.FiniteDuration) TimeoutException(java.util.concurrent.TimeoutException)

Example 5 with Timeout

use of akka.util.Timeout in project flink by apache.

the class AccumulatorLiveITCase method notifyTaskManagerOfAccumulatorUpdate.

/**
	 * Notify task manager of accumulator update and wait until the Heartbeat containing the message
	 * has been reported.
	 */
public static void notifyTaskManagerOfAccumulatorUpdate() {
    new JavaTestKit(system) {

        {
            Timeout timeout = new Timeout(TIMEOUT);
            Future<Object> ask = Patterns.ask(taskManager, new TestingTaskManagerMessages.AccumulatorsChanged(jobID), timeout);
            try {
                Await.result(ask, timeout.duration());
            } catch (Exception e) {
                fail("Failed to notify task manager of accumulator update.");
            }
        }
    };
}
Also used : Timeout(akka.util.Timeout) JavaTestKit(akka.testkit.JavaTestKit) TestingTaskManagerMessages(org.apache.flink.runtime.testingUtils.TestingTaskManagerMessages) IOException(java.io.IOException)

Aggregations

Timeout (akka.util.Timeout)18 ActorRef (akka.actor.ActorRef)11 FiniteDuration (scala.concurrent.duration.FiniteDuration)11 Props (akka.actor.Props)8 TestingLeaderRetrievalService (org.apache.flink.runtime.leaderelection.TestingLeaderRetrievalService)7 JobClientMessages (org.apache.flink.runtime.messages.JobClientMessages)7 Test (org.junit.Test)7 UUID (java.util.UUID)5 IOException (java.io.IOException)4 ProgramInvocationException (org.apache.flink.client.program.ProgramInvocationException)4 TimeoutException (java.util.concurrent.TimeoutException)3 AttachToJobAndWait (org.apache.flink.runtime.messages.JobClientMessages.AttachToJobAndWait)3 ArrayList (java.util.ArrayList)2 JobStatusMessage (org.apache.flink.runtime.client.JobStatusMessage)2 YarnException (org.apache.hadoop.yarn.exceptions.YarnException)2 ActorSelection (akka.actor.ActorSelection)1 JavaTestKit (akka.testkit.JavaTestKit)1 File (java.io.File)1 JobID (org.apache.flink.api.common.JobID)1 Configuration (org.apache.flink.configuration.Configuration)1