Search in sources :

Example 56 with FiniteDuration

use of scala.concurrent.duration.FiniteDuration in project flink by apache.

the class TaskManagerTest method testLocalPartitionNotFound.

/**
	 *  Tests that repeated local {@link PartitionNotFoundException}s ultimately fail the receiver.
	 */
@Test
public void testLocalPartitionNotFound() throws Exception {
    new JavaTestKit(system) {

        {
            ActorGateway jobManager = null;
            ActorGateway taskManager = null;
            final ActorGateway testActorGateway = new AkkaActorGateway(getTestActor(), leaderSessionID);
            try {
                final IntermediateDataSetID resultId = new IntermediateDataSetID();
                // Create the JM
                ActorRef jm = system.actorOf(Props.create(new SimplePartitionStateLookupJobManagerCreator(leaderSessionID, getTestActor())));
                jobManager = new AkkaActorGateway(jm, leaderSessionID);
                final Configuration config = new Configuration();
                config.setInteger(TaskManagerOptions.NETWORK_REQUEST_BACKOFF_INITIAL, 100);
                config.setInteger(TaskManagerOptions.NETWORK_REQUEST_BACKOFF_MAX, 200);
                taskManager = TestingUtils.createTaskManager(system, jobManager, config, true, true);
                // ---------------------------------------------------------------------------------
                final ActorGateway tm = taskManager;
                final JobID jid = new JobID();
                final JobVertexID vid = new JobVertexID();
                final ExecutionAttemptID eid = new ExecutionAttemptID();
                final ResultPartitionID partitionId = new ResultPartitionID();
                // Local location (on the same TM though) for the partition
                final ResultPartitionLocation loc = ResultPartitionLocation.createLocal();
                final InputChannelDeploymentDescriptor[] icdd = new InputChannelDeploymentDescriptor[] { new InputChannelDeploymentDescriptor(partitionId, loc) };
                final InputGateDeploymentDescriptor igdd = new InputGateDeploymentDescriptor(resultId, ResultPartitionType.PIPELINED, 0, icdd);
                final TaskDeploymentDescriptor tdd = createTaskDeploymentDescriptor(jid, "TestJob", vid, eid, new SerializedValue<>(new ExecutionConfig()), "Receiver", 1, 0, 1, 0, new Configuration(), new Configuration(), Tasks.AgnosticReceiver.class.getName(), Collections.<ResultPartitionDeploymentDescriptor>emptyList(), Collections.singletonList(igdd), Collections.<BlobKey>emptyList(), Collections.<URL>emptyList(), 0);
                new Within(new FiniteDuration(120, TimeUnit.SECONDS)) {

                    @Override
                    protected void run() {
                        // Submit the task
                        tm.tell(new SubmitTask(tdd), testActorGateway);
                        expectMsgClass(Acknowledge.get().getClass());
                        // Wait to be notified about the final execution state by the mock JM
                        TaskExecutionState msg = expectMsgClass(TaskExecutionState.class);
                        // The task should fail after repeated requests
                        assertEquals(msg.getExecutionState(), ExecutionState.FAILED);
                        Throwable error = msg.getError(getClass().getClassLoader());
                        if (error.getClass() != PartitionNotFoundException.class) {
                            error.printStackTrace();
                            fail("Wrong exception: " + error.getMessage());
                        }
                    }
                };
            } catch (Exception e) {
                e.printStackTrace();
                fail(e.getMessage());
            } finally {
                TestingUtils.stopActor(taskManager);
                TestingUtils.stopActor(jobManager);
            }
        }
    };
}
Also used : AkkaActorGateway(org.apache.flink.runtime.instance.AkkaActorGateway) TaskManagerServicesConfiguration(org.apache.flink.runtime.taskexecutor.TaskManagerServicesConfiguration) Configuration(org.apache.flink.configuration.Configuration) ActorRef(akka.actor.ActorRef) JobVertexID(org.apache.flink.runtime.jobgraph.JobVertexID) ExecutionConfig(org.apache.flink.api.common.ExecutionConfig) ResultPartitionLocation(org.apache.flink.runtime.deployment.ResultPartitionLocation) ActorGateway(org.apache.flink.runtime.instance.ActorGateway) AkkaActorGateway(org.apache.flink.runtime.instance.AkkaActorGateway) ResultPartitionID(org.apache.flink.runtime.io.network.partition.ResultPartitionID) IntermediateResultPartitionID(org.apache.flink.runtime.jobgraph.IntermediateResultPartitionID) TaskDeploymentDescriptor(org.apache.flink.runtime.deployment.TaskDeploymentDescriptor) SubmitTask(org.apache.flink.runtime.messages.TaskMessages.SubmitTask) ExecutionAttemptID(org.apache.flink.runtime.executiongraph.ExecutionAttemptID) FiniteDuration(scala.concurrent.duration.FiniteDuration) InputGateDeploymentDescriptor(org.apache.flink.runtime.deployment.InputGateDeploymentDescriptor) PartitionNotFoundException(org.apache.flink.runtime.io.network.partition.PartitionNotFoundException) IOException(java.io.IOException) InputChannelDeploymentDescriptor(org.apache.flink.runtime.deployment.InputChannelDeploymentDescriptor) IntermediateDataSetID(org.apache.flink.runtime.jobgraph.IntermediateDataSetID) JavaTestKit(akka.testkit.JavaTestKit) JobID(org.apache.flink.api.common.JobID) Test(org.junit.Test)

Example 57 with FiniteDuration

use of scala.concurrent.duration.FiniteDuration in project flink by apache.

the class TaskTest method createTask.

private Task createTask(Class<? extends AbstractInvokable> invokable, LibraryCacheManager libCache, NetworkEnvironment networkEnvironment, ResultPartitionConsumableNotifier consumableNotifier, PartitionProducerStateChecker partitionProducerStateChecker, Executor executor, Configuration taskManagerConfig, ExecutionConfig execConfig) throws IOException {
    JobID jobId = new JobID();
    JobVertexID jobVertexId = new JobVertexID();
    ExecutionAttemptID executionAttemptId = new ExecutionAttemptID();
    InputSplitProvider inputSplitProvider = new TaskInputSplitProvider(jobManagerGateway, jobId, jobVertexId, executionAttemptId, new FiniteDuration(60, TimeUnit.SECONDS));
    CheckpointResponder checkpointResponder = new ActorGatewayCheckpointResponder(jobManagerGateway);
    SerializedValue<ExecutionConfig> serializedExecutionConfig = new SerializedValue<>(execConfig);
    JobInformation jobInformation = new JobInformation(jobId, "Test Job", serializedExecutionConfig, new Configuration(), Collections.<BlobKey>emptyList(), Collections.<URL>emptyList());
    TaskInformation taskInformation = new TaskInformation(jobVertexId, "Test Task", 1, 1, invokable.getName(), new Configuration());
    return new Task(jobInformation, taskInformation, executionAttemptId, new AllocationID(), 0, 0, Collections.<ResultPartitionDeploymentDescriptor>emptyList(), Collections.<InputGateDeploymentDescriptor>emptyList(), 0, null, mock(MemoryManager.class), mock(IOManager.class), networkEnvironment, mock(BroadcastVariableManager.class), taskManagerConnection, inputSplitProvider, checkpointResponder, libCache, mock(FileCache.class), new TestingTaskManagerRuntimeInfo(taskManagerConfig), mock(TaskMetricGroup.class), consumableNotifier, partitionProducerStateChecker, executor);
}
Also used : JobInformation(org.apache.flink.runtime.executiongraph.JobInformation) ExecutionAttemptID(org.apache.flink.runtime.executiongraph.ExecutionAttemptID) TaskInformation(org.apache.flink.runtime.executiongraph.TaskInformation) Configuration(org.apache.flink.configuration.Configuration) IOManager(org.apache.flink.runtime.io.disk.iomanager.IOManager) TaskMetricGroup(org.apache.flink.runtime.metrics.groups.TaskMetricGroup) JobVertexID(org.apache.flink.runtime.jobgraph.JobVertexID) AllocationID(org.apache.flink.runtime.clusterframework.types.AllocationID) FiniteDuration(scala.concurrent.duration.FiniteDuration) ExecutionConfig(org.apache.flink.api.common.ExecutionConfig) SerializedValue(org.apache.flink.util.SerializedValue) MemoryManager(org.apache.flink.runtime.memory.MemoryManager) FileCache(org.apache.flink.runtime.filecache.FileCache) TestingTaskManagerRuntimeInfo(org.apache.flink.runtime.util.TestingTaskManagerRuntimeInfo) BroadcastVariableManager(org.apache.flink.runtime.broadcast.BroadcastVariableManager) InputSplitProvider(org.apache.flink.runtime.jobgraph.tasks.InputSplitProvider) JobID(org.apache.flink.api.common.JobID)

Example 58 with FiniteDuration

use of scala.concurrent.duration.FiniteDuration in project flink by apache.

the class StreamTaskTest method testEarlyCanceling.

/**
	 * This test checks that cancel calls that are issued before the operator is
	 * instantiated still lead to proper canceling.
	 */
@Test
public void testEarlyCanceling() throws Exception {
    Deadline deadline = new FiniteDuration(2, TimeUnit.MINUTES).fromNow();
    StreamConfig cfg = new StreamConfig(new Configuration());
    cfg.setStreamOperator(new SlowlyDeserializingOperator());
    cfg.setTimeCharacteristic(TimeCharacteristic.ProcessingTime);
    Task task = createTask(SourceStreamTask.class, cfg, new Configuration());
    TestingExecutionStateListener testingExecutionStateListener = new TestingExecutionStateListener();
    task.registerExecutionListener(testingExecutionStateListener);
    task.startTaskThread();
    Future<ExecutionState> running = testingExecutionStateListener.notifyWhenExecutionState(ExecutionState.RUNNING);
    // wait until the task thread reached state RUNNING
    ExecutionState executionState = Await.result(running, deadline.timeLeft());
    // make sure the task is really running
    if (executionState != ExecutionState.RUNNING) {
        fail("Task entered state " + task.getExecutionState() + " with error " + ExceptionUtils.stringifyException(task.getFailureCause()));
    }
    // send a cancel. because the operator takes a long time to deserialize, this should
    // hit the task before the operator is deserialized
    task.cancelExecution();
    Future<ExecutionState> canceling = testingExecutionStateListener.notifyWhenExecutionState(ExecutionState.CANCELING);
    executionState = Await.result(canceling, deadline.timeLeft());
    // the task should reach state canceled eventually
    assertTrue(executionState == ExecutionState.CANCELING || executionState == ExecutionState.CANCELED);
    task.getExecutingThread().join(deadline.timeLeft().toMillis());
    assertFalse("Task did not cancel", task.getExecutingThread().isAlive());
    assertEquals(ExecutionState.CANCELED, task.getExecutionState());
}
Also used : ExecutionState(org.apache.flink.runtime.execution.ExecutionState) TaskExecutionState(org.apache.flink.runtime.taskmanager.TaskExecutionState) Task(org.apache.flink.runtime.taskmanager.Task) Configuration(org.apache.flink.configuration.Configuration) Deadline(scala.concurrent.duration.Deadline) FiniteDuration(scala.concurrent.duration.FiniteDuration) StreamConfig(org.apache.flink.streaming.api.graph.StreamConfig) Test(org.junit.Test) PrepareForTest(org.powermock.core.classloader.annotations.PrepareForTest)

Example 59 with FiniteDuration

use of scala.concurrent.duration.FiniteDuration in project flink by apache.

the class RescalingITCase method testSavepointRescalingKeyedState.

/**
	 * Tests that a a job with purely keyed state can be restarted from a savepoint
	 * with a different parallelism.
	 */
public void testSavepointRescalingKeyedState(boolean scaleOut, boolean deriveMaxParallelism) throws Exception {
    final int numberKeys = 42;
    final int numberElements = 1000;
    final int numberElements2 = 500;
    final int parallelism = scaleOut ? numSlots / 2 : numSlots;
    final int parallelism2 = scaleOut ? numSlots : numSlots / 2;
    final int maxParallelism = 13;
    FiniteDuration timeout = new FiniteDuration(3, TimeUnit.MINUTES);
    Deadline deadline = timeout.fromNow();
    ActorGateway jobManager = null;
    JobID jobID = null;
    try {
        jobManager = cluster.getLeaderGateway(deadline.timeLeft());
        JobGraph jobGraph = createJobGraphWithKeyedState(parallelism, maxParallelism, numberKeys, numberElements, false, 100);
        jobID = jobGraph.getJobID();
        cluster.submitJobDetached(jobGraph);
        // wait til the sources have emitted numberElements for each key and completed a checkpoint
        SubtaskIndexFlatMapper.workCompletedLatch.await(deadline.timeLeft().toMillis(), TimeUnit.MILLISECONDS);
        // verify the current state
        Set<Tuple2<Integer, Integer>> actualResult = CollectionSink.getElementsSet();
        Set<Tuple2<Integer, Integer>> expectedResult = new HashSet<>();
        for (int key = 0; key < numberKeys; key++) {
            int keyGroupIndex = KeyGroupRangeAssignment.assignToKeyGroup(key, maxParallelism);
            expectedResult.add(Tuple2.of(KeyGroupRangeAssignment.computeOperatorIndexForKeyGroup(maxParallelism, parallelism, keyGroupIndex), numberElements * key));
        }
        assertEquals(expectedResult, actualResult);
        // clear the CollectionSink set for the restarted job
        CollectionSink.clearElementsSet();
        Future<Object> savepointPathFuture = jobManager.ask(new JobManagerMessages.TriggerSavepoint(jobID, Option.<String>empty()), deadline.timeLeft());
        final String savepointPath = ((JobManagerMessages.TriggerSavepointSuccess) Await.result(savepointPathFuture, deadline.timeLeft())).savepointPath();
        Future<Object> jobRemovedFuture = jobManager.ask(new TestingJobManagerMessages.NotifyWhenJobRemoved(jobID), deadline.timeLeft());
        Future<Object> cancellationResponseFuture = jobManager.ask(new JobManagerMessages.CancelJob(jobID), deadline.timeLeft());
        Object cancellationResponse = Await.result(cancellationResponseFuture, deadline.timeLeft());
        assertTrue(cancellationResponse instanceof JobManagerMessages.CancellationSuccess);
        Await.ready(jobRemovedFuture, deadline.timeLeft());
        jobID = null;
        int restoreMaxParallelism = deriveMaxParallelism ? ExecutionJobVertex.VALUE_NOT_SET : maxParallelism;
        JobGraph scaledJobGraph = createJobGraphWithKeyedState(parallelism2, restoreMaxParallelism, numberKeys, numberElements2, true, 100);
        scaledJobGraph.setSavepointRestoreSettings(SavepointRestoreSettings.forPath(savepointPath));
        jobID = scaledJobGraph.getJobID();
        cluster.submitJobAndWait(scaledJobGraph, false);
        jobID = null;
        Set<Tuple2<Integer, Integer>> actualResult2 = CollectionSink.getElementsSet();
        Set<Tuple2<Integer, Integer>> expectedResult2 = new HashSet<>();
        for (int key = 0; key < numberKeys; key++) {
            int keyGroupIndex = KeyGroupRangeAssignment.assignToKeyGroup(key, maxParallelism);
            expectedResult2.add(Tuple2.of(KeyGroupRangeAssignment.computeOperatorIndexForKeyGroup(maxParallelism, parallelism2, keyGroupIndex), key * (numberElements + numberElements2)));
        }
        assertEquals(expectedResult2, actualResult2);
    } finally {
        // clear the CollectionSink set for the restarted job
        CollectionSink.clearElementsSet();
        // clear any left overs from a possibly failed job
        if (jobID != null && jobManager != null) {
            Future<Object> jobRemovedFuture = jobManager.ask(new TestingJobManagerMessages.NotifyWhenJobRemoved(jobID), timeout);
            try {
                Await.ready(jobRemovedFuture, timeout);
            } catch (TimeoutException | InterruptedException ie) {
                fail("Failed while cleaning up the cluster.");
            }
        }
    }
}
Also used : Deadline(scala.concurrent.duration.Deadline) JobManagerMessages(org.apache.flink.runtime.messages.JobManagerMessages) TestingJobManagerMessages(org.apache.flink.runtime.testingUtils.TestingJobManagerMessages) FiniteDuration(scala.concurrent.duration.FiniteDuration) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) TestingJobManagerMessages(org.apache.flink.runtime.testingUtils.TestingJobManagerMessages) Tuple2(org.apache.flink.api.java.tuple.Tuple2) ActorGateway(org.apache.flink.runtime.instance.ActorGateway) JobID(org.apache.flink.api.common.JobID) HashSet(java.util.HashSet) TimeoutException(java.util.concurrent.TimeoutException)

Example 60 with FiniteDuration

use of scala.concurrent.duration.FiniteDuration in project flink by apache.

the class UtilsTest method testYarnFlinkResourceManagerJobManagerLostLeadership.

@Test
public void testYarnFlinkResourceManagerJobManagerLostLeadership() throws Exception {
    new JavaTestKit(system) {

        {
            final Deadline deadline = new FiniteDuration(3, TimeUnit.MINUTES).fromNow();
            Configuration flinkConfig = new Configuration();
            YarnConfiguration yarnConfig = new YarnConfiguration();
            TestingLeaderRetrievalService leaderRetrievalService = new TestingLeaderRetrievalService();
            String applicationMasterHostName = "localhost";
            String webInterfaceURL = "foobar";
            ContaineredTaskManagerParameters taskManagerParameters = new ContaineredTaskManagerParameters(1l, 1l, 1l, 1, new HashMap<String, String>());
            ContainerLaunchContext taskManagerLaunchContext = mock(ContainerLaunchContext.class);
            int yarnHeartbeatIntervalMillis = 1000;
            int maxFailedContainers = 10;
            int numInitialTaskManagers = 5;
            final YarnResourceManagerCallbackHandler callbackHandler = new YarnResourceManagerCallbackHandler();
            AMRMClientAsync<AMRMClient.ContainerRequest> resourceManagerClient = mock(AMRMClientAsync.class);
            NMClient nodeManagerClient = mock(NMClient.class);
            UUID leaderSessionID = UUID.randomUUID();
            final List<Container> containerList = new ArrayList<>();
            for (int i = 0; i < numInitialTaskManagers; i++) {
                containerList.add(new TestingContainer("container_" + i, "localhost"));
            }
            doAnswer(new Answer() {

                int counter = 0;

                @Override
                public Object answer(InvocationOnMock invocation) throws Throwable {
                    if (counter < containerList.size()) {
                        callbackHandler.onContainersAllocated(Collections.singletonList(containerList.get(counter++)));
                    }
                    return null;
                }
            }).when(resourceManagerClient).addContainerRequest(Matchers.any(AMRMClient.ContainerRequest.class));
            ActorRef resourceManager = null;
            ActorRef leader1;
            try {
                leader1 = system.actorOf(Props.create(TestingUtils.ForwardingActor.class, getRef(), Option.apply(leaderSessionID)));
                resourceManager = system.actorOf(Props.create(TestingYarnFlinkResourceManager.class, flinkConfig, yarnConfig, leaderRetrievalService, applicationMasterHostName, webInterfaceURL, taskManagerParameters, taskManagerLaunchContext, yarnHeartbeatIntervalMillis, maxFailedContainers, numInitialTaskManagers, callbackHandler, resourceManagerClient, nodeManagerClient));
                leaderRetrievalService.notifyListener(leader1.path().toString(), leaderSessionID);
                final AkkaActorGateway leader1Gateway = new AkkaActorGateway(leader1, leaderSessionID);
                final AkkaActorGateway resourceManagerGateway = new AkkaActorGateway(resourceManager, leaderSessionID);
                doAnswer(new Answer() {

                    @Override
                    public Object answer(InvocationOnMock invocation) throws Throwable {
                        Container container = (Container) invocation.getArguments()[0];
                        resourceManagerGateway.tell(new NotifyResourceStarted(YarnFlinkResourceManager.extractResourceID(container)), leader1Gateway);
                        return null;
                    }
                }).when(nodeManagerClient).startContainer(Matchers.any(Container.class), Matchers.any(ContainerLaunchContext.class));
                expectMsgClass(deadline.timeLeft(), RegisterResourceManager.class);
                resourceManagerGateway.tell(new RegisterResourceManagerSuccessful(leader1, Collections.EMPTY_LIST));
                for (int i = 0; i < containerList.size(); i++) {
                    expectMsgClass(deadline.timeLeft(), Acknowledge.class);
                }
                Future<Object> taskManagerRegisteredFuture = resourceManagerGateway.ask(new NotifyWhenResourcesRegistered(numInitialTaskManagers), deadline.timeLeft());
                Await.ready(taskManagerRegisteredFuture, deadline.timeLeft());
                leaderRetrievalService.notifyListener(null, null);
                leaderRetrievalService.notifyListener(leader1.path().toString(), leaderSessionID);
                expectMsgClass(deadline.timeLeft(), RegisterResourceManager.class);
                resourceManagerGateway.tell(new RegisterResourceManagerSuccessful(leader1, Collections.EMPTY_LIST));
                for (Container container : containerList) {
                    resourceManagerGateway.tell(new NotifyResourceStarted(YarnFlinkResourceManager.extractResourceID(container)), leader1Gateway);
                }
                for (int i = 0; i < containerList.size(); i++) {
                    expectMsgClass(deadline.timeLeft(), Acknowledge.class);
                }
                Future<Object> numberOfRegisteredResourcesFuture = resourceManagerGateway.ask(RequestNumberOfRegisteredResources.Instance, deadline.timeLeft());
                int numberOfRegisteredResources = (Integer) Await.result(numberOfRegisteredResourcesFuture, deadline.timeLeft());
                assertEquals(numInitialTaskManagers, numberOfRegisteredResources);
            } finally {
                if (resourceManager != null) {
                    resourceManager.tell(PoisonPill.getInstance(), ActorRef.noSender());
                }
            }
        }
    };
}
Also used : AkkaActorGateway(org.apache.flink.runtime.instance.AkkaActorGateway) YarnConfiguration(org.apache.hadoop.yarn.conf.YarnConfiguration) Configuration(org.apache.flink.configuration.Configuration) TestingLeaderRetrievalService(org.apache.flink.runtime.leaderelection.TestingLeaderRetrievalService) ActorRef(akka.actor.ActorRef) ArrayList(java.util.ArrayList) ContaineredTaskManagerParameters(org.apache.flink.runtime.clusterframework.ContaineredTaskManagerParameters) Container(org.apache.hadoop.yarn.api.records.Container) TestingUtils(org.apache.flink.runtime.testingUtils.TestingUtils) YarnConfiguration(org.apache.hadoop.yarn.conf.YarnConfiguration) RegisterResourceManagerSuccessful(org.apache.flink.runtime.clusterframework.messages.RegisterResourceManagerSuccessful) UUID(java.util.UUID) Deadline(scala.concurrent.duration.Deadline) FiniteDuration(scala.concurrent.duration.FiniteDuration) ContainerLaunchContext(org.apache.hadoop.yarn.api.records.ContainerLaunchContext) NotifyResourceStarted(org.apache.flink.runtime.clusterframework.messages.NotifyResourceStarted) Mockito.doAnswer(org.mockito.Mockito.doAnswer) Answer(org.mockito.stubbing.Answer) InvocationOnMock(org.mockito.invocation.InvocationOnMock) NMClient(org.apache.hadoop.yarn.client.api.NMClient) JavaTestKit(akka.testkit.JavaTestKit) NotifyWhenResourcesRegistered(org.apache.flink.yarn.messages.NotifyWhenResourcesRegistered) Test(org.junit.Test)

Aggregations

FiniteDuration (scala.concurrent.duration.FiniteDuration)77 Test (org.junit.Test)61 Configuration (org.apache.flink.configuration.Configuration)37 ActorGateway (org.apache.flink.runtime.instance.ActorGateway)30 JobGraph (org.apache.flink.runtime.jobgraph.JobGraph)27 ActorRef (akka.actor.ActorRef)25 Deadline (scala.concurrent.duration.Deadline)24 JobID (org.apache.flink.api.common.JobID)19 JobVertex (org.apache.flink.runtime.jobgraph.JobVertex)19 JobManagerMessages (org.apache.flink.runtime.messages.JobManagerMessages)17 TestingJobManagerMessages (org.apache.flink.runtime.testingUtils.TestingJobManagerMessages)13 ActorSystem (akka.actor.ActorSystem)12 JavaTestKit (akka.testkit.JavaTestKit)11 Timeout (akka.util.Timeout)11 File (java.io.File)11 TimeoutException (java.util.concurrent.TimeoutException)11 AkkaActorGateway (org.apache.flink.runtime.instance.AkkaActorGateway)11 JobVertexID (org.apache.flink.runtime.jobgraph.JobVertexID)11 Props (akka.actor.Props)10 IOException (java.io.IOException)10