Search in sources :

Example 66 with ResourceID

use of org.apache.flink.runtime.clusterframework.types.ResourceID in project flink by apache.

the class SlotProtocolTest method testSlotsUnavailableRequest.

/**
	 * Tests whether
	 * 1) SlotRequest is routed to the SlotManager
	 * 2) SlotRequest is confirmed
	 * 3) SlotRequest leads to a container allocation
	 * 4) Slot becomes available and TaskExecutor gets a SlotRequest
	 */
@Test
public void testSlotsUnavailableRequest() throws Exception {
    final String rmAddress = "/rm1";
    final String jmAddress = "/jm1";
    final JobID jobID = new JobID();
    testRpcService.registerGateway(jmAddress, mock(JobMasterGateway.class));
    final TestingHighAvailabilityServices testingHaServices = new TestingHighAvailabilityServices();
    final UUID rmLeaderID = UUID.randomUUID();
    final UUID jmLeaderID = UUID.randomUUID();
    TestingLeaderElectionService rmLeaderElectionService = configureHA(testingHaServices, jobID, rmAddress, rmLeaderID, jmAddress, jmLeaderID);
    ResourceManagerConfiguration resourceManagerConfiguration = new ResourceManagerConfiguration(Time.seconds(5L), Time.seconds(5L), Time.minutes(5L));
    JobLeaderIdService jobLeaderIdService = new JobLeaderIdService(testingHaServices, testRpcService.getScheduledExecutor(), resourceManagerConfiguration.getJobTimeout());
    final TestingSlotManagerFactory slotManagerFactory = new TestingSlotManagerFactory();
    SpiedResourceManager resourceManager = new SpiedResourceManager(testRpcService, resourceManagerConfiguration, testingHaServices, slotManagerFactory, mock(MetricRegistry.class), jobLeaderIdService, mock(FatalErrorHandler.class));
    resourceManager.start();
    rmLeaderElectionService.isLeader(rmLeaderID);
    Future<RegistrationResponse> registrationFuture = resourceManager.registerJobManager(rmLeaderID, jmLeaderID, jmAddress, jobID);
    try {
        registrationFuture.get(5, TimeUnit.SECONDS);
    } catch (Exception e) {
        Assert.fail("JobManager registration Future didn't become ready.");
    }
    final SlotManager slotManager = slotManagerFactory.slotManager;
    final AllocationID allocationID = new AllocationID();
    final ResourceProfile resourceProfile = new ResourceProfile(1.0, 100);
    SlotRequest slotRequest = new SlotRequest(jobID, allocationID, resourceProfile);
    RMSlotRequestReply slotRequestReply = resourceManager.requestSlot(jmLeaderID, rmLeaderID, slotRequest);
    // 1) SlotRequest is routed to the SlotManager
    verify(slotManager).requestSlot(slotRequest);
    // 2) SlotRequest is confirmed
    Assert.assertEquals(slotRequestReply.getAllocationID(), allocationID);
    // 3) SlotRequest leads to a container allocation
    Assert.assertEquals(1, resourceManager.startNewWorkerCalled);
    Assert.assertFalse(slotManager.isAllocated(allocationID));
    // slot becomes available
    final String tmAddress = "/tm1";
    TaskExecutorGateway taskExecutorGateway = mock(TaskExecutorGateway.class);
    Mockito.when(taskExecutorGateway.requestSlot(any(SlotID.class), any(JobID.class), any(AllocationID.class), any(String.class), any(UUID.class), any(Time.class))).thenReturn(new FlinkCompletableFuture<TMSlotRequestReply>());
    testRpcService.registerGateway(tmAddress, taskExecutorGateway);
    final ResourceID resourceID = ResourceID.generate();
    final SlotID slotID = new SlotID(resourceID, 0);
    final SlotStatus slotStatus = new SlotStatus(slotID, resourceProfile);
    final SlotReport slotReport = new SlotReport(Collections.singletonList(slotStatus));
    // register slot at SlotManager
    slotManager.registerTaskExecutor(resourceID, new TaskExecutorRegistration(taskExecutorGateway), slotReport);
    // 4) Slot becomes available and TaskExecutor gets a SlotRequest
    verify(taskExecutorGateway, timeout(5000)).requestSlot(eq(slotID), eq(jobID), eq(allocationID), any(String.class), any(UUID.class), any(Time.class));
}
Also used : TMSlotRequestReply(org.apache.flink.runtime.resourcemanager.messages.taskexecutor.TMSlotRequestReply) TaskExecutorRegistration(org.apache.flink.runtime.resourcemanager.registration.TaskExecutorRegistration) JobLeaderIdService(org.apache.flink.runtime.resourcemanager.JobLeaderIdService) Time(org.apache.flink.api.common.time.Time) JobMasterGateway(org.apache.flink.runtime.jobmaster.JobMasterGateway) SlotRequest(org.apache.flink.runtime.resourcemanager.SlotRequest) TestingHighAvailabilityServices(org.apache.flink.runtime.highavailability.TestingHighAvailabilityServices) ResourceID(org.apache.flink.runtime.clusterframework.types.ResourceID) UUID(java.util.UUID) RegistrationResponse(org.apache.flink.runtime.registration.RegistrationResponse) ResourceProfile(org.apache.flink.runtime.clusterframework.types.ResourceProfile) TestingLeaderElectionService(org.apache.flink.runtime.leaderelection.TestingLeaderElectionService) SlotStatus(org.apache.flink.runtime.taskexecutor.SlotStatus) MetricRegistry(org.apache.flink.runtime.metrics.MetricRegistry) AllocationID(org.apache.flink.runtime.clusterframework.types.AllocationID) RMSlotRequestReply(org.apache.flink.runtime.resourcemanager.messages.jobmanager.RMSlotRequestReply) SlotReport(org.apache.flink.runtime.taskexecutor.SlotReport) ResourceManagerConfiguration(org.apache.flink.runtime.resourcemanager.ResourceManagerConfiguration) TaskExecutorGateway(org.apache.flink.runtime.taskexecutor.TaskExecutorGateway) FatalErrorHandler(org.apache.flink.runtime.rpc.FatalErrorHandler) SlotID(org.apache.flink.runtime.clusterframework.types.SlotID) TestingSlotManager(org.apache.flink.runtime.resourcemanager.TestingSlotManager) JobID(org.apache.flink.api.common.JobID) Test(org.junit.Test)

Example 67 with ResourceID

use of org.apache.flink.runtime.clusterframework.types.ResourceID in project flink by apache.

the class MiniCluster method startTaskManagers.

protected TaskManagerRunner[] startTaskManagers(Configuration configuration, HighAvailabilityServices haServices, MetricRegistry metricRegistry, int numTaskManagers, RpcService[] taskManagerRpcServices) throws Exception {
    final TaskManagerRunner[] taskManagerRunners = new TaskManagerRunner[numTaskManagers];
    final boolean localCommunication = numTaskManagers == 1;
    for (int i = 0; i < numTaskManagers; i++) {
        taskManagerRunners[i] = new TaskManagerRunner(configuration, new ResourceID(UUID.randomUUID().toString()), taskManagerRpcServices[i], haServices, heartbeatServices, metricRegistry, localCommunication);
        taskManagerRunners[i].start();
    }
    return taskManagerRunners;
}
Also used : ResourceID(org.apache.flink.runtime.clusterframework.types.ResourceID) TaskManagerRunner(org.apache.flink.runtime.taskexecutor.TaskManagerRunner)

Example 68 with ResourceID

use of org.apache.flink.runtime.clusterframework.types.ResourceID in project flink by apache.

the class TaskExecutor method establishJobManagerConnection.

private void establishJobManagerConnection(JobID jobId, final JobMasterGateway jobMasterGateway, UUID jobManagerLeaderId, JMTMRegistrationSuccess registrationSuccess) {
    log.info("Establish JobManager connection for job {}.", jobId);
    if (jobManagerTable.contains(jobId)) {
        JobManagerConnection oldJobManagerConnection = jobManagerTable.get(jobId);
        if (!oldJobManagerConnection.getLeaderId().equals(jobManagerLeaderId)) {
            closeJobManagerConnection(jobId, new Exception("Found new job leader for job id " + jobId + '.'));
        }
    }
    ResourceID jobManagerResourceID = registrationSuccess.getResourceID();
    JobManagerConnection newJobManagerConnection = associateWithJobManager(jobId, jobManagerResourceID, jobMasterGateway, jobManagerLeaderId, registrationSuccess.getBlobPort());
    jobManagerConnections.put(jobManagerResourceID, newJobManagerConnection);
    jobManagerTable.put(jobId, newJobManagerConnection);
    // monitor the job manager as heartbeat target
    jobManagerHeartbeatManager.monitorTarget(jobManagerResourceID, new HeartbeatTarget<Void>() {

        @Override
        public void receiveHeartbeat(ResourceID resourceID, Void payload) {
            jobMasterGateway.heartbeatFromTaskManager(resourceID);
        }

        @Override
        public void requestHeartbeat(ResourceID resourceID, Void payload) {
        // request heartbeat will never be called on the task manager side
        }
    });
    offerSlotsToJobManager(jobId);
}
Also used : ResourceID(org.apache.flink.runtime.clusterframework.types.ResourceID) TimeoutException(java.util.concurrent.TimeoutException) PartitionException(org.apache.flink.runtime.taskexecutor.exceptions.PartitionException) CheckpointException(org.apache.flink.runtime.taskexecutor.exceptions.CheckpointException) SlotAllocationException(org.apache.flink.runtime.taskexecutor.exceptions.SlotAllocationException) TaskSubmissionException(org.apache.flink.runtime.taskexecutor.exceptions.TaskSubmissionException) TaskException(org.apache.flink.runtime.taskexecutor.exceptions.TaskException) SlotNotActiveException(org.apache.flink.runtime.taskexecutor.slot.SlotNotActiveException) SlotNotFoundException(org.apache.flink.runtime.taskexecutor.slot.SlotNotFoundException) IOException(java.io.IOException)

Example 69 with ResourceID

use of org.apache.flink.runtime.clusterframework.types.ResourceID in project flink by apache.

the class HeartbeatManagerTest method testRegularHeartbeat.

/**
	 * Tests that regular heartbeat signal triggers the right callback functions in the
	 * {@link HeartbeatListener}.
	 */
@Test
public void testRegularHeartbeat() {
    long heartbeatTimeout = 1000L;
    ResourceID ownResourceID = new ResourceID("foobar");
    ResourceID targetResourceID = new ResourceID("barfoo");
    HeartbeatListener<Object, Object> heartbeatListener = mock(HeartbeatListener.class);
    ScheduledExecutor scheduledExecutor = mock(ScheduledExecutor.class);
    Object expectedObject = new Object();
    when(heartbeatListener.retrievePayload()).thenReturn(FlinkCompletableFuture.completed(expectedObject));
    HeartbeatManagerImpl<Object, Object> heartbeatManager = new HeartbeatManagerImpl<>(heartbeatTimeout, ownResourceID, heartbeatListener, new DirectExecutorService(), scheduledExecutor, LOG);
    HeartbeatTarget<Object> heartbeatTarget = mock(HeartbeatTarget.class);
    heartbeatManager.monitorTarget(targetResourceID, heartbeatTarget);
    heartbeatManager.requestHeartbeat(targetResourceID, expectedObject);
    verify(heartbeatListener, times(1)).reportPayload(targetResourceID, expectedObject);
    verify(heartbeatListener, times(1)).retrievePayload();
    verify(heartbeatTarget, times(1)).receiveHeartbeat(ownResourceID, expectedObject);
    heartbeatManager.receiveHeartbeat(targetResourceID, expectedObject);
    verify(heartbeatListener, times(2)).reportPayload(targetResourceID, expectedObject);
}
Also used : ResourceID(org.apache.flink.runtime.clusterframework.types.ResourceID) DirectExecutorService(org.apache.flink.runtime.util.DirectExecutorService) ScheduledExecutor(org.apache.flink.runtime.concurrent.ScheduledExecutor) Test(org.junit.Test)

Example 70 with ResourceID

use of org.apache.flink.runtime.clusterframework.types.ResourceID in project flink by apache.

the class HeartbeatManagerTest method testTargetUnmonitoring.

/**
	 * Tests that after unmonitoring a target, there won't be a timeout triggered
	 */
@Test
public void testTargetUnmonitoring() throws InterruptedException, ExecutionException {
    // this might be too aggresive for Travis, let's see...
    long heartbeatTimeout = 100L;
    ResourceID resourceID = new ResourceID("foobar");
    ResourceID targetID = new ResourceID("target");
    Object object = new Object();
    TestingHeartbeatListener heartbeatListener = new TestingHeartbeatListener(object);
    HeartbeatManager<Object, Object> heartbeatManager = new HeartbeatManagerImpl<>(heartbeatTimeout, resourceID, heartbeatListener, new DirectExecutorService(), new ScheduledExecutorServiceAdapter(new ScheduledThreadPoolExecutor(1)), LOG);
    heartbeatManager.monitorTarget(targetID, mock(HeartbeatTarget.class));
    heartbeatManager.unmonitorTarget(targetID);
    Future<ResourceID> timeout = heartbeatListener.getTimeoutFuture();
    try {
        timeout.get(2 * heartbeatTimeout, TimeUnit.MILLISECONDS);
        fail("Timeout should time out.");
    } catch (TimeoutException e) {
    // the timeout should not be completed since we unmonitored the target
    }
}
Also used : ScheduledExecutorServiceAdapter(org.apache.flink.runtime.concurrent.ScheduledExecutorServiceAdapter) ResourceID(org.apache.flink.runtime.clusterframework.types.ResourceID) ScheduledThreadPoolExecutor(java.util.concurrent.ScheduledThreadPoolExecutor) DirectExecutorService(org.apache.flink.runtime.util.DirectExecutorService) TimeoutException(java.util.concurrent.TimeoutException) Test(org.junit.Test)

Aggregations

ResourceID (org.apache.flink.runtime.clusterframework.types.ResourceID)74 Test (org.junit.Test)48 TaskManagerLocation (org.apache.flink.runtime.taskmanager.TaskManagerLocation)25 Time (org.apache.flink.api.common.time.Time)18 UUID (java.util.UUID)16 JobID (org.apache.flink.api.common.JobID)16 Configuration (org.apache.flink.configuration.Configuration)14 AllocationID (org.apache.flink.runtime.clusterframework.types.AllocationID)13 JavaTestKit (akka.testkit.JavaTestKit)12 MetricRegistry (org.apache.flink.runtime.metrics.MetricRegistry)12 InetAddress (java.net.InetAddress)11 SlotID (org.apache.flink.runtime.clusterframework.types.SlotID)10 HeartbeatServices (org.apache.flink.runtime.heartbeat.HeartbeatServices)10 TestingHighAvailabilityServices (org.apache.flink.runtime.highavailability.TestingHighAvailabilityServices)10 SlotRequest (org.apache.flink.runtime.resourcemanager.SlotRequest)10 IOManager (org.apache.flink.runtime.io.disk.iomanager.IOManager)9 NetworkEnvironment (org.apache.flink.runtime.io.network.NetworkEnvironment)9 ActorTaskManagerGateway (org.apache.flink.runtime.jobmanager.slots.ActorTaskManagerGateway)9 MemoryManager (org.apache.flink.runtime.memory.MemoryManager)9 TestingSerialRpcService (org.apache.flink.runtime.rpc.TestingSerialRpcService)9