Search in sources :

Example 1 with JobDiagnosticsUpdateEvent

use of org.apache.hadoop.mapreduce.v2.app.job.event.JobDiagnosticsUpdateEvent in project hadoop by apache.

the class TaskImpl method internalError.

protected void internalError(TaskEventType type) {
    LOG.error("Invalid event " + type + " on Task " + this.taskId);
    eventHandler.handle(new JobDiagnosticsUpdateEvent(this.taskId.getJobId(), "Invalid event " + type + " on Task " + this.taskId));
    eventHandler.handle(new JobEvent(this.taskId.getJobId(), JobEventType.INTERNAL_ERROR));
}
Also used : JobEvent(org.apache.hadoop.mapreduce.v2.app.job.event.JobEvent) JobDiagnosticsUpdateEvent(org.apache.hadoop.mapreduce.v2.app.job.event.JobDiagnosticsUpdateEvent)

Example 2 with JobDiagnosticsUpdateEvent

use of org.apache.hadoop.mapreduce.v2.app.job.event.JobDiagnosticsUpdateEvent in project hadoop by apache.

the class RMContainerAllocator method handleMapContainerRequest.

@SuppressWarnings({ "unchecked" })
private void handleMapContainerRequest(ContainerRequestEvent reqEvent) {
    assert (reqEvent.getAttemptID().getTaskId().getTaskType().equals(TaskType.MAP));
    Resource supportedMaxContainerCapability = getMaxContainerCapability();
    JobId jobId = getJob().getID();
    if (mapResourceRequest.equals(Resources.none())) {
        mapResourceRequest = reqEvent.getCapability();
        eventHandler.handle(new JobHistoryEvent(jobId, new NormalizedResourceEvent(org.apache.hadoop.mapreduce.TaskType.MAP, mapResourceRequest.getMemorySize())));
        LOG.info("mapResourceRequest:" + mapResourceRequest);
    }
    boolean mapContainerRequestAccepted = true;
    if (mapResourceRequest.getMemorySize() > supportedMaxContainerCapability.getMemorySize() || mapResourceRequest.getVirtualCores() > supportedMaxContainerCapability.getVirtualCores()) {
        mapContainerRequestAccepted = false;
    }
    if (mapContainerRequestAccepted) {
        // set the resources
        reqEvent.getCapability().setMemorySize(mapResourceRequest.getMemorySize());
        reqEvent.getCapability().setVirtualCores(mapResourceRequest.getVirtualCores());
        //maps are immediately scheduled
        scheduledRequests.addMap(reqEvent);
    } else {
        String diagMsg = "The required MAP capability is more than the " + "supported max container capability in the cluster. Killing" + " the Job. mapResourceRequest: " + mapResourceRequest + " maxContainerCapability:" + supportedMaxContainerCapability;
        LOG.info(diagMsg);
        eventHandler.handle(new JobDiagnosticsUpdateEvent(jobId, diagMsg));
        eventHandler.handle(new JobEvent(jobId, JobEventType.JOB_KILL));
    }
}
Also used : NormalizedResourceEvent(org.apache.hadoop.mapreduce.jobhistory.NormalizedResourceEvent) JobEvent(org.apache.hadoop.mapreduce.v2.app.job.event.JobEvent) Resource(org.apache.hadoop.yarn.api.records.Resource) JobHistoryEvent(org.apache.hadoop.mapreduce.jobhistory.JobHistoryEvent) JobDiagnosticsUpdateEvent(org.apache.hadoop.mapreduce.v2.app.job.event.JobDiagnosticsUpdateEvent) JobId(org.apache.hadoop.mapreduce.v2.api.records.JobId)

Example 3 with JobDiagnosticsUpdateEvent

use of org.apache.hadoop.mapreduce.v2.app.job.event.JobDiagnosticsUpdateEvent in project hadoop by apache.

the class RMContainerAllocator method getResources.

@SuppressWarnings("unchecked")
private List<Container> getResources() throws Exception {
    applyConcurrentTaskLimits();
    // will be null the first time
    Resource headRoom = Resources.clone(getAvailableResources());
    AllocateResponse response;
    /*
     * If contact with RM is lost, the AM will wait MR_AM_TO_RM_WAIT_INTERVAL_MS
     * milliseconds before aborting. During this interval, AM will still try
     * to contact the RM.
     */
    try {
        response = makeRemoteRequest();
        // Reset retry count if no exception occurred.
        retrystartTime = System.currentTimeMillis();
    } catch (ApplicationAttemptNotFoundException e) {
        // This can happen if the RM has been restarted. If it is in that state,
        // this application must clean itself up.
        eventHandler.handle(new JobEvent(this.getJob().getID(), JobEventType.JOB_AM_REBOOT));
        throw new RMContainerAllocationException("Resource Manager doesn't recognize AttemptId: " + this.getContext().getApplicationAttemptId(), e);
    } catch (ApplicationMasterNotRegisteredException e) {
        LOG.info("ApplicationMaster is out of sync with ResourceManager," + " hence resync and send outstanding requests.");
        // RM may have restarted, re-register with RM.
        lastResponseID = 0;
        register();
        addOutstandingRequestOnResync();
        return null;
    } catch (InvalidLabelResourceRequestException e) {
        // If Invalid label exception is received means the requested label doesnt
        // have access so killing job in this case.
        String diagMsg = "Requested node-label-expression is invalid: " + StringUtils.stringifyException(e);
        LOG.info(diagMsg);
        JobId jobId = this.getJob().getID();
        eventHandler.handle(new JobDiagnosticsUpdateEvent(jobId, diagMsg));
        eventHandler.handle(new JobEvent(jobId, JobEventType.JOB_KILL));
        throw e;
    } catch (Exception e) {
        // re-trying until the retryInterval has expired.
        if (System.currentTimeMillis() - retrystartTime >= retryInterval) {
            LOG.error("Could not contact RM after " + retryInterval + " milliseconds.");
            eventHandler.handle(new JobEvent(this.getJob().getID(), JobEventType.JOB_AM_REBOOT));
            throw new RMContainerAllocationException("Could not contact RM after " + retryInterval + " milliseconds.");
        }
        // continue to attempt to contact the RM.
        throw e;
    }
    Resource newHeadRoom = getAvailableResources();
    List<Container> newContainers = response.getAllocatedContainers();
    // Setting NMTokens
    if (response.getNMTokens() != null) {
        for (NMToken nmToken : response.getNMTokens()) {
            NMTokenCache.setNMToken(nmToken.getNodeId().toString(), nmToken.getToken());
        }
    }
    // Setting AMRMToken
    if (response.getAMRMToken() != null) {
        updateAMRMToken(response.getAMRMToken());
    }
    List<ContainerStatus> finishedContainers = response.getCompletedContainersStatuses();
    // propagate preemption requests
    final PreemptionMessage preemptReq = response.getPreemptionMessage();
    if (preemptReq != null) {
        preemptionPolicy.preempt(new PreemptionContext(assignedRequests), preemptReq);
    }
    if (newContainers.size() + finishedContainers.size() > 0 || !headRoom.equals(newHeadRoom)) {
        //something changed
        recalculateReduceSchedule = true;
        if (LOG.isDebugEnabled() && !headRoom.equals(newHeadRoom)) {
            LOG.debug("headroom=" + newHeadRoom);
        }
    }
    if (LOG.isDebugEnabled()) {
        for (Container cont : newContainers) {
            LOG.debug("Received new Container :" + cont);
        }
    }
    //Called on each allocation. Will know about newly blacklisted/added hosts.
    computeIgnoreBlacklisting();
    handleUpdatedNodes(response);
    handleJobPriorityChange(response);
    // handle receiving the timeline collector address for this app
    String collectorAddr = response.getCollectorAddr();
    MRAppMaster.RunningAppContext appContext = (MRAppMaster.RunningAppContext) this.getContext();
    if (collectorAddr != null && !collectorAddr.isEmpty() && appContext.getTimelineV2Client() != null) {
        appContext.getTimelineV2Client().setTimelineServiceAddress(response.getCollectorAddr());
    }
    for (ContainerStatus cont : finishedContainers) {
        processFinishedContainer(cont);
    }
    return newContainers;
}
Also used : PreemptionMessage(org.apache.hadoop.yarn.api.records.PreemptionMessage) MRAppMaster(org.apache.hadoop.mapreduce.v2.app.MRAppMaster) NMToken(org.apache.hadoop.yarn.api.records.NMToken) Resource(org.apache.hadoop.yarn.api.records.Resource) JobDiagnosticsUpdateEvent(org.apache.hadoop.mapreduce.v2.app.job.event.JobDiagnosticsUpdateEvent) ApplicationMasterNotRegisteredException(org.apache.hadoop.yarn.exceptions.ApplicationMasterNotRegisteredException) InvalidLabelResourceRequestException(org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException) IOException(java.io.IOException) ApplicationAttemptNotFoundException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException) YarnRuntimeException(org.apache.hadoop.yarn.exceptions.YarnRuntimeException) ApplicationAttemptNotFoundException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException) AllocateResponse(org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse) ApplicationMasterNotRegisteredException(org.apache.hadoop.yarn.exceptions.ApplicationMasterNotRegisteredException) Container(org.apache.hadoop.yarn.api.records.Container) ContainerStatus(org.apache.hadoop.yarn.api.records.ContainerStatus) JobEvent(org.apache.hadoop.mapreduce.v2.app.job.event.JobEvent) InvalidLabelResourceRequestException(org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException) JobId(org.apache.hadoop.mapreduce.v2.api.records.JobId)

Example 4 with JobDiagnosticsUpdateEvent

use of org.apache.hadoop.mapreduce.v2.app.job.event.JobDiagnosticsUpdateEvent in project hadoop by apache.

the class RMContainerAllocator method handleReduceContainerRequest.

@SuppressWarnings({ "unchecked" })
private void handleReduceContainerRequest(ContainerRequestEvent reqEvent) {
    assert (reqEvent.getAttemptID().getTaskId().getTaskType().equals(TaskType.REDUCE));
    Resource supportedMaxContainerCapability = getMaxContainerCapability();
    JobId jobId = getJob().getID();
    if (reduceResourceRequest.equals(Resources.none())) {
        reduceResourceRequest = reqEvent.getCapability();
        eventHandler.handle(new JobHistoryEvent(jobId, new NormalizedResourceEvent(org.apache.hadoop.mapreduce.TaskType.REDUCE, reduceResourceRequest.getMemorySize())));
        LOG.info("reduceResourceRequest:" + reduceResourceRequest);
    }
    boolean reduceContainerRequestAccepted = true;
    if (reduceResourceRequest.getMemorySize() > supportedMaxContainerCapability.getMemorySize() || reduceResourceRequest.getVirtualCores() > supportedMaxContainerCapability.getVirtualCores()) {
        reduceContainerRequestAccepted = false;
    }
    if (reduceContainerRequestAccepted) {
        // set the resources
        reqEvent.getCapability().setVirtualCores(reduceResourceRequest.getVirtualCores());
        reqEvent.getCapability().setMemorySize(reduceResourceRequest.getMemorySize());
        if (reqEvent.getEarlierAttemptFailed()) {
            //previously failed reducers are added to the front for fail fast
            pendingReduces.addFirst(new ContainerRequest(reqEvent, PRIORITY_REDUCE, reduceNodeLabelExpression));
        } else {
            //reduces are added to pending queue and are slowly ramped up
            pendingReduces.add(new ContainerRequest(reqEvent, PRIORITY_REDUCE, reduceNodeLabelExpression));
        }
    } else {
        String diagMsg = "REDUCE capability required is more than the " + "supported max container capability in the cluster. Killing" + " the Job. reduceResourceRequest: " + reduceResourceRequest + " maxContainerCapability:" + supportedMaxContainerCapability;
        LOG.info(diagMsg);
        eventHandler.handle(new JobDiagnosticsUpdateEvent(jobId, diagMsg));
        eventHandler.handle(new JobEvent(jobId, JobEventType.JOB_KILL));
    }
}
Also used : NormalizedResourceEvent(org.apache.hadoop.mapreduce.jobhistory.NormalizedResourceEvent) JobEvent(org.apache.hadoop.mapreduce.v2.app.job.event.JobEvent) Resource(org.apache.hadoop.yarn.api.records.Resource) JobHistoryEvent(org.apache.hadoop.mapreduce.jobhistory.JobHistoryEvent) JobDiagnosticsUpdateEvent(org.apache.hadoop.mapreduce.v2.app.job.event.JobDiagnosticsUpdateEvent) JobId(org.apache.hadoop.mapreduce.v2.api.records.JobId)

Example 5 with JobDiagnosticsUpdateEvent

use of org.apache.hadoop.mapreduce.v2.app.job.event.JobDiagnosticsUpdateEvent in project hadoop by apache.

the class TestJobImpl method testReportDiagnostics.

@Test
public void testReportDiagnostics() throws Exception {
    JobID jobID = JobID.forName("job_1234567890000_0001");
    JobId jobId = TypeConverter.toYarn(jobID);
    final String diagMsg = "some diagnostic message";
    final JobDiagnosticsUpdateEvent diagUpdateEvent = new JobDiagnosticsUpdateEvent(jobId, diagMsg);
    MRAppMetrics mrAppMetrics = MRAppMetrics.create();
    AppContext mockContext = mock(AppContext.class);
    when(mockContext.hasSuccessfullyUnregistered()).thenReturn(true);
    JobImpl job = new JobImpl(jobId, Records.newRecord(ApplicationAttemptId.class), new Configuration(), mock(EventHandler.class), null, mock(JobTokenSecretManager.class), null, SystemClock.getInstance(), null, mrAppMetrics, null, true, null, 0, null, mockContext, null, null);
    job.handle(diagUpdateEvent);
    String diagnostics = job.getReport().getDiagnostics();
    Assert.assertNotNull(diagnostics);
    Assert.assertTrue(diagnostics.contains(diagMsg));
    job = new JobImpl(jobId, Records.newRecord(ApplicationAttemptId.class), new Configuration(), mock(EventHandler.class), null, mock(JobTokenSecretManager.class), null, SystemClock.getInstance(), null, mrAppMetrics, null, true, null, 0, null, mockContext, null, null);
    job.handle(new JobEvent(jobId, JobEventType.JOB_KILL));
    job.handle(diagUpdateEvent);
    diagnostics = job.getReport().getDiagnostics();
    Assert.assertNotNull(diagnostics);
    Assert.assertTrue(diagnostics.contains(diagMsg));
}
Also used : Configuration(org.apache.hadoop.conf.Configuration) JobEvent(org.apache.hadoop.mapreduce.v2.app.job.event.JobEvent) JobTokenSecretManager(org.apache.hadoop.mapreduce.security.token.JobTokenSecretManager) AppContext(org.apache.hadoop.mapreduce.v2.app.AppContext) CommitterEventHandler(org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler) EventHandler(org.apache.hadoop.yarn.event.EventHandler) JobDiagnosticsUpdateEvent(org.apache.hadoop.mapreduce.v2.app.job.event.JobDiagnosticsUpdateEvent) ApplicationAttemptId(org.apache.hadoop.yarn.api.records.ApplicationAttemptId) MRAppMetrics(org.apache.hadoop.mapreduce.v2.app.metrics.MRAppMetrics) JobID(org.apache.hadoop.mapreduce.JobID) JobId(org.apache.hadoop.mapreduce.v2.api.records.JobId) Test(org.junit.Test)

Aggregations

JobDiagnosticsUpdateEvent (org.apache.hadoop.mapreduce.v2.app.job.event.JobDiagnosticsUpdateEvent)6 JobEvent (org.apache.hadoop.mapreduce.v2.app.job.event.JobEvent)6 JobId (org.apache.hadoop.mapreduce.v2.api.records.JobId)4 Resource (org.apache.hadoop.yarn.api.records.Resource)3 JobHistoryEvent (org.apache.hadoop.mapreduce.jobhistory.JobHistoryEvent)2 NormalizedResourceEvent (org.apache.hadoop.mapreduce.jobhistory.NormalizedResourceEvent)2 IOException (java.io.IOException)1 Configuration (org.apache.hadoop.conf.Configuration)1 JobID (org.apache.hadoop.mapreduce.JobID)1 JobTokenSecretManager (org.apache.hadoop.mapreduce.security.token.JobTokenSecretManager)1 AppContext (org.apache.hadoop.mapreduce.v2.app.AppContext)1 MRAppMaster (org.apache.hadoop.mapreduce.v2.app.MRAppMaster)1 CommitterEventHandler (org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler)1 TaskAttemptStateInternal (org.apache.hadoop.mapreduce.v2.app.job.TaskAttemptStateInternal)1 MRAppMetrics (org.apache.hadoop.mapreduce.v2.app.metrics.MRAppMetrics)1 AllocateResponse (org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse)1 ApplicationAttemptId (org.apache.hadoop.yarn.api.records.ApplicationAttemptId)1 Container (org.apache.hadoop.yarn.api.records.Container)1 ContainerStatus (org.apache.hadoop.yarn.api.records.ContainerStatus)1 NMToken (org.apache.hadoop.yarn.api.records.NMToken)1