Search in sources :

Example 21 with SlotNotFoundException

use of org.apache.flink.runtime.taskexecutor.slot.SlotNotFoundException in project flink by splunk.

the class MetricUtils method getUsedManagedMemory.

private static long getUsedManagedMemory(TaskSlotTable<?> taskSlotTable) {
    Set<AllocationID> activeTaskAllocationIds = taskSlotTable.getActiveTaskSlotAllocationIds();
    long usedMemory = 0L;
    for (AllocationID allocationID : activeTaskAllocationIds) {
        try {
            MemoryManager taskSlotMemoryManager = taskSlotTable.getTaskMemoryManager(allocationID);
            usedMemory += taskSlotMemoryManager.getMemorySize() - taskSlotMemoryManager.availableMemory();
        } catch (SlotNotFoundException e) {
            LOG.debug("The task slot {} is not present anymore and will be ignored in calculating the amount of used memory.", allocationID);
        }
    }
    return usedMemory;
}
Also used : SlotNotFoundException(org.apache.flink.runtime.taskexecutor.slot.SlotNotFoundException) AllocationID(org.apache.flink.runtime.clusterframework.types.AllocationID) MemoryManager(org.apache.flink.runtime.memory.MemoryManager)

Example 22 with SlotNotFoundException

use of org.apache.flink.runtime.taskexecutor.slot.SlotNotFoundException in project flink by splunk.

the class TaskExecutor method allocateSlotForJob.

private boolean allocateSlotForJob(JobID jobId, SlotID slotId, AllocationID allocationId, ResourceProfile resourceProfile, String targetAddress) throws SlotAllocationException {
    allocateSlot(slotId, jobId, allocationId, resourceProfile);
    final JobTable.Job job;
    try {
        job = jobTable.getOrCreateJob(jobId, () -> registerNewJobAndCreateServices(jobId, targetAddress));
    } catch (Exception e) {
        // free the allocated slot
        try {
            taskSlotTable.freeSlot(allocationId);
        } catch (SlotNotFoundException slotNotFoundException) {
            // slot no longer existent, this should actually never happen, because we've
            // just allocated the slot. So let's fail hard in this case!
            onFatalError(slotNotFoundException);
        }
        // release local state under the allocation id.
        localStateStoresManager.releaseLocalStateForAllocationId(allocationId);
        // sanity check
        if (!taskSlotTable.isSlotFree(slotId.getSlotNumber())) {
            onFatalError(new Exception("Could not free slot " + slotId));
        }
        throw new SlotAllocationException("Could not create new job.", e);
    }
    return job.isConnected();
}
Also used : SlotNotFoundException(org.apache.flink.runtime.taskexecutor.slot.SlotNotFoundException) SlotAllocationException(org.apache.flink.runtime.taskexecutor.exceptions.SlotAllocationException) TaskNotRunningException(org.apache.flink.runtime.operators.coordination.TaskNotRunningException) CheckpointException(org.apache.flink.runtime.checkpoint.CheckpointException) SlotOccupiedException(org.apache.flink.runtime.taskexecutor.exceptions.SlotOccupiedException) SlotAllocationException(org.apache.flink.runtime.taskexecutor.exceptions.SlotAllocationException) FlinkException(org.apache.flink.util.FlinkException) TaskSubmissionException(org.apache.flink.runtime.taskexecutor.exceptions.TaskSubmissionException) TaskException(org.apache.flink.runtime.taskexecutor.exceptions.TaskException) SlotNotActiveException(org.apache.flink.runtime.taskexecutor.slot.SlotNotActiveException) SlotNotFoundException(org.apache.flink.runtime.taskexecutor.slot.SlotNotFoundException) IOException(java.io.IOException) TimeoutException(java.util.concurrent.TimeoutException) RegistrationTimeoutException(org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException) CompletionException(java.util.concurrent.CompletionException) TaskManagerException(org.apache.flink.runtime.taskexecutor.exceptions.TaskManagerException)

Example 23 with SlotNotFoundException

use of org.apache.flink.runtime.taskexecutor.slot.SlotNotFoundException in project flink by splunk.

the class TaskExecutor method freeSlotInternal.

private void freeSlotInternal(AllocationID allocationId, Throwable cause) {
    checkNotNull(allocationId);
    // information
    if (isRunning()) {
        log.debug("Free slot with allocation id {} because: {}", allocationId, cause.getMessage());
        try {
            final JobID jobId = taskSlotTable.getOwningJob(allocationId);
            final int slotIndex = taskSlotTable.freeSlot(allocationId, cause);
            slotAllocationSnapshotPersistenceService.deleteAllocationSnapshot(slotIndex);
            if (slotIndex != -1) {
                if (isConnectedToResourceManager()) {
                    // the slot was freed. Tell the RM about it
                    ResourceManagerGateway resourceManagerGateway = establishedResourceManagerConnection.getResourceManagerGateway();
                    resourceManagerGateway.notifySlotAvailable(establishedResourceManagerConnection.getTaskExecutorRegistrationId(), new SlotID(getResourceID(), slotIndex), allocationId);
                }
                if (jobId != null) {
                    closeJobManagerConnectionIfNoAllocatedResources(jobId);
                }
            }
        } catch (SlotNotFoundException e) {
            log.debug("Could not free slot for allocation id {}.", allocationId, e);
        }
        localStateStoresManager.releaseLocalStateForAllocationId(allocationId);
    } else {
        log.debug("Ignoring the freeing of slot {} because the TaskExecutor is shutting down.", allocationId);
    }
}
Also used : SlotNotFoundException(org.apache.flink.runtime.taskexecutor.slot.SlotNotFoundException) SlotID(org.apache.flink.runtime.clusterframework.types.SlotID) JobID(org.apache.flink.api.common.JobID) RpcEndpoint(org.apache.flink.runtime.rpc.RpcEndpoint) ResourceManagerGateway(org.apache.flink.runtime.resourcemanager.ResourceManagerGateway)

Aggregations

SlotNotFoundException (org.apache.flink.runtime.taskexecutor.slot.SlotNotFoundException)23 IOException (java.io.IOException)16 TaskSubmissionException (org.apache.flink.runtime.taskexecutor.exceptions.TaskSubmissionException)13 SlotNotActiveException (org.apache.flink.runtime.taskexecutor.slot.SlotNotActiveException)13 JobID (org.apache.flink.api.common.JobID)10 AllocationID (org.apache.flink.runtime.clusterframework.types.AllocationID)10 TimeoutException (java.util.concurrent.TimeoutException)9 SlotAllocationException (org.apache.flink.runtime.taskexecutor.exceptions.SlotAllocationException)9 TaskException (org.apache.flink.runtime.taskexecutor.exceptions.TaskException)9 FlinkException (org.apache.flink.util.FlinkException)9 Task (org.apache.flink.runtime.taskmanager.Task)8 MemoryManager (org.apache.flink.runtime.memory.MemoryManager)6 CompletionException (java.util.concurrent.CompletionException)5 CheckpointException (org.apache.flink.runtime.checkpoint.CheckpointException)5 TaskNotRunningException (org.apache.flink.runtime.operators.coordination.TaskNotRunningException)5 RegistrationTimeoutException (org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException)5 SlotOccupiedException (org.apache.flink.runtime.taskexecutor.exceptions.SlotOccupiedException)5 TaskManagerException (org.apache.flink.runtime.taskexecutor.exceptions.TaskManagerException)5 SlotID (org.apache.flink.runtime.clusterframework.types.SlotID)4 LibraryCacheManager (org.apache.flink.runtime.execution.librarycache.LibraryCacheManager)4