Search in sources :

Example 16 with ResourceCounter

use of org.apache.flink.runtime.util.ResourceCounter in project flink by apache.

the class AdaptiveSchedulerTest method testHasEnoughResourcesUsesUnmatchedSlotsAsUnknown.

@Test
public void testHasEnoughResourcesUsesUnmatchedSlotsAsUnknown() throws Exception {
    final JobGraph jobGraph = createJobGraph();
    final DefaultDeclarativeSlotPool declarativeSlotPool = createDeclarativeSlotPool(jobGraph.getJobID());
    final AdaptiveScheduler scheduler = new AdaptiveSchedulerBuilder(jobGraph, mainThreadExecutor).setDeclarativeSlotPool(declarativeSlotPool).build();
    scheduler.startScheduling();
    final int numRequiredSlots = 1;
    final ResourceCounter requiredResources = ResourceCounter.withResource(ResourceProfile.UNKNOWN, numRequiredSlots);
    final ResourceCounter providedResources = ResourceCounter.withResource(ResourceProfile.newBuilder().setCpuCores(1).build(), numRequiredSlots);
    offerSlots(declarativeSlotPool, createSlotOffersForResourceRequirements(providedResources));
    assertThat(scheduler.hasDesiredResources(requiredResources)).isTrue();
}
Also used : JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) JobGraphTestUtils.streamingJobGraph(org.apache.flink.runtime.jobgraph.JobGraphTestUtils.streamingJobGraph) DefaultDeclarativeSlotPool(org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool) ResourceCounter(org.apache.flink.runtime.util.ResourceCounter) Test(org.junit.Test) ArchivedExecutionGraphTest(org.apache.flink.runtime.executiongraph.ArchivedExecutionGraphTest) DefaultSchedulerTest(org.apache.flink.runtime.scheduler.DefaultSchedulerTest)

Example 17 with ResourceCounter

use of org.apache.flink.runtime.util.ResourceCounter in project flink by apache.

the class AdaptiveSchedulerTest method testHasEnoughResourcesReturnsFalseIfUnsatisfied.

@Test
public void testHasEnoughResourcesReturnsFalseIfUnsatisfied() throws Exception {
    final AdaptiveScheduler scheduler = new AdaptiveSchedulerBuilder(createJobGraph(), mainThreadExecutor).build();
    scheduler.startScheduling();
    final ResourceCounter resourceRequirement = ResourceCounter.withResource(ResourceProfile.UNKNOWN, 1);
    assertThat(scheduler.hasDesiredResources(resourceRequirement)).isFalse();
}
Also used : ResourceCounter(org.apache.flink.runtime.util.ResourceCounter) Test(org.junit.Test) ArchivedExecutionGraphTest(org.apache.flink.runtime.executiongraph.ArchivedExecutionGraphTest) DefaultSchedulerTest(org.apache.flink.runtime.scheduler.DefaultSchedulerTest)

Example 18 with ResourceCounter

use of org.apache.flink.runtime.util.ResourceCounter in project flink by apache.

the class DeclarativeSlotPoolBridge method releaseSlot.

@Override
public void releaseSlot(@Nonnull SlotRequestId slotRequestId, @Nullable Throwable cause) {
    log.debug("Release slot with slot request id {}", slotRequestId);
    assertRunningInMainThread();
    final PendingRequest pendingRequest = pendingRequests.remove(slotRequestId);
    if (pendingRequest != null) {
        getDeclarativeSlotPool().decreaseResourceRequirementsBy(ResourceCounter.withResource(pendingRequest.getResourceProfile(), 1));
        pendingRequest.failRequest(new FlinkException(String.format("Pending slot request with %s has been released.", pendingRequest.getSlotRequestId()), cause));
    } else {
        final AllocationID allocationId = fulfilledRequests.remove(slotRequestId);
        if (allocationId != null) {
            ResourceCounter previouslyFulfilledRequirement = getDeclarativeSlotPool().freeReservedSlot(allocationId, cause, getRelativeTimeMillis());
            getDeclarativeSlotPool().decreaseResourceRequirementsBy(previouslyFulfilledRequirement);
        } else {
            log.debug("Could not find slot which has fulfilled slot request {}. Ignoring the release operation.", slotRequestId);
        }
    }
}
Also used : AllocationID(org.apache.flink.runtime.clusterframework.types.AllocationID) ResourceCounter(org.apache.flink.runtime.util.ResourceCounter) FlinkException(org.apache.flink.util.FlinkException)

Example 19 with ResourceCounter

use of org.apache.flink.runtime.util.ResourceCounter in project flink by apache.

the class DeclarativeSlotPoolBridge method cancelPendingRequests.

private void cancelPendingRequests(Predicate<PendingRequest> requestPredicate, FlinkException cancelCause) {
    ResourceCounter decreasedResourceRequirements = ResourceCounter.empty();
    // need a copy since failing a request could trigger another request to be issued
    final Iterable<PendingRequest> pendingRequestsToFail = new ArrayList<>(pendingRequests.values());
    pendingRequests.clear();
    for (PendingRequest pendingRequest : pendingRequestsToFail) {
        if (requestPredicate.test(pendingRequest)) {
            pendingRequest.failRequest(cancelCause);
            decreasedResourceRequirements = decreasedResourceRequirements.add(pendingRequest.getResourceProfile(), 1);
        } else {
            pendingRequests.put(pendingRequest.getSlotRequestId(), pendingRequest);
        }
    }
    getDeclarativeSlotPool().decreaseResourceRequirementsBy(decreasedResourceRequirements);
}
Also used : ArrayList(java.util.ArrayList) ResourceCounter(org.apache.flink.runtime.util.ResourceCounter)

Example 20 with ResourceCounter

use of org.apache.flink.runtime.util.ResourceCounter in project flink by apache.

the class DeclarativeSlotManager method checkResourceRequirements.

// ---------------------------------------------------------------------------------------------
// Requirement matching
// ---------------------------------------------------------------------------------------------
/**
 * Matches resource requirements against available resources. In a first round requirements are
 * matched against free slot, and any match results in a slot allocation. The remaining
 * unfulfilled requirements are matched against pending slots, allocating more workers if no
 * matching pending slot could be found. If the requirements for a job could not be fulfilled
 * then a notification is sent to the job master informing it as such.
 *
 * <p>Performance notes: At it's core this method loops, for each job, over all free/pending
 * slots for each required slot, trying to find a matching slot. One should generally go in with
 * the assumption that this runs in numberOfJobsRequiringResources * numberOfRequiredSlots *
 * numberOfFreeOrPendingSlots. This is especially important when dealing with pending slots, as
 * matches between requirements and pending slots are not persisted and recomputed on each call.
 * This may required further refinements in the future; e.g., persisting the matches between
 * requirements and pending slots, or not matching against pending slots at all.
 *
 * <p>When dealing with unspecific resource profiles (i.e., {@link ResourceProfile#ANY}/{@link
 * ResourceProfile#UNKNOWN}), then the number of free/pending slots is not relevant because we
 * only need exactly 1 comparison to determine whether a slot can be fulfilled or not, since
 * they are all the same anyway.
 *
 * <p>When dealing with specific resource profiles things can be a lot worse, with the classical
 * cases where either no matches are found, or only at the very end of the iteration. In the
 * absolute worst case, with J jobs, requiring R slots each with a unique resource profile such
 * each pair of these profiles is not matching, and S free/pending slots that don't fulfill any
 * requirement, then this method does a total of J*R*S resource profile comparisons.
 */
private void checkResourceRequirements() {
    final Map<JobID, Collection<ResourceRequirement>> missingResources = resourceTracker.getMissingResources();
    if (missingResources.isEmpty()) {
        return;
    }
    final Map<JobID, ResourceCounter> unfulfilledRequirements = new LinkedHashMap<>();
    for (Map.Entry<JobID, Collection<ResourceRequirement>> resourceRequirements : missingResources.entrySet()) {
        final JobID jobId = resourceRequirements.getKey();
        final ResourceCounter unfulfilledJobRequirements = tryAllocateSlotsForJob(jobId, resourceRequirements.getValue());
        if (!unfulfilledJobRequirements.isEmpty()) {
            unfulfilledRequirements.put(jobId, unfulfilledJobRequirements);
        }
    }
    if (unfulfilledRequirements.isEmpty()) {
        return;
    }
    ResourceCounter pendingSlots = ResourceCounter.withResources(taskExecutorManager.getPendingTaskManagerSlots().stream().collect(Collectors.groupingBy(PendingTaskManagerSlot::getResourceProfile, Collectors.summingInt(x -> 1))));
    for (Map.Entry<JobID, ResourceCounter> unfulfilledRequirement : unfulfilledRequirements.entrySet()) {
        pendingSlots = tryFulfillRequirementsWithPendingSlots(unfulfilledRequirement.getKey(), unfulfilledRequirement.getValue().getResourcesWithCount(), pendingSlots);
    }
}
Also used : WorkerResourceSpec(org.apache.flink.runtime.resourcemanager.WorkerResourceSpec) BiFunction(java.util.function.BiFunction) LoggerFactory(org.slf4j.LoggerFactory) ResourceRequirement(org.apache.flink.runtime.slots.ResourceRequirement) ResourceCounter(org.apache.flink.runtime.util.ResourceCounter) HashMap(java.util.HashMap) CompletableFuture(java.util.concurrent.CompletableFuture) TaskExecutorGateway(org.apache.flink.runtime.taskexecutor.TaskExecutorGateway) LinkedHashMap(java.util.LinkedHashMap) FutureUtils(org.apache.flink.util.concurrent.FutureUtils) Map(java.util.Map) SlotID(org.apache.flink.runtime.clusterframework.types.SlotID) SlotInfo(org.apache.flink.runtime.rest.messages.taskmanager.SlotInfo) ResourceRequirements(org.apache.flink.runtime.slots.ResourceRequirements) SlotOccupiedException(org.apache.flink.runtime.taskexecutor.exceptions.SlotOccupiedException) Nullable(javax.annotation.Nullable) ScheduledExecutor(org.apache.flink.util.concurrent.ScheduledExecutor) Logger(org.slf4j.Logger) Executor(java.util.concurrent.Executor) Collection(java.util.Collection) ResourceManagerId(org.apache.flink.runtime.resourcemanager.ResourceManagerId) Set(java.util.Set) InstanceID(org.apache.flink.runtime.instance.InstanceID) SlotManagerMetricGroup(org.apache.flink.runtime.metrics.groups.SlotManagerMetricGroup) Preconditions(org.apache.flink.util.Preconditions) Collectors(java.util.stream.Collectors) Acknowledge(org.apache.flink.runtime.messages.Acknowledge) ResourceProfile(org.apache.flink.runtime.clusterframework.types.ResourceProfile) MetricNames(org.apache.flink.runtime.metrics.MetricNames) JobID(org.apache.flink.api.common.JobID) TaskExecutorConnection(org.apache.flink.runtime.resourcemanager.registration.TaskExecutorConnection) SlotStatus(org.apache.flink.runtime.taskexecutor.SlotStatus) Optional(java.util.Optional) SlotReport(org.apache.flink.runtime.taskexecutor.SlotReport) Collections(java.util.Collections) Time(org.apache.flink.api.common.time.Time) AllocationID(org.apache.flink.runtime.clusterframework.types.AllocationID) Collection(java.util.Collection) ResourceCounter(org.apache.flink.runtime.util.ResourceCounter) HashMap(java.util.HashMap) LinkedHashMap(java.util.LinkedHashMap) Map(java.util.Map) JobID(org.apache.flink.api.common.JobID) LinkedHashMap(java.util.LinkedHashMap)

Aggregations

ResourceCounter (org.apache.flink.runtime.util.ResourceCounter)39 Test (org.junit.Test)24 ResourceProfile (org.apache.flink.runtime.clusterframework.types.ResourceProfile)10 SlotOffer (org.apache.flink.runtime.taskexecutor.slot.SlotOffer)8 FlinkException (org.apache.flink.util.FlinkException)7 Map (java.util.Map)5 JobID (org.apache.flink.api.common.JobID)5 AllocationID (org.apache.flink.runtime.clusterframework.types.AllocationID)5 LocalTaskManagerLocation (org.apache.flink.runtime.taskmanager.LocalTaskManagerLocation)5 ArrayList (java.util.ArrayList)4 TestingTaskExecutorGateway (org.apache.flink.runtime.taskexecutor.TestingTaskExecutorGateway)4 TestingTaskExecutorGatewayBuilder (org.apache.flink.runtime.taskexecutor.TestingTaskExecutorGatewayBuilder)4 HashMap (java.util.HashMap)3 Time (org.apache.flink.api.common.time.Time)3 ArchivedExecutionGraphTest (org.apache.flink.runtime.executiongraph.ArchivedExecutionGraphTest)3 DefaultSchedulerTest (org.apache.flink.runtime.scheduler.DefaultSchedulerTest)3 ResourceRequirement (org.apache.flink.runtime.slots.ResourceRequirement)3 CompletableFuture (java.util.concurrent.CompletableFuture)2 InstanceID (org.apache.flink.runtime.instance.InstanceID)2 JobGraph (org.apache.flink.runtime.jobgraph.JobGraph)2