Search in sources :

Example 86 with ExecutionVertexID

use of org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID in project flink by splunk.

the class DefaultPreferredLocationsRetriever method getInputLocationFutures.

private Collection<CompletableFuture<TaskManagerLocation>> getInputLocationFutures(final Set<ExecutionVertexID> producersToIgnore, final Collection<ExecutionVertexID> producers) {
    final Collection<CompletableFuture<TaskManagerLocation>> locationsFutures = new ArrayList<>();
    for (ExecutionVertexID producer : producers) {
        final Optional<CompletableFuture<TaskManagerLocation>> optionalLocationFuture;
        if (!producersToIgnore.contains(producer)) {
            optionalLocationFuture = inputsLocationsRetriever.getTaskManagerLocation(producer);
        } else {
            optionalLocationFuture = Optional.empty();
        }
        optionalLocationFuture.ifPresent(locationsFutures::add);
        // be a long time to wait for all the location futures to complete
        if (locationsFutures.size() > MAX_DISTINCT_LOCATIONS_TO_CONSIDER) {
            return Collections.emptyList();
        }
    }
    return locationsFutures;
}
Also used : CompletableFuture(java.util.concurrent.CompletableFuture) ExecutionVertexID(org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID) ArrayList(java.util.ArrayList)

Example 87 with ExecutionVertexID

use of org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID in project flink by splunk.

the class DefaultScheduler method assignResource.

private BiFunction<LogicalSlot, Throwable, LogicalSlot> assignResource(final DeploymentHandle deploymentHandle) {
    final ExecutionVertexVersion requiredVertexVersion = deploymentHandle.getRequiredVertexVersion();
    final ExecutionVertexID executionVertexId = deploymentHandle.getExecutionVertexId();
    return (logicalSlot, throwable) -> {
        if (executionVertexVersioner.isModified(requiredVertexVersion)) {
            if (throwable == null) {
                log.debug("Refusing to assign slot to execution vertex {} because this deployment was " + "superseded by another deployment", executionVertexId);
                releaseSlotIfPresent(logicalSlot);
            }
            return null;
        }
        // a task which is about to cancel in #restartTasksWithDelay(...)
        if (throwable != null) {
            throw new CompletionException(maybeWrapWithNoResourceAvailableException(throwable));
        }
        final ExecutionVertex executionVertex = getExecutionVertex(executionVertexId);
        executionVertex.tryAssignResource(logicalSlot);
        startReserveAllocation(executionVertexId, logicalSlot.getAllocationId());
        return logicalSlot;
    };
}
Also used : ShuffleMaster(org.apache.flink.runtime.shuffle.ShuffleMaster) TaskManagerLocation(org.apache.flink.runtime.taskmanager.TaskManagerLocation) BiFunction(java.util.function.BiFunction) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) TimeoutException(java.util.concurrent.TimeoutException) ExceptionUtils(org.apache.flink.util.ExceptionUtils) Vertex(org.apache.flink.runtime.topology.Vertex) Map(java.util.Map) SchedulingTopology(org.apache.flink.runtime.scheduler.strategy.SchedulingTopology) Preconditions.checkNotNull(org.apache.flink.util.Preconditions.checkNotNull) CoLocationGroup(org.apache.flink.runtime.jobmanager.scheduler.CoLocationGroup) SchedulingStrategyFactory(org.apache.flink.runtime.scheduler.strategy.SchedulingStrategyFactory) ScheduledExecutor(org.apache.flink.util.concurrent.ScheduledExecutor) JobManagerJobMetricGroup(org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup) Collection(java.util.Collection) Set(java.util.Set) CompletionException(java.util.concurrent.CompletionException) ExecutionVertexID(org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID) Collectors(java.util.stream.Collectors) ResourceProfile(org.apache.flink.runtime.clusterframework.types.ResourceProfile) List(java.util.List) FailoverStrategy(org.apache.flink.runtime.executiongraph.failover.flip1.FailoverStrategy) Optional(java.util.Optional) ExecutionFailureHandler(org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler) Time(org.apache.flink.api.common.time.Time) AllocationID(org.apache.flink.runtime.clusterframework.types.AllocationID) NoResourceAvailableException(org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException) IntermediateResultPartitionID(org.apache.flink.runtime.jobgraph.IntermediateResultPartitionID) ComponentMainThreadExecutor(org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor) SlotSharingGroup(org.apache.flink.runtime.jobmanager.scheduler.SlotSharingGroup) HashMap(java.util.HashMap) CompletableFuture(java.util.concurrent.CompletableFuture) JobStatus(org.apache.flink.api.common.JobStatus) Function(java.util.function.Function) ArrayList(java.util.ArrayList) FailureHandlingResult(org.apache.flink.runtime.executiongraph.failover.flip1.FailureHandlingResult) HashSet(java.util.HashSet) OperatorCoordinatorHolder(org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder) FutureUtils(org.apache.flink.util.concurrent.FutureUtils) SchedulingStrategy(org.apache.flink.runtime.scheduler.strategy.SchedulingStrategy) Nullable(javax.annotation.Nullable) Preconditions.checkState(org.apache.flink.util.Preconditions.checkState) ExecutionJobVertex(org.apache.flink.runtime.executiongraph.ExecutionJobVertex) Logger(org.slf4j.Logger) Executor(java.util.concurrent.Executor) Configuration(org.apache.flink.configuration.Configuration) ExecutionState(org.apache.flink.runtime.execution.ExecutionState) CheckpointsCleaner(org.apache.flink.runtime.checkpoint.CheckpointsCleaner) LogicalSlot(org.apache.flink.runtime.jobmaster.LogicalSlot) IterableUtils(org.apache.flink.util.IterableUtils) JobStatusListener(org.apache.flink.runtime.executiongraph.JobStatusListener) CheckpointRecoveryFactory(org.apache.flink.runtime.checkpoint.CheckpointRecoveryFactory) RestartBackoffTimeStrategy(org.apache.flink.runtime.executiongraph.failover.flip1.RestartBackoffTimeStrategy) TimeUnit(java.util.concurrent.TimeUnit) Consumer(java.util.function.Consumer) FailureHandlingResultSnapshot(org.apache.flink.runtime.scheduler.exceptionhistory.FailureHandlingResultSnapshot) TaskExecutionStateTransition(org.apache.flink.runtime.executiongraph.TaskExecutionStateTransition) ExecutionVertex(org.apache.flink.runtime.executiongraph.ExecutionVertex) ExecutionVertexID(org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID) CompletionException(java.util.concurrent.CompletionException) ExecutionVertex(org.apache.flink.runtime.executiongraph.ExecutionVertex)

Example 88 with ExecutionVertexID

use of org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID in project flink by splunk.

the class DefaultScheduler method restartTasksWithDelay.

private void restartTasksWithDelay(final FailureHandlingResult failureHandlingResult) {
    final Set<ExecutionVertexID> verticesToRestart = failureHandlingResult.getVerticesToRestart();
    final Set<ExecutionVertexVersion> executionVertexVersions = new HashSet<>(executionVertexVersioner.recordVertexModifications(verticesToRestart).values());
    final boolean globalRecovery = failureHandlingResult.isGlobalFailure();
    addVerticesToRestartPending(verticesToRestart);
    final CompletableFuture<?> cancelFuture = cancelTasksAsync(verticesToRestart);
    final FailureHandlingResultSnapshot failureHandlingResultSnapshot = FailureHandlingResultSnapshot.create(failureHandlingResult, id -> this.getExecutionVertex(id).getCurrentExecutionAttempt());
    delayExecutor.schedule(() -> FutureUtils.assertNoException(cancelFuture.thenRunAsync(() -> {
        archiveFromFailureHandlingResult(failureHandlingResultSnapshot);
        restartTasks(executionVertexVersions, globalRecovery);
    }, getMainThreadExecutor())), failureHandlingResult.getRestartDelayMS(), TimeUnit.MILLISECONDS);
}
Also used : ExecutionVertexID(org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID) FailureHandlingResultSnapshot(org.apache.flink.runtime.scheduler.exceptionhistory.FailureHandlingResultSnapshot) HashSet(java.util.HashSet)

Example 89 with ExecutionVertexID

use of org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID in project flink by splunk.

the class DefaultScheduler method deployOrHandleError.

private BiFunction<Object, Throwable, Void> deployOrHandleError(final DeploymentHandle deploymentHandle) {
    final ExecutionVertexVersion requiredVertexVersion = deploymentHandle.getRequiredVertexVersion();
    final ExecutionVertexID executionVertexId = requiredVertexVersion.getExecutionVertexId();
    return (ignored, throwable) -> {
        if (executionVertexVersioner.isModified(requiredVertexVersion)) {
            log.debug("Refusing to deploy execution vertex {} because this deployment was " + "superseded by another deployment", executionVertexId);
            return null;
        }
        if (throwable == null) {
            deployTaskSafe(executionVertexId);
        } else {
            handleTaskDeploymentFailure(executionVertexId, throwable);
        }
        return null;
    };
}
Also used : ShuffleMaster(org.apache.flink.runtime.shuffle.ShuffleMaster) TaskManagerLocation(org.apache.flink.runtime.taskmanager.TaskManagerLocation) BiFunction(java.util.function.BiFunction) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) TimeoutException(java.util.concurrent.TimeoutException) ExceptionUtils(org.apache.flink.util.ExceptionUtils) Vertex(org.apache.flink.runtime.topology.Vertex) Map(java.util.Map) SchedulingTopology(org.apache.flink.runtime.scheduler.strategy.SchedulingTopology) Preconditions.checkNotNull(org.apache.flink.util.Preconditions.checkNotNull) CoLocationGroup(org.apache.flink.runtime.jobmanager.scheduler.CoLocationGroup) SchedulingStrategyFactory(org.apache.flink.runtime.scheduler.strategy.SchedulingStrategyFactory) ScheduledExecutor(org.apache.flink.util.concurrent.ScheduledExecutor) JobManagerJobMetricGroup(org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup) Collection(java.util.Collection) Set(java.util.Set) CompletionException(java.util.concurrent.CompletionException) ExecutionVertexID(org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID) Collectors(java.util.stream.Collectors) ResourceProfile(org.apache.flink.runtime.clusterframework.types.ResourceProfile) List(java.util.List) FailoverStrategy(org.apache.flink.runtime.executiongraph.failover.flip1.FailoverStrategy) Optional(java.util.Optional) ExecutionFailureHandler(org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler) Time(org.apache.flink.api.common.time.Time) AllocationID(org.apache.flink.runtime.clusterframework.types.AllocationID) NoResourceAvailableException(org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException) IntermediateResultPartitionID(org.apache.flink.runtime.jobgraph.IntermediateResultPartitionID) ComponentMainThreadExecutor(org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor) SlotSharingGroup(org.apache.flink.runtime.jobmanager.scheduler.SlotSharingGroup) HashMap(java.util.HashMap) CompletableFuture(java.util.concurrent.CompletableFuture) JobStatus(org.apache.flink.api.common.JobStatus) Function(java.util.function.Function) ArrayList(java.util.ArrayList) FailureHandlingResult(org.apache.flink.runtime.executiongraph.failover.flip1.FailureHandlingResult) HashSet(java.util.HashSet) OperatorCoordinatorHolder(org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder) FutureUtils(org.apache.flink.util.concurrent.FutureUtils) SchedulingStrategy(org.apache.flink.runtime.scheduler.strategy.SchedulingStrategy) Nullable(javax.annotation.Nullable) Preconditions.checkState(org.apache.flink.util.Preconditions.checkState) ExecutionJobVertex(org.apache.flink.runtime.executiongraph.ExecutionJobVertex) Logger(org.slf4j.Logger) Executor(java.util.concurrent.Executor) Configuration(org.apache.flink.configuration.Configuration) ExecutionState(org.apache.flink.runtime.execution.ExecutionState) CheckpointsCleaner(org.apache.flink.runtime.checkpoint.CheckpointsCleaner) LogicalSlot(org.apache.flink.runtime.jobmaster.LogicalSlot) IterableUtils(org.apache.flink.util.IterableUtils) JobStatusListener(org.apache.flink.runtime.executiongraph.JobStatusListener) CheckpointRecoveryFactory(org.apache.flink.runtime.checkpoint.CheckpointRecoveryFactory) RestartBackoffTimeStrategy(org.apache.flink.runtime.executiongraph.failover.flip1.RestartBackoffTimeStrategy) TimeUnit(java.util.concurrent.TimeUnit) Consumer(java.util.function.Consumer) FailureHandlingResultSnapshot(org.apache.flink.runtime.scheduler.exceptionhistory.FailureHandlingResultSnapshot) TaskExecutionStateTransition(org.apache.flink.runtime.executiongraph.TaskExecutionStateTransition) ExecutionVertex(org.apache.flink.runtime.executiongraph.ExecutionVertex) ExecutionVertexID(org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID)

Example 90 with ExecutionVertexID

use of org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID in project flink by splunk.

the class UpdatePartitionConsumersTest method testUpdatePartitionConsumers.

/**
 * Test BLOCKING partition information are properly updated to consumers when its producer
 * finishes.
 */
@Test
public void testUpdatePartitionConsumers() throws Exception {
    final SimpleAckingTaskManagerGateway taskManagerGateway = new SimpleAckingTaskManagerGateway();
    final SchedulerBase scheduler = SchedulerTestingUtils.newSchedulerBuilder(jobGraph, ComponentMainThreadExecutorServiceAdapter.forMainThread()).setExecutionSlotAllocatorFactory(new TestExecutionSlotAllocatorFactory(taskManagerGateway)).build();
    final ExecutionVertex ev1 = scheduler.getExecutionVertex(new ExecutionVertexID(v1.getID(), 0));
    final ExecutionVertex ev2 = scheduler.getExecutionVertex(new ExecutionVertexID(v2.getID(), 0));
    final ExecutionVertex ev3 = scheduler.getExecutionVertex(new ExecutionVertexID(v3.getID(), 0));
    final ExecutionVertex ev4 = scheduler.getExecutionVertex(new ExecutionVertexID(v4.getID(), 0));
    final CompletableFuture<TaskDeploymentDescriptor> ev4TddFuture = new CompletableFuture<>();
    taskManagerGateway.setSubmitConsumer(tdd -> {
        if (tdd.getExecutionAttemptId().equals(ev4.getCurrentExecutionAttempt().getAttemptId())) {
            ev4TddFuture.complete(tdd);
        }
    });
    scheduler.startScheduling();
    assertThat(ev1.getExecutionState(), is(ExecutionState.DEPLOYING));
    assertThat(ev2.getExecutionState(), is(ExecutionState.DEPLOYING));
    assertThat(ev3.getExecutionState(), is(ExecutionState.DEPLOYING));
    assertThat(ev4.getExecutionState(), is(ExecutionState.DEPLOYING));
    updateState(scheduler, ev1, ExecutionState.INITIALIZING);
    updateState(scheduler, ev1, ExecutionState.RUNNING);
    updateState(scheduler, ev2, ExecutionState.INITIALIZING);
    updateState(scheduler, ev2, ExecutionState.RUNNING);
    updateState(scheduler, ev3, ExecutionState.INITIALIZING);
    updateState(scheduler, ev3, ExecutionState.RUNNING);
    updateState(scheduler, ev4, ExecutionState.INITIALIZING);
    updateState(scheduler, ev4, ExecutionState.RUNNING);
    final InputGateDeploymentDescriptor ev4Igdd2 = ev4TddFuture.get(TIMEOUT, TimeUnit.MILLISECONDS).getInputGates().get(1);
    assertThat(ev4Igdd2.getShuffleDescriptors()[0], instanceOf(UnknownShuffleDescriptor.class));
    final CompletableFuture<Void> updatePartitionFuture = new CompletableFuture<>();
    taskManagerGateway.setUpdatePartitionsConsumer((attemptId, partitionInfos, time) -> {
        assertThat(attemptId, equalTo(ev4.getCurrentExecutionAttempt().getAttemptId()));
        final List<PartitionInfo> partitionInfoList = IterableUtils.toStream(partitionInfos).collect(Collectors.toList());
        assertThat(partitionInfoList, hasSize(1));
        final PartitionInfo partitionInfo = partitionInfoList.get(0);
        assertThat(partitionInfo.getIntermediateDataSetID(), equalTo(v3.getProducedDataSets().get(0).getId()));
        assertThat(partitionInfo.getShuffleDescriptor(), instanceOf(NettyShuffleDescriptor.class));
        updatePartitionFuture.complete(null);
    });
    updateState(scheduler, ev1, ExecutionState.FINISHED);
    updateState(scheduler, ev3, ExecutionState.FINISHED);
    updatePartitionFuture.get(TIMEOUT, TimeUnit.MILLISECONDS);
}
Also used : NettyShuffleDescriptor(org.apache.flink.runtime.shuffle.NettyShuffleDescriptor) TestExecutionSlotAllocatorFactory(org.apache.flink.runtime.scheduler.TestExecutionSlotAllocatorFactory) InputGateDeploymentDescriptor(org.apache.flink.runtime.deployment.InputGateDeploymentDescriptor) UnknownShuffleDescriptor(org.apache.flink.runtime.shuffle.UnknownShuffleDescriptor) ExecutionVertex(org.apache.flink.runtime.executiongraph.ExecutionVertex) SimpleAckingTaskManagerGateway(org.apache.flink.runtime.executiongraph.utils.SimpleAckingTaskManagerGateway) CompletableFuture(java.util.concurrent.CompletableFuture) ExecutionVertexID(org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID) SchedulerBase(org.apache.flink.runtime.scheduler.SchedulerBase) TaskDeploymentDescriptor(org.apache.flink.runtime.deployment.TaskDeploymentDescriptor) PartitionInfo(org.apache.flink.runtime.executiongraph.PartitionInfo) Test(org.junit.Test)

Aggregations

ExecutionVertexID (org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID)231 Test (org.junit.Test)165 JobVertexID (org.apache.flink.runtime.jobgraph.JobVertexID)63 JobGraph (org.apache.flink.runtime.jobgraph.JobGraph)57 JobVertex (org.apache.flink.runtime.jobgraph.JobVertex)54 SchedulingExecutionVertex (org.apache.flink.runtime.scheduler.strategy.SchedulingExecutionVertex)51 Set (java.util.Set)48 IntermediateResultPartitionID (org.apache.flink.runtime.jobgraph.IntermediateResultPartitionID)45 AdaptiveSchedulerTest (org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerTest)45 TestingSchedulingExecutionVertex (org.apache.flink.runtime.scheduler.strategy.TestingSchedulingExecutionVertex)45 Collection (java.util.Collection)33 TestingSchedulingTopology (org.apache.flink.runtime.scheduler.strategy.TestingSchedulingTopology)33 HashSet (java.util.HashSet)30 ExecutionVertex (org.apache.flink.runtime.executiongraph.ExecutionVertex)30 ArrayList (java.util.ArrayList)27 Map (java.util.Map)27 HashMap (java.util.HashMap)24 List (java.util.List)24 CompletableFuture (java.util.concurrent.CompletableFuture)24 TaskManagerLocation (org.apache.flink.runtime.taskmanager.TaskManagerLocation)24