Search in sources :

Example 1 with RestartBackoffTimeStrategy

use of org.apache.flink.runtime.executiongraph.failover.flip1.RestartBackoffTimeStrategy in project flink by apache.

the class AdaptiveBatchSchedulerFactory method createInstance.

@Override
public SchedulerNG createInstance(Logger log, JobGraph jobGraph, Executor ioExecutor, Configuration jobMasterConfiguration, SlotPoolService slotPoolService, ScheduledExecutorService futureExecutor, ClassLoader userCodeLoader, CheckpointRecoveryFactory checkpointRecoveryFactory, Time rpcTimeout, BlobWriter blobWriter, JobManagerJobMetricGroup jobManagerJobMetricGroup, Time slotRequestTimeout, ShuffleMaster<?> shuffleMaster, JobMasterPartitionTracker partitionTracker, ExecutionDeploymentTracker executionDeploymentTracker, long initializationTimestamp, ComponentMainThreadExecutor mainThreadExecutor, FatalErrorHandler fatalErrorHandler, JobStatusListener jobStatusListener) throws Exception {
    checkState(jobGraph.getJobType() == JobType.BATCH, "Adaptive batch scheduler only supports batch jobs");
    checkAllExchangesBlocking(jobGraph);
    final SlotPool slotPool = slotPoolService.castInto(SlotPool.class).orElseThrow(() -> new IllegalStateException("The DefaultScheduler requires a SlotPool."));
    final SlotSelectionStrategy slotSelectionStrategy = SlotSelectionStrategyUtils.selectSlotSelectionStrategy(JobType.BATCH, jobMasterConfiguration);
    final PhysicalSlotRequestBulkChecker bulkChecker = PhysicalSlotRequestBulkCheckerImpl.createFromSlotPool(slotPool, SystemClock.getInstance());
    final PhysicalSlotProvider physicalSlotProvider = new PhysicalSlotProviderImpl(slotSelectionStrategy, slotPool);
    final ExecutionSlotAllocatorFactory allocatorFactory = new SlotSharingExecutionSlotAllocatorFactory(physicalSlotProvider, false, bulkChecker, slotRequestTimeout);
    final RestartBackoffTimeStrategy restartBackoffTimeStrategy = RestartBackoffTimeStrategyFactoryLoader.createRestartBackoffTimeStrategyFactory(jobGraph.getSerializedExecutionConfig().deserializeValue(userCodeLoader).getRestartStrategy(), jobMasterConfiguration, jobGraph.isCheckpointingEnabled()).create();
    log.info("Using restart back off time strategy {} for {} ({}).", restartBackoffTimeStrategy, jobGraph.getName(), jobGraph.getJobID());
    final ExecutionGraphFactory executionGraphFactory = new DefaultExecutionGraphFactory(jobMasterConfiguration, userCodeLoader, executionDeploymentTracker, futureExecutor, ioExecutor, rpcTimeout, jobManagerJobMetricGroup, blobWriter, shuffleMaster, partitionTracker, true);
    return new AdaptiveBatchScheduler(log, jobGraph, ioExecutor, jobMasterConfiguration, bulkChecker::start, new ScheduledExecutorServiceAdapter(futureExecutor), userCodeLoader, new CheckpointsCleaner(), checkpointRecoveryFactory, jobManagerJobMetricGroup, new VertexwiseSchedulingStrategy.Factory(), FailoverStrategyFactoryLoader.loadFailoverStrategyFactory(jobMasterConfiguration), restartBackoffTimeStrategy, new DefaultExecutionVertexOperations(), new ExecutionVertexVersioner(), allocatorFactory, initializationTimestamp, mainThreadExecutor, jobStatusListener, executionGraphFactory, shuffleMaster, rpcTimeout, DefaultVertexParallelismDecider.from(jobMasterConfiguration), jobMasterConfiguration.getInteger(JobManagerOptions.ADAPTIVE_BATCH_SCHEDULER_MAX_PARALLELISM));
}
Also used : DefaultExecutionVertexOperations(org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations) SlotSharingExecutionSlotAllocatorFactory(org.apache.flink.runtime.scheduler.SlotSharingExecutionSlotAllocatorFactory) SlotSelectionStrategy(org.apache.flink.runtime.jobmaster.slotpool.SlotSelectionStrategy) VertexwiseSchedulingStrategy(org.apache.flink.runtime.scheduler.strategy.VertexwiseSchedulingStrategy) SlotPool(org.apache.flink.runtime.jobmaster.slotpool.SlotPool) PhysicalSlotRequestBulkChecker(org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkChecker) PhysicalSlotProviderImpl(org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotProviderImpl) SlotSharingExecutionSlotAllocatorFactory(org.apache.flink.runtime.scheduler.SlotSharingExecutionSlotAllocatorFactory) ExecutionSlotAllocatorFactory(org.apache.flink.runtime.scheduler.ExecutionSlotAllocatorFactory) ScheduledExecutorServiceAdapter(org.apache.flink.util.concurrent.ScheduledExecutorServiceAdapter) RestartBackoffTimeStrategy(org.apache.flink.runtime.executiongraph.failover.flip1.RestartBackoffTimeStrategy) CheckpointsCleaner(org.apache.flink.runtime.checkpoint.CheckpointsCleaner) DefaultExecutionGraphFactory(org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory) ExecutionVertexVersioner(org.apache.flink.runtime.scheduler.ExecutionVertexVersioner) ExecutionGraphFactory(org.apache.flink.runtime.scheduler.ExecutionGraphFactory) DefaultExecutionGraphFactory(org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory) PhysicalSlotProvider(org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotProvider)

Example 2 with RestartBackoffTimeStrategy

use of org.apache.flink.runtime.executiongraph.failover.flip1.RestartBackoffTimeStrategy in project flink by apache.

the class AdaptiveSchedulerFactory method createInstance.

@Override
public SchedulerNG createInstance(Logger log, JobGraph jobGraph, Executor ioExecutor, Configuration jobMasterConfiguration, SlotPoolService slotPoolService, ScheduledExecutorService futureExecutor, ClassLoader userCodeLoader, CheckpointRecoveryFactory checkpointRecoveryFactory, Time rpcTimeout, BlobWriter blobWriter, JobManagerJobMetricGroup jobManagerJobMetricGroup, Time slotRequestTimeout, ShuffleMaster<?> shuffleMaster, JobMasterPartitionTracker partitionTracker, ExecutionDeploymentTracker executionDeploymentTracker, long initializationTimestamp, ComponentMainThreadExecutor mainThreadExecutor, FatalErrorHandler fatalErrorHandler, JobStatusListener jobStatusListener) throws Exception {
    final DeclarativeSlotPool declarativeSlotPool = slotPoolService.castInto(DeclarativeSlotPool.class).orElseThrow(() -> new IllegalStateException("The AdaptiveScheduler requires a DeclarativeSlotPool."));
    final RestartBackoffTimeStrategy restartBackoffTimeStrategy = RestartBackoffTimeStrategyFactoryLoader.createRestartBackoffTimeStrategyFactory(jobGraph.getSerializedExecutionConfig().deserializeValue(userCodeLoader).getRestartStrategy(), jobMasterConfiguration, jobGraph.isCheckpointingEnabled()).create();
    log.info("Using restart back off time strategy {} for {} ({}).", restartBackoffTimeStrategy, jobGraph.getName(), jobGraph.getJobID());
    final SlotSharingSlotAllocator slotAllocator = createSlotSharingSlotAllocator(declarativeSlotPool);
    final ExecutionGraphFactory executionGraphFactory = new DefaultExecutionGraphFactory(jobMasterConfiguration, userCodeLoader, executionDeploymentTracker, futureExecutor, ioExecutor, rpcTimeout, jobManagerJobMetricGroup, blobWriter, shuffleMaster, partitionTracker);
    return new AdaptiveScheduler(jobGraph, jobMasterConfiguration, declarativeSlotPool, slotAllocator, ioExecutor, userCodeLoader, new CheckpointsCleaner(), checkpointRecoveryFactory, initialResourceAllocationTimeout, resourceStabilizationTimeout, jobManagerJobMetricGroup, restartBackoffTimeStrategy, initializationTimestamp, mainThreadExecutor, fatalErrorHandler, jobStatusListener, executionGraphFactory);
}
Also used : DeclarativeSlotPool(org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPool) RestartBackoffTimeStrategy(org.apache.flink.runtime.executiongraph.failover.flip1.RestartBackoffTimeStrategy) CheckpointsCleaner(org.apache.flink.runtime.checkpoint.CheckpointsCleaner) DefaultExecutionGraphFactory(org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory) SlotSharingSlotAllocator(org.apache.flink.runtime.scheduler.adaptive.allocator.SlotSharingSlotAllocator) ExecutionGraphFactory(org.apache.flink.runtime.scheduler.ExecutionGraphFactory) DefaultExecutionGraphFactory(org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory)

Example 3 with RestartBackoffTimeStrategy

use of org.apache.flink.runtime.executiongraph.failover.flip1.RestartBackoffTimeStrategy in project flink by apache.

the class DefaultSchedulerFactory method createInstance.

@Override
public SchedulerNG createInstance(final Logger log, final JobGraph jobGraph, final Executor ioExecutor, final Configuration jobMasterConfiguration, final SlotPoolService slotPoolService, final ScheduledExecutorService futureExecutor, final ClassLoader userCodeLoader, final CheckpointRecoveryFactory checkpointRecoveryFactory, final Time rpcTimeout, final BlobWriter blobWriter, final JobManagerJobMetricGroup jobManagerJobMetricGroup, final Time slotRequestTimeout, final ShuffleMaster<?> shuffleMaster, final JobMasterPartitionTracker partitionTracker, final ExecutionDeploymentTracker executionDeploymentTracker, long initializationTimestamp, final ComponentMainThreadExecutor mainThreadExecutor, final FatalErrorHandler fatalErrorHandler, final JobStatusListener jobStatusListener) throws Exception {
    final SlotPool slotPool = slotPoolService.castInto(SlotPool.class).orElseThrow(() -> new IllegalStateException("The DefaultScheduler requires a SlotPool."));
    final DefaultSchedulerComponents schedulerComponents = createSchedulerComponents(jobGraph.getJobType(), jobGraph.isApproximateLocalRecoveryEnabled(), jobMasterConfiguration, slotPool, slotRequestTimeout);
    final RestartBackoffTimeStrategy restartBackoffTimeStrategy = RestartBackoffTimeStrategyFactoryLoader.createRestartBackoffTimeStrategyFactory(jobGraph.getSerializedExecutionConfig().deserializeValue(userCodeLoader).getRestartStrategy(), jobMasterConfiguration, jobGraph.isCheckpointingEnabled()).create();
    log.info("Using restart back off time strategy {} for {} ({}).", restartBackoffTimeStrategy, jobGraph.getName(), jobGraph.getJobID());
    final ExecutionGraphFactory executionGraphFactory = new DefaultExecutionGraphFactory(jobMasterConfiguration, userCodeLoader, executionDeploymentTracker, futureExecutor, ioExecutor, rpcTimeout, jobManagerJobMetricGroup, blobWriter, shuffleMaster, partitionTracker);
    return new DefaultScheduler(log, jobGraph, ioExecutor, jobMasterConfiguration, schedulerComponents.getStartUpAction(), new ScheduledExecutorServiceAdapter(futureExecutor), userCodeLoader, new CheckpointsCleaner(), checkpointRecoveryFactory, jobManagerJobMetricGroup, schedulerComponents.getSchedulingStrategyFactory(), FailoverStrategyFactoryLoader.loadFailoverStrategyFactory(jobMasterConfiguration), restartBackoffTimeStrategy, new DefaultExecutionVertexOperations(), new ExecutionVertexVersioner(), schedulerComponents.getAllocatorFactory(), initializationTimestamp, mainThreadExecutor, (jobId, jobStatus, timestamp) -> {
        if (jobStatus == JobStatus.RESTARTING) {
            slotPool.setIsJobRestarting(true);
        } else {
            slotPool.setIsJobRestarting(false);
        }
        jobStatusListener.jobStatusChanges(jobId, jobStatus, timestamp);
    }, executionGraphFactory, shuffleMaster, rpcTimeout);
}
Also used : ScheduledExecutorServiceAdapter(org.apache.flink.util.concurrent.ScheduledExecutorServiceAdapter) RestartBackoffTimeStrategy(org.apache.flink.runtime.executiongraph.failover.flip1.RestartBackoffTimeStrategy) CheckpointsCleaner(org.apache.flink.runtime.checkpoint.CheckpointsCleaner) SlotPool(org.apache.flink.runtime.jobmaster.slotpool.SlotPool)

Example 4 with RestartBackoffTimeStrategy

use of org.apache.flink.runtime.executiongraph.failover.flip1.RestartBackoffTimeStrategy in project flink by apache.

the class AdaptiveSchedulerTest method testHowToHandleFailureAllowedByStrategy.

@Test
public void testHowToHandleFailureAllowedByStrategy() throws Exception {
    final TestRestartBackoffTimeStrategy restartBackoffTimeStrategy = new TestRestartBackoffTimeStrategy(true, 1234);
    final AdaptiveScheduler scheduler = new AdaptiveSchedulerBuilder(createJobGraph(), mainThreadExecutor).setRestartBackoffTimeStrategy(restartBackoffTimeStrategy).build();
    final FailureResult failureResult = scheduler.howToHandleFailure(new Exception("test"));
    assertThat(failureResult.canRestart()).isTrue();
    assertThat(failureResult.getBackoffTime().toMillis()).isEqualTo(restartBackoffTimeStrategy.getBackoffTime());
}
Also used : TestRestartBackoffTimeStrategy(org.apache.flink.runtime.executiongraph.failover.flip1.TestRestartBackoffTimeStrategy) TaskNotRunningException(org.apache.flink.runtime.operators.coordination.TaskNotRunningException) CheckpointException(org.apache.flink.runtime.checkpoint.CheckpointException) FlinkException(org.apache.flink.util.FlinkException) PartitionProducerDisposedException(org.apache.flink.runtime.jobmanager.PartitionProducerDisposedException) IOException(java.io.IOException) ExecutionException(java.util.concurrent.ExecutionException) SuppressRestartsException(org.apache.flink.runtime.execution.SuppressRestartsException) Test(org.junit.Test) ArchivedExecutionGraphTest(org.apache.flink.runtime.executiongraph.ArchivedExecutionGraphTest) DefaultSchedulerTest(org.apache.flink.runtime.scheduler.DefaultSchedulerTest)

Aggregations

CheckpointsCleaner (org.apache.flink.runtime.checkpoint.CheckpointsCleaner)3 RestartBackoffTimeStrategy (org.apache.flink.runtime.executiongraph.failover.flip1.RestartBackoffTimeStrategy)3 SlotPool (org.apache.flink.runtime.jobmaster.slotpool.SlotPool)2 DefaultExecutionGraphFactory (org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory)2 ExecutionGraphFactory (org.apache.flink.runtime.scheduler.ExecutionGraphFactory)2 ScheduledExecutorServiceAdapter (org.apache.flink.util.concurrent.ScheduledExecutorServiceAdapter)2 IOException (java.io.IOException)1 ExecutionException (java.util.concurrent.ExecutionException)1 CheckpointException (org.apache.flink.runtime.checkpoint.CheckpointException)1 SuppressRestartsException (org.apache.flink.runtime.execution.SuppressRestartsException)1 ArchivedExecutionGraphTest (org.apache.flink.runtime.executiongraph.ArchivedExecutionGraphTest)1 TestRestartBackoffTimeStrategy (org.apache.flink.runtime.executiongraph.failover.flip1.TestRestartBackoffTimeStrategy)1 PartitionProducerDisposedException (org.apache.flink.runtime.jobmanager.PartitionProducerDisposedException)1 DeclarativeSlotPool (org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPool)1 PhysicalSlotProvider (org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotProvider)1 PhysicalSlotProviderImpl (org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotProviderImpl)1 PhysicalSlotRequestBulkChecker (org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkChecker)1 SlotSelectionStrategy (org.apache.flink.runtime.jobmaster.slotpool.SlotSelectionStrategy)1 TaskNotRunningException (org.apache.flink.runtime.operators.coordination.TaskNotRunningException)1 DefaultExecutionVertexOperations (org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations)1