Search in sources :

Example 96 with OneShotLatch

use of org.apache.flink.core.testutils.OneShotLatch in project flink by apache.

the class DispatcherCleanupITCase method createAndStartJobGraphStoreWithCleanupFailures.

private JobGraphStore createAndStartJobGraphStoreWithCleanupFailures(int numberOfCleanupFailures, Throwable throwable, AtomicInteger actualCleanupCallCount, OneShotLatch successfulCleanupLatch) throws Exception {
    final AtomicInteger failureCount = new AtomicInteger(numberOfCleanupFailures);
    final JobGraphStore jobGraphStore = TestingJobGraphStore.newBuilder().setGlobalCleanupFunction((ignoredJobId, ignoredExecutor) -> {
        actualCleanupCallCount.incrementAndGet();
        if (failureCount.getAndDecrement() > 0) {
            return FutureUtils.completedExceptionally(throwable);
        }
        successfulCleanupLatch.trigger();
        return FutureUtils.completedVoidFuture();
    }).build();
    jobGraphStore.start(null);
    return jobGraphStore;
}
Also used : CoreMatchers(org.hamcrest.CoreMatchers) Deadline(org.apache.flink.api.common.time.Deadline) RpcEndpoint(org.apache.flink.runtime.rpc.RpcEndpoint) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) IsEqual.equalTo(org.hamcrest.core.IsEqual.equalTo) CheckpointCoordinatorConfiguration(org.apache.flink.runtime.jobgraph.tasks.CheckpointCoordinatorConfiguration) ExceptionUtils(org.apache.flink.util.ExceptionUtils) IsEmptyCollection(org.hamcrest.collection.IsEmptyCollection) PerJobCheckpointRecoveryFactory(org.apache.flink.runtime.checkpoint.PerJobCheckpointRecoveryFactory) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) After(org.junit.After) Duration(java.time.Duration) JobCheckpointingSettings(org.apache.flink.runtime.jobgraph.tasks.JobCheckpointingSettings) DispatcherResourceCleanerFactory(org.apache.flink.runtime.dispatcher.cleanup.DispatcherResourceCleanerFactory) BlockingQueue(java.util.concurrent.BlockingQueue) UUID(java.util.UUID) LinkedBlockingQueue(java.util.concurrent.LinkedBlockingQueue) Collectors(java.util.stream.Collectors) CountDownLatch(java.util.concurrent.CountDownLatch) List(java.util.List) TimeUtils(org.apache.flink.util.TimeUtils) TestingJobResultStore(org.apache.flink.runtime.testutils.TestingJobResultStore) Optional(java.util.Optional) JobResultStore(org.apache.flink.runtime.highavailability.JobResultStore) JobGraphStore(org.apache.flink.runtime.jobmanager.JobGraphStore) OneShotLatch(org.apache.flink.core.testutils.OneShotLatch) JobVertex(org.apache.flink.runtime.jobgraph.JobVertex) FlinkMatchers(org.apache.flink.core.testutils.FlinkMatchers) TestingJobGraphStore(org.apache.flink.runtime.testutils.TestingJobGraphStore) EmbeddedJobResultStore(org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedJobResultStore) CompletableFuture(java.util.concurrent.CompletableFuture) JobStatus(org.apache.flink.api.common.JobStatus) AtomicReference(java.util.concurrent.atomic.AtomicReference) TaskDeploymentDescriptor(org.apache.flink.runtime.deployment.TaskDeploymentDescriptor) JobMasterGateway(org.apache.flink.runtime.jobmaster.JobMasterGateway) JobResult(org.apache.flink.runtime.jobmaster.JobResult) FutureUtils(org.apache.flink.util.concurrent.FutureUtils) MatcherAssert.assertThat(org.hamcrest.MatcherAssert.assertThat) Before(org.junit.Before) JobGraphBuilder(org.apache.flink.runtime.jobgraph.JobGraphBuilder) TestingLeaderElectionService(org.apache.flink.runtime.leaderelection.TestingLeaderElectionService) ExecutionState(org.apache.flink.runtime.execution.ExecutionState) JobMasterId(org.apache.flink.runtime.jobmaster.JobMasterId) Assert.assertTrue(org.junit.Assert.assertTrue) Test(org.junit.Test) RpcUtils(org.apache.flink.runtime.rpc.RpcUtils) ExecutionException(java.util.concurrent.ExecutionException) JobResultEntry(org.apache.flink.runtime.highavailability.JobResultEntry) JobID(org.apache.flink.api.common.JobID) UnregisteredMetricGroups(org.apache.flink.runtime.metrics.groups.UnregisteredMetricGroups) TestingRetryStrategies(org.apache.flink.runtime.dispatcher.cleanup.TestingRetryStrategies) ForkJoinPool(java.util.concurrent.ForkJoinPool) EmbeddedCompletedCheckpointStore(org.apache.flink.runtime.checkpoint.EmbeddedCompletedCheckpointStore) JobManagerRunner(org.apache.flink.runtime.jobmaster.JobManagerRunner) CommonTestUtils(org.apache.flink.runtime.testutils.CommonTestUtils) Assert(org.junit.Assert) Collections(java.util.Collections) NoOpInvokable(org.apache.flink.runtime.testtasks.NoOpInvokable) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) JobGraphStore(org.apache.flink.runtime.jobmanager.JobGraphStore) TestingJobGraphStore(org.apache.flink.runtime.testutils.TestingJobGraphStore)

Example 97 with OneShotLatch

use of org.apache.flink.core.testutils.OneShotLatch in project flink by apache.

the class DispatcherCleanupITCase method testCleanupThroughRetries.

@Test
public void testCleanupThroughRetries() throws Exception {
    final JobGraph jobGraph = createJobGraph();
    final JobID jobId = jobGraph.getJobID();
    // JobGraphStore
    final AtomicInteger actualGlobalCleanupCallCount = new AtomicInteger();
    final OneShotLatch successfulCleanupLatch = new OneShotLatch();
    final int numberOfErrors = 5;
    final RuntimeException temporaryError = new RuntimeException("Expected RuntimeException: Unable to remove job graph.");
    final JobGraphStore jobGraphStore = createAndStartJobGraphStoreWithCleanupFailures(numberOfErrors, temporaryError, actualGlobalCleanupCallCount, successfulCleanupLatch);
    haServices.setJobGraphStore(jobGraphStore);
    // Construct leader election service.
    final TestingLeaderElectionService leaderElectionService = new TestingLeaderElectionService();
    haServices.setJobMasterLeaderElectionService(jobId, leaderElectionService);
    // start the dispatcher with enough retries on cleanup
    final JobManagerRunnerRegistry jobManagerRunnerRegistry = new DefaultJobManagerRunnerRegistry(2);
    final Dispatcher dispatcher = createTestingDispatcherBuilder().setResourceCleanerFactory(new DispatcherResourceCleanerFactory(ForkJoinPool.commonPool(), TestingRetryStrategies.createWithNumberOfRetries(numberOfErrors), jobManagerRunnerRegistry, haServices.getJobGraphStore(), blobServer, haServices, UnregisteredMetricGroups.createUnregisteredJobManagerMetricGroup())).build();
    dispatcher.start();
    toTerminate.add(dispatcher);
    leaderElectionService.isLeader(UUID.randomUUID());
    final DispatcherGateway dispatcherGateway = dispatcher.getSelfGateway(DispatcherGateway.class);
    dispatcherGateway.submitJob(jobGraph, TIMEOUT).get();
    waitForJobToFinish(leaderElectionService, dispatcherGateway, jobId);
    successfulCleanupLatch.await();
    assertThat(actualGlobalCleanupCallCount.get(), equalTo(numberOfErrors + 1));
    assertThat("The JobGraph should be removed from JobGraphStore.", haServices.getJobGraphStore().getJobIds(), IsEmptyCollection.empty());
    CommonTestUtils.waitUntilCondition(() -> haServices.getJobResultStore().hasJobResultEntry(jobId), Deadline.fromNow(Duration.ofMinutes(5)), "The JobResultStore should have this job marked as clean.");
}
Also used : TestingLeaderElectionService(org.apache.flink.runtime.leaderelection.TestingLeaderElectionService) JobGraphStore(org.apache.flink.runtime.jobmanager.JobGraphStore) TestingJobGraphStore(org.apache.flink.runtime.testutils.TestingJobGraphStore) RpcEndpoint(org.apache.flink.runtime.rpc.RpcEndpoint) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) OneShotLatch(org.apache.flink.core.testutils.OneShotLatch) DispatcherResourceCleanerFactory(org.apache.flink.runtime.dispatcher.cleanup.DispatcherResourceCleanerFactory) JobID(org.apache.flink.api.common.JobID) Test(org.junit.Test)

Example 98 with OneShotLatch

use of org.apache.flink.core.testutils.OneShotLatch in project flink by apache.

the class DispatcherCleanupITCase method testCleanupAfterLeadershipChange.

@Test
public void testCleanupAfterLeadershipChange() throws Exception {
    final JobGraph jobGraph = createJobGraph();
    final JobID jobId = jobGraph.getJobID();
    // Construct job graph store.
    final AtomicInteger actualGlobalCleanupCallCount = new AtomicInteger();
    final OneShotLatch successfulCleanupLatch = new OneShotLatch();
    final RuntimeException temporaryError = new RuntimeException("Unable to remove job graph.");
    final JobGraphStore jobGraphStore = createAndStartJobGraphStoreWithCleanupFailures(1, temporaryError, actualGlobalCleanupCallCount, successfulCleanupLatch);
    haServices.setJobGraphStore(jobGraphStore);
    // Construct leader election service.
    final TestingLeaderElectionService leaderElectionService = new TestingLeaderElectionService();
    haServices.setJobMasterLeaderElectionService(jobId, leaderElectionService);
    // start the dispatcher with no retries on cleanup
    final CountDownLatch jobGraphRemovalErrorReceived = new CountDownLatch(1);
    final Dispatcher dispatcher = createTestingDispatcherBuilder().setFatalErrorHandler(throwable -> {
        final Optional<Throwable> maybeError = ExceptionUtils.findThrowable(throwable, temporaryError::equals);
        if (maybeError.isPresent()) {
            jobGraphRemovalErrorReceived.countDown();
        } else {
            testingFatalErrorHandlerResource.getFatalErrorHandler().onFatalError(throwable);
        }
    }).build();
    dispatcher.start();
    toTerminate.add(dispatcher);
    leaderElectionService.isLeader(UUID.randomUUID());
    final DispatcherGateway dispatcherGateway = dispatcher.getSelfGateway(DispatcherGateway.class);
    dispatcherGateway.submitJob(jobGraph, TIMEOUT).get();
    waitForJobToFinish(leaderElectionService, dispatcherGateway, jobId);
    jobGraphRemovalErrorReceived.await();
    // Remove job master leadership.
    leaderElectionService.notLeader();
    // This will clear internal state of election service, so a new contender can register.
    leaderElectionService.stop();
    assertThat(successfulCleanupLatch.isTriggered(), CoreMatchers.is(false));
    assertThat("The JobGraph is still stored in the JobGraphStore.", haServices.getJobGraphStore().getJobIds(), CoreMatchers.is(Collections.singleton(jobId)));
    assertThat("The JobResultStore has this job marked as dirty.", haServices.getJobResultStore().getDirtyResults().stream().map(JobResult::getJobId).collect(Collectors.toSet()), CoreMatchers.is(Collections.singleton(jobId)));
    // Run a second dispatcher, that restores our finished job.
    final Dispatcher secondDispatcher = createTestingDispatcherBuilder().setRecoveredDirtyJobs(haServices.getJobResultStore().getDirtyResults()).build();
    secondDispatcher.start();
    toTerminate.add(secondDispatcher);
    leaderElectionService.isLeader(UUID.randomUUID());
    CommonTestUtils.waitUntilCondition(() -> haServices.getJobResultStore().getDirtyResults().isEmpty(), Deadline.fromNow(TimeUtils.toDuration(TIMEOUT)));
    assertThat("The JobGraph is not stored in the JobGraphStore.", haServices.getJobGraphStore().getJobIds(), IsEmptyCollection.empty());
    assertTrue("The JobResultStore has the job listed as clean.", haServices.getJobResultStore().hasJobResultEntry(jobId));
    // wait for the successful cleanup to be triggered
    successfulCleanupLatch.await();
    assertThat(actualGlobalCleanupCallCount.get(), equalTo(2));
}
Also used : CoreMatchers(org.hamcrest.CoreMatchers) Deadline(org.apache.flink.api.common.time.Deadline) RpcEndpoint(org.apache.flink.runtime.rpc.RpcEndpoint) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) IsEqual.equalTo(org.hamcrest.core.IsEqual.equalTo) CheckpointCoordinatorConfiguration(org.apache.flink.runtime.jobgraph.tasks.CheckpointCoordinatorConfiguration) ExceptionUtils(org.apache.flink.util.ExceptionUtils) IsEmptyCollection(org.hamcrest.collection.IsEmptyCollection) PerJobCheckpointRecoveryFactory(org.apache.flink.runtime.checkpoint.PerJobCheckpointRecoveryFactory) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) After(org.junit.After) Duration(java.time.Duration) JobCheckpointingSettings(org.apache.flink.runtime.jobgraph.tasks.JobCheckpointingSettings) DispatcherResourceCleanerFactory(org.apache.flink.runtime.dispatcher.cleanup.DispatcherResourceCleanerFactory) BlockingQueue(java.util.concurrent.BlockingQueue) UUID(java.util.UUID) LinkedBlockingQueue(java.util.concurrent.LinkedBlockingQueue) Collectors(java.util.stream.Collectors) CountDownLatch(java.util.concurrent.CountDownLatch) List(java.util.List) TimeUtils(org.apache.flink.util.TimeUtils) TestingJobResultStore(org.apache.flink.runtime.testutils.TestingJobResultStore) Optional(java.util.Optional) JobResultStore(org.apache.flink.runtime.highavailability.JobResultStore) JobGraphStore(org.apache.flink.runtime.jobmanager.JobGraphStore) OneShotLatch(org.apache.flink.core.testutils.OneShotLatch) JobVertex(org.apache.flink.runtime.jobgraph.JobVertex) FlinkMatchers(org.apache.flink.core.testutils.FlinkMatchers) TestingJobGraphStore(org.apache.flink.runtime.testutils.TestingJobGraphStore) EmbeddedJobResultStore(org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedJobResultStore) CompletableFuture(java.util.concurrent.CompletableFuture) JobStatus(org.apache.flink.api.common.JobStatus) AtomicReference(java.util.concurrent.atomic.AtomicReference) TaskDeploymentDescriptor(org.apache.flink.runtime.deployment.TaskDeploymentDescriptor) JobMasterGateway(org.apache.flink.runtime.jobmaster.JobMasterGateway) JobResult(org.apache.flink.runtime.jobmaster.JobResult) FutureUtils(org.apache.flink.util.concurrent.FutureUtils) MatcherAssert.assertThat(org.hamcrest.MatcherAssert.assertThat) Before(org.junit.Before) JobGraphBuilder(org.apache.flink.runtime.jobgraph.JobGraphBuilder) TestingLeaderElectionService(org.apache.flink.runtime.leaderelection.TestingLeaderElectionService) ExecutionState(org.apache.flink.runtime.execution.ExecutionState) JobMasterId(org.apache.flink.runtime.jobmaster.JobMasterId) Assert.assertTrue(org.junit.Assert.assertTrue) Test(org.junit.Test) RpcUtils(org.apache.flink.runtime.rpc.RpcUtils) ExecutionException(java.util.concurrent.ExecutionException) JobResultEntry(org.apache.flink.runtime.highavailability.JobResultEntry) JobID(org.apache.flink.api.common.JobID) UnregisteredMetricGroups(org.apache.flink.runtime.metrics.groups.UnregisteredMetricGroups) TestingRetryStrategies(org.apache.flink.runtime.dispatcher.cleanup.TestingRetryStrategies) ForkJoinPool(java.util.concurrent.ForkJoinPool) EmbeddedCompletedCheckpointStore(org.apache.flink.runtime.checkpoint.EmbeddedCompletedCheckpointStore) JobManagerRunner(org.apache.flink.runtime.jobmaster.JobManagerRunner) CommonTestUtils(org.apache.flink.runtime.testutils.CommonTestUtils) Assert(org.junit.Assert) Collections(java.util.Collections) NoOpInvokable(org.apache.flink.runtime.testtasks.NoOpInvokable) TestingLeaderElectionService(org.apache.flink.runtime.leaderelection.TestingLeaderElectionService) Optional(java.util.Optional) JobResult(org.apache.flink.runtime.jobmaster.JobResult) JobGraphStore(org.apache.flink.runtime.jobmanager.JobGraphStore) TestingJobGraphStore(org.apache.flink.runtime.testutils.TestingJobGraphStore) CountDownLatch(java.util.concurrent.CountDownLatch) JobGraph(org.apache.flink.runtime.jobgraph.JobGraph) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) OneShotLatch(org.apache.flink.core.testutils.OneShotLatch) JobID(org.apache.flink.api.common.JobID) Test(org.junit.Test)

Example 99 with OneShotLatch

use of org.apache.flink.core.testutils.OneShotLatch in project flink by apache.

the class RemoteInputChannelTest method testOnFailedPartitionRequestDoesNotBlockNetworkThreads.

/**
 * Test to guard against FLINK-13249.
 */
@Test
public void testOnFailedPartitionRequestDoesNotBlockNetworkThreads() throws Exception {
    final long testBlockedWaitTimeoutMillis = 30_000L;
    final PartitionProducerStateChecker partitionProducerStateChecker = (jobId, intermediateDataSetId, resultPartitionId) -> CompletableFuture.completedFuture(ExecutionState.RUNNING);
    final NettyShuffleEnvironment shuffleEnvironment = new NettyShuffleEnvironmentBuilder().build();
    final Task task = new TestTaskBuilder(shuffleEnvironment).setPartitionProducerStateChecker(partitionProducerStateChecker).build();
    final SingleInputGate inputGate = new SingleInputGateBuilder().setPartitionProducerStateProvider(task).build();
    TestTaskBuilder.setTaskState(task, ExecutionState.RUNNING);
    final OneShotLatch ready = new OneShotLatch();
    final OneShotLatch blocker = new OneShotLatch();
    final AtomicBoolean timedOutOrInterrupted = new AtomicBoolean(false);
    final ConnectionManager blockingConnectionManager = new TestingConnectionManager() {

        @Override
        public PartitionRequestClient createPartitionRequestClient(ConnectionID connectionId) {
            ready.trigger();
            try {
                // We block here, in a section that holds the
                // SingleInputGate#requestLock
                blocker.await(testBlockedWaitTimeoutMillis, TimeUnit.MILLISECONDS);
            } catch (InterruptedException | TimeoutException e) {
                timedOutOrInterrupted.set(true);
            }
            return new TestingPartitionRequestClient();
        }
    };
    final RemoteInputChannel remoteInputChannel = InputChannelBuilder.newBuilder().setConnectionManager(blockingConnectionManager).buildRemoteChannel(inputGate);
    inputGate.setInputChannels(remoteInputChannel);
    final Thread simulatedNetworkThread = new Thread(() -> {
        try {
            ready.await();
            // We want to make sure that our simulated network thread does not
            // block on
            // SingleInputGate#requestLock as well through this call.
            remoteInputChannel.onFailedPartitionRequest();
            // Will only give free the blocker if we did not block ourselves.
            blocker.trigger();
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    });
    simulatedNetworkThread.start();
    // The entry point to that will lead us into
    // blockingConnectionManager#createPartitionRequestClient(...).
    inputGate.requestPartitions();
    simulatedNetworkThread.join();
    Assert.assertFalse("Test ended by timeout or interruption - this indicates that the network thread was blocked.", timedOutOrInterrupted.get());
}
Also used : TestTaskBuilder(org.apache.flink.runtime.taskmanager.TestTaskBuilder) Arrays(java.util.Arrays) Matchers.isA(org.hamcrest.Matchers.isA) AvailabilityUtil.assertAvailability(org.apache.flink.runtime.io.network.partition.AvailabilityUtil.assertAvailability) ProducerFailedException(org.apache.flink.runtime.io.network.partition.ProducerFailedException) TimeoutException(java.util.concurrent.TimeoutException) ExceptionUtils(org.apache.flink.util.ExceptionUtils) Random(java.util.Random) PartitionRequestClient(org.apache.flink.runtime.io.network.PartitionRequestClient) NetworkBuffer(org.apache.flink.runtime.io.network.buffer.NetworkBuffer) AvailabilityUtil.assertPriorityAvailability(org.apache.flink.runtime.io.network.partition.AvailabilityUtil.assertPriorityAvailability) Lists(org.apache.flink.shaded.guava30.com.google.common.collect.Lists) Future(java.util.concurrent.Future) CheckpointException(org.apache.flink.runtime.checkpoint.CheckpointException) ResultPartitionID(org.apache.flink.runtime.io.network.partition.ResultPartitionID) CheckpointStorageLocationReference.getDefault(org.apache.flink.runtime.state.CheckpointStorageLocationReference.getDefault) Assert.fail(org.junit.Assert.fail) Preconditions.checkNotNull(org.apache.flink.util.Preconditions.checkNotNull) CheckpointType(org.apache.flink.runtime.checkpoint.CheckpointType) CancelTaskException(org.apache.flink.runtime.execution.CancelTaskException) EventSerializer(org.apache.flink.runtime.io.network.api.serialization.EventSerializer) TestBufferFactory.createBuffer(org.apache.flink.runtime.io.network.util.TestBufferFactory.createBuffer) ExpectedTestException(org.apache.flink.runtime.operators.testutils.ExpectedTestException) InputChannelTestUtils(org.apache.flink.runtime.io.network.partition.InputChannelTestUtils) DataType(org.apache.flink.runtime.io.network.buffer.Buffer.DataType) ConnectionID(org.apache.flink.runtime.io.network.ConnectionID) FreeingBufferRecycler(org.apache.flink.runtime.io.network.buffer.FreeingBufferRecycler) CheckpointOptions(org.apache.flink.runtime.checkpoint.CheckpointOptions) NettyShuffleEnvironment(org.apache.flink.runtime.io.network.NettyShuffleEnvironment) IntermediateDataSetID(org.apache.flink.runtime.jobgraph.IntermediateDataSetID) PartitionNotFoundException(org.apache.flink.runtime.io.network.partition.PartitionNotFoundException) Collectors(java.util.stream.Collectors) Buffer(org.apache.flink.runtime.io.network.buffer.Buffer) Executors(java.util.concurrent.Executors) Matchers.any(org.mockito.Matchers.any) CloseableIterator(org.apache.flink.util.CloseableIterator) List(java.util.List) CheckpointBarrier(org.apache.flink.runtime.io.network.api.CheckpointBarrier) Matchers.contains(org.hamcrest.Matchers.contains) Assert.assertFalse(org.junit.Assert.assertFalse) Optional(java.util.Optional) TestingPartitionRequestClient(org.apache.flink.runtime.io.network.TestingPartitionRequestClient) Matchers.is(org.hamcrest.Matchers.is) Queue(java.util.Queue) Mockito.mock(org.mockito.Mockito.mock) OneShotLatch(org.apache.flink.core.testutils.OneShotLatch) TestingConnectionManager(org.apache.flink.runtime.io.network.TestingConnectionManager) TestBufferFactory(org.apache.flink.runtime.io.network.util.TestBufferFactory) CHECKPOINT(org.apache.flink.runtime.checkpoint.CheckpointType.CHECKPOINT) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) InputChannelTestUtils.createSingleInputGate(org.apache.flink.runtime.io.network.partition.InputChannelTestUtils.createSingleInputGate) Callable(java.util.concurrent.Callable) CompletableFuture(java.util.concurrent.CompletableFuture) Mockito.spy(org.mockito.Mockito.spy) NetworkBufferPool(org.apache.flink.runtime.io.network.buffer.NetworkBufferPool) ArrayList(java.util.ArrayList) Matchers.hasProperty(org.hamcrest.Matchers.hasProperty) BufferBuilder(org.apache.flink.runtime.io.network.buffer.BufferBuilder) EventSerializer.toBuffer(org.apache.flink.runtime.io.network.api.serialization.EventSerializer.toBuffer) ChannelStateWriter(org.apache.flink.runtime.checkpoint.channel.ChannelStateWriter) MatcherAssert.assertThat(org.hamcrest.MatcherAssert.assertThat) CheckpointOptions.alignedWithTimeout(org.apache.flink.runtime.checkpoint.CheckpointOptions.alignedWithTimeout) InputChannelInfo(org.apache.flink.runtime.checkpoint.channel.InputChannelInfo) Nullable(javax.annotation.Nullable) ExecutorService(java.util.concurrent.ExecutorService) MemorySegment(org.apache.flink.core.memory.MemorySegment) MemorySegmentFactory(org.apache.flink.core.memory.MemorySegmentFactory) NettyShuffleEnvironmentBuilder(org.apache.flink.runtime.io.network.NettyShuffleEnvironmentBuilder) ConnectionManager(org.apache.flink.runtime.io.network.ConnectionManager) Assert.assertNotNull(org.junit.Assert.assertNotNull) BufferPool(org.apache.flink.runtime.io.network.buffer.BufferPool) ExecutionState(org.apache.flink.runtime.execution.ExecutionState) Assert.assertTrue(org.junit.Assert.assertTrue) Test(org.junit.Test) IOException(java.io.IOException) Mockito.times(org.mockito.Mockito.times) Mockito.when(org.mockito.Mockito.when) PartitionProducerStateChecker(org.apache.flink.runtime.taskexecutor.PartitionProducerStateChecker) Mockito.verify(org.mockito.Mockito.verify) TimeUnit(java.util.concurrent.TimeUnit) Consumer(java.util.function.Consumer) BufferBuilderTestUtils.buildSingleBuffer(org.apache.flink.runtime.io.network.buffer.BufferBuilderTestUtils.buildSingleBuffer) BufferAndAvailability(org.apache.flink.runtime.io.network.partition.consumer.InputChannel.BufferAndAvailability) Task(org.apache.flink.runtime.taskmanager.Task) Assert.assertNull(org.junit.Assert.assertNull) NoOpBufferPool(org.apache.flink.runtime.io.network.buffer.NoOpBufferPool) Assert(org.junit.Assert) ArrayDeque(java.util.ArrayDeque) PartitionProducerStateProvider(org.apache.flink.runtime.io.network.partition.PartitionProducerStateProvider) Assert.assertEquals(org.junit.Assert.assertEquals) Task(org.apache.flink.runtime.taskmanager.Task) TestingConnectionManager(org.apache.flink.runtime.io.network.TestingConnectionManager) NettyShuffleEnvironmentBuilder(org.apache.flink.runtime.io.network.NettyShuffleEnvironmentBuilder) NettyShuffleEnvironment(org.apache.flink.runtime.io.network.NettyShuffleEnvironment) InputChannelTestUtils.createSingleInputGate(org.apache.flink.runtime.io.network.partition.InputChannelTestUtils.createSingleInputGate) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) ConnectionID(org.apache.flink.runtime.io.network.ConnectionID) TestingConnectionManager(org.apache.flink.runtime.io.network.TestingConnectionManager) ConnectionManager(org.apache.flink.runtime.io.network.ConnectionManager) TestTaskBuilder(org.apache.flink.runtime.taskmanager.TestTaskBuilder) OneShotLatch(org.apache.flink.core.testutils.OneShotLatch) PartitionProducerStateChecker(org.apache.flink.runtime.taskexecutor.PartitionProducerStateChecker) TestingPartitionRequestClient(org.apache.flink.runtime.io.network.TestingPartitionRequestClient) TimeoutException(java.util.concurrent.TimeoutException) Test(org.junit.Test)

Example 100 with OneShotLatch

use of org.apache.flink.core.testutils.OneShotLatch in project flink by apache.

the class TaskExecutorTest method testSlotAcceptance.

/**
 * Tests that accepted slots go into state assigned and the others are returned to the resource
 * manager.
 */
@Test
public void testSlotAcceptance() throws Exception {
    final InstanceID registrationId = new InstanceID();
    final OneShotLatch taskExecutorIsRegistered = new OneShotLatch();
    final CompletableFuture<Tuple3<InstanceID, SlotID, AllocationID>> availableSlotFuture = new CompletableFuture<>();
    final TestingResourceManagerGateway resourceManagerGateway = createRmWithTmRegisterAndNotifySlotHooks(registrationId, taskExecutorIsRegistered, availableSlotFuture);
    final AllocationID allocationId1 = new AllocationID();
    final AllocationID allocationId2 = new AllocationID();
    final SlotOffer offer1 = new SlotOffer(allocationId1, 0, ResourceProfile.ANY);
    final OneShotLatch offerSlotsLatch = new OneShotLatch();
    final OneShotLatch taskInTerminalState = new OneShotLatch();
    final CompletableFuture<Collection<SlotOffer>> offerResultFuture = new CompletableFuture<>();
    final TestingJobMasterGateway jobMasterGateway = createJobMasterWithSlotOfferAndTaskTerminationHooks(offerSlotsLatch, taskInTerminalState, offerResultFuture);
    rpc.registerGateway(resourceManagerGateway.getAddress(), resourceManagerGateway);
    rpc.registerGateway(jobMasterGateway.getAddress(), jobMasterGateway);
    final TaskSlotTable<Task> taskSlotTable = TaskSlotUtils.createTaskSlotTable(2);
    final TaskManagerServices taskManagerServices = createTaskManagerServicesWithTaskSlotTable(taskSlotTable);
    final TestingTaskExecutor taskManager = createTestingTaskExecutor(taskManagerServices);
    try {
        taskManager.start();
        taskManager.waitUntilStarted();
        final TaskExecutorGateway tmGateway = taskManager.getSelfGateway(TaskExecutorGateway.class);
        // wait until registered at the RM
        taskExecutorIsRegistered.await();
        // request 2 slots for the given allocation ids
        AllocationID[] allocationIds = new AllocationID[] { allocationId1, allocationId2 };
        for (int i = 0; i < allocationIds.length; i++) {
            requestSlot(tmGateway, jobId, allocationIds[i], buildSlotID(i), ResourceProfile.UNKNOWN, jobMasterGateway.getAddress(), resourceManagerGateway.getFencingToken());
        }
        // notify job leader to start slot offering
        jobManagerLeaderRetriever.notifyListener(jobMasterGateway.getAddress(), jobMasterGateway.getFencingToken().toUUID());
        // wait until slots have been offered
        offerSlotsLatch.await();
        offerResultFuture.complete(Collections.singletonList(offer1));
        final Tuple3<InstanceID, SlotID, AllocationID> instanceIDSlotIDAllocationIDTuple3 = availableSlotFuture.get();
        final Tuple3<InstanceID, SlotID, AllocationID> expectedResult = Tuple3.of(registrationId, buildSlotID(1), allocationId2);
        assertThat(instanceIDSlotIDAllocationIDTuple3, equalTo(expectedResult));
        // the slot 1 can be activate for task submission
        submit(allocationId1, jobMasterGateway, tmGateway, NoOpInvokable.class);
        // wait for the task completion
        taskInTerminalState.await();
        // the slot 2 can NOT be activate for task submission
        try {
            submit(allocationId2, jobMasterGateway, tmGateway, NoOpInvokable.class);
            fail("It should not be possible to submit task to acquired by JM slot with index 1 (allocationId2)");
        } catch (CompletionException e) {
            assertThat(e.getCause(), instanceOf(TaskSubmissionException.class));
        }
        // the slot 2 is free to request
        requestSlot(tmGateway, jobId, allocationId2, buildSlotID(1), ResourceProfile.UNKNOWN, jobMasterGateway.getAddress(), resourceManagerGateway.getFencingToken());
    } finally {
        RpcUtils.terminateRpcEndpoint(taskManager, timeout);
    }
}
Also used : Task(org.apache.flink.runtime.taskmanager.Task) SlotOffer(org.apache.flink.runtime.taskexecutor.slot.SlotOffer) InstanceID(org.apache.flink.runtime.instance.InstanceID) AllocationID(org.apache.flink.runtime.clusterframework.types.AllocationID) SlotID(org.apache.flink.runtime.clusterframework.types.SlotID) CompletableFuture(java.util.concurrent.CompletableFuture) TestingJobMasterGateway(org.apache.flink.runtime.jobmaster.utils.TestingJobMasterGateway) Tuple3(org.apache.flink.api.java.tuple.Tuple3) CompletionException(java.util.concurrent.CompletionException) TestingResourceManagerGateway(org.apache.flink.runtime.resourcemanager.utils.TestingResourceManagerGateway) OneShotLatch(org.apache.flink.core.testutils.OneShotLatch) Collection(java.util.Collection) Test(org.junit.Test)

Aggregations

OneShotLatch (org.apache.flink.core.testutils.OneShotLatch)138 Test (org.junit.Test)118 JobID (org.apache.flink.api.common.JobID)41 CompletableFuture (java.util.concurrent.CompletableFuture)38 ExecutionException (java.util.concurrent.ExecutionException)27 Configuration (org.apache.flink.configuration.Configuration)26 IOException (java.io.IOException)24 Before (org.junit.Before)24 FlinkException (org.apache.flink.util.FlinkException)23 TestLogger (org.apache.flink.util.TestLogger)21 File (java.io.File)20 UUID (java.util.UUID)18 TimeoutException (java.util.concurrent.TimeoutException)18 TestingResourceManagerGateway (org.apache.flink.runtime.resourcemanager.utils.TestingResourceManagerGateway)18 Time (org.apache.flink.api.common.time.Time)17 TestingJobMasterGateway (org.apache.flink.runtime.jobmaster.utils.TestingJobMasterGateway)17 Rule (org.junit.Rule)17 Collections (java.util.Collections)16 ArrayBlockingQueue (java.util.concurrent.ArrayBlockingQueue)16 RpcUtils (org.apache.flink.runtime.rpc.RpcUtils)16