Search in sources :

Example 1 with AccumulatorSnapshot

use of org.apache.flink.runtime.accumulators.AccumulatorSnapshot in project flink by apache.

the class TaskExecutor method unregisterTaskAndNotifyFinalState.

private void unregisterTaskAndNotifyFinalState(final UUID jobMasterLeaderId, final JobMasterGateway jobMasterGateway, final ExecutionAttemptID executionAttemptID) {
    Task task = taskSlotTable.removeTask(executionAttemptID);
    if (task != null) {
        if (!task.getExecutionState().isTerminal()) {
            try {
                task.failExternally(new IllegalStateException("Task is being remove from TaskManager."));
            } catch (Exception e) {
                log.error("Could not properly fail task.", e);
            }
        }
        log.info("Un-registering task and sending final execution state {} to JobManager for task {} {}.", task.getExecutionState(), task.getTaskInfo().getTaskName(), task.getExecutionId());
        AccumulatorSnapshot accumulatorSnapshot = task.getAccumulatorRegistry().getSnapshot();
        updateTaskExecutionState(jobMasterLeaderId, jobMasterGateway, new TaskExecutionState(task.getJobID(), task.getExecutionId(), task.getExecutionState(), task.getFailureCause(), accumulatorSnapshot, task.getMetricGroup().getIOMetricGroup().createSnapshot()));
    } else {
        log.error("Cannot find task with ID {} to unregister.", executionAttemptID);
    }
}
Also used : Task(org.apache.flink.runtime.taskmanager.Task) AccumulatorSnapshot(org.apache.flink.runtime.accumulators.AccumulatorSnapshot) TimeoutException(java.util.concurrent.TimeoutException) PartitionException(org.apache.flink.runtime.taskexecutor.exceptions.PartitionException) CheckpointException(org.apache.flink.runtime.taskexecutor.exceptions.CheckpointException) SlotAllocationException(org.apache.flink.runtime.taskexecutor.exceptions.SlotAllocationException) TaskSubmissionException(org.apache.flink.runtime.taskexecutor.exceptions.TaskSubmissionException) TaskException(org.apache.flink.runtime.taskexecutor.exceptions.TaskException) SlotNotActiveException(org.apache.flink.runtime.taskexecutor.slot.SlotNotActiveException) SlotNotFoundException(org.apache.flink.runtime.taskexecutor.slot.SlotNotFoundException) IOException(java.io.IOException) TaskExecutionState(org.apache.flink.runtime.taskmanager.TaskExecutionState)

Example 2 with AccumulatorSnapshot

use of org.apache.flink.runtime.accumulators.AccumulatorSnapshot in project flink by apache.

the class ExecutionGraphDeploymentTest method testAccumulatorsAndMetricsForwarding.

/**
	 * Verifies that {@link ExecutionGraph#updateState(TaskExecutionState)} updates the accumulators and metrics for an
	 * execution that failed or was canceled.
	 */
@Test
public void testAccumulatorsAndMetricsForwarding() throws Exception {
    final JobVertexID jid1 = new JobVertexID();
    final JobVertexID jid2 = new JobVertexID();
    JobVertex v1 = new JobVertex("v1", jid1);
    JobVertex v2 = new JobVertex("v2", jid2);
    Tuple2<ExecutionGraph, Map<ExecutionAttemptID, Execution>> graphAndExecutions = setupExecution(v1, 1, v2, 1);
    ExecutionGraph graph = graphAndExecutions.f0;
    // verify behavior for canceled executions
    Execution execution1 = graphAndExecutions.f1.values().iterator().next();
    IOMetrics ioMetrics = new IOMetrics(0, 0, 0, 0, 0, 0.0, 0.0, 0.0, 0.0, 0.0);
    Map<String, Accumulator<?, ?>> accumulators = new HashMap<>();
    accumulators.put("acc", new IntCounter(4));
    AccumulatorSnapshot accumulatorSnapshot = new AccumulatorSnapshot(graph.getJobID(), execution1.getAttemptId(), accumulators);
    TaskExecutionState state = new TaskExecutionState(graph.getJobID(), execution1.getAttemptId(), ExecutionState.CANCELED, null, accumulatorSnapshot, ioMetrics);
    graph.updateState(state);
    assertEquals(ioMetrics, execution1.getIOMetrics());
    assertNotNull(execution1.getUserAccumulators());
    assertEquals(4, execution1.getUserAccumulators().get("acc").getLocalValue());
    // verify behavior for failed executions
    Execution execution2 = graphAndExecutions.f1.values().iterator().next();
    IOMetrics ioMetrics2 = new IOMetrics(0, 0, 0, 0, 0, 0.0, 0.0, 0.0, 0.0, 0.0);
    Map<String, Accumulator<?, ?>> accumulators2 = new HashMap<>();
    accumulators2.put("acc", new IntCounter(8));
    AccumulatorSnapshot accumulatorSnapshot2 = new AccumulatorSnapshot(graph.getJobID(), execution2.getAttemptId(), accumulators2);
    TaskExecutionState state2 = new TaskExecutionState(graph.getJobID(), execution2.getAttemptId(), ExecutionState.FAILED, null, accumulatorSnapshot2, ioMetrics2);
    graph.updateState(state2);
    assertEquals(ioMetrics2, execution2.getIOMetrics());
    assertNotNull(execution2.getUserAccumulators());
    assertEquals(8, execution2.getUserAccumulators().get("acc").getLocalValue());
}
Also used : Accumulator(org.apache.flink.api.common.accumulators.Accumulator) HashMap(java.util.HashMap) JobVertexID(org.apache.flink.runtime.jobgraph.JobVertexID) TaskExecutionState(org.apache.flink.runtime.taskmanager.TaskExecutionState) JobVertex(org.apache.flink.runtime.jobgraph.JobVertex) AccumulatorSnapshot(org.apache.flink.runtime.accumulators.AccumulatorSnapshot) IntCounter(org.apache.flink.api.common.accumulators.IntCounter) Map(java.util.Map) HashMap(java.util.HashMap) Test(org.junit.Test)

Example 3 with AccumulatorSnapshot

use of org.apache.flink.runtime.accumulators.AccumulatorSnapshot in project flink by apache.

the class ExecutionGraph method deserializeAccumulators.

private Map<String, Accumulator<?, ?>> deserializeAccumulators(TaskExecutionState state) {
    AccumulatorSnapshot serializedAccumulators = state.getAccumulators();
    Map<String, Accumulator<?, ?>> accumulators = null;
    if (serializedAccumulators != null) {
        try {
            accumulators = serializedAccumulators.deserializeUserAccumulators(userClassLoader);
        } catch (Exception e) {
            LOG.error("Failed to deserialize final accumulator results.", e);
        }
    }
    return accumulators;
}
Also used : Accumulator(org.apache.flink.api.common.accumulators.Accumulator) AccumulatorSnapshot(org.apache.flink.runtime.accumulators.AccumulatorSnapshot) SuppressRestartsException(org.apache.flink.runtime.execution.SuppressRestartsException) StoppingException(org.apache.flink.runtime.StoppingException) NoResourceAvailableException(org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException) JobException(org.apache.flink.runtime.JobException) NoSuchElementException(java.util.NoSuchElementException) IOException(java.io.IOException) ExecutionException(java.util.concurrent.ExecutionException)

Aggregations

AccumulatorSnapshot (org.apache.flink.runtime.accumulators.AccumulatorSnapshot)3 IOException (java.io.IOException)2 Accumulator (org.apache.flink.api.common.accumulators.Accumulator)2 TaskExecutionState (org.apache.flink.runtime.taskmanager.TaskExecutionState)2 HashMap (java.util.HashMap)1 Map (java.util.Map)1 NoSuchElementException (java.util.NoSuchElementException)1 ExecutionException (java.util.concurrent.ExecutionException)1 TimeoutException (java.util.concurrent.TimeoutException)1 IntCounter (org.apache.flink.api.common.accumulators.IntCounter)1 JobException (org.apache.flink.runtime.JobException)1 StoppingException (org.apache.flink.runtime.StoppingException)1 SuppressRestartsException (org.apache.flink.runtime.execution.SuppressRestartsException)1 JobVertex (org.apache.flink.runtime.jobgraph.JobVertex)1 JobVertexID (org.apache.flink.runtime.jobgraph.JobVertexID)1 NoResourceAvailableException (org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException)1 CheckpointException (org.apache.flink.runtime.taskexecutor.exceptions.CheckpointException)1 PartitionException (org.apache.flink.runtime.taskexecutor.exceptions.PartitionException)1 SlotAllocationException (org.apache.flink.runtime.taskexecutor.exceptions.SlotAllocationException)1 TaskException (org.apache.flink.runtime.taskexecutor.exceptions.TaskException)1