Search in sources :

Example 1 with FAILED

use of com.hazelcast.jet.core.JobStatus.FAILED in project hazelcast-jet by hazelcast.

the class MasterContext method invokeCompleteExecution.

private void invokeCompleteExecution(Throwable error) {
    JobStatus status = jobStatus();
    Throwable finalError;
    if (status == STARTING || status == RESTARTING || status == RUNNING) {
        logger.fine("Completing " + jobIdString());
        finalError = error;
    } else {
        if (error != null) {
            logger.severe("Cannot properly complete failed " + jobIdString() + ": status is " + status, error);
        } else {
            logger.severe("Cannot properly complete " + jobIdString() + ": status is " + status);
        }
        finalError = new IllegalStateException("Job coordination failed.");
    }
    Function<ExecutionPlan, Operation> operationCtor = plan -> new CompleteExecutionOperation(executionId, finalError);
    invoke(operationCtor, responses -> finalizeJob(error), null);
}
Also used : JobStatus(com.hazelcast.jet.core.JobStatus) NO_SNAPSHOT(com.hazelcast.jet.impl.execution.SnapshotContext.NO_SNAPSHOT) SnapshotRepository.snapshotDataMapName(com.hazelcast.jet.impl.SnapshotRepository.snapshotDataMapName) NonCompletableFuture(com.hazelcast.jet.impl.util.NonCompletableFuture) Address(com.hazelcast.nio.Address) Util.jobAndExecutionId(com.hazelcast.jet.impl.util.Util.jobAndExecutionId) Processors.mapP(com.hazelcast.jet.core.processor.Processors.mapP) SourceProcessors.readMapP(com.hazelcast.jet.core.processor.SourceProcessors.readMapP) CompletionToken(com.hazelcast.jet.impl.util.CompletionToken) Util.idToString(com.hazelcast.jet.impl.util.Util.idToString) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) MemberInfo(com.hazelcast.internal.cluster.MemberInfo) Map(java.util.Map) STARTING(com.hazelcast.jet.core.JobStatus.STARTING) DAG(com.hazelcast.jet.core.DAG) JobStatus(com.hazelcast.jet.core.JobStatus) ExceptionUtil(com.hazelcast.jet.impl.util.ExceptionUtil) ExecutionService(com.hazelcast.spi.ExecutionService) Operation(com.hazelcast.spi.Operation) CancellationException(java.util.concurrent.CancellationException) Collections.emptyList(java.util.Collections.emptyList) Collection(java.util.Collection) Util.getJetInstance(com.hazelcast.jet.impl.util.Util.getJetInstance) JobConfig(com.hazelcast.jet.config.JobConfig) ConcurrentHashMap(java.util.concurrent.ConcurrentHashMap) Set(java.util.Set) Collectors(java.util.stream.Collectors) BroadcastKey(com.hazelcast.jet.core.BroadcastKey) List(java.util.List) ExecutionCallback(com.hazelcast.core.ExecutionCallback) ExecutionPlan(com.hazelcast.jet.impl.execution.init.ExecutionPlan) Entry(java.util.Map.Entry) CancelExecutionOperation(com.hazelcast.jet.impl.operation.CancelExecutionOperation) TopologyChangedException(com.hazelcast.jet.core.TopologyChangedException) COMPLETED(com.hazelcast.jet.core.JobStatus.COMPLETED) InternalCompletableFuture(com.hazelcast.spi.InternalCompletableFuture) ExecutionPlanBuilder.createExecutionPlans(com.hazelcast.jet.impl.execution.init.ExecutionPlanBuilder.createExecutionPlans) Collectors.partitioningBy(java.util.stream.Collectors.partitioningBy) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) HashMap(java.util.HashMap) CompletableFuture(java.util.concurrent.CompletableFuture) StartExecutionOperation(com.hazelcast.jet.impl.operation.StartExecutionOperation) AtomicReference(java.util.concurrent.atomic.AtomicReference) Function(java.util.function.Function) HashSet(java.util.HashSet) InitExecutionOperation(com.hazelcast.jet.impl.operation.InitExecutionOperation) ILogger(com.hazelcast.logging.ILogger) ExceptionUtil.withTryCatch(com.hazelcast.jet.impl.util.ExceptionUtil.withTryCatch) NOT_STARTED(com.hazelcast.jet.core.JobStatus.NOT_STARTED) MembersView(com.hazelcast.internal.cluster.impl.MembersView) DistributedFunction(com.hazelcast.jet.function.DistributedFunction) Edge(com.hazelcast.jet.core.Edge) ClusterServiceImpl(com.hazelcast.internal.cluster.impl.ClusterServiceImpl) Nullable(javax.annotation.Nullable) NodeEngineImpl(com.hazelcast.spi.impl.NodeEngineImpl) RESTARTING(com.hazelcast.jet.core.JobStatus.RESTARTING) BroadcastEntry(com.hazelcast.jet.impl.execution.BroadcastEntry) ExceptionUtil.isTopologicalFailure(com.hazelcast.jet.impl.util.ExceptionUtil.isTopologicalFailure) DistributedFunctions.entryKey(com.hazelcast.jet.function.DistributedFunctions.entryKey) Consumer(java.util.function.Consumer) Vertex(com.hazelcast.jet.core.Vertex) Collectors.toList(java.util.stream.Collectors.toList) CustomClassLoadedObject.deserializeWithCustomClassLoader(com.hazelcast.jet.impl.execution.init.CustomClassLoadedObject.deserializeWithCustomClassLoader) CompleteExecutionOperation(com.hazelcast.jet.impl.operation.CompleteExecutionOperation) ExceptionUtil.peel(com.hazelcast.jet.impl.util.ExceptionUtil.peel) FAILED(com.hazelcast.jet.core.JobStatus.FAILED) RUNNING(com.hazelcast.jet.core.JobStatus.RUNNING) ProcessingGuarantee(com.hazelcast.jet.config.ProcessingGuarantee) JobRestartRequestedException(com.hazelcast.jet.impl.exception.JobRestartRequestedException) SnapshotOperation(com.hazelcast.jet.impl.operation.SnapshotOperation) Edge.between(com.hazelcast.jet.core.Edge.between) ExecutionPlan(com.hazelcast.jet.impl.execution.init.ExecutionPlan) Operation(com.hazelcast.spi.Operation) CancelExecutionOperation(com.hazelcast.jet.impl.operation.CancelExecutionOperation) StartExecutionOperation(com.hazelcast.jet.impl.operation.StartExecutionOperation) InitExecutionOperation(com.hazelcast.jet.impl.operation.InitExecutionOperation) CompleteExecutionOperation(com.hazelcast.jet.impl.operation.CompleteExecutionOperation) SnapshotOperation(com.hazelcast.jet.impl.operation.SnapshotOperation) CompleteExecutionOperation(com.hazelcast.jet.impl.operation.CompleteExecutionOperation)

Example 2 with FAILED

use of com.hazelcast.jet.core.JobStatus.FAILED in project hazelcast-jet by hazelcast.

the class MasterContext method tryStartJob.

/**
 * Starts execution of the job if it is not already completed, cancelled or failed.
 * If the job is already cancelled, the job completion procedure is triggered.
 * If the job quorum is not satisfied, job restart is rescheduled.
 * If there was a membership change and the partition table is not completely
 * fixed yet, job restart is rescheduled.
 */
void tryStartJob(Function<Long, Long> executionIdSupplier) {
    if (!setJobStatusToStarting()) {
        return;
    }
    if (scheduleRestartIfQuorumAbsent() || scheduleRestartIfClusterIsNotSafe()) {
        return;
    }
    DAG dag;
    try {
        dag = deserializeDAG();
    } catch (Exception e) {
        logger.warning("DAG deserialization failed", e);
        finalizeJob(e);
        return;
    }
    // save a copy of the vertex list, because it is going to change
    vertices = new HashSet<>();
    dag.iterator().forEachRemaining(vertices::add);
    executionId = executionIdSupplier.apply(jobId);
    // last started snapshot complete or not complete. The next started snapshot must be greater than this number
    long lastSnapshotId = NO_SNAPSHOT;
    if (isSnapshottingEnabled()) {
        Long snapshotIdToRestore = snapshotRepository.latestCompleteSnapshot(jobId);
        snapshotRepository.deleteAllSnapshotsExceptOne(jobId, snapshotIdToRestore);
        Long lastStartedSnapshot = snapshotRepository.latestStartedSnapshot(jobId);
        if (snapshotIdToRestore != null) {
            logger.info("State of " + jobIdString() + " will be restored from snapshot " + snapshotIdToRestore);
            rewriteDagWithSnapshotRestore(dag, snapshotIdToRestore);
        } else {
            logger.info("No previous snapshot for " + jobIdString() + " found.");
        }
        if (lastStartedSnapshot != null) {
            lastSnapshotId = lastStartedSnapshot;
        }
    }
    MembersView membersView = getMembersView();
    ClassLoader previousCL = swapContextClassLoader(coordinationService.getClassLoader(jobId));
    try {
        int defaultLocalParallelism = getJetInstance(nodeEngine).getConfig().getInstanceConfig().getCooperativeThreadCount();
        logger.info("Start executing " + jobIdString() + ", status " + jobStatus() + "\n" + dag.toString(defaultLocalParallelism));
        logger.fine("Building execution plan for " + jobIdString());
        executionPlanMap = createExecutionPlans(nodeEngine, membersView, dag, getJobConfig(), lastSnapshotId);
    } catch (Exception e) {
        logger.severe("Exception creating execution plan for " + jobIdString(), e);
        finalizeJob(e);
        return;
    } finally {
        Thread.currentThread().setContextClassLoader(previousCL);
    }
    logger.fine("Built execution plans for " + jobIdString());
    Set<MemberInfo> participants = executionPlanMap.keySet();
    Function<ExecutionPlan, Operation> operationCtor = plan -> new InitExecutionOperation(jobId, executionId, membersView.getVersion(), participants, nodeEngine.getSerializationService().toData(plan));
    invoke(operationCtor, this::onInitStepCompleted, null);
}
Also used : NO_SNAPSHOT(com.hazelcast.jet.impl.execution.SnapshotContext.NO_SNAPSHOT) SnapshotRepository.snapshotDataMapName(com.hazelcast.jet.impl.SnapshotRepository.snapshotDataMapName) NonCompletableFuture(com.hazelcast.jet.impl.util.NonCompletableFuture) Address(com.hazelcast.nio.Address) Util.jobAndExecutionId(com.hazelcast.jet.impl.util.Util.jobAndExecutionId) Processors.mapP(com.hazelcast.jet.core.processor.Processors.mapP) SourceProcessors.readMapP(com.hazelcast.jet.core.processor.SourceProcessors.readMapP) CompletionToken(com.hazelcast.jet.impl.util.CompletionToken) Util.idToString(com.hazelcast.jet.impl.util.Util.idToString) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) MemberInfo(com.hazelcast.internal.cluster.MemberInfo) Map(java.util.Map) STARTING(com.hazelcast.jet.core.JobStatus.STARTING) DAG(com.hazelcast.jet.core.DAG) JobStatus(com.hazelcast.jet.core.JobStatus) ExceptionUtil(com.hazelcast.jet.impl.util.ExceptionUtil) ExecutionService(com.hazelcast.spi.ExecutionService) Operation(com.hazelcast.spi.Operation) CancellationException(java.util.concurrent.CancellationException) Collections.emptyList(java.util.Collections.emptyList) Collection(java.util.Collection) Util.getJetInstance(com.hazelcast.jet.impl.util.Util.getJetInstance) JobConfig(com.hazelcast.jet.config.JobConfig) ConcurrentHashMap(java.util.concurrent.ConcurrentHashMap) Set(java.util.Set) Collectors(java.util.stream.Collectors) BroadcastKey(com.hazelcast.jet.core.BroadcastKey) List(java.util.List) ExecutionCallback(com.hazelcast.core.ExecutionCallback) ExecutionPlan(com.hazelcast.jet.impl.execution.init.ExecutionPlan) Entry(java.util.Map.Entry) CancelExecutionOperation(com.hazelcast.jet.impl.operation.CancelExecutionOperation) TopologyChangedException(com.hazelcast.jet.core.TopologyChangedException) COMPLETED(com.hazelcast.jet.core.JobStatus.COMPLETED) InternalCompletableFuture(com.hazelcast.spi.InternalCompletableFuture) ExecutionPlanBuilder.createExecutionPlans(com.hazelcast.jet.impl.execution.init.ExecutionPlanBuilder.createExecutionPlans) Collectors.partitioningBy(java.util.stream.Collectors.partitioningBy) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) HashMap(java.util.HashMap) CompletableFuture(java.util.concurrent.CompletableFuture) StartExecutionOperation(com.hazelcast.jet.impl.operation.StartExecutionOperation) AtomicReference(java.util.concurrent.atomic.AtomicReference) Function(java.util.function.Function) HashSet(java.util.HashSet) InitExecutionOperation(com.hazelcast.jet.impl.operation.InitExecutionOperation) ILogger(com.hazelcast.logging.ILogger) ExceptionUtil.withTryCatch(com.hazelcast.jet.impl.util.ExceptionUtil.withTryCatch) NOT_STARTED(com.hazelcast.jet.core.JobStatus.NOT_STARTED) MembersView(com.hazelcast.internal.cluster.impl.MembersView) DistributedFunction(com.hazelcast.jet.function.DistributedFunction) Edge(com.hazelcast.jet.core.Edge) ClusterServiceImpl(com.hazelcast.internal.cluster.impl.ClusterServiceImpl) Nullable(javax.annotation.Nullable) NodeEngineImpl(com.hazelcast.spi.impl.NodeEngineImpl) RESTARTING(com.hazelcast.jet.core.JobStatus.RESTARTING) BroadcastEntry(com.hazelcast.jet.impl.execution.BroadcastEntry) ExceptionUtil.isTopologicalFailure(com.hazelcast.jet.impl.util.ExceptionUtil.isTopologicalFailure) DistributedFunctions.entryKey(com.hazelcast.jet.function.DistributedFunctions.entryKey) Consumer(java.util.function.Consumer) Vertex(com.hazelcast.jet.core.Vertex) Collectors.toList(java.util.stream.Collectors.toList) CustomClassLoadedObject.deserializeWithCustomClassLoader(com.hazelcast.jet.impl.execution.init.CustomClassLoadedObject.deserializeWithCustomClassLoader) CompleteExecutionOperation(com.hazelcast.jet.impl.operation.CompleteExecutionOperation) ExceptionUtil.peel(com.hazelcast.jet.impl.util.ExceptionUtil.peel) FAILED(com.hazelcast.jet.core.JobStatus.FAILED) RUNNING(com.hazelcast.jet.core.JobStatus.RUNNING) ProcessingGuarantee(com.hazelcast.jet.config.ProcessingGuarantee) JobRestartRequestedException(com.hazelcast.jet.impl.exception.JobRestartRequestedException) SnapshotOperation(com.hazelcast.jet.impl.operation.SnapshotOperation) Edge.between(com.hazelcast.jet.core.Edge.between) MembersView(com.hazelcast.internal.cluster.impl.MembersView) DAG(com.hazelcast.jet.core.DAG) Operation(com.hazelcast.spi.Operation) CancelExecutionOperation(com.hazelcast.jet.impl.operation.CancelExecutionOperation) StartExecutionOperation(com.hazelcast.jet.impl.operation.StartExecutionOperation) InitExecutionOperation(com.hazelcast.jet.impl.operation.InitExecutionOperation) CompleteExecutionOperation(com.hazelcast.jet.impl.operation.CompleteExecutionOperation) SnapshotOperation(com.hazelcast.jet.impl.operation.SnapshotOperation) CancellationException(java.util.concurrent.CancellationException) TopologyChangedException(com.hazelcast.jet.core.TopologyChangedException) JobRestartRequestedException(com.hazelcast.jet.impl.exception.JobRestartRequestedException) ExecutionPlan(com.hazelcast.jet.impl.execution.init.ExecutionPlan) MemberInfo(com.hazelcast.internal.cluster.MemberInfo) InitExecutionOperation(com.hazelcast.jet.impl.operation.InitExecutionOperation) CustomClassLoadedObject.deserializeWithCustomClassLoader(com.hazelcast.jet.impl.execution.init.CustomClassLoadedObject.deserializeWithCustomClassLoader)

Example 3 with FAILED

use of com.hazelcast.jet.core.JobStatus.FAILED in project hazelcast by hazelcast.

the class MasterJobContext method tryStartJob.

/**
 * Starts the execution of the job if it is not already completed,
 * cancelled or failed.
 * <p>
 * If the job is already cancelled, triggers the job completion procedure.
 * <p>
 * If the job quorum is not satisfied, reschedules the job restart.
 * <p>
 * If there was a membership change and the partition table is not completely
 * fixed yet, reschedules the job restart.
 */
void tryStartJob(Supplier<Long> executionIdSupplier) {
    mc.coordinationService().submitToCoordinatorThread(() -> {
        executionStartTime = System.currentTimeMillis();
        try {
            JobExecutionRecord jobExecRec = mc.jobExecutionRecord();
            jobExecRec.markExecuted();
            Tuple2<DAG, ClassLoader> dagAndClassloader = resolveDagAndCL(executionIdSupplier);
            if (dagAndClassloader == null) {
                return;
            }
            DAG dag = dagAndClassloader.f0();
            assert dag != null;
            ClassLoader classLoader = dagAndClassloader.f1();
            // must call this before rewriteDagWithSnapshotRestore()
            String dotRepresentation = dag.toDotString(defaultParallelism, defaultQueueSize);
            long snapshotId = jobExecRec.snapshotId();
            String snapshotName = mc.jobConfig().getInitialSnapshotName();
            String mapName = snapshotId >= 0 ? jobExecRec.successfulSnapshotDataMapName(mc.jobId()) : snapshotName != null ? EXPORTED_SNAPSHOTS_PREFIX + snapshotName : null;
            if (mapName != null) {
                rewriteDagWithSnapshotRestore(dag, snapshotId, mapName, snapshotName);
            } else {
                logger.info("Didn't find any snapshot to restore for " + mc.jobIdString());
            }
            MembersView membersView = Util.getMembersView(mc.nodeEngine());
            logger.info("Start executing " + mc.jobIdString() + ", execution graph in DOT format:\n" + dotRepresentation + "\nHINT: You can use graphviz or http://viz-js.com to visualize the printed graph.");
            logger.fine("Building execution plan for " + mc.jobIdString());
            Util.doWithClassLoader(classLoader, () -> mc.setExecutionPlanMap(createExecutionPlans(mc.nodeEngine(), membersView.getMembers(), dag, mc.jobId(), mc.executionId(), mc.jobConfig(), jobExecRec.ongoingSnapshotId(), false, mc.jobRecord().getSubject())));
            logger.fine("Built execution plans for " + mc.jobIdString());
            Set<MemberInfo> participants = mc.executionPlanMap().keySet();
            Version coordinatorVersion = mc.nodeEngine().getLocalMember().getVersion().asVersion();
            Function<ExecutionPlan, Operation> operationCtor = plan -> new InitExecutionOperation(mc.jobId(), mc.executionId(), membersView.getVersion(), coordinatorVersion, participants, mc.nodeEngine().getSerializationService().toData(plan), false);
            mc.invokeOnParticipants(operationCtor, this::onInitStepCompleted, null, false);
        } catch (Throwable e) {
            finalizeJob(e);
        }
    });
}
Also used : Address(com.hazelcast.cluster.Address) SUSPEND(com.hazelcast.jet.impl.TerminationMode.ActionAfterTerminate.SUSPEND) NOT_RUNNING(com.hazelcast.jet.core.JobStatus.NOT_RUNNING) GetLocalJobMetricsOperation(com.hazelcast.jet.impl.operation.GetLocalJobMetricsOperation) CompletableFuture.completedFuture(java.util.concurrent.CompletableFuture.completedFuture) NonCompletableFuture(com.hazelcast.jet.impl.util.NonCompletableFuture) ExceptionUtil.isTopologyException(com.hazelcast.jet.impl.util.ExceptionUtil.isTopologyException) JobTerminateRequestedException(com.hazelcast.jet.impl.exception.JobTerminateRequestedException) SourceProcessors.readMapP(com.hazelcast.jet.core.processor.SourceProcessors.readMapP) RESTART(com.hazelcast.jet.impl.TerminationMode.ActionAfterTerminate.RESTART) JetDelegatingClassLoader(com.hazelcast.jet.impl.deployment.JetDelegatingClassLoader) TerminatedWithSnapshotException(com.hazelcast.jet.impl.exception.TerminatedWithSnapshotException) Collectors.toMap(java.util.stream.Collectors.toMap) Functions.entryKey(com.hazelcast.function.Functions.entryKey) MemberInfo(com.hazelcast.internal.cluster.MemberInfo) Map(java.util.Map) STARTING(com.hazelcast.jet.core.JobStatus.STARTING) SUSPENDED(com.hazelcast.jet.core.JobStatus.SUSPENDED) DAG(com.hazelcast.jet.core.DAG) JobStatus(com.hazelcast.jet.core.JobStatus) ExceptionUtil(com.hazelcast.jet.impl.util.ExceptionUtil) JobMetrics(com.hazelcast.jet.core.metrics.JobMetrics) CancellationException(java.util.concurrent.CancellationException) CANCEL_GRACEFUL(com.hazelcast.jet.impl.TerminationMode.CANCEL_GRACEFUL) Collections.emptyList(java.util.Collections.emptyList) Collection(java.util.Collection) Set(java.util.Set) UUID(java.util.UUID) MILLISECONDS(java.util.concurrent.TimeUnit.MILLISECONDS) Collectors(java.util.stream.Collectors) CANCEL_FORCEFUL(com.hazelcast.jet.impl.TerminationMode.CANCEL_FORCEFUL) Objects(java.util.Objects) Util(com.hazelcast.jet.impl.util.Util) List(java.util.List) Util.idToString(com.hazelcast.jet.Util.idToString) ExecutionPlan(com.hazelcast.jet.impl.execution.init.ExecutionPlan) MetricNames(com.hazelcast.jet.core.metrics.MetricNames) Entry(java.util.Map.Entry) TopologyChangedException(com.hazelcast.jet.core.TopologyChangedException) COMPLETED(com.hazelcast.jet.core.JobStatus.COMPLETED) JetDisabledException(com.hazelcast.jet.impl.exception.JetDisabledException) LoggingUtil(com.hazelcast.jet.impl.util.LoggingUtil) ExecutionPlanBuilder.createExecutionPlans(com.hazelcast.jet.impl.execution.init.ExecutionPlanBuilder.createExecutionPlans) Collectors.partitioningBy(java.util.stream.Collectors.partitioningBy) TerminateExecutionOperation(com.hazelcast.jet.impl.operation.TerminateExecutionOperation) ExceptionUtil.isRestartableException(com.hazelcast.jet.impl.util.ExceptionUtil.isRestartableException) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) LoggingUtil.logFinest(com.hazelcast.jet.impl.util.LoggingUtil.logFinest) Util.doWithClassLoader(com.hazelcast.jet.impl.util.Util.doWithClassLoader) HashMap(java.util.HashMap) CompletableFuture(java.util.concurrent.CompletableFuture) ExecutionService(com.hazelcast.spi.impl.executionservice.ExecutionService) StartExecutionOperation(com.hazelcast.jet.impl.operation.StartExecutionOperation) Function(java.util.function.Function) Supplier(java.util.function.Supplier) Util.formatJobDuration(com.hazelcast.jet.impl.util.Util.formatJobDuration) ActionAfterTerminate(com.hazelcast.jet.impl.TerminationMode.ActionAfterTerminate) ExecutionNotFoundException(com.hazelcast.jet.impl.exception.ExecutionNotFoundException) ArrayList(java.util.ArrayList) JetException(com.hazelcast.jet.JetException) HashSet(java.util.HashSet) InitExecutionOperation(com.hazelcast.jet.impl.operation.InitExecutionOperation) COORDINATOR(com.hazelcast.jet.impl.JobClassLoaderService.JobPhase.COORDINATOR) ILogger(com.hazelcast.logging.ILogger) SnapshotValidator.validateSnapshot(com.hazelcast.jet.impl.SnapshotValidator.validateSnapshot) ExceptionUtil.rethrow(com.hazelcast.jet.impl.util.ExceptionUtil.rethrow) Operation(com.hazelcast.spi.impl.operationservice.Operation) Util.entry(com.hazelcast.jet.Util.entry) ExceptionUtil.withTryCatch(com.hazelcast.jet.impl.util.ExceptionUtil.withTryCatch) BiConsumer(java.util.function.BiConsumer) MembersView(com.hazelcast.internal.cluster.impl.MembersView) LocalMemberResetException(com.hazelcast.core.LocalMemberResetException) RESTART_GRACEFUL(com.hazelcast.jet.impl.TerminationMode.RESTART_GRACEFUL) Edge(com.hazelcast.jet.core.Edge) Version(com.hazelcast.version.Version) EXPORTED_SNAPSHOTS_PREFIX(com.hazelcast.jet.impl.JobRepository.EXPORTED_SNAPSHOTS_PREFIX) Nonnull(javax.annotation.Nonnull) Tuple2(com.hazelcast.jet.datamodel.Tuple2) Nullable(javax.annotation.Nullable) Job(com.hazelcast.jet.Job) Measurement(com.hazelcast.jet.core.metrics.Measurement) SUSPENDED_EXPORTING_SNAPSHOT(com.hazelcast.jet.core.JobStatus.SUSPENDED_EXPORTING_SNAPSHOT) Util.toList(com.hazelcast.jet.impl.util.Util.toList) RawJobMetrics(com.hazelcast.jet.impl.metrics.RawJobMetrics) MetricTags(com.hazelcast.jet.core.metrics.MetricTags) NONE(com.hazelcast.jet.config.ProcessingGuarantee.NONE) Consumer(java.util.function.Consumer) Vertex(com.hazelcast.jet.core.Vertex) Tuple2.tuple2(com.hazelcast.jet.datamodel.Tuple2.tuple2) CustomClassLoadedObject.deserializeWithCustomClassLoader(com.hazelcast.jet.impl.execution.init.CustomClassLoadedObject.deserializeWithCustomClassLoader) ExceptionUtil.peel(com.hazelcast.jet.impl.util.ExceptionUtil.peel) FAILED(com.hazelcast.jet.core.JobStatus.FAILED) RUNNING(com.hazelcast.jet.core.JobStatus.RUNNING) Collections(java.util.Collections) IMap(com.hazelcast.map.IMap) Edge.between(com.hazelcast.jet.core.Edge.between) MembersView(com.hazelcast.internal.cluster.impl.MembersView) DAG(com.hazelcast.jet.core.DAG) Util.idToString(com.hazelcast.jet.Util.idToString) GetLocalJobMetricsOperation(com.hazelcast.jet.impl.operation.GetLocalJobMetricsOperation) TerminateExecutionOperation(com.hazelcast.jet.impl.operation.TerminateExecutionOperation) StartExecutionOperation(com.hazelcast.jet.impl.operation.StartExecutionOperation) InitExecutionOperation(com.hazelcast.jet.impl.operation.InitExecutionOperation) Operation(com.hazelcast.spi.impl.operationservice.Operation) ExecutionPlan(com.hazelcast.jet.impl.execution.init.ExecutionPlan) MemberInfo(com.hazelcast.internal.cluster.MemberInfo) Version(com.hazelcast.version.Version) InitExecutionOperation(com.hazelcast.jet.impl.operation.InitExecutionOperation) JetDelegatingClassLoader(com.hazelcast.jet.impl.deployment.JetDelegatingClassLoader) Util.doWithClassLoader(com.hazelcast.jet.impl.util.Util.doWithClassLoader) CustomClassLoadedObject.deserializeWithCustomClassLoader(com.hazelcast.jet.impl.execution.init.CustomClassLoadedObject.deserializeWithCustomClassLoader)

Example 4 with FAILED

use of com.hazelcast.jet.core.JobStatus.FAILED in project hazelcast by hazelcast.

the class MasterJobContext method onStartExecutionComplete.

private void onStartExecutionComplete(Throwable error, Collection<Entry<MemberInfo, Object>> responses) {
    JobStatus status = mc.jobStatus();
    if (status != STARTING && status != RUNNING) {
        logCannotComplete(error);
        error = new IllegalStateException("Job coordination failed");
    }
    setJobMetrics(responses.stream().filter(en -> en.getValue() instanceof RawJobMetrics).map(e1 -> (RawJobMetrics) e1.getValue()).collect(Collectors.toList()));
    if (error instanceof JobTerminateRequestedException && ((JobTerminateRequestedException) error).mode().isWithTerminalSnapshot()) {
        Throwable finalError = error;
        // The terminal snapshot on members is always completed before replying to StartExecutionOp.
        // However, the response to snapshot operations can be processed after the response to
        // StartExecutionOp, so wait for that too.
        mc.snapshotContext().terminalSnapshotFuture().whenCompleteAsync(withTryCatch(logger, (r, e) -> finalizeJob(finalError)));
    } else {
        if (error instanceof ExecutionNotFoundException) {
            // If the StartExecutionOperation didn't find the execution, it means that it was cancelled.
            if (requestedTerminationMode != null) {
                // This cancellation can be because the master cancelled it. If that's the case, convert the exception
                // to JobTerminateRequestedException.
                error = new JobTerminateRequestedException(requestedTerminationMode).initCause(error);
            }
        // The cancellation can also happen if some participant left and
        // the target cancelled the execution locally in JobExecutionService.onMemberRemoved().
        // We keep this (and possibly other) exceptions as they are
        // and let the execution complete with failure.
        }
        finalizeJob(error);
    }
}
Also used : JobStatus(com.hazelcast.jet.core.JobStatus) Address(com.hazelcast.cluster.Address) SUSPEND(com.hazelcast.jet.impl.TerminationMode.ActionAfterTerminate.SUSPEND) NOT_RUNNING(com.hazelcast.jet.core.JobStatus.NOT_RUNNING) GetLocalJobMetricsOperation(com.hazelcast.jet.impl.operation.GetLocalJobMetricsOperation) CompletableFuture.completedFuture(java.util.concurrent.CompletableFuture.completedFuture) NonCompletableFuture(com.hazelcast.jet.impl.util.NonCompletableFuture) ExceptionUtil.isTopologyException(com.hazelcast.jet.impl.util.ExceptionUtil.isTopologyException) JobTerminateRequestedException(com.hazelcast.jet.impl.exception.JobTerminateRequestedException) SourceProcessors.readMapP(com.hazelcast.jet.core.processor.SourceProcessors.readMapP) RESTART(com.hazelcast.jet.impl.TerminationMode.ActionAfterTerminate.RESTART) JetDelegatingClassLoader(com.hazelcast.jet.impl.deployment.JetDelegatingClassLoader) TerminatedWithSnapshotException(com.hazelcast.jet.impl.exception.TerminatedWithSnapshotException) Collectors.toMap(java.util.stream.Collectors.toMap) Functions.entryKey(com.hazelcast.function.Functions.entryKey) MemberInfo(com.hazelcast.internal.cluster.MemberInfo) Map(java.util.Map) STARTING(com.hazelcast.jet.core.JobStatus.STARTING) SUSPENDED(com.hazelcast.jet.core.JobStatus.SUSPENDED) DAG(com.hazelcast.jet.core.DAG) JobStatus(com.hazelcast.jet.core.JobStatus) ExceptionUtil(com.hazelcast.jet.impl.util.ExceptionUtil) JobMetrics(com.hazelcast.jet.core.metrics.JobMetrics) CancellationException(java.util.concurrent.CancellationException) CANCEL_GRACEFUL(com.hazelcast.jet.impl.TerminationMode.CANCEL_GRACEFUL) Collections.emptyList(java.util.Collections.emptyList) Collection(java.util.Collection) Set(java.util.Set) UUID(java.util.UUID) MILLISECONDS(java.util.concurrent.TimeUnit.MILLISECONDS) Collectors(java.util.stream.Collectors) CANCEL_FORCEFUL(com.hazelcast.jet.impl.TerminationMode.CANCEL_FORCEFUL) Objects(java.util.Objects) Util(com.hazelcast.jet.impl.util.Util) List(java.util.List) Util.idToString(com.hazelcast.jet.Util.idToString) ExecutionPlan(com.hazelcast.jet.impl.execution.init.ExecutionPlan) MetricNames(com.hazelcast.jet.core.metrics.MetricNames) Entry(java.util.Map.Entry) TopologyChangedException(com.hazelcast.jet.core.TopologyChangedException) COMPLETED(com.hazelcast.jet.core.JobStatus.COMPLETED) JetDisabledException(com.hazelcast.jet.impl.exception.JetDisabledException) LoggingUtil(com.hazelcast.jet.impl.util.LoggingUtil) ExecutionPlanBuilder.createExecutionPlans(com.hazelcast.jet.impl.execution.init.ExecutionPlanBuilder.createExecutionPlans) Collectors.partitioningBy(java.util.stream.Collectors.partitioningBy) TerminateExecutionOperation(com.hazelcast.jet.impl.operation.TerminateExecutionOperation) ExceptionUtil.isRestartableException(com.hazelcast.jet.impl.util.ExceptionUtil.isRestartableException) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) LoggingUtil.logFinest(com.hazelcast.jet.impl.util.LoggingUtil.logFinest) Util.doWithClassLoader(com.hazelcast.jet.impl.util.Util.doWithClassLoader) HashMap(java.util.HashMap) CompletableFuture(java.util.concurrent.CompletableFuture) ExecutionService(com.hazelcast.spi.impl.executionservice.ExecutionService) StartExecutionOperation(com.hazelcast.jet.impl.operation.StartExecutionOperation) Function(java.util.function.Function) Supplier(java.util.function.Supplier) Util.formatJobDuration(com.hazelcast.jet.impl.util.Util.formatJobDuration) ActionAfterTerminate(com.hazelcast.jet.impl.TerminationMode.ActionAfterTerminate) ExecutionNotFoundException(com.hazelcast.jet.impl.exception.ExecutionNotFoundException) ArrayList(java.util.ArrayList) JetException(com.hazelcast.jet.JetException) HashSet(java.util.HashSet) InitExecutionOperation(com.hazelcast.jet.impl.operation.InitExecutionOperation) COORDINATOR(com.hazelcast.jet.impl.JobClassLoaderService.JobPhase.COORDINATOR) ILogger(com.hazelcast.logging.ILogger) SnapshotValidator.validateSnapshot(com.hazelcast.jet.impl.SnapshotValidator.validateSnapshot) ExceptionUtil.rethrow(com.hazelcast.jet.impl.util.ExceptionUtil.rethrow) Operation(com.hazelcast.spi.impl.operationservice.Operation) Util.entry(com.hazelcast.jet.Util.entry) ExceptionUtil.withTryCatch(com.hazelcast.jet.impl.util.ExceptionUtil.withTryCatch) BiConsumer(java.util.function.BiConsumer) MembersView(com.hazelcast.internal.cluster.impl.MembersView) LocalMemberResetException(com.hazelcast.core.LocalMemberResetException) RESTART_GRACEFUL(com.hazelcast.jet.impl.TerminationMode.RESTART_GRACEFUL) Edge(com.hazelcast.jet.core.Edge) Version(com.hazelcast.version.Version) EXPORTED_SNAPSHOTS_PREFIX(com.hazelcast.jet.impl.JobRepository.EXPORTED_SNAPSHOTS_PREFIX) Nonnull(javax.annotation.Nonnull) Tuple2(com.hazelcast.jet.datamodel.Tuple2) Nullable(javax.annotation.Nullable) Job(com.hazelcast.jet.Job) Measurement(com.hazelcast.jet.core.metrics.Measurement) SUSPENDED_EXPORTING_SNAPSHOT(com.hazelcast.jet.core.JobStatus.SUSPENDED_EXPORTING_SNAPSHOT) Util.toList(com.hazelcast.jet.impl.util.Util.toList) RawJobMetrics(com.hazelcast.jet.impl.metrics.RawJobMetrics) MetricTags(com.hazelcast.jet.core.metrics.MetricTags) NONE(com.hazelcast.jet.config.ProcessingGuarantee.NONE) Consumer(java.util.function.Consumer) Vertex(com.hazelcast.jet.core.Vertex) Tuple2.tuple2(com.hazelcast.jet.datamodel.Tuple2.tuple2) CustomClassLoadedObject.deserializeWithCustomClassLoader(com.hazelcast.jet.impl.execution.init.CustomClassLoadedObject.deserializeWithCustomClassLoader) ExceptionUtil.peel(com.hazelcast.jet.impl.util.ExceptionUtil.peel) FAILED(com.hazelcast.jet.core.JobStatus.FAILED) RUNNING(com.hazelcast.jet.core.JobStatus.RUNNING) Collections(java.util.Collections) IMap(com.hazelcast.map.IMap) Edge.between(com.hazelcast.jet.core.Edge.between) ExecutionNotFoundException(com.hazelcast.jet.impl.exception.ExecutionNotFoundException) RawJobMetrics(com.hazelcast.jet.impl.metrics.RawJobMetrics) JobTerminateRequestedException(com.hazelcast.jet.impl.exception.JobTerminateRequestedException)

Example 5 with FAILED

use of com.hazelcast.jet.core.JobStatus.FAILED in project hazelcast by hazelcast.

the class WatermarkCoalescer_TerminalSnapshotTest method test.

@Test
public void test() throws Exception {
    /*
        This test tests the issue that after a terminal barrier is processed, no other work should
        be done by the ProcessorTasklet or CIES after that (except for emitting the DONE_ITEM).
        Also, if at-least-once guarantee is used, the tasklet should not continue to drain
        the queue that had the barrier while waiting for other barriers.

        Specifically, the issue was that in at-least-once mode the DONE_ITEM was processed
        after the terminal barrier while waiting for the barrier on other queues/edges. The
        DONE_ITEM could have caused a WM being emitted after the barrier, which is ok
        for the at-least-once mode, but the terminal snapshot should behave as if exactly-once
        mode was used.

        This test ensures that we're waiting for a WM in coalescer (by having a stream skew)
        and then does a graceful restart in at-least-once mode and checks that the results are
        correct.
         */
    String key0 = generateKeyForPartition(instance, 0);
    String key1 = generateKeyForPartition(instance, 1);
    Pipeline p = Pipeline.create();
    p.readFrom(Sources.mapJournal(sourceMap, JournalInitialPosition.START_FROM_OLDEST)).withTimestamps(Map.Entry::getValue, 0).setLocalParallelism(PARTITION_COUNT).groupingKey(Map.Entry::getKey).window(WindowDefinition.sliding(1, 1)).aggregate(AggregateOperations.counting()).setLocalParallelism(PARTITION_COUNT).writeTo(SinkBuilder.sinkBuilder("throwing", ctx -> "").<KeyedWindowResult<String, Long>>receiveFn((w, kwr) -> {
        if (kwr.result() != COUNT) {
            throw new RuntimeException("Received unexpected item " + kwr + ", expected count is " + COUNT);
        }
    }).build());
    Job job = instance.getJet().newJob(p, new JobConfig().setProcessingGuarantee(ProcessingGuarantee.AT_LEAST_ONCE));
    List<Future> futures = new ArrayList<>();
    futures.add(spawn(() -> {
        for (; ; ) {
            assertJobStatusEventually(job, JobStatus.RUNNING);
            System.out.println("============RESTARTING JOB=========");
            job.restart();
            Thread.sleep(2000);
        }
    }));
    // one producer is twice as fast as the other, to cause waiting for WM while doing snapshot
    futures.add(spawn(() -> producer(key0, 1)));
    futures.add(spawn(() -> producer(key1, 2)));
    sleepSeconds(20);
    for (Future f : futures) {
        f.cancel(true);
        // check that the future was cancelled and didn't fail with another error
        try {
            f.get();
            fail("Exception was expected");
        } catch (CancellationException expected) {
        }
    }
    // check that the job is running
    JobStatus status = job.getStatus();
    assertTrue("job should not be completed, status=" + status, status != FAILED && status != COMPLETED && status != SUSPENDED);
}
Also used : ParallelJVMTest(com.hazelcast.test.annotation.ParallelJVMTest) KeyedWindowResult(com.hazelcast.jet.datamodel.KeyedWindowResult) QuickTest(com.hazelcast.test.annotation.QuickTest) RunWith(org.junit.runner.RunWith) EventJournalConfig(com.hazelcast.config.EventJournalConfig) ArrayList(java.util.ArrayList) Future(java.util.concurrent.Future) Map(java.util.Map) SUSPENDED(com.hazelcast.jet.core.JobStatus.SUSPENDED) Assert.fail(org.junit.Assert.fail) JobStatus(com.hazelcast.jet.core.JobStatus) Job(com.hazelcast.jet.Job) Before(org.junit.Before) Config(com.hazelcast.config.Config) HazelcastInstance(com.hazelcast.core.HazelcastInstance) WindowDefinition(com.hazelcast.jet.pipeline.WindowDefinition) Pipeline(com.hazelcast.jet.pipeline.Pipeline) CancellationException(java.util.concurrent.CancellationException) JetTestSupport(com.hazelcast.jet.core.JetTestSupport) JobConfig(com.hazelcast.jet.config.JobConfig) AggregateOperations(com.hazelcast.jet.aggregate.AggregateOperations) Assert.assertTrue(org.junit.Assert.assertTrue) Test(org.junit.Test) Category(org.junit.experimental.categories.Category) ClusterProperty(com.hazelcast.spi.properties.ClusterProperty) Sources(com.hazelcast.jet.pipeline.Sources) TimeUnit(java.util.concurrent.TimeUnit) LockSupport(java.util.concurrent.locks.LockSupport) List(java.util.List) JournalInitialPosition(com.hazelcast.jet.pipeline.JournalInitialPosition) HazelcastParallelClassRunner(com.hazelcast.test.HazelcastParallelClassRunner) FAILED(com.hazelcast.jet.core.JobStatus.FAILED) SinkBuilder(com.hazelcast.jet.pipeline.SinkBuilder) ProcessingGuarantee(com.hazelcast.jet.config.ProcessingGuarantee) COMPLETED(com.hazelcast.jet.core.JobStatus.COMPLETED) IMap(com.hazelcast.map.IMap) ArrayList(java.util.ArrayList) KeyedWindowResult(com.hazelcast.jet.datamodel.KeyedWindowResult) JobConfig(com.hazelcast.jet.config.JobConfig) Pipeline(com.hazelcast.jet.pipeline.Pipeline) JobStatus(com.hazelcast.jet.core.JobStatus) CancellationException(java.util.concurrent.CancellationException) Future(java.util.concurrent.Future) Job(com.hazelcast.jet.Job) ParallelJVMTest(com.hazelcast.test.annotation.ParallelJVMTest) QuickTest(com.hazelcast.test.annotation.QuickTest) Test(org.junit.Test)

Aggregations

JobStatus (com.hazelcast.jet.core.JobStatus)6 COMPLETED (com.hazelcast.jet.core.JobStatus.COMPLETED)6 FAILED (com.hazelcast.jet.core.JobStatus.FAILED)6 List (java.util.List)6 Map (java.util.Map)6 CancellationException (java.util.concurrent.CancellationException)6 MemberInfo (com.hazelcast.internal.cluster.MemberInfo)5 MembersView (com.hazelcast.internal.cluster.impl.MembersView)5 DAG (com.hazelcast.jet.core.DAG)5 Edge (com.hazelcast.jet.core.Edge)5 Edge.between (com.hazelcast.jet.core.Edge.between)5 RUNNING (com.hazelcast.jet.core.JobStatus.RUNNING)5 STARTING (com.hazelcast.jet.core.JobStatus.STARTING)5 TopologyChangedException (com.hazelcast.jet.core.TopologyChangedException)5 Vertex (com.hazelcast.jet.core.Vertex)5 SourceProcessors.readMapP (com.hazelcast.jet.core.processor.SourceProcessors.readMapP)5 CustomClassLoadedObject.deserializeWithCustomClassLoader (com.hazelcast.jet.impl.execution.init.CustomClassLoadedObject.deserializeWithCustomClassLoader)5 ExecutionPlan (com.hazelcast.jet.impl.execution.init.ExecutionPlan)5 ExecutionPlanBuilder.createExecutionPlans (com.hazelcast.jet.impl.execution.init.ExecutionPlanBuilder.createExecutionPlans)5 InitExecutionOperation (com.hazelcast.jet.impl.operation.InitExecutionOperation)5