Search in sources :

Example 1 with JobKeepAliveEvent

use of com.netflix.titus.api.jobmanager.model.job.event.JobKeepAliveEvent in project titus-control-plane by Netflix.

the class ObserveJobsCommand method executeOnce.

private void executeOnce(Flux<JobManagerEvent<?>> events, JobEventPropagationMetrics metrics, boolean printLatency, boolean printEvents, boolean snapshotOnly) throws InterruptedException {
    CountDownLatch latch = new CountDownLatch(1);
    AtomicBoolean snapshotRead = new AtomicBoolean();
    Stopwatch stopwatch = Stopwatch.createStarted();
    Disposable disposable = events.subscribe(next -> {
        if (next == JobManagerEvent.snapshotMarker()) {
            logger.info("Emitted: snapshot marker in {}ms", stopwatch.elapsed(TimeUnit.MILLISECONDS));
            snapshotRead.set(true);
            if (snapshotOnly) {
                latch.countDown();
            }
        } else if (next instanceof JobUpdateEvent) {
            Job<?> job = ((JobUpdateEvent) next).getCurrent();
            if (printEvents) {
                logger.info("Emitted job update: jobId={}({}), jobState={}, version={}", job.getId(), next.isArchived() ? "archived" : job.getStatus().getState(), job.getStatus(), job.getVersion());
            }
            Optional<EventPropagationTrace> trace = metrics.recordJob(((JobUpdateEvent) next).getCurrent(), !snapshotRead.get());
            if (printLatency) {
                trace.ifPresent(t -> {
                    logger.info("Event propagation data: stages={}", t);
                });
            }
        } else if (next instanceof TaskUpdateEvent) {
            Task task = ((TaskUpdateEvent) next).getCurrent();
            if (printEvents) {
                logger.info("Emitted task update: jobId={}({}), taskId={}, taskState={}, version={}", task.getJobId(), next.isArchived() ? "archived" : task.getStatus().getState(), task.getId(), task.getStatus(), task.getVersion());
            }
            Optional<EventPropagationTrace> trace = metrics.recordTask(((TaskUpdateEvent) next).getCurrent(), !snapshotRead.get());
            if (printLatency) {
                trace.ifPresent(t -> logger.info("Event propagation data: {}", t));
            }
        } else if (next instanceof JobKeepAliveEvent) {
            if (printEvents) {
                logger.info("Keep alive response: " + next);
            }
        } else {
            logger.info("Unrecognized event type: {}", next);
        }
    }, e -> {
        ErrorReports.handleReplyError("Error in the event stream", e);
        latch.countDown();
    }, () -> {
        logger.info("Event stream closed");
        latch.countDown();
    });
    latch.await();
    disposable.dispose();
}
Also used : Disposable(reactor.core.Disposable) CommandContext(com.netflix.titus.cli.CommandContext) Disposable(reactor.core.Disposable) Stopwatch(com.google.common.base.Stopwatch) ObserveJobsQuery(com.netflix.titus.grpc.protogen.ObserveJobsQuery) Task(com.netflix.titus.api.jobmanager.model.job.Task) Options(org.apache.commons.cli.Options) LoggerFactory(org.slf4j.LoggerFactory) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) StringExt(com.netflix.titus.common.util.StringExt) CliCommand(com.netflix.titus.cli.CliCommand) JobEventPropagationMetrics(com.netflix.titus.runtime.connector.jobmanager.JobEventPropagationMetrics) Option(org.apache.commons.cli.Option) EventPropagationTrace(com.netflix.titus.common.util.event.EventPropagationTrace) Job(com.netflix.titus.api.jobmanager.model.job.Job) Logger(org.slf4j.Logger) Iterator(java.util.Iterator) JobUpdateEvent(com.netflix.titus.api.jobmanager.model.job.event.JobUpdateEvent) Set(java.util.Set) JobManagerEvent(com.netflix.titus.api.jobmanager.model.job.event.JobManagerEvent) JobKeepAliveEvent(com.netflix.titus.api.jobmanager.model.job.event.JobKeepAliveEvent) TimeUnit(java.util.concurrent.TimeUnit) CountDownLatch(java.util.concurrent.CountDownLatch) Flux(reactor.core.publisher.Flux) TaskUpdateEvent(com.netflix.titus.api.jobmanager.model.job.event.TaskUpdateEvent) JobManagementServiceBlockingStub(com.netflix.titus.grpc.protogen.JobManagementServiceGrpc.JobManagementServiceBlockingStub) Optional(java.util.Optional) ErrorReports(com.netflix.titus.cli.command.ErrorReports) Collections(java.util.Collections) JobChangeNotification(com.netflix.titus.grpc.protogen.JobChangeNotification) RemoteJobManagementClient(com.netflix.titus.runtime.connector.jobmanager.RemoteJobManagementClient) Task(com.netflix.titus.api.jobmanager.model.job.Task) Optional(java.util.Optional) Stopwatch(com.google.common.base.Stopwatch) JobKeepAliveEvent(com.netflix.titus.api.jobmanager.model.job.event.JobKeepAliveEvent) CountDownLatch(java.util.concurrent.CountDownLatch) EventPropagationTrace(com.netflix.titus.common.util.event.EventPropagationTrace) JobUpdateEvent(com.netflix.titus.api.jobmanager.model.job.event.JobUpdateEvent) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) Job(com.netflix.titus.api.jobmanager.model.job.Job) TaskUpdateEvent(com.netflix.titus.api.jobmanager.model.job.event.TaskUpdateEvent)

Example 2 with JobKeepAliveEvent

use of com.netflix.titus.api.jobmanager.model.job.event.JobKeepAliveEvent in project titus-control-plane by Netflix.

the class LocalCacheQueryProcessor method observeJobs.

public Observable<JobChangeNotification> observeJobs(ObserveJobsQuery query) {
    JobQueryCriteria<TaskStatus.TaskState, JobDescriptor.JobSpecCase> criteria = toJobQueryCriteria(query);
    V3JobQueryCriteriaEvaluator jobsPredicate = new V3JobQueryCriteriaEvaluator(criteria, titusRuntime);
    V3TaskQueryCriteriaEvaluator tasksPredicate = new V3TaskQueryCriteriaEvaluator(criteria, titusRuntime);
    Set<String> jobFields = newFieldsFilter(query.getJobFieldsList(), JOB_MINIMUM_FIELD_SET);
    Set<String> taskFields = newFieldsFilter(query.getTaskFieldsList(), TASK_MINIMUM_FIELD_SET);
    Flux<JobChangeNotification> eventStream = Flux.defer(() -> {
        AtomicBoolean first = new AtomicBoolean(true);
        return jobDataReplicator.events().subscribeOn(scheduler).publishOn(scheduler).flatMap(event -> {
            JobManagerEvent<?> jobManagerEvent = event.getRight();
            long now = titusRuntime.getClock().wallTime();
            JobSnapshot snapshot = event.getLeft();
            Optional<JobChangeNotification> grpcEvent = toObserveJobsEvent(snapshot, jobManagerEvent, now, jobsPredicate, tasksPredicate, jobFields, taskFields);
            // On first event emit full snapshot first
            if (first.getAndSet(false)) {
                List<JobChangeNotification> snapshotEvents = buildSnapshot(snapshot, now, jobsPredicate, tasksPredicate, jobFields, taskFields);
                grpcEvent.ifPresent(snapshotEvents::add);
                return Flux.fromIterable(snapshotEvents);
            }
            // subscribe again. Snapshot marker indicates that the underlying GRPC stream was disconnected.
            if (jobManagerEvent == JobManagerEvent.snapshotMarker()) {
                return Mono.error(new StatusRuntimeException(Status.ABORTED.augmentDescription("Downstream event stream reconnected.")));
            }
            // to filter them out here.
            if (jobManagerEvent instanceof JobKeepAliveEvent) {
                // Check if staleness is not too high.
                if (jobDataReplicator.getStalenessMs() > configuration.getObserveJobsStalenessDisconnectMs()) {
                    rejectedByStalenessTooHighMetric.increment();
                    return Mono.error(new StatusRuntimeException(Status.ABORTED.augmentDescription("Data staleness in the event stream is too high. Most likely caused by connectivity issue to the downstream server.")));
                }
                return Mono.empty();
            }
            return grpcEvent.map(Flux::just).orElseGet(Flux::empty);
        });
    });
    return ReactorExt.toObservable(eventStream);
}
Also used : V3TaskQueryCriteriaEvaluator(com.netflix.titus.runtime.endpoint.v3.grpc.query.V3TaskQueryCriteriaEvaluator) Flux(reactor.core.publisher.Flux) JobKeepAliveEvent(com.netflix.titus.api.jobmanager.model.job.event.JobKeepAliveEvent) V3JobQueryCriteriaEvaluator(com.netflix.titus.runtime.endpoint.v3.grpc.query.V3JobQueryCriteriaEvaluator) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) JobChangeNotification(com.netflix.titus.grpc.protogen.JobChangeNotification) StatusRuntimeException(io.grpc.StatusRuntimeException) JobSnapshot(com.netflix.titus.runtime.connector.jobmanager.snapshot.JobSnapshot)

Aggregations

JobKeepAliveEvent (com.netflix.titus.api.jobmanager.model.job.event.JobKeepAliveEvent)2 JobChangeNotification (com.netflix.titus.grpc.protogen.JobChangeNotification)2 AtomicBoolean (java.util.concurrent.atomic.AtomicBoolean)2 Flux (reactor.core.publisher.Flux)2 Stopwatch (com.google.common.base.Stopwatch)1 Job (com.netflix.titus.api.jobmanager.model.job.Job)1 Task (com.netflix.titus.api.jobmanager.model.job.Task)1 JobManagerEvent (com.netflix.titus.api.jobmanager.model.job.event.JobManagerEvent)1 JobUpdateEvent (com.netflix.titus.api.jobmanager.model.job.event.JobUpdateEvent)1 TaskUpdateEvent (com.netflix.titus.api.jobmanager.model.job.event.TaskUpdateEvent)1 CliCommand (com.netflix.titus.cli.CliCommand)1 CommandContext (com.netflix.titus.cli.CommandContext)1 ErrorReports (com.netflix.titus.cli.command.ErrorReports)1 StringExt (com.netflix.titus.common.util.StringExt)1 EventPropagationTrace (com.netflix.titus.common.util.event.EventPropagationTrace)1 JobManagementServiceBlockingStub (com.netflix.titus.grpc.protogen.JobManagementServiceGrpc.JobManagementServiceBlockingStub)1 ObserveJobsQuery (com.netflix.titus.grpc.protogen.ObserveJobsQuery)1 JobEventPropagationMetrics (com.netflix.titus.runtime.connector.jobmanager.JobEventPropagationMetrics)1 RemoteJobManagementClient (com.netflix.titus.runtime.connector.jobmanager.RemoteJobManagementClient)1 JobSnapshot (com.netflix.titus.runtime.connector.jobmanager.snapshot.JobSnapshot)1