Search in sources :

Example 96 with TimeValue

use of org.opensearch.common.unit.TimeValue in project OpenSearch by opensearch-project.

the class RecoverySourceHandler method recoverToTarget.

/**
 * performs the recovery from the local engine to the target
 */
public void recoverToTarget(ActionListener<RecoveryResponse> listener) {
    addListener(listener);
    final Closeable releaseResources = () -> IOUtils.close(resources);
    try {
        cancellableThreads.setOnCancel((reason, beforeCancelEx) -> {
            final RuntimeException e;
            if (shard.state() == IndexShardState.CLOSED) {
                // check if the shard got closed on us
                e = new IndexShardClosedException(shard.shardId(), "shard is closed and recovery was canceled reason [" + reason + "]");
            } else {
                e = new CancellableThreads.ExecutionCancelledException("recovery was canceled reason [" + reason + "]");
            }
            if (beforeCancelEx != null) {
                e.addSuppressed(beforeCancelEx);
            }
            IOUtils.closeWhileHandlingException(releaseResources, () -> future.onFailure(e));
            throw e;
        });
        final Consumer<Exception> onFailure = e -> {
            assert Transports.assertNotTransportThread(RecoverySourceHandler.this + "[onFailure]");
            IOUtils.closeWhileHandlingException(releaseResources, () -> future.onFailure(e));
        };
        final SetOnce<RetentionLease> retentionLeaseRef = new SetOnce<>();
        runUnderPrimaryPermit(() -> {
            final IndexShardRoutingTable routingTable = shard.getReplicationGroup().getRoutingTable();
            ShardRouting targetShardRouting = routingTable.getByAllocationId(request.targetAllocationId());
            if (targetShardRouting == null) {
                logger.debug("delaying recovery of {} as it is not listed as assigned to target node {}", request.shardId(), request.targetNode());
                throw new DelayRecoveryException("source node does not have the shard listed in its state as allocated on the node");
            }
            assert targetShardRouting.initializing() : "expected recovery target to be initializing but was " + targetShardRouting;
            retentionLeaseRef.set(shard.getRetentionLeases().get(ReplicationTracker.getPeerRecoveryRetentionLeaseId(targetShardRouting)));
        }, shardId + " validating recovery target [" + request.targetAllocationId() + "] registered ", shard, cancellableThreads, logger);
        final Closeable retentionLock = shard.acquireHistoryRetentionLock();
        resources.add(retentionLock);
        final long startingSeqNo;
        final boolean isSequenceNumberBasedRecovery = request.startingSeqNo() != SequenceNumbers.UNASSIGNED_SEQ_NO && isTargetSameHistory() && shard.hasCompleteHistoryOperations(PEER_RECOVERY_NAME, request.startingSeqNo()) && ((retentionLeaseRef.get() == null && shard.useRetentionLeasesInPeerRecovery() == false) || (retentionLeaseRef.get() != null && retentionLeaseRef.get().retainingSequenceNumber() <= request.startingSeqNo()));
        if (isSequenceNumberBasedRecovery && retentionLeaseRef.get() != null) {
            // all the history we need is retained by an existing retention lease, so we do not need a separate retention lock
            retentionLock.close();
            logger.trace("history is retained by {}", retentionLeaseRef.get());
        } else {
            // all the history we need is retained by the retention lock, obtained before calling shard.hasCompleteHistoryOperations()
            // and before acquiring the safe commit we'll be using, so we can be certain that all operations after the safe commit's
            // local checkpoint will be retained for the duration of this recovery.
            logger.trace("history is retained by retention lock");
        }
        final StepListener<SendFileResult> sendFileStep = new StepListener<>();
        final StepListener<TimeValue> prepareEngineStep = new StepListener<>();
        final StepListener<SendSnapshotResult> sendSnapshotStep = new StepListener<>();
        final StepListener<Void> finalizeStep = new StepListener<>();
        if (isSequenceNumberBasedRecovery) {
            logger.trace("performing sequence numbers based recovery. starting at [{}]", request.startingSeqNo());
            startingSeqNo = request.startingSeqNo();
            if (retentionLeaseRef.get() == null) {
                createRetentionLease(startingSeqNo, ActionListener.map(sendFileStep, ignored -> SendFileResult.EMPTY));
            } else {
                sendFileStep.onResponse(SendFileResult.EMPTY);
            }
        } else {
            final Engine.IndexCommitRef safeCommitRef;
            try {
                safeCommitRef = acquireSafeCommit(shard);
                resources.add(safeCommitRef);
            } catch (final Exception e) {
                throw new RecoveryEngineException(shard.shardId(), 1, "snapshot failed", e);
            }
            // Try and copy enough operations to the recovering peer so that if it is promoted to primary then it has a chance of being
            // able to recover other replicas using operations-based recoveries. If we are not using retention leases then we
            // conservatively copy all available operations. If we are using retention leases then "enough operations" is just the
            // operations from the local checkpoint of the safe commit onwards, because when using soft deletes the safe commit retains
            // at least as much history as anything else. The safe commit will often contain all the history retained by the current set
            // of retention leases, but this is not guaranteed: an earlier peer recovery from a different primary might have created a
            // retention lease for some history that this primary already discarded, since we discard history when the global checkpoint
            // advances and not when creating a new safe commit. In any case this is a best-effort thing since future recoveries can
            // always fall back to file-based ones, and only really presents a problem if this primary fails before things have settled
            // down.
            startingSeqNo = Long.parseLong(safeCommitRef.getIndexCommit().getUserData().get(SequenceNumbers.LOCAL_CHECKPOINT_KEY)) + 1L;
            logger.trace("performing file-based recovery followed by history replay starting at [{}]", startingSeqNo);
            try {
                final int estimateNumOps = estimateNumberOfHistoryOperations(startingSeqNo);
                final Releasable releaseStore = acquireStore(shard.store());
                resources.add(releaseStore);
                sendFileStep.whenComplete(r -> IOUtils.close(safeCommitRef, releaseStore), e -> {
                    try {
                        IOUtils.close(safeCommitRef, releaseStore);
                    } catch (final IOException ex) {
                        logger.warn("releasing snapshot caused exception", ex);
                    }
                });
                final StepListener<ReplicationResponse> deleteRetentionLeaseStep = new StepListener<>();
                runUnderPrimaryPermit(() -> {
                    try {
                        // If the target previously had a copy of this shard then a file-based recovery might move its global
                        // checkpoint backwards. We must therefore remove any existing retention lease so that we can create a
                        // new one later on in the recovery.
                        shard.removePeerRecoveryRetentionLease(request.targetNode().getId(), new ThreadedActionListener<>(logger, shard.getThreadPool(), ThreadPool.Names.GENERIC, deleteRetentionLeaseStep, false));
                    } catch (RetentionLeaseNotFoundException e) {
                        logger.debug("no peer-recovery retention lease for " + request.targetAllocationId());
                        deleteRetentionLeaseStep.onResponse(null);
                    }
                }, shardId + " removing retention lease for [" + request.targetAllocationId() + "]", shard, cancellableThreads, logger);
                deleteRetentionLeaseStep.whenComplete(ignored -> {
                    assert Transports.assertNotTransportThread(RecoverySourceHandler.this + "[phase1]");
                    phase1(safeCommitRef.getIndexCommit(), startingSeqNo, () -> estimateNumOps, sendFileStep);
                }, onFailure);
            } catch (final Exception e) {
                throw new RecoveryEngineException(shard.shardId(), 1, "sendFileStep failed", e);
            }
        }
        assert startingSeqNo >= 0 : "startingSeqNo must be non negative. got: " + startingSeqNo;
        sendFileStep.whenComplete(r -> {
            assert Transports.assertNotTransportThread(RecoverySourceHandler.this + "[prepareTargetForTranslog]");
            // For a sequence based recovery, the target can keep its local translog
            prepareTargetForTranslog(estimateNumberOfHistoryOperations(startingSeqNo), prepareEngineStep);
        }, onFailure);
        prepareEngineStep.whenComplete(prepareEngineTime -> {
            assert Transports.assertNotTransportThread(RecoverySourceHandler.this + "[phase2]");
            /*
                 * add shard to replication group (shard will receive replication requests from this point on) now that engine is open.
                 * This means that any document indexed into the primary after this will be replicated to this replica as well
                 * make sure to do this before sampling the max sequence number in the next step, to ensure that we send
                 * all documents up to maxSeqNo in phase2.
                 */
            runUnderPrimaryPermit(() -> shard.initiateTracking(request.targetAllocationId()), shardId + " initiating tracking of " + request.targetAllocationId(), shard, cancellableThreads, logger);
            final long endingSeqNo = shard.seqNoStats().getMaxSeqNo();
            if (logger.isTraceEnabled()) {
                logger.trace("snapshot translog for recovery; current size is [{}]", estimateNumberOfHistoryOperations(startingSeqNo));
            }
            final Translog.Snapshot phase2Snapshot = shard.newChangesSnapshot(PEER_RECOVERY_NAME, startingSeqNo, Long.MAX_VALUE, false);
            resources.add(phase2Snapshot);
            retentionLock.close();
            // we have to capture the max_seen_auto_id_timestamp and the max_seq_no_of_updates to make sure that these values
            // are at least as high as the corresponding values on the primary when any of these operations were executed on it.
            final long maxSeenAutoIdTimestamp = shard.getMaxSeenAutoIdTimestamp();
            final long maxSeqNoOfUpdatesOrDeletes = shard.getMaxSeqNoOfUpdatesOrDeletes();
            final RetentionLeases retentionLeases = shard.getRetentionLeases();
            final long mappingVersionOnPrimary = shard.indexSettings().getIndexMetadata().getMappingVersion();
            phase2(startingSeqNo, endingSeqNo, phase2Snapshot, maxSeenAutoIdTimestamp, maxSeqNoOfUpdatesOrDeletes, retentionLeases, mappingVersionOnPrimary, sendSnapshotStep);
        }, onFailure);
        // Recovery target can trim all operations >= startingSeqNo as we have sent all these operations in the phase 2
        final long trimAboveSeqNo = startingSeqNo - 1;
        sendSnapshotStep.whenComplete(r -> finalizeRecovery(r.targetLocalCheckpoint, trimAboveSeqNo, finalizeStep), onFailure);
        finalizeStep.whenComplete(r -> {
            // TODO: return the actual throttle time
            final long phase1ThrottlingWaitTime = 0L;
            final SendSnapshotResult sendSnapshotResult = sendSnapshotStep.result();
            final SendFileResult sendFileResult = sendFileStep.result();
            final RecoveryResponse response = new RecoveryResponse(sendFileResult.phase1FileNames, sendFileResult.phase1FileSizes, sendFileResult.phase1ExistingFileNames, sendFileResult.phase1ExistingFileSizes, sendFileResult.totalSize, sendFileResult.existingTotalSize, sendFileResult.took.millis(), phase1ThrottlingWaitTime, prepareEngineStep.result().millis(), sendSnapshotResult.sentOperations, sendSnapshotResult.tookTime.millis());
            try {
                future.onResponse(response);
            } finally {
                IOUtils.close(resources);
            }
        }, onFailure);
    } catch (Exception e) {
        IOUtils.closeWhileHandlingException(releaseResources, () -> future.onFailure(e));
    }
}
Also used : SequenceNumbers(org.opensearch.index.seqno.SequenceNumbers) Arrays(java.util.Arrays) IndexFormatTooNewException(org.apache.lucene.index.IndexFormatTooNewException) RecoveryEngineException(org.opensearch.index.engine.RecoveryEngineException) FutureUtils(org.opensearch.common.util.concurrent.FutureUtils) Releasables(org.opensearch.common.lease.Releasables) CorruptIndexException(org.apache.lucene.index.CorruptIndexException) PlainActionFuture(org.opensearch.action.support.PlainActionFuture) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) Locale(java.util.Locale) ActionListener(org.opensearch.action.ActionListener) IOContext(org.apache.lucene.store.IOContext) IndexShardRoutingTable(org.opensearch.cluster.routing.IndexShardRoutingTable) TimeValue(org.opensearch.common.unit.TimeValue) RemoteTransportException(org.opensearch.transport.RemoteTransportException) ExceptionsHelper(org.opensearch.ExceptionsHelper) ReplicationTracker(org.opensearch.index.seqno.ReplicationTracker) IndexShardClosedException(org.opensearch.index.shard.IndexShardClosedException) Store(org.opensearch.index.store.Store) Engine(org.opensearch.index.engine.Engine) List(java.util.List) Logger(org.apache.logging.log4j.Logger) BytesArray(org.opensearch.common.bytes.BytesArray) CheckedRunnable(org.opensearch.common.CheckedRunnable) ReplicationResponse(org.opensearch.action.support.replication.ReplicationResponse) StepListener(org.opensearch.action.StepListener) ListenableFuture(org.opensearch.common.util.concurrent.ListenableFuture) CopyOnWriteArrayList(java.util.concurrent.CopyOnWriteArrayList) IndexCommit(org.apache.lucene.index.IndexCommit) CancellableThreads(org.opensearch.common.util.CancellableThreads) IndexShardState(org.opensearch.index.shard.IndexShardState) BytesReference(org.opensearch.common.bytes.BytesReference) ActionRunnable(org.opensearch.action.ActionRunnable) ThreadPool(org.opensearch.threadpool.ThreadPool) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) ByteSizeValue(org.opensearch.common.unit.ByteSizeValue) CompletableFuture(java.util.concurrent.CompletableFuture) Releasable(org.opensearch.common.lease.Releasable) ThreadedActionListener(org.opensearch.action.support.ThreadedActionListener) Deque(java.util.Deque) ParameterizedMessage(org.apache.logging.log4j.message.ParameterizedMessage) OpenSearchExecutors(org.opensearch.common.util.concurrent.OpenSearchExecutors) ArrayList(java.util.ArrayList) IndexShard(org.opensearch.index.shard.IndexShard) Loggers(org.opensearch.common.logging.Loggers) LegacyESVersion(org.opensearch.LegacyESVersion) Translog(org.opensearch.index.translog.Translog) StreamSupport(java.util.stream.StreamSupport) StoreFileMetadata(org.opensearch.index.store.StoreFileMetadata) IntSupplier(java.util.function.IntSupplier) RetentionLease(org.opensearch.index.seqno.RetentionLease) ArrayUtil(org.apache.lucene.util.ArrayUtil) StopWatch(org.opensearch.common.StopWatch) InputStreamIndexInput(org.opensearch.common.lucene.store.InputStreamIndexInput) IndexInput(org.apache.lucene.store.IndexInput) SetOnce(org.apache.lucene.util.SetOnce) IOException(java.io.IOException) IndexFormatTooOldException(org.apache.lucene.index.IndexFormatTooOldException) IndexShardRelocatedException(org.opensearch.index.shard.IndexShardRelocatedException) ConcurrentLinkedDeque(java.util.concurrent.ConcurrentLinkedDeque) ShardRouting(org.opensearch.cluster.routing.ShardRouting) IOUtils(org.opensearch.core.internal.io.IOUtils) Consumer(java.util.function.Consumer) AtomicLong(java.util.concurrent.atomic.AtomicLong) Transports(org.opensearch.transport.Transports) RetentionLeases(org.opensearch.index.seqno.RetentionLeases) Closeable(java.io.Closeable) Comparator(java.util.Comparator) Collections(java.util.Collections) RateLimiter(org.apache.lucene.store.RateLimiter) RetentionLeaseNotFoundException(org.opensearch.index.seqno.RetentionLeaseNotFoundException) IndexShardRoutingTable(org.opensearch.cluster.routing.IndexShardRoutingTable) Closeable(java.io.Closeable) ReplicationResponse(org.opensearch.action.support.replication.ReplicationResponse) Translog(org.opensearch.index.translog.Translog) RecoveryEngineException(org.opensearch.index.engine.RecoveryEngineException) TimeValue(org.opensearch.common.unit.TimeValue) Engine(org.opensearch.index.engine.Engine) CancellableThreads(org.opensearch.common.util.CancellableThreads) SetOnce(org.apache.lucene.util.SetOnce) IOException(java.io.IOException) IndexFormatTooNewException(org.apache.lucene.index.IndexFormatTooNewException) RecoveryEngineException(org.opensearch.index.engine.RecoveryEngineException) CorruptIndexException(org.apache.lucene.index.CorruptIndexException) RemoteTransportException(org.opensearch.transport.RemoteTransportException) IndexShardClosedException(org.opensearch.index.shard.IndexShardClosedException) IOException(java.io.IOException) IndexFormatTooOldException(org.apache.lucene.index.IndexFormatTooOldException) IndexShardRelocatedException(org.opensearch.index.shard.IndexShardRelocatedException) RetentionLeaseNotFoundException(org.opensearch.index.seqno.RetentionLeaseNotFoundException) RetentionLeases(org.opensearch.index.seqno.RetentionLeases) RetentionLeaseNotFoundException(org.opensearch.index.seqno.RetentionLeaseNotFoundException) IndexShardClosedException(org.opensearch.index.shard.IndexShardClosedException) RetentionLease(org.opensearch.index.seqno.RetentionLease) StepListener(org.opensearch.action.StepListener) Releasable(org.opensearch.common.lease.Releasable) ShardRouting(org.opensearch.cluster.routing.ShardRouting)

Example 97 with TimeValue

use of org.opensearch.common.unit.TimeValue in project OpenSearch by opensearch-project.

the class JvmStats method toXContent.

@Override
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
    builder.startObject(Fields.JVM);
    builder.field(Fields.TIMESTAMP, timestamp);
    builder.humanReadableField(Fields.UPTIME_IN_MILLIS, Fields.UPTIME, new TimeValue(uptime));
    builder.startObject(Fields.MEM);
    builder.humanReadableField(Fields.HEAP_USED_IN_BYTES, Fields.HEAP_USED, new ByteSizeValue(mem.heapUsed));
    if (mem.getHeapUsedPercent() >= 0) {
        builder.field(Fields.HEAP_USED_PERCENT, mem.getHeapUsedPercent());
    }
    builder.humanReadableField(Fields.HEAP_COMMITTED_IN_BYTES, Fields.HEAP_COMMITTED, new ByteSizeValue(mem.heapCommitted));
    builder.humanReadableField(Fields.HEAP_MAX_IN_BYTES, Fields.HEAP_MAX, new ByteSizeValue(mem.heapMax));
    builder.humanReadableField(Fields.NON_HEAP_USED_IN_BYTES, Fields.NON_HEAP_USED, new ByteSizeValue(mem.nonHeapUsed));
    builder.humanReadableField(Fields.NON_HEAP_COMMITTED_IN_BYTES, Fields.NON_HEAP_COMMITTED, new ByteSizeValue(mem.nonHeapCommitted));
    builder.startObject(Fields.POOLS);
    for (MemoryPool pool : mem) {
        builder.startObject(pool.getName());
        builder.humanReadableField(Fields.USED_IN_BYTES, Fields.USED, new ByteSizeValue(pool.used));
        builder.humanReadableField(Fields.MAX_IN_BYTES, Fields.MAX, new ByteSizeValue(pool.max));
        builder.humanReadableField(Fields.PEAK_USED_IN_BYTES, Fields.PEAK_USED, new ByteSizeValue(pool.peakUsed));
        builder.humanReadableField(Fields.PEAK_MAX_IN_BYTES, Fields.PEAK_MAX, new ByteSizeValue(pool.peakMax));
        builder.startObject(Fields.LAST_GC_STATS);
        builder.humanReadableField(Fields.USED_IN_BYTES, Fields.USED, new ByteSizeValue(pool.getLastGcStats().used));
        builder.humanReadableField(Fields.MAX_IN_BYTES, Fields.MAX, new ByteSizeValue(pool.getLastGcStats().max));
        builder.field(Fields.USAGE_PERCENT, pool.getLastGcStats().getUsagePercent());
        builder.endObject();
        builder.endObject();
    }
    builder.endObject();
    builder.endObject();
    builder.startObject(Fields.THREADS);
    builder.field(Fields.COUNT, threads.getCount());
    builder.field(Fields.PEAK_COUNT, threads.getPeakCount());
    builder.endObject();
    builder.startObject(Fields.GC);
    builder.startObject(Fields.COLLECTORS);
    for (GarbageCollector collector : gc) {
        builder.startObject(collector.getName());
        builder.field(Fields.COLLECTION_COUNT, collector.getCollectionCount());
        builder.humanReadableField(Fields.COLLECTION_TIME_IN_MILLIS, Fields.COLLECTION_TIME, new TimeValue(collector.collectionTime));
        builder.endObject();
    }
    builder.endObject();
    builder.endObject();
    if (bufferPools != null) {
        builder.startObject(Fields.BUFFER_POOLS);
        for (BufferPool bufferPool : bufferPools) {
            builder.startObject(bufferPool.getName());
            builder.field(Fields.COUNT, bufferPool.getCount());
            builder.humanReadableField(Fields.USED_IN_BYTES, Fields.USED, new ByteSizeValue(bufferPool.used));
            builder.humanReadableField(Fields.TOTAL_CAPACITY_IN_BYTES, Fields.TOTAL_CAPACITY, new ByteSizeValue(bufferPool.totalCapacity));
            builder.endObject();
        }
        builder.endObject();
    }
    builder.startObject(Fields.CLASSES);
    builder.field(Fields.CURRENT_LOADED_COUNT, classes.getLoadedClassCount());
    builder.field(Fields.TOTAL_LOADED_COUNT, classes.getTotalLoadedClassCount());
    builder.field(Fields.TOTAL_UNLOADED_COUNT, classes.getUnloadedClassCount());
    builder.endObject();
    builder.endObject();
    return builder;
}
Also used : ByteSizeValue(org.opensearch.common.unit.ByteSizeValue) TimeValue(org.opensearch.common.unit.TimeValue)

Example 98 with TimeValue

use of org.opensearch.common.unit.TimeValue in project OpenSearch by opensearch-project.

the class PersistentTasksService method waitForPersistentTaskCondition.

/**
 * Waits for a given persistent task to comply with a given predicate, then call back the listener accordingly.
 *
 * @param taskId the persistent task id
 * @param predicate the persistent task predicate to evaluate
 * @param timeout a timeout for waiting
 * @param listener the callback listener
 */
public void waitForPersistentTaskCondition(final String taskId, final Predicate<PersistentTask<?>> predicate, @Nullable final TimeValue timeout, final WaitForPersistentTaskListener<?> listener) {
    final Predicate<ClusterState> clusterStatePredicate = clusterState -> predicate.test(PersistentTasksCustomMetadata.getTaskWithId(clusterState, taskId));
    final ClusterStateObserver observer = new ClusterStateObserver(clusterService, timeout, logger, threadPool.getThreadContext());
    final ClusterState clusterState = observer.setAndGetObservedState();
    if (clusterStatePredicate.test(clusterState)) {
        listener.onResponse(PersistentTasksCustomMetadata.getTaskWithId(clusterState, taskId));
    } else {
        observer.waitForNextChange(new ClusterStateObserver.Listener() {

            @Override
            public void onNewClusterState(ClusterState state) {
                listener.onResponse(PersistentTasksCustomMetadata.getTaskWithId(state, taskId));
            }

            @Override
            public void onClusterServiceClose() {
                listener.onFailure(new NodeClosedException(clusterService.localNode()));
            }

            @Override
            public void onTimeout(TimeValue timeout) {
                listener.onTimeout(timeout);
            }
        }, clusterStatePredicate);
    }
}
Also used : Client(org.opensearch.client.Client) TimeValue(org.opensearch.common.unit.TimeValue) Predicate(java.util.function.Predicate) ThreadPool(org.opensearch.threadpool.ThreadPool) TaskId(org.opensearch.tasks.TaskId) ActionRequest(org.opensearch.action.ActionRequest) Nullable(org.opensearch.common.Nullable) ClusterState(org.opensearch.cluster.ClusterState) Logger(org.apache.logging.log4j.Logger) CancelTasksRequest(org.opensearch.action.admin.cluster.node.tasks.cancel.CancelTasksRequest) ClusterService(org.opensearch.cluster.service.ClusterService) NodeClosedException(org.opensearch.node.NodeClosedException) ActionType(org.opensearch.action.ActionType) ActionListener(org.opensearch.action.ActionListener) ClusterStateObserver(org.opensearch.cluster.ClusterStateObserver) LogManager(org.apache.logging.log4j.LogManager) CancelTasksResponse(org.opensearch.action.admin.cluster.node.tasks.cancel.CancelTasksResponse) OriginSettingClient(org.opensearch.client.OriginSettingClient) PersistentTask(org.opensearch.persistent.PersistentTasksCustomMetadata.PersistentTask) ClusterState(org.opensearch.cluster.ClusterState) ClusterStateObserver(org.opensearch.cluster.ClusterStateObserver) NodeClosedException(org.opensearch.node.NodeClosedException) TimeValue(org.opensearch.common.unit.TimeValue)

Example 99 with TimeValue

use of org.opensearch.common.unit.TimeValue in project OpenSearch by opensearch-project.

the class Node method start.

/**
 * Start the node. If the node is already started, this method is no-op.
 */
public Node start() throws NodeValidationException {
    if (!lifecycle.moveToStarted()) {
        return this;
    }
    logger.info("starting ...");
    pluginLifecycleComponents.forEach(LifecycleComponent::start);
    injector.getInstance(MappingUpdatedAction.class).setClient(client);
    injector.getInstance(IndicesService.class).start();
    injector.getInstance(IndicesClusterStateService.class).start();
    injector.getInstance(SnapshotsService.class).start();
    injector.getInstance(SnapshotShardsService.class).start();
    injector.getInstance(RepositoriesService.class).start();
    injector.getInstance(SearchService.class).start();
    injector.getInstance(FsHealthService.class).start();
    nodeService.getMonitorService().start();
    final ClusterService clusterService = injector.getInstance(ClusterService.class);
    final NodeConnectionsService nodeConnectionsService = injector.getInstance(NodeConnectionsService.class);
    nodeConnectionsService.start();
    clusterService.setNodeConnectionsService(nodeConnectionsService);
    injector.getInstance(GatewayService.class).start();
    Discovery discovery = injector.getInstance(Discovery.class);
    clusterService.getMasterService().setClusterStatePublisher(discovery::publish);
    // Start the transport service now so the publish address will be added to the local disco node in ClusterService
    TransportService transportService = injector.getInstance(TransportService.class);
    transportService.getTaskManager().setTaskResultsService(injector.getInstance(TaskResultsService.class));
    transportService.getTaskManager().setTaskCancellationService(new TaskCancellationService(transportService));
    transportService.start();
    assert localNodeFactory.getNode() != null;
    assert transportService.getLocalNode().equals(localNodeFactory.getNode()) : "transportService has a different local node than the factory provided";
    injector.getInstance(PeerRecoverySourceService.class).start();
    // Load (and maybe upgrade) the metadata stored on disk
    final GatewayMetaState gatewayMetaState = injector.getInstance(GatewayMetaState.class);
    gatewayMetaState.start(settings(), transportService, clusterService, injector.getInstance(MetaStateService.class), injector.getInstance(MetadataIndexUpgradeService.class), injector.getInstance(MetadataUpgrader.class), injector.getInstance(PersistedClusterStateService.class));
    if (Assertions.ENABLED) {
        try {
            assert injector.getInstance(MetaStateService.class).loadFullState().v1().isEmpty();
            final NodeMetadata nodeMetadata = NodeMetadata.FORMAT.loadLatestState(logger, NamedXContentRegistry.EMPTY, nodeEnvironment.nodeDataPaths());
            assert nodeMetadata != null;
            assert nodeMetadata.nodeVersion().equals(Version.CURRENT);
            assert nodeMetadata.nodeId().equals(localNodeFactory.getNode().getId());
        } catch (IOException e) {
            assert false : e;
        }
    }
    // we load the global state here (the persistent part of the cluster state stored on disk) to
    // pass it to the bootstrap checks to allow plugins to enforce certain preconditions based on the recovered state.
    final Metadata onDiskMetadata = gatewayMetaState.getPersistedState().getLastAcceptedState().metadata();
    // this is never null
    assert onDiskMetadata != null : "metadata is null but shouldn't";
    validateNodeBeforeAcceptingRequests(new BootstrapContext(environment, onDiskMetadata), transportService.boundAddress(), pluginsService.filterPlugins(Plugin.class).stream().flatMap(p -> p.getBootstrapChecks().stream()).collect(Collectors.toList()));
    clusterService.addStateApplier(transportService.getTaskManager());
    // start after transport service so the local disco is known
    // start before cluster service so that it can set initial state on ClusterApplierService
    discovery.start();
    clusterService.start();
    assert clusterService.localNode().equals(localNodeFactory.getNode()) : "clusterService has a different local node than the factory provided";
    transportService.acceptIncomingRequests();
    discovery.startInitialJoin();
    final TimeValue initialStateTimeout = DiscoverySettings.INITIAL_STATE_TIMEOUT_SETTING.get(settings());
    configureNodeAndClusterIdStateListener(clusterService);
    if (initialStateTimeout.millis() > 0) {
        final ThreadPool thread = injector.getInstance(ThreadPool.class);
        ClusterState clusterState = clusterService.state();
        ClusterStateObserver observer = new ClusterStateObserver(clusterState, clusterService, null, logger, thread.getThreadContext());
        if (clusterState.nodes().getMasterNodeId() == null) {
            logger.debug("waiting to join the cluster. timeout [{}]", initialStateTimeout);
            final CountDownLatch latch = new CountDownLatch(1);
            observer.waitForNextChange(new ClusterStateObserver.Listener() {

                @Override
                public void onNewClusterState(ClusterState state) {
                    latch.countDown();
                }

                @Override
                public void onClusterServiceClose() {
                    latch.countDown();
                }

                @Override
                public void onTimeout(TimeValue timeout) {
                    logger.warn("timed out while waiting for initial discovery state - timeout: {}", initialStateTimeout);
                    latch.countDown();
                }
            }, state -> state.nodes().getMasterNodeId() != null, initialStateTimeout);
            try {
                latch.await();
            } catch (InterruptedException e) {
                throw new OpenSearchTimeoutException("Interrupted while waiting for initial discovery state");
            }
        }
    }
    injector.getInstance(HttpServerTransport.class).start();
    if (WRITE_PORTS_FILE_SETTING.get(settings())) {
        TransportService transport = injector.getInstance(TransportService.class);
        writePortsFile("transport", transport.boundAddress());
        HttpServerTransport http = injector.getInstance(HttpServerTransport.class);
        writePortsFile("http", http.boundAddress());
    }
    logger.info("started");
    pluginsService.filterPlugins(ClusterPlugin.class).forEach(ClusterPlugin::onNodeStarted);
    return this;
}
Also used : OpenSearchTimeoutException(org.opensearch.OpenSearchTimeoutException) SnapshotsService(org.opensearch.snapshots.SnapshotsService) SnapshotShardsService(org.opensearch.snapshots.SnapshotShardsService) NodeConnectionsService(org.opensearch.cluster.NodeConnectionsService) Metadata(org.opensearch.cluster.metadata.Metadata) NodeMetadata(org.opensearch.env.NodeMetadata) IndexTemplateMetadata(org.opensearch.cluster.metadata.IndexTemplateMetadata) ThreadPool(org.opensearch.threadpool.ThreadPool) MetadataUpgrader(org.opensearch.plugins.MetadataUpgrader) BootstrapContext(org.opensearch.bootstrap.BootstrapContext) TaskResultsService(org.opensearch.tasks.TaskResultsService) HttpServerTransport(org.opensearch.http.HttpServerTransport) GatewayMetaState(org.opensearch.gateway.GatewayMetaState) MetaStateService(org.opensearch.gateway.MetaStateService) IndicesClusterStateService(org.opensearch.indices.cluster.IndicesClusterStateService) LifecycleComponent(org.opensearch.common.component.LifecycleComponent) SearchService(org.opensearch.search.SearchService) PeerRecoverySourceService(org.opensearch.indices.recovery.PeerRecoverySourceService) TimeValue(org.opensearch.common.unit.TimeValue) FsHealthService(org.opensearch.monitor.fs.FsHealthService) ClusterState(org.opensearch.cluster.ClusterState) ClusterStateObserver(org.opensearch.cluster.ClusterStateObserver) ClusterPlugin(org.opensearch.plugins.ClusterPlugin) Discovery(org.opensearch.discovery.Discovery) IndicesService(org.opensearch.indices.IndicesService) MetadataIndexUpgradeService(org.opensearch.cluster.metadata.MetadataIndexUpgradeService) IOException(java.io.IOException) CountDownLatch(java.util.concurrent.CountDownLatch) GatewayService(org.opensearch.gateway.GatewayService) NodeMetadata(org.opensearch.env.NodeMetadata) ClusterService(org.opensearch.cluster.service.ClusterService) PersistentTasksClusterService(org.opensearch.persistent.PersistentTasksClusterService) RemoteClusterService(org.opensearch.transport.RemoteClusterService) TransportService(org.opensearch.transport.TransportService) SearchTransportService(org.opensearch.action.search.SearchTransportService) RepositoriesService(org.opensearch.repositories.RepositoriesService) MappingUpdatedAction(org.opensearch.cluster.action.index.MappingUpdatedAction) PersistedClusterStateService(org.opensearch.gateway.PersistedClusterStateService) TaskCancellationService(org.opensearch.tasks.TaskCancellationService)

Example 100 with TimeValue

use of org.opensearch.common.unit.TimeValue in project OpenSearch by opensearch-project.

the class OsInfo method toXContent.

@Override
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
    builder.startObject(Fields.OS);
    builder.humanReadableField(Fields.REFRESH_INTERVAL_IN_MILLIS, Fields.REFRESH_INTERVAL, new TimeValue(refreshInterval));
    if (name != null) {
        builder.field(Fields.NAME, name);
    }
    if (prettyName != null) {
        builder.field(Fields.PRETTY_NAME, prettyName);
    }
    if (arch != null) {
        builder.field(Fields.ARCH, arch);
    }
    if (version != null) {
        builder.field(Fields.VERSION, version);
    }
    builder.field(Fields.AVAILABLE_PROCESSORS, availableProcessors);
    builder.field(Fields.ALLOCATED_PROCESSORS, allocatedProcessors);
    builder.endObject();
    return builder;
}
Also used : TimeValue(org.opensearch.common.unit.TimeValue)

Aggregations

TimeValue (org.opensearch.common.unit.TimeValue)260 IOException (java.io.IOException)47 ClusterState (org.opensearch.cluster.ClusterState)43 Settings (org.opensearch.common.settings.Settings)38 ArrayList (java.util.ArrayList)35 List (java.util.List)31 Matchers.containsString (org.hamcrest.Matchers.containsString)31 ThreadPool (org.opensearch.threadpool.ThreadPool)31 CountDownLatch (java.util.concurrent.CountDownLatch)30 ParameterizedMessage (org.apache.logging.log4j.message.ParameterizedMessage)30 ActionListener (org.opensearch.action.ActionListener)29 Map (java.util.Map)28 Logger (org.apache.logging.log4j.Logger)25 AtomicBoolean (java.util.concurrent.atomic.AtomicBoolean)23 SearchResponse (org.opensearch.action.search.SearchResponse)22 HashMap (java.util.HashMap)20 LogManager (org.apache.logging.log4j.LogManager)20 TimeUnit (java.util.concurrent.TimeUnit)19 ClusterStateUpdateTask (org.opensearch.cluster.ClusterStateUpdateTask)19 DiscoveryNode (org.opensearch.cluster.node.DiscoveryNode)19