Search in sources :

Example 21 with TimeOut

use of org.apache.solr.util.TimeOut in project lucene-solr by apache.

the class TestPullReplicaErrorHandling method waitForDeletion.

private void waitForDeletion(String collection) throws InterruptedException, KeeperException {
    TimeOut t = new TimeOut(10, TimeUnit.SECONDS);
    while (cluster.getSolrClient().getZkStateReader().getClusterState().hasCollection(collection)) {
        LOG.info("Collection not yet deleted");
        try {
            Thread.sleep(100);
            if (t.hasTimedOut()) {
                fail("Timed out waiting for collection " + collection + " to be deleted.");
            }
            cluster.getSolrClient().getZkStateReader().forceUpdateCollection(collection);
        } catch (SolrException e) {
            return;
        }
    }
}
Also used : TimeOut(org.apache.solr.util.TimeOut) SolrException(org.apache.solr.common.SolrException)

Example 22 with TimeOut

use of org.apache.solr.util.TimeOut in project lucene-solr by apache.

the class ZkTestServer method waitForServerDown.

public static boolean waitForServerDown(String hp, long timeoutMs) {
    final TimeOut timeout = new TimeOut(timeoutMs, TimeUnit.MILLISECONDS);
    while (true) {
        try {
            HostPort hpobj = parseHostPortList(hp).get(0);
            send4LetterWord(hpobj.host, hpobj.port, "stat");
        } catch (IOException e) {
            return true;
        }
        if (timeout.hasTimedOut()) {
            break;
        }
        try {
            Thread.sleep(250);
        } catch (InterruptedException e) {
        // ignore
        }
    }
    return false;
}
Also used : TimeOut(org.apache.solr.util.TimeOut) IOException(java.io.IOException)

Example 23 with TimeOut

use of org.apache.solr.util.TimeOut in project lucene-solr by apache.

the class ChaosMonkeyNothingIsSafeWithPullReplicasTest method test.

@Test
public void test() throws Exception {
    cloudClient.setSoTimeout(clientSoTimeout);
    DocCollection docCollection = cloudClient.getZkStateReader().getClusterState().getCollection(DEFAULT_COLLECTION);
    assertEquals(this.sliceCount, docCollection.getSlices().size());
    Slice s = docCollection.getSlice("shard1");
    assertNotNull(s);
    assertEquals("Unexpected number of replicas. Collection: " + docCollection, numRealtimeOrTlogReplicas + numPullReplicas, s.getReplicas().size());
    assertEquals("Unexpected number of pull replicas. Collection: " + docCollection, numPullReplicas, s.getReplicas(EnumSet.of(Replica.Type.PULL)).size());
    assertEquals(useTlogReplicas() ? 0 : numRealtimeOrTlogReplicas, s.getReplicas(EnumSet.of(Replica.Type.NRT)).size());
    assertEquals(useTlogReplicas() ? numRealtimeOrTlogReplicas : 0, s.getReplicas(EnumSet.of(Replica.Type.TLOG)).size());
    boolean testSuccessful = false;
    try {
        handle.clear();
        handle.put("timestamp", SKIPVAL);
        ZkStateReader zkStateReader = cloudClient.getZkStateReader();
        // make sure we have leaders for each shard
        for (int j = 1; j < sliceCount; j++) {
            zkStateReader.getLeaderRetry(DEFAULT_COLLECTION, "shard" + j, 10000);
        }
        // make sure we again have leaders for each shard
        waitForRecoveriesToFinish(false);
        // we cannot do delete by query
        // as it's not supported for recovery
        del("*:*");
        List<StoppableThread> threads = new ArrayList<>();
        List<StoppableIndexingThread> indexTreads = new ArrayList<>();
        int threadCount = TEST_NIGHTLY ? 3 : 1;
        int i = 0;
        for (i = 0; i < threadCount; i++) {
            StoppableIndexingThread indexThread = new StoppableIndexingThread(controlClient, cloudClient, Integer.toString(i), true);
            threads.add(indexThread);
            indexTreads.add(indexThread);
            indexThread.start();
        }
        threadCount = 1;
        i = 0;
        for (i = 0; i < threadCount; i++) {
            StoppableSearchThread searchThread = new StoppableSearchThread(cloudClient);
            threads.add(searchThread);
            searchThread.start();
        }
        if (usually()) {
            StoppableCommitThread commitThread = new StoppableCommitThread(cloudClient, 1000, false);
            threads.add(commitThread);
            commitThread.start();
        }
        // TODO: we only do this sometimes so that we can sometimes compare against control,
        // it's currently hard to know what requests failed when using ConcurrentSolrUpdateServer
        boolean runFullThrottle = random().nextBoolean();
        if (runFullThrottle) {
            FullThrottleStoppableIndexingThread ftIndexThread = new FullThrottleStoppableIndexingThread(controlClient, cloudClient, clients, "ft1", true, this.clientSoTimeout);
            threads.add(ftIndexThread);
            ftIndexThread.start();
        }
        chaosMonkey.startTheMonkey(true, 10000);
        try {
            long runLength;
            if (RUN_LENGTH != -1) {
                runLength = RUN_LENGTH;
            } else {
                int[] runTimes;
                if (TEST_NIGHTLY) {
                    runTimes = new int[] { 5000, 6000, 10000, 15000, 25000, 30000, 30000, 45000, 90000, 120000 };
                } else {
                    runTimes = new int[] { 5000, 7000, 15000 };
                }
                runLength = runTimes[random().nextInt(runTimes.length - 1)];
            }
            ChaosMonkey.wait(runLength, DEFAULT_COLLECTION, zkStateReader);
        } finally {
            chaosMonkey.stopTheMonkey();
        }
        // ideally this should go into chaosMonkey
        restartZk(1000 * (5 + random().nextInt(4)));
        for (StoppableThread indexThread : threads) {
            indexThread.safeStop();
        }
        // wait for stop...
        for (StoppableThread indexThread : threads) {
            indexThread.join();
        }
        // try and wait for any replications and what not to finish...
        ChaosMonkey.wait(2000, DEFAULT_COLLECTION, zkStateReader);
        // wait until there are no recoveries...
        //Math.round((runLength / 1000.0f / 3.0f)));
        waitForThingsToLevelOut(Integer.MAX_VALUE);
        // make sure we again have leaders for each shard
        for (int j = 1; j < sliceCount; j++) {
            zkStateReader.getLeaderRetry(DEFAULT_COLLECTION, "shard" + j, 30000);
        }
        commit();
        // TODO: assert we didnt kill everyone
        zkStateReader.updateLiveNodes();
        assertTrue(zkStateReader.getClusterState().getLiveNodes().size() > 0);
        // we expect full throttle fails, but cloud client should not easily fail
        for (StoppableThread indexThread : threads) {
            if (indexThread instanceof StoppableIndexingThread && !(indexThread instanceof FullThrottleStoppableIndexingThread)) {
                int failCount = ((StoppableIndexingThread) indexThread).getFailCount();
                assertFalse("There were too many update fails (" + failCount + " > " + FAIL_TOLERANCE + ") - we expect it can happen, but shouldn't easily", failCount > FAIL_TOLERANCE);
            }
        }
        waitForReplicationFromReplicas(DEFAULT_COLLECTION, zkStateReader, new TimeOut(30, TimeUnit.SECONDS));
        //      waitForAllWarmingSearchers();
        Set<String> addFails = getAddFails(indexTreads);
        Set<String> deleteFails = getDeleteFails(indexTreads);
        // full throttle thread can
        // have request fails
        checkShardConsistency(!runFullThrottle, true, addFails, deleteFails);
        long ctrlDocs = controlClient.query(new SolrQuery("*:*")).getResults().getNumFound();
        // ensure we have added more than 0 docs
        long cloudClientDocs = cloudClient.query(new SolrQuery("*:*")).getResults().getNumFound();
        assertTrue("Found " + ctrlDocs + " control docs", cloudClientDocs > 0);
        if (VERBOSE)
            System.out.println("control docs:" + controlClient.query(new SolrQuery("*:*")).getResults().getNumFound() + "\n\n");
        // sometimes we restart zookeeper as well
        if (random().nextBoolean()) {
            restartZk(1000 * (5 + random().nextInt(4)));
        }
        try (CloudSolrClient client = createCloudClient("collection1")) {
            // We don't really know how many live nodes we have at this point, so "maxShardsPerNode" needs to be > 1
            createCollection(null, "testcollection", 1, 1, 10, client, null, "conf1");
        }
        List<Integer> numShardsNumReplicas = new ArrayList<>(2);
        numShardsNumReplicas.add(1);
        numShardsNumReplicas.add(1 + getPullReplicaCount());
        checkForCollection("testcollection", numShardsNumReplicas, null);
        testSuccessful = true;
    } finally {
        if (!testSuccessful) {
            logReplicaTypesReplicationInfo(DEFAULT_COLLECTION, cloudClient.getZkStateReader());
            printLayout();
        }
    }
}
Also used : TimeOut(org.apache.solr.util.TimeOut) ArrayList(java.util.ArrayList) SolrQuery(org.apache.solr.client.solrj.SolrQuery) CloudSolrClient(org.apache.solr.client.solrj.impl.CloudSolrClient) ZkStateReader(org.apache.solr.common.cloud.ZkStateReader) Slice(org.apache.solr.common.cloud.Slice) DocCollection(org.apache.solr.common.cloud.DocCollection) Test(org.junit.Test)

Example 24 with TimeOut

use of org.apache.solr.util.TimeOut in project lucene-solr by apache.

the class ChaosMonkeySafeLeaderWithPullReplicasTest method test.

@Test
public void test() throws Exception {
    DocCollection docCollection = cloudClient.getZkStateReader().getClusterState().getCollection(DEFAULT_COLLECTION);
    assertEquals(this.sliceCount, docCollection.getSlices().size());
    Slice s = docCollection.getSlice("shard1");
    assertNotNull(s);
    assertEquals("Unexpected number of replicas. Collection: " + docCollection, numRealtimeOrTlogReplicas + numPullReplicas, s.getReplicas().size());
    assertEquals("Unexpected number of pull replicas. Collection: " + docCollection, numPullReplicas, s.getReplicas(EnumSet.of(Replica.Type.PULL)).size());
    assertEquals(useTlogReplicas() ? 0 : numRealtimeOrTlogReplicas, s.getReplicas(EnumSet.of(Replica.Type.NRT)).size());
    assertEquals(useTlogReplicas() ? numRealtimeOrTlogReplicas : 0, s.getReplicas(EnumSet.of(Replica.Type.TLOG)).size());
    handle.clear();
    handle.put("timestamp", SKIPVAL);
    // randomly turn on 1 seconds 'soft' commit
    randomlyEnableAutoSoftCommit();
    tryDelete();
    List<StoppableThread> threads = new ArrayList<>();
    int threadCount = 2;
    int batchSize = 1;
    if (random().nextBoolean()) {
        batchSize = random().nextInt(98) + 2;
    }
    boolean pauseBetweenUpdates = TEST_NIGHTLY ? random().nextBoolean() : true;
    int maxUpdates = -1;
    if (!pauseBetweenUpdates) {
        maxUpdates = 1000 + random().nextInt(1000);
    } else {
        maxUpdates = 15000;
    }
    for (int i = 0; i < threadCount; i++) {
        // random().nextInt(999) + 1
        StoppableIndexingThread indexThread = new StoppableIndexingThread(controlClient, cloudClient, Integer.toString(i), true, maxUpdates, batchSize, pauseBetweenUpdates);
        threads.add(indexThread);
        indexThread.start();
    }
    StoppableCommitThread commitThread = new StoppableCommitThread(cloudClient, 1000, false);
    threads.add(commitThread);
    commitThread.start();
    chaosMonkey.startTheMonkey(false, 500);
    try {
        long runLength;
        if (RUN_LENGTH != -1) {
            runLength = RUN_LENGTH;
        } else {
            int[] runTimes;
            if (TEST_NIGHTLY) {
                runTimes = new int[] { 5000, 6000, 10000, 15000, 25000, 30000, 30000, 45000, 90000, 120000 };
            } else {
                runTimes = new int[] { 5000, 7000, 15000 };
            }
            runLength = runTimes[random().nextInt(runTimes.length - 1)];
        }
        ChaosMonkey.wait(runLength, DEFAULT_COLLECTION, cloudClient.getZkStateReader());
    } finally {
        chaosMonkey.stopTheMonkey();
    }
    for (StoppableThread thread : threads) {
        thread.safeStop();
    }
    // wait for stop...
    for (StoppableThread thread : threads) {
        thread.join();
    }
    for (StoppableThread thread : threads) {
        if (thread instanceof StoppableIndexingThread) {
            assertEquals(0, ((StoppableIndexingThread) thread).getFailCount());
        }
    }
    // try and wait for any replications and what not to finish...
    Thread.sleep(2000);
    waitForThingsToLevelOut(180000);
    // even if things were leveled out, a jetty may have just been stopped or something
    // we wait again and wait to level out again to make sure the system is not still in flux
    Thread.sleep(3000);
    waitForThingsToLevelOut(180000);
    log.info("control docs:" + controlClient.query(new SolrQuery("*:*")).getResults().getNumFound() + "\n\n");
    waitForReplicationFromReplicas(DEFAULT_COLLECTION, cloudClient.getZkStateReader(), new TimeOut(30, TimeUnit.SECONDS));
    //    waitForAllWarmingSearchers();
    checkShardConsistency(batchSize == 1, true);
    // sometimes we restart zookeeper as well
    if (random().nextBoolean()) {
        zkServer.shutdown();
        zkServer = new ZkTestServer(zkServer.getZkDir(), zkServer.getPort());
        zkServer.run();
    }
    try (CloudSolrClient client = createCloudClient("collection1")) {
        createCollection(null, "testcollection", 1, 1, 100, client, null, "conf1");
    }
    List<Integer> numShardsNumReplicas = new ArrayList<>(2);
    numShardsNumReplicas.add(1);
    numShardsNumReplicas.add(1 + getPullReplicaCount());
    checkForCollection("testcollection", numShardsNumReplicas, null);
}
Also used : TimeOut(org.apache.solr.util.TimeOut) ArrayList(java.util.ArrayList) SolrQuery(org.apache.solr.client.solrj.SolrQuery) CloudSolrClient(org.apache.solr.client.solrj.impl.CloudSolrClient) Slice(org.apache.solr.common.cloud.Slice) DocCollection(org.apache.solr.common.cloud.DocCollection) Test(org.junit.Test)

Example 25 with TimeOut

use of org.apache.solr.util.TimeOut in project lucene-solr by apache.

the class TestTlogReplica method waitForDeletion.

private void waitForDeletion(String collection) throws InterruptedException, KeeperException {
    TimeOut t = new TimeOut(10, TimeUnit.SECONDS);
    while (cluster.getSolrClient().getZkStateReader().getClusterState().hasCollection(collection)) {
        try {
            Thread.sleep(100);
            if (t.hasTimedOut()) {
                fail("Timed out waiting for collection " + collection + " to be deleted.");
            }
            cluster.getSolrClient().getZkStateReader().forceUpdateCollection(collection);
        } catch (SolrException e) {
            return;
        }
    }
}
Also used : TimeOut(org.apache.solr.util.TimeOut) SolrException(org.apache.solr.common.SolrException)

Aggregations

TimeOut (org.apache.solr.util.TimeOut)48 SolrException (org.apache.solr.common.SolrException)15 Slice (org.apache.solr.common.cloud.Slice)15 DocCollection (org.apache.solr.common.cloud.DocCollection)14 Replica (org.apache.solr.common.cloud.Replica)13 SolrQuery (org.apache.solr.client.solrj.SolrQuery)11 ZkStateReader (org.apache.solr.common.cloud.ZkStateReader)8 ModifiableSolrParams (org.apache.solr.common.params.ModifiableSolrParams)8 HashMap (java.util.HashMap)7 Test (org.junit.Test)7 IOException (java.io.IOException)6 ArrayList (java.util.ArrayList)6 SolrInputDocument (org.apache.solr.common.SolrInputDocument)6 ZkNodeProps (org.apache.solr.common.cloud.ZkNodeProps)6 NamedList (org.apache.solr.common.util.NamedList)6 HttpSolrClient (org.apache.solr.client.solrj.impl.HttpSolrClient)5 Map (java.util.Map)4 SolrServerException (org.apache.solr.client.solrj.SolrServerException)4 Collections.singletonList (java.util.Collections.singletonList)3 HashSet (java.util.HashSet)3