Search in sources :

Example 6 with PartitionOfflineException

use of org.apache.geode.cache.persistence.PartitionOfflineException in project geode by apache.

the class PersistentColocatedPartitionedRegionDUnitTest method replaceOfflineMemberAndRestart.

/**
   * Test for support issue 7870. 1. Run three members with redundancy 1 and recovery delay 0 2.
   * Kill one of the members, to trigger replacement of buckets 3. Shutdown all members and restart.
   * 
   * What was happening is that in the parent PR, we discarded our offline data in one member, but
   * in the child PR the other members ended up waiting for the child bucket to be created in the
   * member that discarded it's offline data.
   * 
   * @throws Throwable
   */
public void replaceOfflineMemberAndRestart(SerializableRunnable createPRs) throws Throwable {
    disconnectAllFromDS();
    Host host = Host.getHost(0);
    VM vm0 = host.getVM(0);
    VM vm1 = host.getVM(1);
    VM vm2 = host.getVM(2);
    // Create the PR on three members
    vm0.invoke(createPRs);
    vm1.invoke(createPRs);
    vm2.invoke(createPRs);
    // Create some buckets.
    createData(vm0, 0, NUM_BUCKETS, "a");
    createData(vm0, 0, NUM_BUCKETS, "a", "region2");
    // Close one of the members to trigger redundancy recovery.
    closeCache(vm2);
    // Wait until redundancy is recovered.
    waitForRedundancyRecovery(vm0, 1, PR_REGION_NAME);
    waitForRedundancyRecovery(vm0, 1, "region2");
    createData(vm0, 0, NUM_BUCKETS, "b");
    createData(vm0, 0, NUM_BUCKETS, "b", "region2");
    IgnoredException expected = IgnoredException.addIgnoredException("PartitionOfflineException");
    try {
        // Close the remaining members.
        vm0.invoke(new SerializableCallable() {

            public Object call() throws Exception {
                InternalDistributedSystem ds = (InternalDistributedSystem) getCache().getDistributedSystem();
                AdminDistributedSystemImpl.shutDownAllMembers(ds.getDistributionManager(), 600000);
                return null;
            }
        });
        // Make sure that vm-1 is completely disconnected
        // The shutdown all asynchronously finishes the disconnect after
        // replying to the admin member.
        vm1.invoke(new SerializableRunnable() {

            public void run() {
                basicGetSystem().disconnect();
            }
        });
        // Recreate the members. Try to make sure that
        // the member with the latest copy of the buckets
        // is the one that decides to throw away it's copy
        // by starting it last.
        AsyncInvocation async0 = vm0.invokeAsync(createPRs);
        AsyncInvocation async1 = vm1.invokeAsync(createPRs);
        Wait.pause(2000);
        AsyncInvocation async2 = vm2.invokeAsync(createPRs);
        async0.getResult(MAX_WAIT);
        async1.getResult(MAX_WAIT);
        async2.getResult(MAX_WAIT);
        checkData(vm0, 0, NUM_BUCKETS, "b");
        checkData(vm0, 0, NUM_BUCKETS, "b", "region2");
        waitForRedundancyRecovery(vm0, 1, PR_REGION_NAME);
        waitForRedundancyRecovery(vm0, 1, "region2");
        waitForRedundancyRecovery(vm1, 1, PR_REGION_NAME);
        waitForRedundancyRecovery(vm1, 1, "region2");
        waitForRedundancyRecovery(vm2, 1, PR_REGION_NAME);
        waitForRedundancyRecovery(vm2, 1, "region2");
        // Make sure we don't have any extra buckets after the restart
        int totalBucketCount = getBucketList(vm0).size();
        totalBucketCount += getBucketList(vm1).size();
        totalBucketCount += getBucketList(vm2).size();
        assertEquals(2 * NUM_BUCKETS, totalBucketCount);
        totalBucketCount = getBucketList(vm0, "region2").size();
        totalBucketCount += getBucketList(vm1, "region2").size();
        totalBucketCount += getBucketList(vm2, "region2").size();
        assertEquals(2 * NUM_BUCKETS, totalBucketCount);
    } finally {
        expected.remove();
    }
}
Also used : VM(org.apache.geode.test.dunit.VM) SerializableCallable(org.apache.geode.test.dunit.SerializableCallable) SerializableRunnable(org.apache.geode.test.dunit.SerializableRunnable) IgnoredException(org.apache.geode.test.dunit.IgnoredException) Host(org.apache.geode.test.dunit.Host) InternalDistributedSystem(org.apache.geode.distributed.internal.InternalDistributedSystem) AsyncInvocation(org.apache.geode.test.dunit.AsyncInvocation) IgnoredException(org.apache.geode.test.dunit.IgnoredException) PartitionedRegionStorageException(org.apache.geode.cache.PartitionedRegionStorageException) RMIException(org.apache.geode.test.dunit.RMIException) CacheClosedException(org.apache.geode.cache.CacheClosedException) PartitionOfflineException(org.apache.geode.cache.persistence.PartitionOfflineException) IOException(java.io.IOException)

Example 7 with PartitionOfflineException

use of org.apache.geode.cache.persistence.PartitionOfflineException in project geode by apache.

the class PersistentColocatedPartitionedRegionDUnitTest method testCrashDuringRedundancySatisfaction.

/**
   * Test what happens when we crash in the middle of satisfying redundancy for a colocated bucket.
   * 
   * @throws Throwable
   */
// This test method is disabled because it is failing
// periodically and causing cruise control failures
// See bug #46748
@Test
public void testCrashDuringRedundancySatisfaction() throws Throwable {
    Host host = Host.getHost(0);
    VM vm0 = host.getVM(0);
    VM vm1 = host.getVM(1);
    SerializableRunnable createPRs = new SerializableRunnable("region1") {

        public void run() {
            Cache cache = getCache();
            DiskStore ds = cache.findDiskStore("disk");
            if (ds == null) {
                ds = cache.createDiskStoreFactory().setDiskDirs(getDiskDirs()).create("disk");
            }
            AttributesFactory af = new AttributesFactory();
            PartitionAttributesFactory paf = new PartitionAttributesFactory();
            paf.setRedundantCopies(1);
            // Workaround for 44414 - disable recovery delay so we shutdown
            // vm1 at a predictable point.
            paf.setRecoveryDelay(-1);
            paf.setStartupRecoveryDelay(-1);
            af.setPartitionAttributes(paf.create());
            af.setDataPolicy(DataPolicy.PERSISTENT_PARTITION);
            af.setDiskStoreName("disk");
            cache.createRegion(PR_REGION_NAME, af.create());
            paf.setColocatedWith(PR_REGION_NAME);
            af.setPartitionAttributes(paf.create());
            cache.createRegion("region2", af.create());
        }
    };
    // Create the PR on vm0
    vm0.invoke(createPRs);
    // Create some buckets.
    createData(vm0, 0, NUM_BUCKETS, "a");
    createData(vm0, 0, NUM_BUCKETS, "a", "region2");
    vm1.invoke(createPRs);
    // We shouldn't have created any buckets in vm1 yet.
    assertEquals(Collections.emptySet(), getBucketList(vm1));
    // Add an observer that will disconnect before allowing the peer to
    // GII a colocated bucket. This should leave the peer with only the parent
    // bucket
    vm0.invoke(new SerializableRunnable() {

        public void run() {
            DistributionMessageObserver.setInstance(new DistributionMessageObserver() {

                @Override
                public void beforeProcessMessage(DistributionManager dm, DistributionMessage message) {
                    if (message instanceof RequestImageMessage) {
                        if (((RequestImageMessage) message).regionPath.contains("region2")) {
                            DistributionMessageObserver.setInstance(null);
                            disconnectFromDS();
                        }
                    }
                }
            });
        }
    });
    IgnoredException ex = IgnoredException.addIgnoredException("PartitionOfflineException", vm1);
    try {
        // as we satisfy redundancy with vm1.
        try {
            RebalanceResults rr = rebalance(vm1);
        } catch (Exception expected) {
            // disconnect
            if (!(expected.getCause() instanceof PartitionOfflineException)) {
                throw expected;
            }
        }
        // Wait for vm0 to be closed by the callback
        vm0.invoke(new SerializableCallable() {

            public Object call() throws Exception {
                Wait.waitForCriterion(new WaitCriterion() {

                    public boolean done() {
                        InternalDistributedSystem ds = basicGetSystem();
                        return ds == null || !ds.isConnected();
                    }

                    public String description() {
                        return "DS did not disconnect";
                    }
                }, MAX_WAIT, 100, true);
                return null;
            }
        });
        // close the cache in vm1
        SerializableCallable disconnectFromDS = new SerializableCallable() {

            public Object call() throws Exception {
                disconnectFromDS();
                return null;
            }
        };
        vm1.invoke(disconnectFromDS);
        // Make sure vm0 is disconnected. This avoids a race where we
        // may still in the process of disconnecting even though the our async listener
        // found the system was disconnected
        vm0.invoke(disconnectFromDS);
    } finally {
        ex.remove();
    }
    // Create the cache and PRs on both members
    AsyncInvocation async0 = vm0.invokeAsync(createPRs);
    AsyncInvocation async1 = vm1.invokeAsync(createPRs);
    async0.getResult(MAX_WAIT);
    async1.getResult(MAX_WAIT);
    // Make sure the data was recovered correctly
    checkData(vm0, 0, NUM_BUCKETS, "a");
    // Workaround for bug 46748.
    checkData(vm0, 0, NUM_BUCKETS, "a", "region2");
}
Also used : SerializableRunnable(org.apache.geode.test.dunit.SerializableRunnable) Host(org.apache.geode.test.dunit.Host) RequestImageMessage(org.apache.geode.internal.cache.InitialImageOperation.RequestImageMessage) AsyncInvocation(org.apache.geode.test.dunit.AsyncInvocation) IgnoredException(org.apache.geode.test.dunit.IgnoredException) PartitionedRegionStorageException(org.apache.geode.cache.PartitionedRegionStorageException) RMIException(org.apache.geode.test.dunit.RMIException) CacheClosedException(org.apache.geode.cache.CacheClosedException) PartitionOfflineException(org.apache.geode.cache.persistence.PartitionOfflineException) IOException(java.io.IOException) DiskStore(org.apache.geode.cache.DiskStore) PartitionAttributesFactory(org.apache.geode.cache.PartitionAttributesFactory) AttributesFactory(org.apache.geode.cache.AttributesFactory) PartitionAttributesFactory(org.apache.geode.cache.PartitionAttributesFactory) WaitCriterion(org.apache.geode.test.dunit.WaitCriterion) PartitionOfflineException(org.apache.geode.cache.persistence.PartitionOfflineException) DistributionMessage(org.apache.geode.distributed.internal.DistributionMessage) VM(org.apache.geode.test.dunit.VM) SerializableCallable(org.apache.geode.test.dunit.SerializableCallable) IgnoredException(org.apache.geode.test.dunit.IgnoredException) InternalDistributedSystem(org.apache.geode.distributed.internal.InternalDistributedSystem) DistributionMessageObserver(org.apache.geode.distributed.internal.DistributionMessageObserver) DistributionManager(org.apache.geode.distributed.internal.DistributionManager) RebalanceResults(org.apache.geode.cache.control.RebalanceResults) Cache(org.apache.geode.cache.Cache) DistributedTest(org.apache.geode.test.junit.categories.DistributedTest) FlakyTest(org.apache.geode.test.junit.categories.FlakyTest) Test(org.junit.Test)

Example 8 with PartitionOfflineException

use of org.apache.geode.cache.persistence.PartitionOfflineException in project geode by apache.

the class PutAllCSDUnitTest method testPartialKeyInPR.

/**
   * Tests partial key putAll to 2 PR servers, because putting data at server side is different
   * between PR and LR. PR does it in postPutAll. It's not running in singleHop putAll
   */
@Test
public void testPartialKeyInPR() throws CacheException, InterruptedException {
    final String title = "testPartialKeyInPR:";
    final Host host = Host.getHost(0);
    VM server1 = host.getVM(0);
    final VM server2 = host.getVM(1);
    VM client1 = host.getVM(2);
    VM client2 = host.getVM(3);
    final String regionName = getUniqueName();
    final String serverHost = NetworkUtils.getServerHostName(server1.getHost());
    // set <true, false> means <PR=true, notifyBySubscription=false> to test local-invalidates
    final int serverPort1 = createBridgeServer(server1, regionName, 0, true, 0, "ds1");
    final int serverPort2 = createBridgeServer(server2, regionName, 0, true, 0, "ds1");
    createClient(client1, regionName, serverHost, new int[] { serverPort1, serverPort2 }, -1, -1, false, false, true);
    createClient(client2, regionName, serverHost, new int[] { serverPort1, serverPort2 }, -1, -1, false, false, true);
    server1.invoke(addExceptionTag1(expectedExceptions));
    server2.invoke(addExceptionTag1(expectedExceptions));
    client1.invoke(addExceptionTag1(expectedExceptions));
    client2.invoke(addExceptionTag1(expectedExceptions));
    server1.invoke(new CacheSerializableRunnable(title + "server1 add slow listener") {

        @Override
        public void run2() throws CacheException {
            Region region = getRootRegion().getSubregion(regionName);
            region.getAttributesMutator().addCacheListener(new MyListener(true));
        }
    });
    final SharedCounter sc_server2 = new SharedCounter("server2");
    server2.invoke(new CacheSerializableRunnable(title + "server2 add slow listener") {

        @Override
        public void run2() throws CacheException {
            Region region = getRootRegion().getSubregion(regionName);
            region.getAttributesMutator().addCacheListener(new MyListener(server2, true, sc_server2, 10));
        }
    });
    client2.invoke(new CacheSerializableRunnable(title + "client2 add listener") {

        @Override
        public void run2() throws CacheException {
            Region region = getRootRegion().getSubregion(regionName);
            region.getAttributesMutator().addCacheListener(new MyListener(false));
            region.registerInterest("ALL_KEYS");
            LogWriterUtils.getLogWriter().info("client2 registerInterest ALL_KEYS at " + region.getFullPath());
        }
    });
    AsyncInvocation async1 = client1.invokeAsync(new CacheSerializableRunnable(title + "client1 add listener and putAll") {

        @Override
        public void run2() throws CacheException {
            Region region = getRootRegion().getSubregion(regionName);
            region.getAttributesMutator().addCacheListener(new MyListener(false));
            region.registerInterest("ALL_KEYS");
            // create keys
            try {
                doPutAll(regionName, title, numberOfEntries);
                fail("Expect ServerOperationException caused by PutAllParitialResultException");
            } catch (ServerOperationException soe) {
                if (!(soe.getCause() instanceof PartitionOfflineException)) {
                    throw soe;
                }
                if (!soe.getMessage().contains(LocalizedStrings.Region_PutAll_Applied_PartialKeys_At_Server_0.toLocalizedString(region.getFullPath()))) {
                    throw soe;
                }
            }
        }
    });
    // server2 will closeCache after created 10 keys
    ThreadUtils.join(async1, 30 * 1000);
    if (async1.exceptionOccurred()) {
        Assert.fail("Aync1 get exceptions:", async1.getException());
    }
    int client1Size = getRegionSize(client1, regionName);
    // client2Size maybe more than client1Size
    int client2Size = getRegionSize(client2, regionName);
    int server1Size = getRegionSize(server1, regionName);
    LogWriterUtils.getLogWriter().info("region sizes: " + client1Size + "," + client2Size + "," + server1Size);
    // restart server2
    createBridgeServer(server2, regionName, serverPort2, true, 0, "ds1");
    server1Size = getRegionSize(server1, regionName);
    int server2Size = getRegionSize(server2, regionName);
    LogWriterUtils.getLogWriter().info("region sizes after server2 restarted: " + client1Size + "," + client2Size + "," + server1Size + ":" + server2Size);
    assertEquals(client2Size, server1Size);
    assertEquals(client2Size, server2Size);
    // close a server to re-run the test
    closeCache(server2);
    server1Size = getRegionSize(server1, regionName);
    client1.invoke(new CacheSerializableRunnable(title + "client1 does putAll again") {

        @Override
        public void run2() throws CacheException {
            Region region = getRootRegion().getSubregion(regionName);
            // create keys
            try {
                doPutAll(regionName, title + "again:", numberOfEntries);
                fail("Expect ServerOperationException caused by PutAllParitialResultException");
            } catch (ServerOperationException soe) {
                assertTrue(soe.getMessage().contains(LocalizedStrings.Region_PutAll_Applied_PartialKeys_At_Server_0.toLocalizedString(region.getFullPath())));
                assertTrue(soe.getCause() instanceof PartitionOfflineException);
            }
        }
    });
    int new_server1Size = getRegionSize(server1, regionName);
    int new_client1Size = getRegionSize(client1, regionName);
    int new_client2Size = getRegionSize(client2, regionName);
    LogWriterUtils.getLogWriter().info("region sizes after re-run the putAll: " + new_client1Size + "," + new_client2Size + "," + new_server1Size);
    assertEquals(server1Size + numberOfEntries / 2, new_server1Size);
    assertEquals(client1Size + numberOfEntries / 2, new_client1Size);
    assertEquals(client2Size + numberOfEntries / 2, new_client2Size);
    // restart server2
    createBridgeServer(server2, regionName, serverPort2, true, 0, "ds1");
    server1Size = getRegionSize(server1, regionName);
    server2Size = getRegionSize(server2, regionName);
    LogWriterUtils.getLogWriter().info("region sizes after restart server2: " + server1Size + "," + server2Size);
    assertEquals(server1Size, server2Size);
    // add a cacheWriter for server to stop after created 15 keys
    server1.invoke(new CacheSerializableRunnable(title + "server1 execute P2P putAll") {

        @Override
        public void run2() throws CacheException {
            Region region = getRootRegion().getSubregion(regionName);
            // let the server to trigger exception after created 15 keys
            region.getAttributesMutator().setCacheWriter(new MyWriter(15));
        }
    });
    // p2p putAll on PR and expect exception
    server2.invoke(new CacheSerializableRunnable(title + "server2 add listener and putAll") {

        @Override
        public void run2() throws CacheException {
            // create keys
            try {
                doPutAll(regionName, title + "once again:", numberOfEntries);
                fail("Expected a CacheWriterException to be thrown by test");
            } catch (CacheWriterException rte) {
                assertTrue(rte.getMessage().contains("Triggered exception as planned, created 15 keys"));
            }
        }
    });
    new_server1Size = getRegionSize(server1, regionName);
    int new_server2Size = getRegionSize(server2, regionName);
    LogWriterUtils.getLogWriter().info("region sizes after restart server2: " + new_server1Size + "," + new_server2Size);
    assertEquals(server1Size + 15, new_server1Size);
    assertEquals(server2Size + 15, new_server2Size);
    server1.invoke(removeExceptionTag1(expectedExceptions));
    server2.invoke(removeExceptionTag1(expectedExceptions));
    client1.invoke(removeExceptionTag1(expectedExceptions));
    client2.invoke(removeExceptionTag1(expectedExceptions));
    // Stop server
    stopBridgeServers(getCache());
}
Also used : CacheException(org.apache.geode.cache.CacheException) Host(org.apache.geode.test.dunit.Host) AsyncInvocation(org.apache.geode.test.dunit.AsyncInvocation) CacheSerializableRunnable(org.apache.geode.cache30.CacheSerializableRunnable) PartitionOfflineException(org.apache.geode.cache.persistence.PartitionOfflineException) VM(org.apache.geode.test.dunit.VM) Region(org.apache.geode.cache.Region) ServerOperationException(org.apache.geode.cache.client.ServerOperationException) CacheWriterException(org.apache.geode.cache.CacheWriterException) ClientSubscriptionTest(org.apache.geode.test.junit.categories.ClientSubscriptionTest) DistributedTest(org.apache.geode.test.junit.categories.DistributedTest) FlakyTest(org.apache.geode.test.junit.categories.FlakyTest) ClientServerTest(org.apache.geode.test.junit.categories.ClientServerTest) Test(org.junit.Test)

Example 9 with PartitionOfflineException

use of org.apache.geode.cache.persistence.PartitionOfflineException in project geode by apache.

the class PRHARedundancyProvider method createBucketAtomically.

/**
   * Creates bucket atomically by creating all the copies to satisfy redundancy. In case all copies
   * can not be created, a PartitionedRegionStorageException is thrown to the user and
   * BucketBackupMessage is sent to the nodes to make copies of a bucket that was only partially
   * created. Other VMs are informed of bucket creation through updates through their
   * {@link BucketAdvisor.BucketProfile}s.
   * 
   * <p>
   * This method is synchronized to enforce a single threaded ordering, allowing for a more accurate
   * picture of bucket distribution in the face of concurrency. See bug 37275.
   * </p>
   * 
   * This method is now slightly misnamed. Another member could be in the process of creating this
   * same bucket at the same time.
   * 
   * @param bucketId Id of the bucket to be created.
   * @param newBucketSize size of the first entry.
   * @param startTime a time stamp prior to calling the method, used to update bucket creation stats
   * @return the primary member for the newly created bucket
   * @throws PartitionedRegionStorageException if required # of buckets can not be created to
   *         satisfy redundancy.
   * @throws PartitionedRegionException if d-lock can not be acquired to create bucket.
   * @throws PartitionOfflineException if persistent data recovery is not complete for a partitioned
   *         region referred to in the query.
   */
public InternalDistributedMember createBucketAtomically(final int bucketId, final int newBucketSize, final long startTime, final boolean finishIncompleteCreation, String partitionName) throws PartitionedRegionStorageException, PartitionedRegionException, PartitionOfflineException {
    final boolean isDebugEnabled = logger.isDebugEnabled();
    prRegion.checkPROffline();
    // If there are insufficient stores throw *before* we try acquiring the
    // (very expensive) bucket lock or the (somewhat expensive) monitor on this
    earlySufficientStoresCheck(partitionName);
    synchronized (this) {
        if (this.prRegion.getCache().isCacheAtShutdownAll()) {
            throw new CacheClosedException("Cache is shutting down");
        }
        if (isDebugEnabled) {
            logger.debug("Starting atomic creation of bucketId={}", this.prRegion.bucketStringForLogs(bucketId));
        }
        Collection<InternalDistributedMember> acceptedMembers = // ArrayList<DataBucketStores>
        new ArrayList<InternalDistributedMember>();
        Set<InternalDistributedMember> excludedMembers = new HashSet<InternalDistributedMember>();
        ArrayListWithClearState<InternalDistributedMember> failedMembers = new ArrayListWithClearState<InternalDistributedMember>();
        final long timeOut = System.currentTimeMillis() + computeTimeout();
        BucketMembershipObserver observer = null;
        boolean needToElectPrimary = true;
        InternalDistributedMember bucketPrimary = null;
        try {
            this.prRegion.checkReadiness();
            Bucket toCreate = this.prRegion.getRegionAdvisor().getBucket(bucketId);
            if (!finishIncompleteCreation) {
                bucketPrimary = this.prRegion.getBucketPrimary(bucketId);
                if (bucketPrimary != null) {
                    if (isDebugEnabled) {
                        logger.debug("during atomic creation, discovered that the primary already exists {} returning early", bucketPrimary);
                    }
                    needToElectPrimary = false;
                    return bucketPrimary;
                }
            }
            observer = new BucketMembershipObserver(toCreate).beginMonitoring();
            // track if insufficient data stores have been
            boolean loggedInsufficentStores = false;
            // detected
            for (; ; ) {
                this.prRegion.checkReadiness();
                if (this.prRegion.getCache().isCacheAtShutdownAll()) {
                    if (isDebugEnabled) {
                        logger.debug("Aborted createBucketAtomically due to ShutdownAll");
                    }
                    throw new CacheClosedException("Cache is shutting down");
                }
                // this.prRegion.getCache().getLogger().config(
                // "DEBUG createBucketAtomically: "
                // + " bucketId=" + this.prRegion.getBucketName(bucketId) +
                // " accepted: " + acceptedMembers +
                // " failed: " + failedMembers);
                long timeLeft = timeOut - System.currentTimeMillis();
                if (timeLeft < 0) {
                    // It took too long.
                    timedOut(this.prRegion, getAllStores(partitionName), acceptedMembers, ALLOCATE_ENOUGH_MEMBERS_TO_HOST_BUCKET.toLocalizedString(), computeTimeout());
                // NOTREACHED
                }
                if (isDebugEnabled) {
                    logger.debug("createBucketAtomically: have {} ms left to finish this", timeLeft);
                }
                // Always go back to the advisor, see if any fresh data stores are
                // present.
                Set<InternalDistributedMember> allStores = getAllStores(partitionName);
                loggedInsufficentStores = checkSufficientStores(allStores, loggedInsufficentStores);
                InternalDistributedMember candidate = createBucketInstance(bucketId, newBucketSize, excludedMembers, acceptedMembers, failedMembers, timeOut, allStores);
                if (candidate != null) {
                    if (this.prRegion.getDistributionManager().enforceUniqueZone()) {
                        // enforceUniqueZone property has no effect for a loner. Fix for defect #47181
                        if (!(this.prRegion.getDistributionManager() instanceof LonerDistributionManager)) {
                            Set<InternalDistributedMember> exm = getBuddyMembersInZone(candidate, allStores);
                            exm.remove(candidate);
                            exm.removeAll(acceptedMembers);
                            excludedMembers.addAll(exm);
                        } else {
                            // log a warning if Loner
                            logger.warn(LocalizedMessage.create(LocalizedStrings.GemFireCache_ENFORCE_UNIQUE_HOST_NOT_APPLICABLE_FOR_LONER));
                        }
                    }
                }
                // Get an updated list of bucket owners, which should include
                // buckets created concurrently with this createBucketAtomically call
                acceptedMembers = prRegion.getRegionAdvisor().getBucketOwners(bucketId);
                if (isDebugEnabled) {
                    logger.debug("Accepted members: {}", acceptedMembers);
                }
                // the candidate has accepted
                if (bucketPrimary == null && acceptedMembers.contains(candidate)) {
                    bucketPrimary = candidate;
                }
                // prune out the stores that have left
                verifyBucketNodes(excludedMembers, partitionName);
                // Note - we used to wait for the created bucket to become primary here
                // if this is a colocated region. We no longer need to do that, because
                // the EndBucketMessage is sent out after bucket creation completes to
                // select the primary.
                // Have we exhausted all candidates?
                final int potentialCandidateCount = (allStores.size() - (excludedMembers.size() + acceptedMembers.size() + failedMembers.size()));
                // Determining exhausted members competes with bucket balancing; it's
                // important to re-visit all failed members since "failed" set may
                // contain datastores which at the moment are imbalanced, but yet could
                // be candidates. If the failed members list is empty, its expected
                // that the next iteration clears the (already empty) list.
                final boolean exhaustedPotentialCandidates = failedMembers.wasCleared() && potentialCandidateCount <= 0;
                final boolean redundancySatisfied = acceptedMembers.size() > this.prRegion.getRedundantCopies();
                final boolean bucketNotCreated = acceptedMembers.size() == 0;
                if (isDebugEnabled) {
                    logger.debug("potentialCandidateCount={}, exhaustedPotentialCandidates={}, redundancySatisfied={}, bucketNotCreated={}", potentialCandidateCount, exhaustedPotentialCandidates, redundancySatisfied, bucketNotCreated);
                }
                if (bucketNotCreated) {
                    // if we haven't managed to create the bucket on any nodes, retry.
                    continue;
                }
                if (exhaustedPotentialCandidates && !redundancySatisfied) {
                    insufficientStores(allStores, acceptedMembers, true);
                }
                // Fix for bug 39283
                if (redundancySatisfied || exhaustedPotentialCandidates) {
                    // Tell one of the members to become primary.
                    // The rest of the members will be allowed to
                    // volunteer for primary.
                    endBucketCreation(bucketId, acceptedMembers, bucketPrimary, partitionName);
                    final int expectedRemoteHosts = acceptedMembers.size() - (acceptedMembers.contains(this.prRegion.getMyId()) ? 1 : 0);
                    boolean interrupted = Thread.interrupted();
                    try {
                        BucketMembershipObserverResults results = observer.waitForOwnersGetPrimary(expectedRemoteHosts, acceptedMembers, partitionName);
                        if (results.problematicDeparture) {
                            // BZZZT! Member left. Start over.
                            continue;
                        }
                        bucketPrimary = results.primary;
                    } catch (InterruptedException e) {
                        interrupted = true;
                        this.prRegion.getCancelCriterion().checkCancelInProgress(e);
                    } finally {
                        if (interrupted) {
                            Thread.currentThread().interrupt();
                        }
                    }
                    needToElectPrimary = false;
                    return bucketPrimary;
                }
            // almost done
            }
        // for
        } catch (CancelException e) {
            // Fix for 43544 - We don't need to elect a primary
            // if the cache was closed. The other members will
            // take care of it. This ensures we don't compromise
            // redundancy.
            needToElectPrimary = false;
            throw e;
        } catch (RegionDestroyedException e) {
            // Fix for 43544 - We don't need to elect a primary
            // if the region was destroyed. The other members will
            // take care of it. This ensures we don't compromise
            // redundancy.
            needToElectPrimary = false;
            throw e;
        } catch (PartitionOfflineException e) {
            throw e;
        } catch (RuntimeException e) {
            if (isDebugEnabled) {
                logger.debug("Unable to create new bucket {}: {}", bucketId, e.getMessage(), e);
            }
            // than reattempting on other nodes?
            if (!finishIncompleteCreation) {
                cleanUpBucket(bucketId);
            }
            throw e;
        } finally {
            if (observer != null) {
                observer.stopMonitoring();
            }
            // Try to make sure everyone that created the bucket can volunteer for primary
            if (needToElectPrimary) {
                try {
                    endBucketCreation(bucketId, prRegion.getRegionAdvisor().getBucketOwners(bucketId), bucketPrimary, partitionName);
                } catch (Exception e) {
                    // if region is going down, then no warning level logs
                    if (e instanceof CancelException || e instanceof CacheClosedException || (prRegion.getCancelCriterion().isCancelInProgress())) {
                        logger.debug("Exception trying choose a primary after bucket creation failure", e);
                    } else {
                        logger.warn("Exception trying choose a primary after bucket creation failure", e);
                    }
                }
            }
        }
    }
// synchronized(this)
}
Also used : RegionDestroyedException(org.apache.geode.cache.RegionDestroyedException) CacheClosedException(org.apache.geode.cache.CacheClosedException) PartitionedRegionStorageException(org.apache.geode.cache.PartitionedRegionStorageException) RejectedExecutionException(java.util.concurrent.RejectedExecutionException) RegionDestroyedException(org.apache.geode.cache.RegionDestroyedException) CacheClosedException(org.apache.geode.cache.CacheClosedException) CancelException(org.apache.geode.CancelException) PartitionOfflineException(org.apache.geode.cache.persistence.PartitionOfflineException) InternalDistributedMember(org.apache.geode.distributed.internal.membership.InternalDistributedMember) PartitionOfflineException(org.apache.geode.cache.persistence.PartitionOfflineException) CancelException(org.apache.geode.CancelException) LonerDistributionManager(org.apache.geode.distributed.internal.LonerDistributionManager)

Example 10 with PartitionOfflineException

use of org.apache.geode.cache.persistence.PartitionOfflineException in project geode by apache.

the class ProxyBucketRegion method checkBucketRedundancyBeforeGrab.

public boolean checkBucketRedundancyBeforeGrab(InternalDistributedMember moveSource, boolean replaceOfflineData) {
    int redundancy = getBucketAdvisor().getBucketRedundancy();
    // Skip any checks if this is a colocated bucket. We need to create
    // the colocated bucket if we managed to create the parent bucket. There are
    // race conditions where the parent region may know that a member is no longer
    // hosting the bucket, but the child region doesn't know that yet.
    PartitionedRegion colocatedRegion = ColocationHelper.getColocatedRegion(this.partitionedRegion);
    if (colocatedRegion != null) {
        return true;
    }
    // sure the bucket isn't completely offline
    if (!replaceOfflineData || redundancy == -1) {
        BucketPersistenceAdvisor persistAdvisor = getPersistenceAdvisor();
        if (persistAdvisor != null) {
            // any offline buckets should be empty
            if (!persistAdvisor.wasHosting() && advisor.getHadPrimary()) {
                final PersistentMembershipView membershipView = persistAdvisor.getMembershipView();
                if (membershipView == null) {
                    // Refuse to create the bucket if that is the case.
                    if (logger.isDebugEnabled()) {
                        logger.debug("grabFreeBucket: Can't create bucket because persistence is not yet initialized {}{}{}", this.partitionedRegion.getPRId(), PartitionedRegion.BUCKET_ID_SEPARATOR, bid);
                    }
                    return false;
                }
                Set<PersistentMemberID> offlineMembers = membershipView.getOfflineMembers();
                if (logger.isDebugEnabled()) {
                    logger.debug("We didn't host the bucket. Checking redundancy level before creating the bucket. Redundancy={} offline members={}", redundancy, offlineMembers);
                }
                if (offlineMembers != null && !offlineMembers.isEmpty() && redundancy == -1) {
                    // If there are offline members, and no online members, throw
                    // an exception indicating that we can't create the bucket.
                    String message = LocalizedStrings.PartitionedRegionDataStore_DATA_OFFLINE_MESSAGE.toLocalizedString(partitionedRegion.getFullPath(), bid, offlineMembers);
                    throw new PartitionOfflineException((Set) offlineMembers, message);
                } else {
                    // an extra copy of the bucket.
                    if (offlineMembers != null) {
                        redundancy += offlineMembers.size();
                    }
                }
            }
        }
    }
    if (moveSource == null) {
        if (redundancy >= this.partitionedRegion.getRedundantCopies()) {
            if (logger.isDebugEnabled()) {
                logger.debug("grabFreeBucket: Bucket already meets redundancy level bucketId={}{}{}", this.partitionedRegion.getPRId(), PartitionedRegion.BUCKET_ID_SEPARATOR, bid);
            }
            return false;
        }
    }
    // is a bucket move, we allow the source to be on the same host.
    if (!PartitionedRegionBucketMgmtHelper.bucketIsAllowedOnThisHost(this, moveSource)) {
        if (logger.isDebugEnabled()) {
            logger.debug("grabFreeBucket: Bucket can't be recovered because we're enforcing that the bucket host must be unique {}{}{}", this.partitionedRegion.getPRId(), PartitionedRegion.BUCKET_ID_SEPARATOR, bid);
        }
        return false;
    }
    return true;
}
Also used : PartitionOfflineException(org.apache.geode.cache.persistence.PartitionOfflineException) PersistentMembershipView(org.apache.geode.internal.cache.persistence.PersistentMembershipView) PersistentMemberID(org.apache.geode.internal.cache.persistence.PersistentMemberID)

Aggregations

PartitionOfflineException (org.apache.geode.cache.persistence.PartitionOfflineException)19 DistributedTest (org.apache.geode.test.junit.categories.DistributedTest)12 Test (org.junit.Test)12 Host (org.apache.geode.test.dunit.Host)10 IgnoredException (org.apache.geode.test.dunit.IgnoredException)10 RMIException (org.apache.geode.test.dunit.RMIException)10 VM (org.apache.geode.test.dunit.VM)10 FlakyTest (org.apache.geode.test.junit.categories.FlakyTest)10 IOException (java.io.IOException)9 CacheClosedException (org.apache.geode.cache.CacheClosedException)9 PartitionedRegionStorageException (org.apache.geode.cache.PartitionedRegionStorageException)9 SerializableRunnable (org.apache.geode.test.dunit.SerializableRunnable)9 AsyncInvocation (org.apache.geode.test.dunit.AsyncInvocation)7 AttributesFactory (org.apache.geode.cache.AttributesFactory)6 Cache (org.apache.geode.cache.Cache)6 PartitionAttributesFactory (org.apache.geode.cache.PartitionAttributesFactory)6 DiskStore (org.apache.geode.cache.DiskStore)5 Region (org.apache.geode.cache.Region)4 InternalDistributedSystem (org.apache.geode.distributed.internal.InternalDistributedSystem)3 CancelException (org.apache.geode.CancelException)2