Search in sources :

Example 1 with BitArray

use of com.google.common.hash.BloomFilterStrategies.BitArray in project guava by hceylan.

the class BloomFilter method create.

/**
   * Creates a {@code Builder} of a {@link BloomFilter BloomFilter<T>}, with the expected number
   * of insertions and expected false positive probability.
   *
   * <p>Note that overflowing a {@code BloomFilter} with significantly more elements
   * than specified, will result in its saturation, and a sharp deterioration of its
   * false positive probability.
   *
   * <p>The constructed {@code BloomFilter<T>} will be serializable if the provided
   * {@code Funnel<T>} is.
   *
   * <p>It is recommended the funnel is implemented as a Java enum. This has the benefit of ensuring
   * proper serialization and deserialization, which is important since {@link #equals} also relies
   * on object identity of funnels.
   *
   * @param funnel the funnel of T's that the constructed {@code BloomFilter<T>} will use
   * @param expectedInsertions the number of expected insertions to the constructed
   *        {@code BloomFilter<T>}; must be positive
   * @param falsePositiveProbability the desired false positive probability (must be positive and
   *        less than 1.0)
   * @return a {@code BloomFilter}
   */
public static <T> BloomFilter<T> create(Funnel<T> funnel, int expectedInsertions, /* n */
double falsePositiveProbability) {
    checkNotNull(funnel);
    checkArgument(expectedInsertions > 0, "Expected insertions must be positive");
    checkArgument(falsePositiveProbability > 0.0 & falsePositiveProbability < 1.0, "False positive probability in (0.0, 1.0)");
    /*
     * andreou: I wanted to put a warning in the javadoc about tiny fpp values,
     * since the resulting size is proportional to -log(p), but there is not
     * much of a point after all, e.g. optimalM(1000, 0.0000000000000001) = 76680
     * which is less that 10kb. Who cares!
     */
    int numBits = optimalNumOfBits(expectedInsertions, falsePositiveProbability);
    int numHashFunctions = optimalNumOfHashFunctions(expectedInsertions, numBits);
    return new BloomFilter<T>(new BitArray(numBits), numHashFunctions, funnel, BloomFilterStrategies.MURMUR128_MITZ_32);
}
Also used : BitArray(com.google.common.hash.BloomFilterStrategies.BitArray)

Example 2 with BitArray

use of com.google.common.hash.BloomFilterStrategies.BitArray in project guava by google.

the class BloomFilter method create.

@VisibleForTesting
static <T> BloomFilter<T> create(Funnel<? super T> funnel, long expectedInsertions, double fpp, Strategy strategy) {
    checkNotNull(funnel);
    checkArgument(expectedInsertions >= 0, "Expected insertions (%s) must be >= 0", expectedInsertions);
    checkArgument(fpp > 0.0, "False positive probability (%s) must be > 0.0", fpp);
    checkArgument(fpp < 1.0, "False positive probability (%s) must be < 1.0", fpp);
    checkNotNull(strategy);
    if (expectedInsertions == 0) {
        expectedInsertions = 1;
    }
    /*
     * TODO(user): Put a warning in the javadoc about tiny fpp values, since the resulting size
     * is proportional to -log(p), but there is not much of a point after all, e.g.
     * optimalM(1000, 0.0000000000000001) = 76680 which is less than 10kb. Who cares!
     */
    long numBits = optimalNumOfBits(expectedInsertions, fpp);
    int numHashFunctions = optimalNumOfHashFunctions(expectedInsertions, numBits);
    try {
        return new BloomFilter<T>(new BitArray(numBits), numHashFunctions, funnel, strategy);
    } catch (IllegalArgumentException e) {
        throw new IllegalArgumentException("Could not create BloomFilter of " + numBits + " bits", e);
    }
}
Also used : BitArray(com.google.common.hash.BloomFilterStrategies.BitArray) VisibleForTesting(com.google.common.annotations.VisibleForTesting)

Example 3 with BitArray

use of com.google.common.hash.BloomFilterStrategies.BitArray in project guava by google.

the class BloomFilter method readFrom.

/**
   * Reads a byte stream, which was written by {@linkplain #writeTo(OutputStream)}, into a
   * {@code BloomFilter<T>}.
   *
   * The {@code Funnel} to be used is not encoded in the stream, so it must be provided here.
   * <b>Warning:</b> the funnel provided <b>must</b> behave identically to the one used to populate
   * the original Bloom filter!
   *
   * @throws IOException if the InputStream throws an {@code IOException}, or if its data does not
   *     appear to be a BloomFilter serialized using the {@linkplain #writeTo(OutputStream)} method.
   */
public static <T> BloomFilter<T> readFrom(InputStream in, Funnel<? super T> funnel) throws IOException {
    checkNotNull(in, "InputStream");
    checkNotNull(funnel, "Funnel");
    int strategyOrdinal = -1;
    int numHashFunctions = -1;
    int dataLength = -1;
    try {
        DataInputStream din = new DataInputStream(in);
        // currently this assumes there is no negative ordinal; will have to be updated if we
        // add non-stateless strategies (for which we've reserved negative ordinals; see
        // Strategy.ordinal()).
        strategyOrdinal = din.readByte();
        numHashFunctions = UnsignedBytes.toInt(din.readByte());
        dataLength = din.readInt();
        Strategy strategy = BloomFilterStrategies.values()[strategyOrdinal];
        long[] data = new long[dataLength];
        for (int i = 0; i < data.length; i++) {
            data[i] = din.readLong();
        }
        return new BloomFilter<T>(new BitArray(data), numHashFunctions, funnel, strategy);
    } catch (RuntimeException e) {
        String message = "Unable to deserialize BloomFilter from InputStream." + " strategyOrdinal: " + strategyOrdinal + " numHashFunctions: " + numHashFunctions + " dataLength: " + dataLength;
        throw new IOException(message, e);
    }
}
Also used : BitArray(com.google.common.hash.BloomFilterStrategies.BitArray) IOException(java.io.IOException) DataInputStream(java.io.DataInputStream)

Example 4 with BitArray

use of com.google.common.hash.BloomFilterStrategies.BitArray in project guava by google.

the class BloomFilterTest method testLargeBloomFilterDoesntOverflow.

// OutOfMemoryError
@AndroidIncompatible
public void testLargeBloomFilterDoesntOverflow() {
    long numBits = Integer.MAX_VALUE;
    numBits++;
    BitArray bitArray = new BitArray(numBits);
    assertTrue("BitArray.bitSize() must return a positive number, but was " + bitArray.bitSize(), bitArray.bitSize() > 0);
// Ideally we would also test the bitSize() overflow of this BF, but it runs out of heap space
// BloomFilter.create(Funnels.unencodedCharsFunnel(), 244412641, 1e-11);
}
Also used : BitArray(com.google.common.hash.BloomFilterStrategies.BitArray)

Aggregations

BitArray (com.google.common.hash.BloomFilterStrategies.BitArray)4 VisibleForTesting (com.google.common.annotations.VisibleForTesting)1 DataInputStream (java.io.DataInputStream)1 IOException (java.io.IOException)1