use of com.google.common.hash.BloomFilterStrategies.BitArray in project guava by hceylan.
the class BloomFilter method create.
/**
* Creates a {@code Builder} of a {@link BloomFilter BloomFilter<T>}, with the expected number
* of insertions and expected false positive probability.
*
* <p>Note that overflowing a {@code BloomFilter} with significantly more elements
* than specified, will result in its saturation, and a sharp deterioration of its
* false positive probability.
*
* <p>The constructed {@code BloomFilter<T>} will be serializable if the provided
* {@code Funnel<T>} is.
*
* <p>It is recommended the funnel is implemented as a Java enum. This has the benefit of ensuring
* proper serialization and deserialization, which is important since {@link #equals} also relies
* on object identity of funnels.
*
* @param funnel the funnel of T's that the constructed {@code BloomFilter<T>} will use
* @param expectedInsertions the number of expected insertions to the constructed
* {@code BloomFilter<T>}; must be positive
* @param falsePositiveProbability the desired false positive probability (must be positive and
* less than 1.0)
* @return a {@code BloomFilter}
*/
public static <T> BloomFilter<T> create(Funnel<T> funnel, int expectedInsertions, /* n */
double falsePositiveProbability) {
checkNotNull(funnel);
checkArgument(expectedInsertions > 0, "Expected insertions must be positive");
checkArgument(falsePositiveProbability > 0.0 & falsePositiveProbability < 1.0, "False positive probability in (0.0, 1.0)");
/*
* andreou: I wanted to put a warning in the javadoc about tiny fpp values,
* since the resulting size is proportional to -log(p), but there is not
* much of a point after all, e.g. optimalM(1000, 0.0000000000000001) = 76680
* which is less that 10kb. Who cares!
*/
int numBits = optimalNumOfBits(expectedInsertions, falsePositiveProbability);
int numHashFunctions = optimalNumOfHashFunctions(expectedInsertions, numBits);
return new BloomFilter<T>(new BitArray(numBits), numHashFunctions, funnel, BloomFilterStrategies.MURMUR128_MITZ_32);
}
use of com.google.common.hash.BloomFilterStrategies.BitArray in project guava by google.
the class BloomFilter method create.
@VisibleForTesting
static <T> BloomFilter<T> create(Funnel<? super T> funnel, long expectedInsertions, double fpp, Strategy strategy) {
checkNotNull(funnel);
checkArgument(expectedInsertions >= 0, "Expected insertions (%s) must be >= 0", expectedInsertions);
checkArgument(fpp > 0.0, "False positive probability (%s) must be > 0.0", fpp);
checkArgument(fpp < 1.0, "False positive probability (%s) must be < 1.0", fpp);
checkNotNull(strategy);
if (expectedInsertions == 0) {
expectedInsertions = 1;
}
/*
* TODO(user): Put a warning in the javadoc about tiny fpp values, since the resulting size
* is proportional to -log(p), but there is not much of a point after all, e.g.
* optimalM(1000, 0.0000000000000001) = 76680 which is less than 10kb. Who cares!
*/
long numBits = optimalNumOfBits(expectedInsertions, fpp);
int numHashFunctions = optimalNumOfHashFunctions(expectedInsertions, numBits);
try {
return new BloomFilter<T>(new BitArray(numBits), numHashFunctions, funnel, strategy);
} catch (IllegalArgumentException e) {
throw new IllegalArgumentException("Could not create BloomFilter of " + numBits + " bits", e);
}
}
use of com.google.common.hash.BloomFilterStrategies.BitArray in project guava by google.
the class BloomFilter method readFrom.
/**
* Reads a byte stream, which was written by {@linkplain #writeTo(OutputStream)}, into a
* {@code BloomFilter<T>}.
*
* The {@code Funnel} to be used is not encoded in the stream, so it must be provided here.
* <b>Warning:</b> the funnel provided <b>must</b> behave identically to the one used to populate
* the original Bloom filter!
*
* @throws IOException if the InputStream throws an {@code IOException}, or if its data does not
* appear to be a BloomFilter serialized using the {@linkplain #writeTo(OutputStream)} method.
*/
public static <T> BloomFilter<T> readFrom(InputStream in, Funnel<? super T> funnel) throws IOException {
checkNotNull(in, "InputStream");
checkNotNull(funnel, "Funnel");
int strategyOrdinal = -1;
int numHashFunctions = -1;
int dataLength = -1;
try {
DataInputStream din = new DataInputStream(in);
// currently this assumes there is no negative ordinal; will have to be updated if we
// add non-stateless strategies (for which we've reserved negative ordinals; see
// Strategy.ordinal()).
strategyOrdinal = din.readByte();
numHashFunctions = UnsignedBytes.toInt(din.readByte());
dataLength = din.readInt();
Strategy strategy = BloomFilterStrategies.values()[strategyOrdinal];
long[] data = new long[dataLength];
for (int i = 0; i < data.length; i++) {
data[i] = din.readLong();
}
return new BloomFilter<T>(new BitArray(data), numHashFunctions, funnel, strategy);
} catch (RuntimeException e) {
String message = "Unable to deserialize BloomFilter from InputStream." + " strategyOrdinal: " + strategyOrdinal + " numHashFunctions: " + numHashFunctions + " dataLength: " + dataLength;
throw new IOException(message, e);
}
}
use of com.google.common.hash.BloomFilterStrategies.BitArray in project guava by google.
the class BloomFilterTest method testLargeBloomFilterDoesntOverflow.
// OutOfMemoryError
@AndroidIncompatible
public void testLargeBloomFilterDoesntOverflow() {
long numBits = Integer.MAX_VALUE;
numBits++;
BitArray bitArray = new BitArray(numBits);
assertTrue("BitArray.bitSize() must return a positive number, but was " + bitArray.bitSize(), bitArray.bitSize() > 0);
// Ideally we would also test the bitSize() overflow of this BF, but it runs out of heap space
// BloomFilter.create(Funnels.unencodedCharsFunnel(), 244412641, 1e-11);
}
Aggregations