Search in sources :

Example 6 with SplitStoreFromJavaRDDOfElements

use of uk.gov.gchq.gaffer.spark.operation.javardd.SplitStoreFromJavaRDDOfElements in project Gaffer by gchq.

the class SplitStoreFromJavaRDDOfElementsHandler method generateSplitPoints.

private void generateSplitPoints(final SplitStoreFromJavaRDDOfElements operation, final Context context, final AccumuloStore store) throws OperationException {
    final byte[] schemaAsJson = store.getSchema().toCompactJson();
    final String keyConverterClassName = store.getKeyPackage().getKeyConverter().getClass().getName();
    final JavaRDD<Text> rows = operation.getInput().mapPartitions(new ElementIteratorToPairIteratorFunction(keyConverterClassName, schemaAsJson)).flatMap(pair -> {
        if (null == pair.getSecond()) {
            return asList(pair.getFirst()).iterator();
        } else {
            return asList(pair.getFirst(), pair.getSecond()).iterator();
        }
    }).map(key -> key.getRow());
    final double fractionToSample = super.adjustFractionToSampleForSize(operation.getFractionToSample(), operation.getMaxSampleSize(), rows.count());
    final Random seed = new Random(System.currentTimeMillis());
    final List<String> sample = rows.sample(WITHOUT_REPLACEMENT, fractionToSample, seed.nextLong()).map(Text::toString).collect();
    super.createSplitPoints(store, context, sample);
}
Also used : AccumuloElementConverter(uk.gov.gchq.gaffer.accumulostore.key.AccumuloElementConverter) Iterator(java.util.Iterator) Pair(uk.gov.gchq.gaffer.commonutil.pair.Pair) Text(org.apache.hadoop.io.Text) Random(java.util.Random) Element(uk.gov.gchq.gaffer.data.element.Element) Store(uk.gov.gchq.gaffer.store.Store) List(java.util.List) Context(uk.gov.gchq.gaffer.store.Context) Schema(uk.gov.gchq.gaffer.store.schema.Schema) AccumuloStore(uk.gov.gchq.gaffer.accumulostore.AccumuloStore) Arrays.asList(java.util.Arrays.asList) Key(org.apache.accumulo.core.data.Key) OperationException(uk.gov.gchq.gaffer.operation.OperationException) AbstractSplitStoreFromRDDOfElementsHandler(uk.gov.gchq.gaffer.sparkaccumulo.operation.handler.AbstractSplitStoreFromRDDOfElementsHandler) JavaRDD(org.apache.spark.api.java.JavaRDD) FlatMapFunction(org.apache.spark.api.java.function.FlatMapFunction) SplitStoreFromJavaRDDOfElements(uk.gov.gchq.gaffer.spark.operation.javardd.SplitStoreFromJavaRDDOfElements) Random(java.util.Random) Text(org.apache.hadoop.io.Text)

Aggregations

SplitStoreFromJavaRDDOfElements (uk.gov.gchq.gaffer.spark.operation.javardd.SplitStoreFromJavaRDDOfElements)6 Test (org.junit.jupiter.api.Test)5 Arrays.asList (java.util.Arrays.asList)1 Iterator (java.util.Iterator)1 List (java.util.List)1 Random (java.util.Random)1 Key (org.apache.accumulo.core.data.Key)1 Text (org.apache.hadoop.io.Text)1 JavaRDD (org.apache.spark.api.java.JavaRDD)1 FlatMapFunction (org.apache.spark.api.java.function.FlatMapFunction)1 AccumuloStore (uk.gov.gchq.gaffer.accumulostore.AccumuloStore)1 AccumuloElementConverter (uk.gov.gchq.gaffer.accumulostore.key.AccumuloElementConverter)1 Pair (uk.gov.gchq.gaffer.commonutil.pair.Pair)1 Element (uk.gov.gchq.gaffer.data.element.Element)1 OperationException (uk.gov.gchq.gaffer.operation.OperationException)1 AbstractSplitStoreFromRDDOfElementsHandler (uk.gov.gchq.gaffer.sparkaccumulo.operation.handler.AbstractSplitStoreFromRDDOfElementsHandler)1 Context (uk.gov.gchq.gaffer.store.Context)1 Store (uk.gov.gchq.gaffer.store.Store)1 Schema (uk.gov.gchq.gaffer.store.schema.Schema)1