Search in sources :

Example 1 with WriteData

use of uk.gov.gchq.gaffer.parquetstore.operation.handler.spark.utilities.WriteData in project Gaffer by gchq.

the class AddElementsFromRDD method writeInputData.

/**
 * Writes the provided {@link JavaRDD} of {@link Element}s to files split by group and input partition (i.e. the
 * partition of the input {@link JavaRDD}, not the partition of the existing graph). The data is
 * written in the order the {@link JavaRDD} provides it, with no sorting or aggregation.
 *
 * @param input the JavaRDD of Elements
 */
private void writeInputData(final JavaRDD<Element> input) {
    LOGGER.info("Writing data for input RDD");
    final Function<String, String> groupToUnsortedUnaggregatedNewData = group -> getDirectory(group, false, false, false);
    input.foreachPartition(new WriteData(groupToUnsortedUnaggregatedNewData, schema, store.getProperties().getCompressionCodecName()));
}
Also used : Arrays(java.util.Arrays) StoreException(uk.gov.gchq.gaffer.store.StoreException) FileSystem(org.apache.hadoop.fs.FileSystem) LoggerFactory(org.slf4j.LoggerFactory) SerialisationException(uk.gov.gchq.gaffer.exception.SerialisationException) FileStatus(org.apache.hadoop.fs.FileStatus) ParquetStore(uk.gov.gchq.gaffer.parquetstore.ParquetStore) Function(java.util.function.Function) Element(uk.gov.gchq.gaffer.data.element.Element) ArrayList(java.util.ArrayList) CalculatePartitioner(uk.gov.gchq.gaffer.parquetstore.operation.handler.utilities.CalculatePartitioner) FSDataOutputStream(org.apache.hadoop.fs.FSDataOutputStream) SparkParquetUtils(uk.gov.gchq.gaffer.parquetstore.utils.SparkParquetUtils) SortFullGroup(uk.gov.gchq.gaffer.parquetstore.operation.handler.utilities.SortFullGroup) Path(org.apache.hadoop.fs.Path) JavaRDD(org.apache.spark.api.java.JavaRDD) SparkSession(org.apache.spark.sql.SparkSession) AggregateDataForGroup(uk.gov.gchq.gaffer.parquetstore.operation.handler.utilities.AggregateDataForGroup) Logger(org.slf4j.Logger) WriteData(uk.gov.gchq.gaffer.parquetstore.operation.handler.spark.utilities.WriteData) SparkContextUtil(uk.gov.gchq.gaffer.spark.SparkContextUtil) SchemaUtils(uk.gov.gchq.gaffer.parquetstore.utils.SchemaUtils) IOException(java.io.IOException) List(java.util.List) GraphPartitionerSerialiser(uk.gov.gchq.gaffer.parquetstore.partitioner.serialisation.GraphPartitionerSerialiser) Context(uk.gov.gchq.gaffer.store.Context) Schema(uk.gov.gchq.gaffer.store.schema.Schema) GraphPartitioner(uk.gov.gchq.gaffer.parquetstore.partitioner.GraphPartitioner) OperationException(uk.gov.gchq.gaffer.operation.OperationException) RDD(org.apache.spark.rdd.RDD) WriteData(uk.gov.gchq.gaffer.parquetstore.operation.handler.spark.utilities.WriteData)

Aggregations

IOException (java.io.IOException)1 ArrayList (java.util.ArrayList)1 Arrays (java.util.Arrays)1 List (java.util.List)1 Function (java.util.function.Function)1 FSDataOutputStream (org.apache.hadoop.fs.FSDataOutputStream)1 FileStatus (org.apache.hadoop.fs.FileStatus)1 FileSystem (org.apache.hadoop.fs.FileSystem)1 Path (org.apache.hadoop.fs.Path)1 JavaRDD (org.apache.spark.api.java.JavaRDD)1 RDD (org.apache.spark.rdd.RDD)1 SparkSession (org.apache.spark.sql.SparkSession)1 Logger (org.slf4j.Logger)1 LoggerFactory (org.slf4j.LoggerFactory)1 Element (uk.gov.gchq.gaffer.data.element.Element)1 SerialisationException (uk.gov.gchq.gaffer.exception.SerialisationException)1 OperationException (uk.gov.gchq.gaffer.operation.OperationException)1 ParquetStore (uk.gov.gchq.gaffer.parquetstore.ParquetStore)1 WriteData (uk.gov.gchq.gaffer.parquetstore.operation.handler.spark.utilities.WriteData)1 AggregateDataForGroup (uk.gov.gchq.gaffer.parquetstore.operation.handler.utilities.AggregateDataForGroup)1