Search in sources :

Example 6 with ConvertElementToRow

use of uk.gov.gchq.gaffer.spark.operation.dataframe.ConvertElementToRow in project Gaffer by gchq.

the class AccumuloStoreRelation method buildScan.

/**
     * Creates a <code>DataFrame</code> of all {@link Element}s from the specified groups with columns that are not
     * required filtered out and with (some of) the supplied {@link Filter}s applied.
     * <p>
     * Note that Spark also applies the provided {@link Filter}s - applying them here is an optimisation to reduce
     * the amount of data transferred from the store to Spark's executors (this is known as "predicate pushdown").
     * <p>
     * Currently this does not push the projection down to the store (i.e. it should be implemented in an iterator,
     * not in the transform). Issue 320 refers to this.
     *
     * @param requiredColumns The columns to return.
     * @param filters         The {@link Filter}s to apply (these are applied before aggregation).
     * @return An {@link RDD} of {@link Row}s containing the requested columns.
     */
@Override
public RDD<Row> buildScan(final String[] requiredColumns, final Filter[] filters) {
    LOGGER.info("Building scan with required columns {} and {} filters ({})", StringUtils.join(requiredColumns, ','), filters.length, StringUtils.join(filters, ','));
    AbstractGetRDD<?> operation = new FiltersToOperationConverter(sqlContext, view, store.getSchema(), filters).getOperation();
    if (operation == null) {
        // and there is no group X in the schema).
        return sqlContext.emptyDataFrame().rdd();
    }
    try {
        final RDD<Element> rdd = store.execute(operation, user);
        return rdd.map(new ConvertElementToRow(new LinkedHashSet<>(Arrays.asList(requiredColumns)), propertyNeedsConversion, converterByProperty), ClassTagConstants.ROW_CLASS_TAG);
    } catch (final OperationException e) {
        LOGGER.error("OperationException while executing operation {}", e);
        return null;
    }
}
Also used : LinkedHashSet(java.util.LinkedHashSet) Element(uk.gov.gchq.gaffer.data.element.Element) FiltersToOperationConverter(uk.gov.gchq.gaffer.spark.operation.dataframe.FiltersToOperationConverter) OperationException(uk.gov.gchq.gaffer.operation.OperationException) ConvertElementToRow(uk.gov.gchq.gaffer.spark.operation.dataframe.ConvertElementToRow)

Aggregations

ConvertElementToRow (uk.gov.gchq.gaffer.spark.operation.dataframe.ConvertElementToRow)6 LinkedHashSet (java.util.LinkedHashSet)5 HashSet (java.util.HashSet)3 Row (org.apache.spark.sql.Row)3 SQLContext (org.apache.spark.sql.SQLContext)3 AccumuloProperties (uk.gov.gchq.gaffer.accumulostore.AccumuloProperties)3 SingleUseMockAccumuloStore (uk.gov.gchq.gaffer.accumulostore.SingleUseMockAccumuloStore)3 Element (uk.gov.gchq.gaffer.data.element.Element)3 OperationException (uk.gov.gchq.gaffer.operation.OperationException)3 SchemaToStructTypeConverter (uk.gov.gchq.gaffer.spark.operation.dataframe.converter.schema.SchemaToStructTypeConverter)3 Schema (uk.gov.gchq.gaffer.store.schema.Schema)3 User (uk.gov.gchq.gaffer.user.User)3 GetRDDOfAllElements (uk.gov.gchq.gaffer.spark.operation.scalardd.GetRDDOfAllElements)2 FiltersToOperationConverter (uk.gov.gchq.gaffer.spark.operation.dataframe.FiltersToOperationConverter)1