Search in sources :

Example 1 with FiltersToOperationConverter

use of uk.gov.gchq.gaffer.spark.operation.dataframe.FiltersToOperationConverter in project Gaffer by gchq.

the class AccumuloStoreRelation method buildScan.

/**
     * Creates a <code>DataFrame</code> of all {@link Element}s from the specified groups with columns that are not
     * required filtered out and with (some of) the supplied {@link Filter}s applied.
     * <p>
     * Note that Spark also applies the provided {@link Filter}s - applying them here is an optimisation to reduce
     * the amount of data transferred from the store to Spark's executors (this is known as "predicate pushdown").
     * <p>
     * Currently this does not push the projection down to the store (i.e. it should be implemented in an iterator,
     * not in the transform). Issue 320 refers to this.
     *
     * @param requiredColumns The columns to return.
     * @param filters         The {@link Filter}s to apply (these are applied before aggregation).
     * @return An {@link RDD} of {@link Row}s containing the requested columns.
     */
@Override
public RDD<Row> buildScan(final String[] requiredColumns, final Filter[] filters) {
    LOGGER.info("Building scan with required columns {} and {} filters ({})", StringUtils.join(requiredColumns, ','), filters.length, StringUtils.join(filters, ','));
    AbstractGetRDD<?> operation = new FiltersToOperationConverter(sqlContext, view, store.getSchema(), filters).getOperation();
    if (operation == null) {
        // and there is no group X in the schema).
        return sqlContext.emptyDataFrame().rdd();
    }
    try {
        final RDD<Element> rdd = store.execute(operation, user);
        return rdd.map(new ConvertElementToRow(new LinkedHashSet<>(Arrays.asList(requiredColumns)), propertyNeedsConversion, converterByProperty), ClassTagConstants.ROW_CLASS_TAG);
    } catch (final OperationException e) {
        LOGGER.error("OperationException while executing operation {}", e);
        return null;
    }
}
Also used : LinkedHashSet(java.util.LinkedHashSet) Element(uk.gov.gchq.gaffer.data.element.Element) FiltersToOperationConverter(uk.gov.gchq.gaffer.spark.operation.dataframe.FiltersToOperationConverter) OperationException(uk.gov.gchq.gaffer.operation.OperationException) ConvertElementToRow(uk.gov.gchq.gaffer.spark.operation.dataframe.ConvertElementToRow)

Aggregations

LinkedHashSet (java.util.LinkedHashSet)1 Element (uk.gov.gchq.gaffer.data.element.Element)1 OperationException (uk.gov.gchq.gaffer.operation.OperationException)1 ConvertElementToRow (uk.gov.gchq.gaffer.spark.operation.dataframe.ConvertElementToRow)1 FiltersToOperationConverter (uk.gov.gchq.gaffer.spark.operation.dataframe.FiltersToOperationConverter)1