use of uk.gov.gchq.gaffer.spark.operation.dataframe.ConvertElementToRow in project Gaffer by gchq.
the class AccumuloStoreRelation method buildScan.
/**
* Creates a <code>DataFrame</code> of all {@link Element}s from the specified groups with columns that are not
* required filtered out and with (some of) the supplied {@link Filter}s applied.
* <p>
* Note that Spark also applies the provided {@link Filter}s - applying them here is an optimisation to reduce
* the amount of data transferred from the store to Spark's executors (this is known as "predicate pushdown").
* <p>
* Currently this does not push the projection down to the store (i.e. it should be implemented in an iterator,
* not in the transform). Issue 320 refers to this.
*
* @param requiredColumns The columns to return.
* @param filters The {@link Filter}s to apply (these are applied before aggregation).
* @return An {@link RDD} of {@link Row}s containing the requested columns.
*/
@Override
public RDD<Row> buildScan(final String[] requiredColumns, final Filter[] filters) {
LOGGER.info("Building scan with required columns {} and {} filters ({})", StringUtils.join(requiredColumns, ','), filters.length, StringUtils.join(filters, ','));
AbstractGetRDD<?> operation = new FiltersToOperationConverter(sqlContext, view, store.getSchema(), filters).getOperation();
if (operation == null) {
// and there is no group X in the schema).
return sqlContext.emptyDataFrame().rdd();
}
try {
final RDD<Element> rdd = store.execute(operation, user);
return rdd.map(new ConvertElementToRow(new LinkedHashSet<>(Arrays.asList(requiredColumns)), propertyNeedsConversion, converterByProperty), ClassTagConstants.ROW_CLASS_TAG);
} catch (final OperationException e) {
LOGGER.error("OperationException while executing operation {}", e);
return null;
}
}
Aggregations