Search in sources :

Example 1 with ParquetStoreProperties

use of uk.gov.gchq.gaffer.parquetstore.ParquetStoreProperties in project Gaffer by gchq.

the class QueryGeneratorTest method testQueryGeneratorForGetAllElements.

@Test
public void testQueryGeneratorForGetAllElements(@TempDir java.nio.file.Path tempDir) throws IOException, OperationException {
    // Given
    // - Create snapshot folder
    final String folder = String.format("file:///%s", tempDir.toString());
    final String snapshotFolder = folder + "/" + ParquetStore.getSnapshotPath(1000L);
    // - Write out Parquet files so know the partitioning
    CalculatePartitionerTest.writeData(snapshotFolder, new SchemaUtils(schema));
    // - Initialise store
    final ParquetStoreProperties storeProperties = new ParquetStoreProperties();
    storeProperties.setDataDir(folder);
    storeProperties.setTempFilesDir(folder + "/tmpdata");
    final ParquetStore store = (ParquetStore) ParquetStore.createStore("graphId", schema, storeProperties);
    // When 1 - no view
    GetAllElements getAllElements = new GetAllElements.Builder().build();
    ParquetQuery query = new QueryGenerator(store).getParquetQuery(getAllElements);
    // Then 1
    final List expected = new ArrayList<>();
    for (final String group : Arrays.asList(TestGroups.ENTITY, TestGroups.ENTITY_2, TestGroups.EDGE, TestGroups.EDGE_2)) {
        final Path groupFolderPath = store.getGroupPath(group);
        for (int partition = 0; partition < 10; partition++) {
            final Path pathForPartitionFile = new Path(groupFolderPath, ParquetStore.getFile(partition));
            expected.add(new ParquetFileQuery(pathForPartitionFile, null, true));
        }
    }
    assertThat(expected).containsOnly(query.getAllParquetFileQueries().toArray());
    // When 2 - simple view that restricts to one group
    getAllElements = new GetAllElements.Builder().view(new View.Builder().edge(TestGroups.EDGE).build()).build();
    query = new QueryGenerator(store).getParquetQuery(getAllElements);
    // Then 2
    expected.clear();
    Path groupFolderPath = store.getGroupPath(TestGroups.EDGE);
    for (int partition = 0; partition < 10; partition++) {
        final Path pathForPartitionFile = new Path(groupFolderPath, ParquetStore.getFile(partition));
        expected.add(new ParquetFileQuery(pathForPartitionFile, null, true));
    }
    assertThat(expected).containsOnly(query.getAllParquetFileQueries().toArray());
    // When 3 - view with filter that can be pushed down to Parquet
    getAllElements = new GetAllElements.Builder().view(new View.Builder().edge(TestGroups.EDGE, new ViewElementDefinition.Builder().preAggregationFilter(new ElementFilter.Builder().select("count").execute(new IsMoreThan(10)).build()).build()).build()).build();
    query = new QueryGenerator(store).getParquetQuery(getAllElements);
    // Then 3
    expected.clear();
    for (int partition = 0; partition < 10; partition++) {
        final Path pathForPartitionFile = new Path(groupFolderPath, ParquetStore.getFile(partition));
        expected.add(new ParquetFileQuery(pathForPartitionFile, gt(FilterApi.intColumn("count"), 10), true));
    }
    assertThat(expected).containsOnly(query.getAllParquetFileQueries().toArray());
    // When 4 - view with filter that can't be pushed down to Parquet
    getAllElements = new GetAllElements.Builder().view(new View.Builder().edge(TestGroups.EDGE, new ViewElementDefinition.Builder().preAggregationFilter(new ElementFilter.Builder().select("count").execute(new IsEvenFilter()).build()).build()).build()).build();
    query = new QueryGenerator(store).getParquetQuery(getAllElements);
    // Then 4
    expected.clear();
    for (int partition = 0; partition < 10; partition++) {
        final Path pathForPartitionFile = new Path(groupFolderPath, ParquetStore.getFile(partition));
        expected.add(new ParquetFileQuery(pathForPartitionFile, null, false));
    }
    assertThat(expected).containsOnly(query.getAllParquetFileQueries().toArray());
    // When 5 - view with one filter that can be pushed down and one that can't
    getAllElements = new GetAllElements.Builder().view(new View.Builder().edge(TestGroups.EDGE, new ViewElementDefinition.Builder().preAggregationFilter(new ElementFilter.Builder().select("count").execute(new IsEvenFilter()).select("count").execute(new IsMoreThan(10)).build()).build()).build()).build();
    query = new QueryGenerator(store).getParquetQuery(getAllElements);
    // Then 5
    expected.clear();
    for (int partition = 0; partition < 10; partition++) {
        final Path pathForPartitionFile = new Path(groupFolderPath, ParquetStore.getFile(partition));
        expected.add(new ParquetFileQuery(pathForPartitionFile, gt(FilterApi.intColumn("count"), 10), false));
    }
    assertThat(expected).containsOnly(query.getAllParquetFileQueries().toArray());
}
Also used : ParquetStore(uk.gov.gchq.gaffer.parquetstore.ParquetStore) Path(org.apache.hadoop.fs.Path) ArrayList(java.util.ArrayList) ViewElementDefinition(uk.gov.gchq.gaffer.data.elementdefinition.view.ViewElementDefinition) View(uk.gov.gchq.gaffer.data.elementdefinition.view.View) SchemaUtils(uk.gov.gchq.gaffer.parquetstore.utils.SchemaUtils) ParquetStoreProperties(uk.gov.gchq.gaffer.parquetstore.ParquetStoreProperties) GetAllElements(uk.gov.gchq.gaffer.operation.impl.get.GetAllElements) ArrayList(java.util.ArrayList) List(java.util.List) IsMoreThan(uk.gov.gchq.koryphe.impl.predicate.IsMoreThan) CalculatePartitionerTest(uk.gov.gchq.gaffer.parquetstore.operation.handler.utilities.CalculatePartitionerTest) LongVertexOperationsTest(uk.gov.gchq.gaffer.parquetstore.operation.handler.LongVertexOperationsTest) Test(org.junit.jupiter.api.Test)

Example 2 with ParquetStoreProperties

use of uk.gov.gchq.gaffer.parquetstore.ParquetStoreProperties in project Gaffer by gchq.

the class QueryGeneratorTest method testQueryGeneratorForGetElementsWithEntitySeeds.

@Test
public void testQueryGeneratorForGetElementsWithEntitySeeds(@TempDir java.nio.file.Path tempDir) throws IOException, OperationException {
    // Given
    // - Create snapshot folder
    final String folder = String.format("file:///%s", tempDir.toString());
    final String snapshotFolder = folder + "/" + ParquetStore.getSnapshotPath(1000L);
    // - Write out Parquet files so know the partitioning
    CalculatePartitionerTest.writeData(snapshotFolder, new SchemaUtils(schema));
    // - Initialise store
    final ParquetStoreProperties storeProperties = new ParquetStoreProperties();
    storeProperties.setDataDir(folder);
    storeProperties.setTempFilesDir(folder + "/tmpdata");
    final ParquetStore store = (ParquetStore) ParquetStore.createStore("graphId", schema, storeProperties);
    // When 1 - no view, query for vertex 0
    GetElements getElements = new GetElements.Builder().input(new EntitySeed(0L)).seedMatching(SeedMatching.SeedMatchingType.RELATED).build();
    ParquetQuery query = new QueryGenerator(store).getParquetQuery(getElements);
    // Then 1
    final List expected = new ArrayList<>();
    final FilterPredicate vertex0 = eq(FilterApi.longColumn(ParquetStore.VERTEX), 0L);
    final FilterPredicate source0 = eq(FilterApi.longColumn(ParquetStore.SOURCE), 0L);
    final FilterPredicate destination0 = eq(FilterApi.longColumn(ParquetStore.DESTINATION), 0L);
    for (final String group : Arrays.asList(TestGroups.ENTITY, TestGroups.ENTITY_2)) {
        final Path groupFolderPath = new Path(snapshotFolder, ParquetStore.getGroupSubDir(group, false));
        final Path pathForPartitionFile = new Path(groupFolderPath, ParquetStore.getFile(0));
        expected.add(new ParquetFileQuery(pathForPartitionFile, vertex0, true));
    }
    for (final String group : Arrays.asList(TestGroups.EDGE, TestGroups.EDGE_2)) {
        final Path groupFolderPath = new Path(snapshotFolder, ParquetStore.getGroupSubDir(group, false));
        final Path pathForPartitionFile = new Path(groupFolderPath, ParquetStore.getFile(0));
        expected.add(new ParquetFileQuery(pathForPartitionFile, source0, true));
        final Path reversedGroupFolderPath = new Path(snapshotFolder, ParquetStore.getGroupSubDir(group, true));
        final Path pathForReversedPartitionFile = new Path(reversedGroupFolderPath, ParquetStore.getFile(0));
        expected.add(new ParquetFileQuery(pathForReversedPartitionFile, destination0, true));
    }
    assertThat(expected).containsOnly(query.getAllParquetFileQueries().toArray());
    // When 2 - no view, query for vertices 0 and 1000000
    getElements = new GetElements.Builder().input(new EntitySeed(0L), new EntitySeed(1000000L)).seedMatching(SeedMatching.SeedMatchingType.RELATED).build();
    query = new QueryGenerator(store).getParquetQuery(getElements);
    // Then 2
    expected.clear();
    final FilterPredicate vertex1000000 = eq(FilterApi.longColumn(ParquetStore.VERTEX), 1000000L);
    final FilterPredicate source1000000 = eq(FilterApi.longColumn(ParquetStore.SOURCE), 1000000L);
    final FilterPredicate destination1000000 = eq(FilterApi.longColumn(ParquetStore.DESTINATION), 1000000L);
    for (final String group : Arrays.asList(TestGroups.ENTITY, TestGroups.ENTITY_2)) {
        final Path groupFolderPath = new Path(snapshotFolder, ParquetStore.getGroupSubDir(group, false));
        final Path pathForPartitionFile1 = new Path(groupFolderPath, ParquetStore.getFile(0));
        expected.add(new ParquetFileQuery(pathForPartitionFile1, vertex0, true));
        final Path pathForPartitionFile2 = new Path(groupFolderPath, ParquetStore.getFile(9));
        expected.add(new ParquetFileQuery(pathForPartitionFile2, vertex1000000, true));
    }
    for (final String group : Arrays.asList(TestGroups.EDGE, TestGroups.EDGE_2)) {
        final Path groupFolderPath = new Path(snapshotFolder, ParquetStore.getGroupSubDir(group, false));
        final Path reversedGroupFolderPath = new Path(snapshotFolder, ParquetStore.getGroupSubDir(group, true));
        // Partition 0, vertex 0L
        final Path pathForPartitionFile1 = new Path(groupFolderPath, ParquetStore.getFile(0));
        expected.add(new ParquetFileQuery(pathForPartitionFile1, source0, true));
        // Partition 9, vertex 1000000L
        final Path pathForPartitionFile2 = new Path(groupFolderPath, ParquetStore.getFile(9));
        expected.add(new ParquetFileQuery(pathForPartitionFile2, source1000000, true));
        // Partition 0 of reversed, vertex 0L
        final Path pathForPartitionFile3 = new Path(reversedGroupFolderPath, ParquetStore.getFile(0));
        expected.add(new ParquetFileQuery(pathForPartitionFile3, destination0, true));
        // Partition 9 of reversed, vertex 1000000L
        final Path pathForPartitionFile4 = new Path(reversedGroupFolderPath, ParquetStore.getFile(9));
        expected.add(new ParquetFileQuery(pathForPartitionFile4, destination1000000, true));
    }
    assertThat(expected).containsOnly(query.getAllParquetFileQueries().toArray());
    // When 3 - view with filter that can be pushed down to Parquet, query for vertices 0 and 1000000
    getElements = new GetElements.Builder().input(new EntitySeed(0L), new EntitySeed(1000000L)).seedMatching(SeedMatching.SeedMatchingType.RELATED).view(new View.Builder().edge(TestGroups.EDGE, new ViewElementDefinition.Builder().preAggregationFilter(new ElementFilter.Builder().select("count").execute(new IsMoreThan(10)).build()).build()).build()).build();
    query = new QueryGenerator(store).getParquetQuery(getElements);
    // Then 3
    expected.clear();
    final FilterPredicate source0AndCount = and(gt(FilterApi.intColumn("count"), 10), eq(FilterApi.longColumn(ParquetStore.SOURCE), 0L));
    final FilterPredicate source1000000AndCount = and(gt(FilterApi.intColumn("count"), 10), eq(FilterApi.longColumn(ParquetStore.SOURCE), 1000000L));
    final FilterPredicate destination0AndCount = and(gt(FilterApi.intColumn("count"), 10), eq(FilterApi.longColumn(ParquetStore.DESTINATION), 0L));
    final FilterPredicate destination1000000AndCount = and(gt(FilterApi.intColumn("count"), 10), eq(FilterApi.longColumn(ParquetStore.DESTINATION), 1000000L));
    final Path groupFolderPath = new Path(snapshotFolder, ParquetStore.getGroupSubDir(TestGroups.EDGE, false));
    final Path reversedGroupFolderPath = new Path(snapshotFolder, ParquetStore.getGroupSubDir(TestGroups.EDGE, true));
    // Partition 0, vertex 0L
    final Path pathForPartitionFile1 = new Path(groupFolderPath, ParquetStore.getFile(0));
    expected.add(new ParquetFileQuery(pathForPartitionFile1, source0AndCount, true));
    // Partition 9, vertex 1000000L
    final Path pathForPartitionFile2 = new Path(groupFolderPath, ParquetStore.getFile(9));
    expected.add(new ParquetFileQuery(pathForPartitionFile2, source1000000AndCount, true));
    // Partition 0 of reversed, vertex 0L
    final Path pathForPartitionFile3 = new Path(reversedGroupFolderPath, ParquetStore.getFile(0));
    expected.add(new ParquetFileQuery(pathForPartitionFile3, destination0AndCount, true));
    // Partition 9 of reversed, vertex 1000000L
    final Path pathForPartitionFile4 = new Path(reversedGroupFolderPath, ParquetStore.getFile(9));
    expected.add(new ParquetFileQuery(pathForPartitionFile4, destination1000000AndCount, true));
    assertThat(expected).containsOnly(query.getAllParquetFileQueries().toArray());
    // When 4 - view with filter that can't be pushed down to Parquet, query for vertices 0 and 1000000
    getElements = new GetElements.Builder().input(new EntitySeed(0L), new EntitySeed(1000000L)).seedMatching(SeedMatching.SeedMatchingType.RELATED).view(new View.Builder().edge(TestGroups.EDGE, new ViewElementDefinition.Builder().preAggregationFilter(new ElementFilter.Builder().select("count").execute(new IsEvenFilter()).build()).build()).build()).build();
    query = new QueryGenerator(store).getParquetQuery(getElements);
    // Then 4
    expected.clear();
    // Partition 0, vertex 0L
    expected.add(new ParquetFileQuery(pathForPartitionFile1, source0, false));
    // Partition 9, vertex 1000000L
    expected.add(new ParquetFileQuery(pathForPartitionFile2, source1000000, false));
    // Partition 0 of reversed, vertex 0L
    expected.add(new ParquetFileQuery(pathForPartitionFile3, destination0, false));
    // Partition 9 of reversed, vertex 1000000L
    expected.add(new ParquetFileQuery(pathForPartitionFile4, destination1000000, false));
    assertThat(expected).containsOnly(query.getAllParquetFileQueries().toArray());
}
Also used : ParquetStore(uk.gov.gchq.gaffer.parquetstore.ParquetStore) Path(org.apache.hadoop.fs.Path) ArrayList(java.util.ArrayList) GetElements(uk.gov.gchq.gaffer.operation.impl.get.GetElements) ViewElementDefinition(uk.gov.gchq.gaffer.data.elementdefinition.view.ViewElementDefinition) View(uk.gov.gchq.gaffer.data.elementdefinition.view.View) SchemaUtils(uk.gov.gchq.gaffer.parquetstore.utils.SchemaUtils) ParquetStoreProperties(uk.gov.gchq.gaffer.parquetstore.ParquetStoreProperties) EntitySeed(uk.gov.gchq.gaffer.operation.data.EntitySeed) ArrayList(java.util.ArrayList) List(java.util.List) FilterPredicate(org.apache.parquet.filter2.predicate.FilterPredicate) IsMoreThan(uk.gov.gchq.koryphe.impl.predicate.IsMoreThan) CalculatePartitionerTest(uk.gov.gchq.gaffer.parquetstore.operation.handler.utilities.CalculatePartitionerTest) LongVertexOperationsTest(uk.gov.gchq.gaffer.parquetstore.operation.handler.LongVertexOperationsTest) Test(org.junit.jupiter.api.Test)

Example 3 with ParquetStoreProperties

use of uk.gov.gchq.gaffer.parquetstore.ParquetStoreProperties in project Gaffer by gchq.

the class QueryGeneratorTest method testQueryGeneratorForGetElementsWithEdgeSeeds.

@Test
public void testQueryGeneratorForGetElementsWithEdgeSeeds(@TempDir java.nio.file.Path tempDir) throws IOException, OperationException {
    // Given
    // - Create snapshot folder
    final String folder = String.format("file:///%s", tempDir.toString());
    final String snapshotFolder = folder + "/" + ParquetStore.getSnapshotPath(1000L);
    // - Write out Parquet files so know the partitioning
    CalculatePartitionerTest.writeData(snapshotFolder, new SchemaUtils(schema));
    // - Initialise store
    final ParquetStoreProperties storeProperties = new ParquetStoreProperties();
    storeProperties.setDataDir(folder);
    storeProperties.setTempFilesDir(folder + "/tmpdata");
    final ParquetStore store = (ParquetStore) ParquetStore.createStore("graphId", schema, storeProperties);
    // When 1 - no view, query for edges 0->1, 10--10000, 10000--10 with seed matching type set to EQUAL
    GetElements getElements = new GetElements.Builder().input(new EdgeSeed(0L, 1L, DirectedType.DIRECTED), new EdgeSeed(10L, 1000L, DirectedType.UNDIRECTED), new EdgeSeed(10000L, 10L, DirectedType.EITHER)).seedMatching(SeedMatching.SeedMatchingType.EQUAL).build();
    ParquetQuery query = new QueryGenerator(store).getParquetQuery(getElements);
    // Then 1
    final List expected = new ArrayList<>();
    final FilterPredicate source0 = eq(FilterApi.longColumn(ParquetStore.SOURCE), 0L);
    final FilterPredicate source10 = eq(FilterApi.longColumn(ParquetStore.SOURCE), 10L);
    final FilterPredicate source10000 = eq(FilterApi.longColumn(ParquetStore.SOURCE), 10000L);
    final FilterPredicate destination1 = eq(FilterApi.longColumn(ParquetStore.DESTINATION), 1L);
    final FilterPredicate destination10 = eq(FilterApi.longColumn(ParquetStore.DESTINATION), 10L);
    final FilterPredicate destination1000 = eq(FilterApi.longColumn(ParquetStore.DESTINATION), 1000L);
    final FilterPredicate directedTrue = eq(FilterApi.booleanColumn(ParquetStore.DIRECTED), true);
    final FilterPredicate directedFalse = eq(FilterApi.booleanColumn(ParquetStore.DIRECTED), false);
    final FilterPredicate source0Destination1DirectedTrue = and(and(source0, destination1), directedTrue);
    final FilterPredicate source10Destination1000DirectedFalse = and(and(source10, destination1000), directedFalse);
    final FilterPredicate source10000Destination10DirectedEither = and(source10000, destination10);
    for (final String group : Arrays.asList(TestGroups.EDGE, TestGroups.EDGE_2)) {
        final Path groupFolderPath = new Path(snapshotFolder, ParquetStore.getGroupSubDir(group, false));
        // 0->1 partition 0 of forward
        final Path pathForPartition0File = new Path(groupFolderPath, ParquetStore.getFile(0));
        // Comment here that don't need to look in the reversed directory
        expected.add(new ParquetFileQuery(pathForPartition0File, source0Destination1DirectedTrue, true));
        // 10--1000 partition 1 of forward
        final Path pathForPartition1File = new Path(groupFolderPath, ParquetStore.getFile(1));
        // Comment here that don't need to look in the reversed directory
        expected.add(new ParquetFileQuery(pathForPartition1File, source10Destination1000DirectedFalse, true));
        // 10000--10 partition 9 of forward
        final Path pathForPartition9File = new Path(groupFolderPath, ParquetStore.getFile(9));
        // Comment here that don't need to look in the reversed directory
        expected.add(new ParquetFileQuery(pathForPartition9File, source10000Destination10DirectedEither, true));
    }
    assertThat(expected).containsOnly(query.getAllParquetFileQueries().toArray());
    // When 2 - no view, query for edges 0->1, 10--10000, 10000--10 with seed matching type set to RELATED
    getElements = new GetElements.Builder().input(new EdgeSeed(0L, 1L, DirectedType.DIRECTED), new EdgeSeed(10L, 1000L, DirectedType.UNDIRECTED), new EdgeSeed(10000L, 10L, DirectedType.EITHER)).seedMatching(SeedMatching.SeedMatchingType.RELATED).build();
    query = new QueryGenerator(store).getParquetQuery(getElements);
    // Then 2
    expected.clear();
    final FilterPredicate vertex0 = eq(FilterApi.longColumn(ParquetStore.VERTEX), 0L);
    final FilterPredicate vertex1 = eq(FilterApi.longColumn(ParquetStore.VERTEX), 1L);
    final FilterPredicate vertex10 = eq(FilterApi.longColumn(ParquetStore.VERTEX), 10L);
    final FilterPredicate vertex1000 = eq(FilterApi.longColumn(ParquetStore.VERTEX), 1000L);
    final FilterPredicate vertex10000 = eq(FilterApi.longColumn(ParquetStore.VERTEX), 10000L);
    final FilterPredicate vertex0or1 = or(vertex0, vertex1);
    final FilterPredicate vertex10or1000 = or(vertex10, vertex1000);
    final FilterPredicate vertex10000or10 = or(vertex10000, vertex10);
    for (final String group : Arrays.asList(TestGroups.ENTITY, TestGroups.ENTITY_2)) {
        final Path groupFolderPath = new Path(snapshotFolder, ParquetStore.getGroupSubDir(group, false));
        // 0 and 1 in partition 0
        final Path pathForPartition0File = new Path(groupFolderPath, ParquetStore.getFile(0));
        expected.add(new ParquetFileQuery(pathForPartition0File, vertex0or1, true));
        // 10 or 1000 and 10000 or 10 in partition 1 (NB 1000 and 10000 't appear in partition 1 but this doesn't cause any incorrect results, and will be fixed in later versions)
        final Path pathForPartition1File = new Path(groupFolderPath, ParquetStore.getFile(1));
        expected.add(new ParquetFileQuery(pathForPartition1File, or(vertex10or1000, vertex10000or10), true));
        // 10 or 1000 and 1000 or 10000 in partition 9
        final Path pathForPartition9File = new Path(groupFolderPath, ParquetStore.getFile(9));
        expected.add(new ParquetFileQuery(pathForPartition9File, or(vertex10or1000, vertex10000or10), true));
    }
    for (final String group : Arrays.asList(TestGroups.EDGE, TestGroups.EDGE_2)) {
        final Path groupFolderPath = new Path(snapshotFolder, ParquetStore.getGroupSubDir(group, false));
        // 0->1 partition 0 of forward
        final Path pathForPartition0File = new Path(groupFolderPath, ParquetStore.getFile(0));
        // Comment here that don't need to look in the reversed directory
        expected.add(new ParquetFileQuery(pathForPartition0File, source0Destination1DirectedTrue, true));
        // 10--1000 partition 1 of forward
        final Path pathForPartition1File = new Path(groupFolderPath, ParquetStore.getFile(1));
        // Comment here that don't need to look in the reversed directory
        expected.add(new ParquetFileQuery(pathForPartition1File, source10Destination1000DirectedFalse, true));
        // 10000--10 partition 9 of forward
        final Path pathForPartition9File = new Path(groupFolderPath, ParquetStore.getFile(9));
        // Comment here that don't need to look in the reversed directory
        expected.add(new ParquetFileQuery(pathForPartition9File, source10000Destination10DirectedEither, true));
    }
    assertThat(expected).containsOnly(query.getAllParquetFileQueries().toArray());
}
Also used : ParquetStore(uk.gov.gchq.gaffer.parquetstore.ParquetStore) Path(org.apache.hadoop.fs.Path) ArrayList(java.util.ArrayList) GetElements(uk.gov.gchq.gaffer.operation.impl.get.GetElements) SchemaUtils(uk.gov.gchq.gaffer.parquetstore.utils.SchemaUtils) ParquetStoreProperties(uk.gov.gchq.gaffer.parquetstore.ParquetStoreProperties) EdgeSeed(uk.gov.gchq.gaffer.operation.data.EdgeSeed) ArrayList(java.util.ArrayList) List(java.util.List) FilterPredicate(org.apache.parquet.filter2.predicate.FilterPredicate) CalculatePartitionerTest(uk.gov.gchq.gaffer.parquetstore.operation.handler.utilities.CalculatePartitionerTest) LongVertexOperationsTest(uk.gov.gchq.gaffer.parquetstore.operation.handler.LongVertexOperationsTest) Test(org.junit.jupiter.api.Test)

Example 4 with ParquetStoreProperties

use of uk.gov.gchq.gaffer.parquetstore.ParquetStoreProperties in project Gaffer by gchq.

the class AddElementsHandlerTest method testOnePartitionAllGroups.

@Test
public void testOnePartitionAllGroups(@TempDir java.nio.file.Path tempDir) throws IOException, OperationException, StoreException {
    // Given
    final List<Element> elementsToAdd = new ArrayList<>();
    // - Data for TestGroups.ENTITY
    elementsToAdd.addAll(AggregateAndSortDataTest.generateData());
    elementsToAdd.addAll(AggregateAndSortDataTest.generateData());
    // - Data for TestGroups.ENTITY_2
    elementsToAdd.add(WriteUnsortedDataTest.createEntityForEntityGroup_2(10000L));
    elementsToAdd.add(WriteUnsortedDataTest.createEntityForEntityGroup_2(100L));
    elementsToAdd.add(WriteUnsortedDataTest.createEntityForEntityGroup_2(10L));
    elementsToAdd.add(WriteUnsortedDataTest.createEntityForEntityGroup_2(1L));
    // - Data for TestGroups.EDGE
    elementsToAdd.add(WriteUnsortedDataTest.createEdgeForEdgeGroup(10000L, 1000L, true, new Date(100L)));
    elementsToAdd.add(WriteUnsortedDataTest.createEdgeForEdgeGroup(100L, 100000L, false, new Date(200L)));
    elementsToAdd.add(WriteUnsortedDataTest.createEdgeForEdgeGroup(1L, 10L, true, new Date(300L)));
    elementsToAdd.add(WriteUnsortedDataTest.createEdgeForEdgeGroup(1L, 10L, true, new Date(400L)));
    elementsToAdd.add(WriteUnsortedDataTest.createEdgeForEdgeGroup(1L, 10L, false, new Date(400L)));
    elementsToAdd.add(WriteUnsortedDataTest.createEdgeForEdgeGroup(1L, 2L, false, new Date(400L)));
    // - Data for TestGroups.EDGE_2
    elementsToAdd.add(WriteUnsortedDataTest.createEdgeForEdgeGroup_2(10000L, 20L, true));
    elementsToAdd.add(WriteUnsortedDataTest.createEdgeForEdgeGroup_2(100L, 200L, false));
    elementsToAdd.add(WriteUnsortedDataTest.createEdgeForEdgeGroup_2(10L, 50L, true));
    elementsToAdd.add(WriteUnsortedDataTest.createEdgeForEdgeGroup_2(1L, 2000L, false));
    // - Shuffle the list so that the order is random
    Collections.shuffle(elementsToAdd);
    final AddElements add = new AddElements.Builder().input(elementsToAdd).build();
    final Context context = new Context();
    final Schema schema = TestUtils.gafferSchema("schemaUsingLongVertexType");
    final ParquetStoreProperties storeProperties = new ParquetStoreProperties();
    final String testDir = tempDir.toString();
    storeProperties.setDataDir(testDir + "/data");
    storeProperties.setTempFilesDir(testDir + "/tmpdata");
    final ParquetStore store = (ParquetStore) ParquetStore.createStore("graphId", schema, storeProperties);
    final FileSystem fs = FileSystem.get(new Configuration());
    final SparkSession sparkSession = SparkSessionProvider.getSparkSession();
    // When
    new AddElementsHandler().doOperation(add, context, store);
    // Then
    // - New snapshot directory should have been created.
    final long snapshotId = store.getLatestSnapshot();
    final Path snapshotPath = new Path(testDir + "/data", ParquetStore.getSnapshotPath(snapshotId));
    assertTrue(fs.exists(snapshotPath));
    // - There should be 1 file named partition-0.parquet (and an associated .crc file) in the "group=BasicEntity"
    // directory.
    assertTrue(fs.exists(new Path(snapshotPath, ParquetStore.getGroupSubDir(TestGroups.ENTITY, false) + "/" + ParquetStore.getFile(0))));
    assertTrue(fs.exists(new Path(snapshotPath, ParquetStore.getGroupSubDir(TestGroups.ENTITY, false) + "/." + ParquetStore.getFile(0) + ".crc")));
    // - The files should contain the data sorted by vertex and date.
    Row[] results = (Row[]) sparkSession.read().parquet(new Path(snapshotPath, ParquetStore.getGroupSubDir(TestGroups.ENTITY, false) + "/" + ParquetStore.getFile(0)).toString()).collect();
    assertThat(results).hasSize(40);
    for (int i = 0; i < 40; i++) {
        assertEquals((long) i / 2, (long) results[i].getAs(ParquetStore.VERTEX));
        assertEquals(i % 2 == 0 ? 'b' : 'a', ((byte[]) results[i].getAs("byte"))[0]);
        assertEquals(i % 2 == 0 ? 8f : 6f, results[i].getAs("float"), 0.01f);
        assertEquals(11L * 2 * (i / 2), (long) results[i].getAs("long"));
        assertEquals(i % 2 == 0 ? 14 : 12, (int) results[i].getAs("short"));
        assertEquals(i % 2 == 0 ? 100000L : 200000L, (long) results[i].getAs("date"));
        assertEquals(2, (int) results[i].getAs("count"));
        assertArrayEquals(i % 2 == 0 ? new String[] { "A", "C" } : new String[] { "A", "B" }, (String[]) ((WrappedArray<String>) results[i].getAs("treeSet")).array());
        final FreqMap mergedFreqMap1 = new FreqMap();
        mergedFreqMap1.put("A", 2L);
        mergedFreqMap1.put("B", 2L);
        final FreqMap mergedFreqMap2 = new FreqMap();
        mergedFreqMap2.put("A", 2L);
        mergedFreqMap2.put("C", 2L);
        assertEquals(JavaConversions$.MODULE$.mapAsScalaMap(i % 2 == 0 ? mergedFreqMap2 : mergedFreqMap1), results[i].getAs("freqMap"));
    }
    // - There should be 1 file named partition-0.parquet (and an associated .crc file) in the "group=BasicEntity2"
    // directory.
    assertTrue(fs.exists(new Path(snapshotPath, ParquetStore.getGroupSubDir(TestGroups.ENTITY_2, false) + "/" + ParquetStore.getFile(0))));
    assertTrue(fs.exists(new Path(snapshotPath, ParquetStore.getGroupSubDir(TestGroups.ENTITY_2, false) + "/." + ParquetStore.getFile(0) + ".crc")));
    // - The files should contain the data sorted by vertex.
    results = (Row[]) sparkSession.read().parquet(new Path(snapshotPath, "graph/group=BasicEntity2/" + ParquetStore.getFile(0)).toString()).collect();
    assertThat(results).hasSize(4);
    checkEntityGroup2(WriteUnsortedDataTest.createEntityForEntityGroup_2(1L), results[0]);
    checkEntityGroup2(WriteUnsortedDataTest.createEntityForEntityGroup_2(10L), results[1]);
    checkEntityGroup2(WriteUnsortedDataTest.createEntityForEntityGroup_2(100L), results[2]);
    checkEntityGroup2(WriteUnsortedDataTest.createEntityForEntityGroup_2(10000L), results[3]);
    // - There should be 1 file named partition-0.parquet (and an associated .crc file) in the "group=BasicEdge"
    // directory and in the "reversed-group=BasicEdge" directory.
    assertTrue(fs.exists(new Path(snapshotPath, ParquetStore.getGroupSubDir(TestGroups.EDGE, false) + "/" + ParquetStore.getFile(0))));
    assertTrue(fs.exists(new Path(snapshotPath, ParquetStore.getGroupSubDir(TestGroups.EDGE, false) + "/." + ParquetStore.getFile(0) + ".crc")));
    assertTrue(fs.exists(new Path(snapshotPath, ParquetStore.getGroupSubDir(TestGroups.EDGE, true) + "/" + ParquetStore.getFile(0))));
    assertTrue(fs.exists(new Path(snapshotPath, ParquetStore.getGroupSubDir(TestGroups.EDGE, true) + "/." + ParquetStore.getFile(0) + ".crc")));
    // - The files should contain the data sorted by source, destination, directed, date
    results = (Row[]) sparkSession.read().parquet(new Path(snapshotPath, ParquetStore.getGroupSubDir(TestGroups.EDGE, false) + "/" + ParquetStore.getFile(0)).toString()).collect();
    assertThat(results).hasSize(6);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup(1L, 2L, false, new Date(400L)), results[0]);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup(1L, 10L, false, new Date(400L)), results[1]);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup(1L, 10L, true, new Date(300L)), results[2]);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup(1L, 10L, true, new Date(400L)), results[3]);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup(100L, 100000L, false, new Date(200L)), results[4]);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup(10000L, 1000L, true, new Date(100L)), results[5]);
    results = (Row[]) sparkSession.read().parquet(new Path(snapshotPath, ParquetStore.getGroupSubDir(TestGroups.EDGE, true) + "/" + ParquetStore.getFile(0)).toString()).collect();
    assertThat(results).hasSize(6);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup(1L, 2L, false, new Date(400L)), results[0]);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup(1L, 10L, false, new Date(400L)), results[1]);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup(1L, 10L, true, new Date(300L)), results[2]);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup(1L, 10L, true, new Date(400L)), results[3]);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup(10000L, 1000L, true, new Date(100L)), results[4]);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup(100L, 100000L, false, new Date(200L)), results[5]);
    // - There should be 1 file named partition-0.parquet (and an associated .crc file) in the "group=BasicEdge2"
    // directory and in the "reversed-group=BasicEdge2" directory.
    assertTrue(fs.exists(new Path(snapshotPath, ParquetStore.getGroupSubDir(TestGroups.EDGE_2, false) + "/" + ParquetStore.getFile(0))));
    assertTrue(fs.exists(new Path(snapshotPath, ParquetStore.getGroupSubDir(TestGroups.EDGE_2, false) + "/." + ParquetStore.getFile(0) + ".crc")));
    // - The files should contain the data sorted by source, destination, directed
    results = (Row[]) sparkSession.read().parquet(new Path(snapshotPath, ParquetStore.getGroupSubDir(TestGroups.EDGE_2, false) + "/" + ParquetStore.getFile(0)).toString()).collect();
    assertThat(results).hasSize(4);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup_2(1L, 2000L, false), results[0]);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup_2(10L, 50L, true), results[1]);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup_2(100L, 200L, false), results[2]);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup_2(10000L, 20L, true), results[3]);
    results = (Row[]) sparkSession.read().parquet(new Path(snapshotPath, ParquetStore.getGroupSubDir(TestGroups.EDGE_2, true) + "/" + ParquetStore.getFile(0)).toString()).collect();
    assertThat(results).hasSize(4);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup_2(10000L, 20L, true), results[0]);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup_2(10L, 50L, true), results[1]);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup_2(100L, 200L, false), results[2]);
    checkEdge(WriteUnsortedDataTest.createEdgeForEdgeGroup_2(1L, 2000L, false), results[3]);
}
Also used : AddElements(uk.gov.gchq.gaffer.operation.impl.add.AddElements) Context(uk.gov.gchq.gaffer.store.Context) ParquetStore(uk.gov.gchq.gaffer.parquetstore.ParquetStore) Path(org.apache.hadoop.fs.Path) SparkSession(org.apache.spark.sql.SparkSession) Configuration(org.apache.hadoop.conf.Configuration) FreqMap(uk.gov.gchq.gaffer.types.FreqMap) Element(uk.gov.gchq.gaffer.data.element.Element) Schema(uk.gov.gchq.gaffer.store.schema.Schema) ArrayList(java.util.ArrayList) Date(java.util.Date) WrappedArray(scala.collection.mutable.WrappedArray) ParquetStoreProperties(uk.gov.gchq.gaffer.parquetstore.ParquetStoreProperties) FileSystem(org.apache.hadoop.fs.FileSystem) Row(org.apache.spark.sql.Row) WriteUnsortedDataTest(uk.gov.gchq.gaffer.parquetstore.utils.WriteUnsortedDataTest) Test(org.junit.jupiter.api.Test) AggregateAndSortDataTest(uk.gov.gchq.gaffer.parquetstore.utils.AggregateAndSortDataTest)

Example 5 with ParquetStoreProperties

use of uk.gov.gchq.gaffer.parquetstore.ParquetStoreProperties in project Gaffer by gchq.

the class AbstractSparkOperationsTest method createGraph.

protected Graph createGraph(final Path tempDir, final int numOutputFiles) throws IOException {
    final ParquetStoreProperties storeProperties = TestUtils.getParquetStoreProperties(tempDir);
    storeProperties.setAddElementsOutputFilesPerGroup(numOutputFiles);
    return createGraph(storeProperties);
}
Also used : ParquetStoreProperties(uk.gov.gchq.gaffer.parquetstore.ParquetStoreProperties)

Aggregations

ParquetStoreProperties (uk.gov.gchq.gaffer.parquetstore.ParquetStoreProperties)10 Test (org.junit.jupiter.api.Test)7 ArrayList (java.util.ArrayList)6 Path (org.apache.hadoop.fs.Path)6 ParquetStore (uk.gov.gchq.gaffer.parquetstore.ParquetStore)6 Element (uk.gov.gchq.gaffer.data.element.Element)4 AddElements (uk.gov.gchq.gaffer.operation.impl.add.AddElements)4 List (java.util.List)3 Configuration (org.apache.hadoop.conf.Configuration)3 FileSystem (org.apache.hadoop.fs.FileSystem)3 Row (org.apache.spark.sql.Row)3 SparkSession (org.apache.spark.sql.SparkSession)3 WrappedArray (scala.collection.mutable.WrappedArray)3 LongVertexOperationsTest (uk.gov.gchq.gaffer.parquetstore.operation.handler.LongVertexOperationsTest)3 CalculatePartitionerTest (uk.gov.gchq.gaffer.parquetstore.operation.handler.utilities.CalculatePartitionerTest)3 AggregateAndSortDataTest (uk.gov.gchq.gaffer.parquetstore.utils.AggregateAndSortDataTest)3 SchemaUtils (uk.gov.gchq.gaffer.parquetstore.utils.SchemaUtils)3 WriteUnsortedDataTest (uk.gov.gchq.gaffer.parquetstore.utils.WriteUnsortedDataTest)3 Context (uk.gov.gchq.gaffer.store.Context)3 Schema (uk.gov.gchq.gaffer.store.schema.Schema)3