Search in sources :

Example 1 with PartitionKey

use of uk.gov.gchq.gaffer.parquetstore.partitioner.PartitionKey in project Gaffer by gchq.

the class CalculatePartitionerTest method calculatePartitionerTest.

@Test
public void calculatePartitionerTest(@TempDir java.nio.file.Path tempDir) throws IOException {
    // Given
    final FileSystem fs = FileSystem.get(new Configuration());
    final Schema schema = getSchema();
    final SchemaUtils schemaUtils = new SchemaUtils(schema);
    final String topLevelFolder = tempDir.toString();
    writeData(topLevelFolder, schemaUtils);
    // When
    // - Calculate partitioner from files
    final GraphPartitioner actual = new CalculatePartitioner(new Path(topLevelFolder), schema, fs).call();
    // - Manually create the correct partitioner
    final GraphPartitioner expected = new GraphPartitioner();
    final List<PartitionKey> splitPointsEntity = new ArrayList<>();
    for (int i = 1; i < 10; i++) {
        splitPointsEntity.add(new PartitionKey(new Object[] { 10L * i }));
    }
    final GroupPartitioner groupPartitionerEntity = new GroupPartitioner(TestGroups.ENTITY, splitPointsEntity);
    expected.addGroupPartitioner(TestGroups.ENTITY, groupPartitionerEntity);
    final GroupPartitioner groupPartitionerEntity2 = new GroupPartitioner(TestGroups.ENTITY_2, splitPointsEntity);
    expected.addGroupPartitioner(TestGroups.ENTITY_2, groupPartitionerEntity2);
    final List<PartitionKey> splitPointsEdge = new ArrayList<>();
    for (int i = 1; i < 10; i++) {
        splitPointsEdge.add(new PartitionKey(new Object[] { 10L * i, 10L * i + 1, true }));
    }
    final GroupPartitioner groupPartitionerEdge = new GroupPartitioner(TestGroups.EDGE, splitPointsEdge);
    expected.addGroupPartitioner(TestGroups.EDGE, groupPartitionerEdge);
    final GroupPartitioner groupPartitionerEdge2 = new GroupPartitioner(TestGroups.EDGE_2, splitPointsEdge);
    expected.addGroupPartitioner(TestGroups.EDGE_2, groupPartitionerEdge2);
    final List<PartitionKey> splitPointsReversedEdge = new ArrayList<>();
    for (int i = 1; i < 10; i++) {
        splitPointsReversedEdge.add(new PartitionKey(new Object[] { 10L * i + 1, 10L * i, true }));
    }
    final GroupPartitioner reversedGroupPartitionerEdge = new GroupPartitioner(TestGroups.EDGE, splitPointsReversedEdge);
    expected.addGroupPartitionerForReversedEdges(TestGroups.EDGE, reversedGroupPartitionerEdge);
    final GroupPartitioner reversedGroupPartitionerEdge2 = new GroupPartitioner(TestGroups.EDGE_2, splitPointsReversedEdge);
    expected.addGroupPartitionerForReversedEdges(TestGroups.EDGE_2, reversedGroupPartitionerEdge2);
    // Then
    assertEquals(expected, actual);
}
Also used : Path(org.apache.hadoop.fs.Path) GroupPartitioner(uk.gov.gchq.gaffer.parquetstore.partitioner.GroupPartitioner) Configuration(org.apache.hadoop.conf.Configuration) Schema(uk.gov.gchq.gaffer.store.schema.Schema) ArrayList(java.util.ArrayList) SchemaUtils(uk.gov.gchq.gaffer.parquetstore.utils.SchemaUtils) GraphPartitioner(uk.gov.gchq.gaffer.parquetstore.partitioner.GraphPartitioner) FileSystem(org.apache.hadoop.fs.FileSystem) PartitionKey(uk.gov.gchq.gaffer.parquetstore.partitioner.PartitionKey) Test(org.junit.jupiter.api.Test)

Example 2 with PartitionKey

use of uk.gov.gchq.gaffer.parquetstore.partitioner.PartitionKey in project Gaffer by gchq.

the class PartitionKeySerialiserTest method testEmptyPartitionKey.

@Test
public void testEmptyPartitionKey(@TempDir java.nio.file.Path tempDir) throws IOException {
    // Given
    final Object[] key = new Object[] {};
    final PartitionKey partitionKey = new PartitionKey(key);
    final PartitionKeySerialiser serialiser = new PartitionKeySerialiser();
    // When
    final String filename = tempDir.resolve("testEmptyPartitionKey").toString();
    final DataOutputStream dos = new DataOutputStream(new FileOutputStream(filename));
    serialiser.write(partitionKey, dos);
    dos.close();
    final DataInputStream dis = new DataInputStream(new FileInputStream(filename));
    final PartitionKey readPartitionKey = serialiser.read(dis);
    dis.close();
    // Then
    assertArrayEquals(key, readPartitionKey.getPartitionKey());
}
Also used : DataOutputStream(java.io.DataOutputStream) FileOutputStream(java.io.FileOutputStream) PartitionKey(uk.gov.gchq.gaffer.parquetstore.partitioner.PartitionKey) PositiveInfinityPartitionKey(uk.gov.gchq.gaffer.parquetstore.partitioner.PositiveInfinityPartitionKey) NegativeInfinityPartitionKey(uk.gov.gchq.gaffer.parquetstore.partitioner.NegativeInfinityPartitionKey) DataInputStream(java.io.DataInputStream) FileInputStream(java.io.FileInputStream) Test(org.junit.jupiter.api.Test)

Example 3 with PartitionKey

use of uk.gov.gchq.gaffer.parquetstore.partitioner.PartitionKey in project Gaffer by gchq.

the class PartitionKeySerialiserTest method testWithInfinitePartitionKey.

@Test
public void testWithInfinitePartitionKey(@TempDir java.nio.file.Path tempDir) throws IOException {
    // Given
    final PartitionKey negativeInfinity = new NegativeInfinityPartitionKey();
    final PartitionKey positiveInfinity = new PositiveInfinityPartitionKey();
    final PartitionKeySerialiser serialiser = new PartitionKeySerialiser();
    // When
    final String filename = tempDir.resolve("test").toString();
    final DataOutputStream dos = new DataOutputStream(new FileOutputStream(filename));
    serialiser.write(negativeInfinity, dos);
    serialiser.write(positiveInfinity, dos);
    dos.close();
    final DataInputStream dis = new DataInputStream(new FileInputStream(filename));
    final PartitionKey readPartitionKey1 = serialiser.read(dis);
    final PartitionKey readPartitionKey2 = serialiser.read(dis);
    dis.close();
    // Then
    assertEquals(negativeInfinity, readPartitionKey1);
    assertEquals(positiveInfinity, readPartitionKey2);
}
Also used : NegativeInfinityPartitionKey(uk.gov.gchq.gaffer.parquetstore.partitioner.NegativeInfinityPartitionKey) PositiveInfinityPartitionKey(uk.gov.gchq.gaffer.parquetstore.partitioner.PositiveInfinityPartitionKey) DataOutputStream(java.io.DataOutputStream) FileOutputStream(java.io.FileOutputStream) PartitionKey(uk.gov.gchq.gaffer.parquetstore.partitioner.PartitionKey) PositiveInfinityPartitionKey(uk.gov.gchq.gaffer.parquetstore.partitioner.PositiveInfinityPartitionKey) NegativeInfinityPartitionKey(uk.gov.gchq.gaffer.parquetstore.partitioner.NegativeInfinityPartitionKey) DataInputStream(java.io.DataInputStream) FileInputStream(java.io.FileInputStream) Test(org.junit.jupiter.api.Test)

Example 4 with PartitionKey

use of uk.gov.gchq.gaffer.parquetstore.partitioner.PartitionKey in project Gaffer by gchq.

the class GroupPartitionerSerialiserTest method shouldSerialiseKeysToFileAndReadCorrectly.

@Test
public void shouldSerialiseKeysToFileAndReadCorrectly(@TempDir Path tempDir) throws IOException {
    // Given
    final Object[] key1 = new Object[] { 1L, 5, "ABC", 10F, (short) 1, (byte) 64, new byte[] { (byte) 1, (byte) 2, (byte) 3 } };
    final PartitionKey partitionKey1 = new PartitionKey(key1);
    final Object[] key2 = new Object[] { 100L, 500, "XYZ", 1000F, (short) 3, (byte) 55, new byte[] { (byte) 10, (byte) 9, (byte) 8, (byte) 7 } };
    final PartitionKey partitionKey2 = new PartitionKey(key2);
    final List<PartitionKey> splitPoints = new ArrayList<>();
    splitPoints.add(partitionKey1);
    splitPoints.add(partitionKey2);
    final GroupPartitioner groupPartitioner = new GroupPartitioner("GROUP", splitPoints);
    final GroupPartitionerSerialiser serialiser = new GroupPartitionerSerialiser();
    // When
    final String filename = tempDir.resolve("test").toString();
    final DataOutputStream dos = new DataOutputStream(new FileOutputStream(filename));
    serialiser.write(groupPartitioner, dos);
    dos.close();
    final DataInputStream dis = new DataInputStream(new FileInputStream(filename));
    final GroupPartitioner readGroupPartitioner = serialiser.read(dis);
    dis.close();
    // Then
    assertEquals(groupPartitioner, readGroupPartitioner);
}
Also used : GroupPartitioner(uk.gov.gchq.gaffer.parquetstore.partitioner.GroupPartitioner) DataOutputStream(java.io.DataOutputStream) FileOutputStream(java.io.FileOutputStream) ArrayList(java.util.ArrayList) PartitionKey(uk.gov.gchq.gaffer.parquetstore.partitioner.PartitionKey) DataInputStream(java.io.DataInputStream) FileInputStream(java.io.FileInputStream) Test(org.junit.jupiter.api.Test)

Example 5 with PartitionKey

use of uk.gov.gchq.gaffer.parquetstore.partitioner.PartitionKey in project Gaffer by gchq.

the class GroupPartitionerSerialiser method write.

public void write(final GroupPartitioner groupPartitioner, final DataOutputStream stream) throws IOException {
    stream.writeUTF(groupPartitioner.getGroup());
    stream.writeInt(groupPartitioner.getSplitPoints().size());
    for (final PartitionKey partitionKey : groupPartitioner.getSplitPoints()) {
        partitionKeySerialiser.write(partitionKey, stream);
    }
}
Also used : PartitionKey(uk.gov.gchq.gaffer.parquetstore.partitioner.PartitionKey)

Aggregations

PartitionKey (uk.gov.gchq.gaffer.parquetstore.partitioner.PartitionKey)12 Test (org.junit.jupiter.api.Test)8 ArrayList (java.util.ArrayList)6 GroupPartitioner (uk.gov.gchq.gaffer.parquetstore.partitioner.GroupPartitioner)6 DataInputStream (java.io.DataInputStream)5 DataOutputStream (java.io.DataOutputStream)5 FileInputStream (java.io.FileInputStream)5 FileOutputStream (java.io.FileOutputStream)5 Element (uk.gov.gchq.gaffer.data.element.Element)5 GraphPartitioner (uk.gov.gchq.gaffer.parquetstore.partitioner.GraphPartitioner)5 FileSystem (org.apache.hadoop.fs.FileSystem)4 Path (org.apache.hadoop.fs.Path)4 IOException (java.io.IOException)3 Arrays (java.util.Arrays)3 List (java.util.List)3 Configuration (org.apache.hadoop.conf.Configuration)3 FileStatus (org.apache.hadoop.fs.FileStatus)3 ParquetReader (org.apache.parquet.hadoop.ParquetReader)3 Edge (uk.gov.gchq.gaffer.data.element.Edge)3 ParquetElementReader (uk.gov.gchq.gaffer.parquetstore.io.reader.ParquetElementReader)3