Examples with AddElementsFromHdfs - uk.gov.gchq.gaffer.hdfs.operation.AddElementsFromHdfs

Example 11 with AddElementsFromHdfs

use of uk.gov.gchq.gaffer.hdfs.operation.AddElementsFromHdfs in project Gaffer by gchq.

the class AccumuloAddElementsFromHdfsJobFactoryTest method shouldSetNoMoreThanMaxNumberOfReducersSpecified.

@Test
public void shouldSetNoMoreThanMaxNumberOfReducersSpecified() throws IOException, StoreException, OperationException {
    // Given
    store.initialise("graphId", SCHEMA, PROPERTIES);
    final JobConf localConf = createLocalConf();
    final FileSystem fs = FileSystem.getLocal(localConf);
    fs.mkdirs(new Path(outputDir));
    fs.mkdirs(new Path(splitsDir));
    final BufferedWriter writer = new BufferedWriter(new FileWriter(splitsFile.toString()));
    for (int i = 100; i < 200; i++) {
        writer.write(i + "\n");
    }
    writer.close();
    final SplitStoreFromFile splitTable = new SplitStoreFromFile.Builder().inputPath(splitsFile.toString()).build();
    store.execute(splitTable, new Context(new User()));
    final AccumuloAddElementsFromHdfsJobFactory factory = getJobFactory();
    final Job job = Job.getInstance(localConf);
    // When
    AddElementsFromHdfs operation = new AddElementsFromHdfs.Builder().outputPath(outputDir.toString()).addInputMapperPair(inputDir.toString(), TextMapperGeneratorImpl.class.getName()).maxReducers(10).splitsFilePath("target/data/splits.txt").build();
    factory.setupJob(job, operation, TextMapperGeneratorImpl.class.getName(), store);
    // Then
    assertTrue(job.getNumReduceTasks() <= 10);
    // When
    operation = new AddElementsFromHdfs.Builder().outputPath(outputDir.toString()).addInputMapperPair(inputDir.toString(), TextMapperGeneratorImpl.class.getName()).maxReducers(100).splitsFilePath("target/data/splits.txt").build();
    factory.setupJob(job, operation, TextMapperGeneratorImpl.class.getName(), store);
    // Then
    assertTrue(job.getNumReduceTasks() <= 100);
    // When
    operation = new AddElementsFromHdfs.Builder().outputPath(outputDir.toString()).addInputMapperPair(inputDir.toString(), TextMapperGeneratorImpl.class.getName()).maxReducers(1000).splitsFilePath("target/data/splits.txt").build();
    factory.setupJob(job, operation, TextMapperGeneratorImpl.class.getName(), store);
    // Then
    assertTrue(job.getNumReduceTasks() <= 1000);
}

Also used : Path(org.apache.hadoop.fs.Path) Context(uk.gov.gchq.gaffer.store.Context) AddElementsFromHdfs(uk.gov.gchq.gaffer.hdfs.operation.AddElementsFromHdfs) User(uk.gov.gchq.gaffer.user.User) FileWriter(java.io.FileWriter) SplitStoreFromFile(uk.gov.gchq.gaffer.operation.impl.SplitStoreFromFile) BufferedWriter(java.io.BufferedWriter) FileSystem(org.apache.hadoop.fs.FileSystem) Job(org.apache.hadoop.mapreduce.Job) JobConf(org.apache.hadoop.mapred.JobConf) Test(org.junit.jupiter.api.Test) AbstractJobFactoryTest(uk.gov.gchq.gaffer.hdfs.operation.hander.job.factory.AbstractJobFactoryTest)

Example 12 with AddElementsFromHdfs

use of uk.gov.gchq.gaffer.hdfs.operation.AddElementsFromHdfs in project Gaffer by gchq.

the class AccumuloAddElementsFromHdfsJobFactoryTest method setupAccumuloPartitionerWithGivenPartitioner.

private void setupAccumuloPartitionerWithGivenPartitioner(final Class<? extends Partitioner> partitioner) throws IOException {
    // Given
    final JobConf localConf = createLocalConf();
    final FileSystem fs = FileSystem.getLocal(localConf);
    fs.mkdirs(new Path(outputDir));
    fs.mkdirs(new Path(splitsDir));
    try (final BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(fs.create(new Path(splitsFile))))) {
        writer.write("1");
    }
    final AccumuloAddElementsFromHdfsJobFactory factory = getJobFactory();
    final Job job = mock(Job.class);
    final AddElementsFromHdfs operation = new AddElementsFromHdfs.Builder().outputPath(outputDir).partitioner(partitioner).useProvidedSplits(true).splitsFilePath(splitsFile).build();
    final AccumuloStore store = mock(AccumuloStore.class);
    given(job.getConfiguration()).willReturn(localConf);
    // When
    factory.setupJob(job, operation, TextMapperGeneratorImpl.class.getName(), store);
    // Then
    if (NoPartitioner.class.equals(partitioner)) {
        verify(job, never()).setNumReduceTasks(Mockito.anyInt());
        verify(job, never()).setPartitionerClass(Mockito.any(Class.class));
        assertNull(job.getConfiguration().get(GafferRangePartitioner.class.getName() + ".cutFile"));
    } else {
        verify(job).setNumReduceTasks(2);
        verify(job).setPartitionerClass(GafferKeyRangePartitioner.class);
        assertEquals(splitsFile, job.getConfiguration().get(GafferRangePartitioner.class.getName() + ".cutFile"));
    }
}

Also used : Path(org.apache.hadoop.fs.Path) AddElementsFromHdfs(uk.gov.gchq.gaffer.hdfs.operation.AddElementsFromHdfs) BufferedWriter(java.io.BufferedWriter) FileSystem(org.apache.hadoop.fs.FileSystem) OutputStreamWriter(java.io.OutputStreamWriter) SingleUseMiniAccumuloStore(uk.gov.gchq.gaffer.accumulostore.SingleUseMiniAccumuloStore) AccumuloStore(uk.gov.gchq.gaffer.accumulostore.AccumuloStore) Job(org.apache.hadoop.mapreduce.Job) GafferRangePartitioner(uk.gov.gchq.gaffer.accumulostore.operation.hdfs.handler.job.partitioner.GafferRangePartitioner) JobConf(org.apache.hadoop.mapred.JobConf)

Aggregations

AddElementsFromHdfs (uk.gov.gchq.gaffer.hdfs.operation.AddElementsFromHdfs)12 BufferedWriter (java.io.BufferedWriter)7 FileSystem (org.apache.hadoop.fs.FileSystem)7 Path (org.apache.hadoop.fs.Path)7 JobConf (org.apache.hadoop.mapred.JobConf)7 Job (org.apache.hadoop.mapreduce.Job)7 Test (org.junit.jupiter.api.Test)6 User (uk.gov.gchq.gaffer.user.User)6 AbstractJobFactoryTest (uk.gov.gchq.gaffer.hdfs.operation.hander.job.factory.AbstractJobFactoryTest)5 Context (uk.gov.gchq.gaffer.store.Context)5 FileWriter (java.io.FileWriter)4 TextJobInitialiser (uk.gov.gchq.gaffer.hdfs.operation.handler.job.initialiser.TextJobInitialiser)4 SplitStoreFromFile (uk.gov.gchq.gaffer.operation.impl.SplitStoreFromFile)4 SuppressFBWarnings (edu.umd.cs.findbugs.annotations.SuppressFBWarnings)3 OutputStreamWriter (java.io.OutputStreamWriter)3 AccumuloStore (uk.gov.gchq.gaffer.accumulostore.AccumuloStore)3 SingleUseMiniAccumuloStore (uk.gov.gchq.gaffer.accumulostore.SingleUseMiniAccumuloStore)2 GafferRangePartitioner (uk.gov.gchq.gaffer.accumulostore.operation.hdfs.handler.job.partitioner.GafferRangePartitioner)2 Graph (uk.gov.gchq.gaffer.graph.Graph)2 HashMap (java.util.HashMap)1