Search in sources :

Example 6 with SamplerConfiguration

use of org.apache.accumulo.core.client.sample.SamplerConfiguration in project accumulo-examples by apache.

the class SampleExample method main.

public static void main(String[] args) throws Exception {
    Opts opts = new Opts();
    BatchWriterOpts bwOpts = new BatchWriterOpts();
    opts.parseArgs(RandomBatchWriter.class.getName(), args, bwOpts);
    Connector conn = opts.getConnector();
    if (!conn.tableOperations().exists(opts.getTableName())) {
        conn.tableOperations().create(opts.getTableName());
    } else {
        System.out.println("Table exists, not doing anything.");
        return;
    }
    // write some data
    BatchWriter bw = conn.createBatchWriter(opts.getTableName(), bwOpts.getBatchWriterConfig());
    bw.addMutation(createMutation("9225", "abcde", "file://foo.txt"));
    bw.addMutation(createMutation("8934", "accumulo scales", "file://accumulo_notes.txt"));
    bw.addMutation(createMutation("2317", "milk, eggs, bread, parmigiano-reggiano", "file://groceries/9/txt"));
    bw.addMutation(createMutation("3900", "EC2 ate my homework", "file://final_project.txt"));
    bw.flush();
    SamplerConfiguration sc1 = new SamplerConfiguration(RowSampler.class.getName());
    sc1.setOptions(ImmutableMap.of("hasher", "murmur3_32", "modulus", "3"));
    conn.tableOperations().setSamplerConfiguration(opts.getTableName(), sc1);
    Scanner scanner = conn.createScanner(opts.getTableName(), Authorizations.EMPTY);
    System.out.println("Scanning all data :");
    print(scanner);
    System.out.println();
    System.out.println("Scanning with sampler configuration.  Data was written before sampler was set on table, scan should fail.");
    scanner.setSamplerConfiguration(sc1);
    try {
        print(scanner);
    } catch (SampleNotPresentException e) {
        System.out.println("  Saw sample not present exception as expected.");
    }
    System.out.println();
    // compact table to recreate sample data
    conn.tableOperations().compact(opts.getTableName(), new CompactionConfig().setCompactionStrategy(NO_SAMPLE_STRATEGY));
    System.out.println("Scanning after compaction (compaction should have created sample data) : ");
    print(scanner);
    System.out.println();
    // update a document in the sample data
    bw.addMutation(createMutation("2317", "milk, eggs, bread, parmigiano-reggiano, butter", "file://groceries/9/txt"));
    bw.close();
    System.out.println("Scanning sample after updating content for docId 2317 (should see content change in sample data) : ");
    print(scanner);
    System.out.println();
    // change tables sampling configuration...
    SamplerConfiguration sc2 = new SamplerConfiguration(RowSampler.class.getName());
    sc2.setOptions(ImmutableMap.of("hasher", "murmur3_32", "modulus", "2"));
    conn.tableOperations().setSamplerConfiguration(opts.getTableName(), sc2);
    // compact table to recreate sample data using new configuration
    conn.tableOperations().compact(opts.getTableName(), new CompactionConfig().setCompactionStrategy(NO_SAMPLE_STRATEGY));
    System.out.println("Scanning with old sampler configuration.  Sample data was created using new configuration with a compaction.  Scan should fail.");
    try {
        // try scanning with old sampler configuration
        print(scanner);
    } catch (SampleNotPresentException e) {
        System.out.println("  Saw sample not present exception as expected ");
    }
    System.out.println();
    // update expected sampler configuration on scanner
    scanner.setSamplerConfiguration(sc2);
    System.out.println("Scanning with new sampler configuration : ");
    print(scanner);
    System.out.println();
}
Also used : RowSampler(org.apache.accumulo.core.client.sample.RowSampler) Connector(org.apache.accumulo.core.client.Connector) Scanner(org.apache.accumulo.core.client.Scanner) SampleNotPresentException(org.apache.accumulo.core.client.SampleNotPresentException) BatchWriterOpts(org.apache.accumulo.examples.cli.BatchWriterOpts) RandomBatchWriter(org.apache.accumulo.examples.client.RandomBatchWriter) SamplerConfiguration(org.apache.accumulo.core.client.sample.SamplerConfiguration) CompactionConfig(org.apache.accumulo.core.client.admin.CompactionConfig) BatchWriterOpts(org.apache.accumulo.examples.cli.BatchWriterOpts) RandomBatchWriter(org.apache.accumulo.examples.client.RandomBatchWriter) BatchWriter(org.apache.accumulo.core.client.BatchWriter)

Example 7 with SamplerConfiguration

use of org.apache.accumulo.core.client.sample.SamplerConfiguration in project accumulo-examples by apache.

the class CutoffIntersectingIterator method setMax.

private void setMax(IteratorEnvironment sampleEnv, Map<String, String> options) {
    String cutoffValue = options.get("cutoff");
    SamplerConfiguration sampleConfig = sampleEnv.getSamplerConfiguration();
    // Ensure the sample was constructed in an expected way. If the sample is not built as expected, then can not draw conclusions based on sample.
    requireNonNull(cutoffValue, "Expected cutoff option is missing");
    validateSamplerConfig(sampleConfig);
    int modulus = Integer.parseInt(sampleConfig.getOptions().get("modulus"));
    sampleMax = Math.round(Float.parseFloat(cutoffValue) / modulus);
}
Also used : SamplerConfiguration(org.apache.accumulo.core.client.sample.SamplerConfiguration)

Example 8 with SamplerConfiguration

use of org.apache.accumulo.core.client.sample.SamplerConfiguration in project accumulo by apache.

the class InputConfigurator method getDefaultInputTableConfig.

/**
 * Returns the {@link org.apache.accumulo.core.client.mapreduce.InputTableConfig} for the configuration based on the properties set using the single-table
 * input methods.
 *
 * @param implementingClass
 *          the class whose name will be used as a prefix for the property configuration key
 * @param conf
 *          the Hadoop instance for which to retrieve the configuration
 * @return the config object built from the single input table properties set on the job
 * @since 1.6.0
 */
protected static Map.Entry<String, InputTableConfig> getDefaultInputTableConfig(Class<?> implementingClass, Configuration conf) {
    String tableName = getInputTableName(implementingClass, conf);
    if (tableName != null) {
        InputTableConfig queryConfig = new InputTableConfig();
        List<IteratorSetting> itrs = getIterators(implementingClass, conf);
        if (itrs != null)
            queryConfig.setIterators(itrs);
        Set<Pair<Text, Text>> columns = getFetchedColumns(implementingClass, conf);
        if (columns != null)
            queryConfig.fetchColumns(columns);
        List<Range> ranges = null;
        try {
            ranges = getRanges(implementingClass, conf);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
        if (ranges != null)
            queryConfig.setRanges(ranges);
        SamplerConfiguration samplerConfig = getSamplerConfiguration(implementingClass, conf);
        if (samplerConfig != null) {
            queryConfig.setSamplerConfiguration(samplerConfig);
        }
        queryConfig.setAutoAdjustRanges(getAutoAdjustRanges(implementingClass, conf)).setUseIsolatedScanners(isIsolated(implementingClass, conf)).setUseLocalIterators(usesLocalIterators(implementingClass, conf)).setOfflineScan(isOfflineScan(implementingClass, conf));
        return Maps.immutableEntry(tableName, queryConfig);
    }
    return null;
}
Also used : InputTableConfig(org.apache.accumulo.core.client.mapreduce.InputTableConfig) IteratorSetting(org.apache.accumulo.core.client.IteratorSetting) SamplerConfiguration(org.apache.accumulo.core.client.sample.SamplerConfiguration) IOException(java.io.IOException) Range(org.apache.accumulo.core.data.Range) Pair(org.apache.accumulo.core.util.Pair)

Example 9 with SamplerConfiguration

use of org.apache.accumulo.core.client.sample.SamplerConfiguration in project accumulo by apache.

the class AccumuloFileOutputFormatTest method validateConfiguration.

@Test
public void validateConfiguration() throws IOException, InterruptedException {
    int a = 7;
    long b = 300l;
    long c = 50l;
    long d = 10l;
    String e = "snappy";
    SamplerConfiguration samplerConfig = new SamplerConfiguration(RowSampler.class.getName());
    samplerConfig.addOption("hasher", "murmur3_32");
    samplerConfig.addOption("modulus", "109");
    SummarizerConfiguration sc1 = SummarizerConfiguration.builder(VisibilitySummarizer.class).addOption(CountingSummarizer.MAX_COUNTERS_OPT, 2048).build();
    SummarizerConfiguration sc2 = SummarizerConfiguration.builder(FamilySummarizer.class).addOption(CountingSummarizer.MAX_COUNTERS_OPT, 256).build();
    Job job1 = Job.getInstance();
    AccumuloFileOutputFormat.setReplication(job1, a);
    AccumuloFileOutputFormat.setFileBlockSize(job1, b);
    AccumuloFileOutputFormat.setDataBlockSize(job1, c);
    AccumuloFileOutputFormat.setIndexBlockSize(job1, d);
    AccumuloFileOutputFormat.setCompressionType(job1, e);
    AccumuloFileOutputFormat.setSampler(job1, samplerConfig);
    AccumuloFileOutputFormat.setSummarizers(job1, sc1, sc2);
    AccumuloConfiguration acuconf = FileOutputConfigurator.getAccumuloConfiguration(AccumuloFileOutputFormat.class, job1.getConfiguration());
    assertEquals(7, acuconf.getCount(Property.TABLE_FILE_REPLICATION));
    assertEquals(300l, acuconf.getAsBytes(Property.TABLE_FILE_BLOCK_SIZE));
    assertEquals(50l, acuconf.getAsBytes(Property.TABLE_FILE_COMPRESSED_BLOCK_SIZE));
    assertEquals(10l, acuconf.getAsBytes(Property.TABLE_FILE_COMPRESSED_BLOCK_SIZE_INDEX));
    assertEquals("snappy", acuconf.get(Property.TABLE_FILE_COMPRESSION_TYPE));
    assertEquals(new SamplerConfigurationImpl(samplerConfig), SamplerConfigurationImpl.newSamplerConfig(acuconf));
    Collection<SummarizerConfiguration> summarizerConfigs = SummarizerConfiguration.fromTableProperties(acuconf);
    assertEquals(2, summarizerConfigs.size());
    assertTrue(summarizerConfigs.contains(sc1));
    assertTrue(summarizerConfigs.contains(sc2));
    a = 17;
    b = 1300l;
    c = 150l;
    d = 110l;
    e = "lzo";
    samplerConfig = new SamplerConfiguration(RowSampler.class.getName());
    samplerConfig.addOption("hasher", "md5");
    samplerConfig.addOption("modulus", "100003");
    Job job2 = Job.getInstance();
    AccumuloFileOutputFormat.setReplication(job2, a);
    AccumuloFileOutputFormat.setFileBlockSize(job2, b);
    AccumuloFileOutputFormat.setDataBlockSize(job2, c);
    AccumuloFileOutputFormat.setIndexBlockSize(job2, d);
    AccumuloFileOutputFormat.setCompressionType(job2, e);
    AccumuloFileOutputFormat.setSampler(job2, samplerConfig);
    acuconf = FileOutputConfigurator.getAccumuloConfiguration(AccumuloFileOutputFormat.class, job2.getConfiguration());
    assertEquals(17, acuconf.getCount(Property.TABLE_FILE_REPLICATION));
    assertEquals(1300l, acuconf.getAsBytes(Property.TABLE_FILE_BLOCK_SIZE));
    assertEquals(150l, acuconf.getAsBytes(Property.TABLE_FILE_COMPRESSED_BLOCK_SIZE));
    assertEquals(110l, acuconf.getAsBytes(Property.TABLE_FILE_COMPRESSED_BLOCK_SIZE_INDEX));
    assertEquals("lzo", acuconf.get(Property.TABLE_FILE_COMPRESSION_TYPE));
    assertEquals(new SamplerConfigurationImpl(samplerConfig), SamplerConfigurationImpl.newSamplerConfig(acuconf));
    summarizerConfigs = SummarizerConfiguration.fromTableProperties(acuconf);
    assertEquals(0, summarizerConfigs.size());
}
Also used : RowSampler(org.apache.accumulo.core.client.sample.RowSampler) SamplerConfigurationImpl(org.apache.accumulo.core.sample.impl.SamplerConfigurationImpl) SamplerConfiguration(org.apache.accumulo.core.client.sample.SamplerConfiguration) Job(org.apache.hadoop.mapreduce.Job) SummarizerConfiguration(org.apache.accumulo.core.client.summary.SummarizerConfiguration) AccumuloConfiguration(org.apache.accumulo.core.conf.AccumuloConfiguration) Test(org.junit.Test)

Example 10 with SamplerConfiguration

use of org.apache.accumulo.core.client.sample.SamplerConfiguration in project accumulo by apache.

the class SortedMapIteratorTest method testSampleNotPresent.

@Test(expected = SampleNotPresentException.class)
public void testSampleNotPresent() {
    SortedMapIterator smi = new SortedMapIterator(new TreeMap<>());
    smi.deepCopy(new BaseIteratorEnvironment() {

        @Override
        public boolean isSamplingEnabled() {
            return true;
        }

        @Override
        public SamplerConfiguration getSamplerConfiguration() {
            return new SamplerConfiguration(RowSampler.class.getName());
        }
    });
}
Also used : BaseIteratorEnvironment(org.apache.accumulo.core.client.impl.BaseIteratorEnvironment) SamplerConfiguration(org.apache.accumulo.core.client.sample.SamplerConfiguration) Test(org.junit.Test)

Aggregations

SamplerConfiguration (org.apache.accumulo.core.client.sample.SamplerConfiguration)12 SampleNotPresentException (org.apache.accumulo.core.client.SampleNotPresentException)4 RowSampler (org.apache.accumulo.core.client.sample.RowSampler)4 Test (org.junit.Test)4 SummarizerConfiguration (org.apache.accumulo.core.client.summary.SummarizerConfiguration)3 AccumuloConfiguration (org.apache.accumulo.core.conf.AccumuloConfiguration)3 Key (org.apache.accumulo.core.data.Key)3 Value (org.apache.accumulo.core.data.Value)3 SamplerConfigurationImpl (org.apache.accumulo.core.sample.impl.SamplerConfigurationImpl)3 Connector (org.apache.accumulo.core.client.Connector)2 Scanner (org.apache.accumulo.core.client.Scanner)2 Configuration (org.apache.hadoop.conf.Configuration)2 IOException (java.io.IOException)1 TreeMap (java.util.TreeMap)1 BatchScanner (org.apache.accumulo.core.client.BatchScanner)1 BatchWriter (org.apache.accumulo.core.client.BatchWriter)1 IteratorSetting (org.apache.accumulo.core.client.IteratorSetting)1 ScannerBase (org.apache.accumulo.core.client.ScannerBase)1 CompactionConfig (org.apache.accumulo.core.client.admin.CompactionConfig)1 NewTableConfiguration (org.apache.accumulo.core.client.admin.NewTableConfiguration)1