Search in sources :

Example 1 with Progressable

use of org.apache.hadoop.util.Progressable in project hadoop by apache.

the class FileSystemTestWrapper method create.

@Override
public FSDataOutputStream create(Path f, EnumSet<CreateFlag> createFlag, CreateOpts... opts) throws AccessControlException, FileAlreadyExistsException, FileNotFoundException, ParentNotDirectoryException, UnsupportedFileSystemException, IOException {
    // Need to translate the FileContext-style options into FileSystem-style
    // Permissions with umask
    CreateOpts.Perms permOpt = CreateOpts.getOpt(CreateOpts.Perms.class, opts);
    FsPermission umask = FsPermission.getUMask(fs.getConf());
    FsPermission permission = (permOpt != null) ? permOpt.getValue() : FsPermission.getFileDefault().applyUMask(umask);
    permission = permission.applyUMask(umask);
    // Overwrite
    boolean overwrite = createFlag.contains(CreateFlag.OVERWRITE);
    // bufferSize
    int bufferSize = fs.getConf().getInt(CommonConfigurationKeysPublic.IO_FILE_BUFFER_SIZE_KEY, CommonConfigurationKeysPublic.IO_FILE_BUFFER_SIZE_DEFAULT);
    CreateOpts.BufferSize bufOpt = CreateOpts.getOpt(CreateOpts.BufferSize.class, opts);
    bufferSize = (bufOpt != null) ? bufOpt.getValue() : bufferSize;
    // replication
    short replication = fs.getDefaultReplication(f);
    CreateOpts.ReplicationFactor repOpt = CreateOpts.getOpt(CreateOpts.ReplicationFactor.class, opts);
    replication = (repOpt != null) ? repOpt.getValue() : replication;
    // blockSize
    long blockSize = fs.getDefaultBlockSize(f);
    CreateOpts.BlockSize blockOpt = CreateOpts.getOpt(CreateOpts.BlockSize.class, opts);
    blockSize = (blockOpt != null) ? blockOpt.getValue() : blockSize;
    // Progressable
    Progressable progress = null;
    CreateOpts.Progress progressOpt = CreateOpts.getOpt(CreateOpts.Progress.class, opts);
    progress = (progressOpt != null) ? progressOpt.getValue() : progress;
    return fs.create(f, permission, overwrite, bufferSize, replication, blockSize, progress);
}
Also used : CreateOpts(org.apache.hadoop.fs.Options.CreateOpts) Progressable(org.apache.hadoop.util.Progressable) FsPermission(org.apache.hadoop.fs.permission.FsPermission) BlockSize(org.apache.hadoop.fs.Options.CreateOpts.BlockSize)

Example 2 with Progressable

use of org.apache.hadoop.util.Progressable in project camel by apache.

the class HdfsConsumerTest method testReadStringArrayFile.

@Test
public void testReadStringArrayFile() throws Exception {
    if (!canTest()) {
        return;
    }
    final Path file = new Path(new File("target/test/test-camel-string").getAbsolutePath());
    Configuration conf = new Configuration();
    FileSystem fs1 = FileSystem.get(file.toUri(), conf);
    ArrayFile.Writer writer = new ArrayFile.Writer(conf, fs1, "target/test/test-camel-string1", Text.class, CompressionType.NONE, new Progressable() {

        @Override
        public void progress() {
        }
    });
    Text valueWritable = new Text();
    String value = "CIAO!";
    valueWritable.set(value);
    writer.append(valueWritable);
    writer.close();
    MockEndpoint resultEndpoint = context.getEndpoint("mock:result", MockEndpoint.class);
    resultEndpoint.expectedMessageCount(1);
    context.addRoutes(new RouteBuilder() {

        public void configure() {
            from("hdfs2:localhost/" + file.getParent().toUri() + "?fileSystemType=LOCAL&fileType=ARRAY_FILE&initialDelay=0").to("mock:result");
        }
    });
    context.start();
    resultEndpoint.assertIsSatisfied();
}
Also used : Path(org.apache.hadoop.fs.Path) Configuration(org.apache.hadoop.conf.Configuration) RouteBuilder(org.apache.camel.builder.RouteBuilder) MockEndpoint(org.apache.camel.component.mock.MockEndpoint) ArrayFile(org.apache.hadoop.io.ArrayFile) Text(org.apache.hadoop.io.Text) Progressable(org.apache.hadoop.util.Progressable) FileSystem(org.apache.hadoop.fs.FileSystem) ArrayFile(org.apache.hadoop.io.ArrayFile) SequenceFile(org.apache.hadoop.io.SequenceFile) File(java.io.File) Writer(org.apache.hadoop.io.SequenceFile.Writer) Test(org.junit.Test)

Example 3 with Progressable

use of org.apache.hadoop.util.Progressable in project elephant-bird by twitter.

the class TestLuceneIndexRecordReader method testLuceneIndexRecordReader.

private void testLuceneIndexRecordReader(ArrayList<String> queryStrings, ArrayList<Path> indexPaths, ArrayList<ArrayList<ArrayList<Integer>>> indexesQueriesDocIds) throws Exception {
    LuceneIndexInputSplit split = createStrictMock(LuceneIndexInputSplit.class);
    expect(split.getIndexDirs()).andReturn(indexPaths);
    replay(split);
    Configuration conf = new Configuration();
    TaskAttemptContext context = createStrictMock(TaskAttemptContext.class);
    expect(HadoopCompat.getConfiguration(context)).andStubReturn(conf);
    // casting to avoid Hadoop 2 incompatibility
    ((Progressable) context).progress();
    expectLastCall().atLeastOnce();
    replay(context);
    LuceneIndexInputFormat.setQueries(queryStrings, conf);
    LuceneIndexRecordReader<IntWritable> rr = createMockBuilder(MockRecordReader.class).addMockedMethod("openIndex").addMockedMethod("createSearcher").createMock();
    Query[] queries = new Query[queryStrings.size()];
    for (int i = 0; i < queries.length; i++) {
        Query query = createStrictMock(Query.class);
        replay(query);
        queries[i] = query;
        expect(rr.deserializeQuery(queryStrings.get(i))).andReturn(query);
    }
    for (int index = 0; index < indexPaths.size(); index++) {
        IndexReader reader = createStrictMock(IndexReader.class);
        expect(reader.maxDoc()).andStubReturn(4);
        replay(reader);
        expect(rr.openIndex(indexPaths.get(index), conf)).andReturn(reader);
        IndexSearcher searcher = createStrictMock(IndexSearcher.class);
        expect(rr.createSearcher(reader)).andReturn(searcher);
        for (int query = 0; query < queries.length; query++) {
            final ArrayList<Integer> ids = indexesQueriesDocIds.get(index).get(query);
            final Capture<Collector> collectorCapture = new Capture<Collector>();
            expect(searcher.getIndexReader()).andReturn(reader);
            searcher.search(eq(queries[query]), capture(collectorCapture));
            expectLastCall().andAnswer(new IAnswer<Void>() {

                @Override
                public Void answer() throws Throwable {
                    for (int id : ids) {
                        collectorCapture.getValue().collect(id);
                    }
                    return null;
                }
            });
            for (int docId : ids) {
                expect(searcher.doc(docId)).andReturn(docs[docId]);
            }
        }
        replay(searcher);
    }
    replay(rr);
    rr.initialize(split, context);
    float prevProgress = -1;
    for (int index = 0; index < indexesQueriesDocIds.size(); index++) {
        for (int query = 0; query < indexesQueriesDocIds.get(index).size(); query++) {
            for (int docId : indexesQueriesDocIds.get(index).get(query)) {
                assertTrue(rr.nextKeyValue());
                assertEquals(query, rr.getCurrentKey().get());
                assertEquals(docsAndValues.get(docs[docId]), (Integer) rr.getCurrentValue().get());
                float newProgress = rr.getProgress();
                assertTrue(newProgress > prevProgress);
                assertTrue(newProgress <= 1.0);
            }
        }
    }
    assertFalse(rr.nextKeyValue());
    assertFalse(rr.nextKeyValue());
    verifyAll();
}
Also used : LuceneIndexInputSplit(com.twitter.elephantbird.mapreduce.input.LuceneIndexInputFormat.LuceneIndexInputSplit) IndexSearcher(org.apache.lucene.search.IndexSearcher) Configuration(org.apache.hadoop.conf.Configuration) Query(org.apache.lucene.search.Query) TaskAttemptContext(org.apache.hadoop.mapreduce.TaskAttemptContext) Capture(org.easymock.Capture) Progressable(org.apache.hadoop.util.Progressable) IndexReader(org.apache.lucene.index.IndexReader) Collector(org.apache.lucene.search.Collector) IntWritable(org.apache.hadoop.io.IntWritable)

Example 4 with Progressable

use of org.apache.hadoop.util.Progressable in project elephant-bird by twitter.

the class LuceneIndexCollectAllRecordReader method search.

/**
 * Applies {@link #docToValue(Document)} to every document
 * found by executing query over searcher
 *
 * @param searcher the index searcher to query
 * @param query the query to run
 * @return a list of values to be emitted as records (one by one) by this record reader
 * @throws IOException
 */
@Override
protected Iterator<T> search(final IndexSearcher searcher, final Query query) throws IOException {
    // grow the bit set if needed
    docIds.set(searcher.getIndexReader().maxDoc());
    // clear it
    docIds.clear();
    searcher.search(query, new Collector() {

        private int docBase;

        @Override
        public void setScorer(Scorer scorer) {
        }

        @Override
        public boolean acceptsDocsOutOfOrder() {
            return true;
        }

        @Override
        public void collect(int doc) {
            docIds.set(doc + docBase);
        }

        public void setNextReader(AtomicReaderContext context) {
            this.docBase = context.docBase;
        }
    });
    return new AbstractIterator<T>() {

        private int doc = docIds.nextSetBit(0);

        @Override
        protected T computeNext() {
            // casting to avoid Hadoop 2 incompatibility
            ((Progressable) context).progress();
            if (doc < 0) {
                return endOfData();
            }
            try {
                T ret = docToValue(searcher.doc(doc));
                doc = docIds.nextSetBit(doc + 1);
                return ret;
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        }
    };
}
Also used : Progressable(org.apache.hadoop.util.Progressable) Collector(org.apache.lucene.search.Collector) Scorer(org.apache.lucene.search.Scorer) AbstractIterator(com.google.common.collect.AbstractIterator) IOException(java.io.IOException) AtomicReaderContext(org.apache.lucene.index.AtomicReaderContext)

Example 5 with Progressable

use of org.apache.hadoop.util.Progressable in project parquet-mr by apache.

the class TestMapredParquetOutputFormat method testGetHiveRecordWriter.

@SuppressWarnings("unchecked")
@Test
public void testGetHiveRecordWriter() throws IOException {
    Properties tableProps = new Properties();
    tableProps.setProperty("columns", "foo,bar");
    tableProps.setProperty("columns.types", "int:int");
    final Progressable mockProgress = mock(Progressable.class);
    final ParquetOutputFormat<ArrayWritable> outputFormat = (ParquetOutputFormat<ArrayWritable>) mock(ParquetOutputFormat.class);
    JobConf jobConf = new JobConf();
    try {
        new MapredParquetOutputFormat(outputFormat) {

            @Override
            protected ParquetRecordWriterWrapper getParquerRecordWriterWrapper(ParquetOutputFormat<ArrayWritable> realOutputFormat, JobConf jobConf, String finalOutPath, Progressable progress) throws IOException {
                assertEquals(outputFormat, realOutputFormat);
                assertNotNull(jobConf.get(DataWritableWriteSupport.PARQUET_HIVE_SCHEMA));
                assertEquals("/foo", finalOutPath.toString());
                assertEquals(mockProgress, progress);
                throw new RuntimeException("passed tests");
            }
        }.getHiveRecordWriter(jobConf, new Path("/foo"), null, false, tableProps, mockProgress);
        fail("should throw runtime exception.");
    } catch (RuntimeException e) {
        assertEquals("passed tests", e.getMessage());
    }
}
Also used : Path(org.apache.hadoop.fs.Path) ParquetOutputFormat(org.apache.parquet.hadoop.ParquetOutputFormat) ParquetRecordWriterWrapper(org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper) IOException(java.io.IOException) Properties(java.util.Properties) Progressable(org.apache.hadoop.util.Progressable) ArrayWritable(org.apache.hadoop.io.ArrayWritable) JobConf(org.apache.hadoop.mapred.JobConf) Test(org.junit.Test)

Aggregations

Progressable (org.apache.hadoop.util.Progressable)12 Path (org.apache.hadoop.fs.Path)7 Configuration (org.apache.hadoop.conf.Configuration)6 IOException (java.io.IOException)5 Test (org.junit.Test)5 FileSystem (org.apache.hadoop.fs.FileSystem)4 File (java.io.File)3 SequenceFile (org.apache.hadoop.io.SequenceFile)3 Text (org.apache.hadoop.io.Text)3 JobConf (org.apache.hadoop.mapred.JobConf)3 Properties (java.util.Properties)2 RouteBuilder (org.apache.camel.builder.RouteBuilder)2 MockEndpoint (org.apache.camel.component.mock.MockEndpoint)2 CreateOpts (org.apache.hadoop.fs.Options.CreateOpts)2 FsPermission (org.apache.hadoop.fs.permission.FsPermission)2 ParquetRecordWriterWrapper (org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper)2 ArrayFile (org.apache.hadoop.io.ArrayFile)2 TaskAttemptContext (org.apache.hadoop.mapreduce.TaskAttemptContext)2 Collector (org.apache.lucene.search.Collector)2 AbstractIterator (com.google.common.collect.AbstractIterator)1