Search in sources :

Example 1 with RecordVectorizer

use of org.apache.flink.orc.vector.RecordVectorizer in project flink by apache.

the class OrcBulkWriterITCase method testOrcBulkWriter.

@Test
public void testOrcBulkWriter() throws Exception {
    final File outDir = TEMPORARY_FOLDER.newFolder();
    final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    final Properties writerProps = new Properties();
    writerProps.setProperty("orc.compress", "LZ4");
    final OrcBulkWriterFactory<Record> factory = new OrcBulkWriterFactory<>(new RecordVectorizer(schema), writerProps, new Configuration());
    env.setParallelism(1);
    env.enableCheckpointing(100);
    DataStream<Record> stream = env.addSource(new FiniteTestSource<>(testData), TypeInformation.of(Record.class));
    stream.map(str -> str).addSink(StreamingFileSink.forBulkFormat(new Path(outDir.toURI()), factory).withBucketAssigner(new UniqueBucketAssigner<>("test")).build());
    env.execute();
    OrcBulkWriterTestUtil.validate(outDir, testData);
}
Also used : Arrays(java.util.Arrays) Properties(java.util.Properties) FiniteTestSource(org.apache.flink.streaming.util.FiniteTestSource) Test(org.junit.Test) File(java.io.File) DataStream(org.apache.flink.streaming.api.datastream.DataStream) List(java.util.List) UniqueBucketAssigner(org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.UniqueBucketAssigner) Path(org.apache.flink.core.fs.Path) OrcBulkWriterTestUtil(org.apache.flink.orc.util.OrcBulkWriterTestUtil) Configuration(org.apache.hadoop.conf.Configuration) StreamingFileSink(org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink) TestLogger(org.apache.flink.util.TestLogger) Record(org.apache.flink.orc.data.Record) TypeInformation(org.apache.flink.api.common.typeinfo.TypeInformation) ClassRule(org.junit.ClassRule) RecordVectorizer(org.apache.flink.orc.vector.RecordVectorizer) TemporaryFolder(org.junit.rules.TemporaryFolder) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) Path(org.apache.flink.core.fs.Path) Configuration(org.apache.hadoop.conf.Configuration) Properties(java.util.Properties) RecordVectorizer(org.apache.flink.orc.vector.RecordVectorizer) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) Record(org.apache.flink.orc.data.Record) File(java.io.File) Test(org.junit.Test)

Example 2 with RecordVectorizer

use of org.apache.flink.orc.vector.RecordVectorizer in project flink by apache.

the class OrcBulkWriterFactoryTest method testNotOverrideInMemoryManager.

@Test
public void testNotOverrideInMemoryManager() throws IOException {
    TestMemoryManager memoryManager = new TestMemoryManager();
    OrcBulkWriterFactory<Record> factory = new TestOrcBulkWriterFactory<>(new RecordVectorizer("struct<_col0:string,_col1:int>"), memoryManager);
    factory.create(new LocalDataOutputStream(temporaryFolder.newFile()));
    factory.create(new LocalDataOutputStream(temporaryFolder.newFile()));
    List<Path> addedWriterPath = memoryManager.getAddedWriterPath();
    assertEquals(2, addedWriterPath.size());
    assertNotEquals(addedWriterPath.get(0), addedWriterPath.get(1));
}
Also used : LocalDataOutputStream(org.apache.flink.core.fs.local.LocalDataOutputStream) Path(org.apache.hadoop.fs.Path) RecordVectorizer(org.apache.flink.orc.vector.RecordVectorizer) Record(org.apache.flink.orc.data.Record) Test(org.junit.Test)

Example 3 with RecordVectorizer

use of org.apache.flink.orc.vector.RecordVectorizer in project flink by apache.

the class OrcBulkWriterTest method testOrcBulkWriter.

@Test
public void testOrcBulkWriter() throws Exception {
    final File outDir = TEMPORARY_FOLDER.newFolder();
    final Properties writerProps = new Properties();
    writerProps.setProperty("orc.compress", "LZ4");
    final OrcBulkWriterFactory<Record> writer = new OrcBulkWriterFactory<>(new RecordVectorizer(schema), writerProps, new Configuration());
    StreamingFileSink<Record> sink = StreamingFileSink.forBulkFormat(new Path(outDir.toURI()), writer).withBucketAssigner(new UniqueBucketAssigner<>("test")).withBucketCheckInterval(10000).build();
    try (OneInputStreamOperatorTestHarness<Record, Object> testHarness = new OneInputStreamOperatorTestHarness<>(new StreamSink<>(sink), 1, 1, 0)) {
        testHarness.setup();
        testHarness.open();
        int time = 0;
        for (final Record record : input) {
            testHarness.processElement(record, ++time);
        }
        testHarness.snapshot(1, ++time);
        testHarness.notifyOfCompletedCheckpoint(1);
        OrcBulkWriterTestUtil.validate(outDir, input);
    }
}
Also used : Path(org.apache.flink.core.fs.Path) Configuration(org.apache.hadoop.conf.Configuration) OneInputStreamOperatorTestHarness(org.apache.flink.streaming.util.OneInputStreamOperatorTestHarness) Properties(java.util.Properties) RecordVectorizer(org.apache.flink.orc.vector.RecordVectorizer) Record(org.apache.flink.orc.data.Record) File(java.io.File) Test(org.junit.Test)

Aggregations

Record (org.apache.flink.orc.data.Record)3 RecordVectorizer (org.apache.flink.orc.vector.RecordVectorizer)3 Test (org.junit.Test)3 File (java.io.File)2 Properties (java.util.Properties)2 Path (org.apache.flink.core.fs.Path)2 Configuration (org.apache.hadoop.conf.Configuration)2 Arrays (java.util.Arrays)1 List (java.util.List)1 TypeInformation (org.apache.flink.api.common.typeinfo.TypeInformation)1 LocalDataOutputStream (org.apache.flink.core.fs.local.LocalDataOutputStream)1 OrcBulkWriterTestUtil (org.apache.flink.orc.util.OrcBulkWriterTestUtil)1 DataStream (org.apache.flink.streaming.api.datastream.DataStream)1 StreamExecutionEnvironment (org.apache.flink.streaming.api.environment.StreamExecutionEnvironment)1 StreamingFileSink (org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink)1 UniqueBucketAssigner (org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.UniqueBucketAssigner)1 FiniteTestSource (org.apache.flink.streaming.util.FiniteTestSource)1 OneInputStreamOperatorTestHarness (org.apache.flink.streaming.util.OneInputStreamOperatorTestHarness)1 TestLogger (org.apache.flink.util.TestLogger)1 Path (org.apache.hadoop.fs.Path)1