Search in sources :

Example 1 with UniqueBucketAssigner

use of org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.UniqueBucketAssigner in project flink by apache.

the class CompressWriterFactoryTest method prepareCompressedFile.

private File prepareCompressedFile(CompressWriterFactory<String> writer, List<String> lines) throws Exception {
    final File outDir = TEMPORARY_FOLDER.newFolder();
    StreamingFileSink<String> sink = StreamingFileSink.forBulkFormat(new Path(outDir.toURI()), writer).withBucketAssigner(new UniqueBucketAssigner<>("test")).build();
    try (OneInputStreamOperatorTestHarness<String, Object> testHarness = new OneInputStreamOperatorTestHarness<>(new StreamSink<>(sink), 1, 1, 0)) {
        testHarness.setup();
        testHarness.open();
        int time = 0;
        for (String line : lines) {
            testHarness.processElement(new StreamRecord<>(line, ++time));
        }
        testHarness.snapshot(1, ++time);
        testHarness.notifyOfCompletedCheckpoint(1);
    }
    return outDir;
}
Also used : Path(org.apache.flink.core.fs.Path) UniqueBucketAssigner(org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.UniqueBucketAssigner) OneInputStreamOperatorTestHarness(org.apache.flink.streaming.util.OneInputStreamOperatorTestHarness) File(java.io.File)

Example 2 with UniqueBucketAssigner

use of org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.UniqueBucketAssigner in project flink by apache.

the class OrcBulkWriterITCase method testOrcBulkWriter.

@Test
public void testOrcBulkWriter() throws Exception {
    final File outDir = TEMPORARY_FOLDER.newFolder();
    final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    final Properties writerProps = new Properties();
    writerProps.setProperty("orc.compress", "LZ4");
    final OrcBulkWriterFactory<Record> factory = new OrcBulkWriterFactory<>(new RecordVectorizer(schema), writerProps, new Configuration());
    env.setParallelism(1);
    env.enableCheckpointing(100);
    DataStream<Record> stream = env.addSource(new FiniteTestSource<>(testData), TypeInformation.of(Record.class));
    stream.map(str -> str).addSink(StreamingFileSink.forBulkFormat(new Path(outDir.toURI()), factory).withBucketAssigner(new UniqueBucketAssigner<>("test")).build());
    env.execute();
    OrcBulkWriterTestUtil.validate(outDir, testData);
}
Also used : Arrays(java.util.Arrays) Properties(java.util.Properties) FiniteTestSource(org.apache.flink.streaming.util.FiniteTestSource) Test(org.junit.Test) File(java.io.File) DataStream(org.apache.flink.streaming.api.datastream.DataStream) List(java.util.List) UniqueBucketAssigner(org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.UniqueBucketAssigner) Path(org.apache.flink.core.fs.Path) OrcBulkWriterTestUtil(org.apache.flink.orc.util.OrcBulkWriterTestUtil) Configuration(org.apache.hadoop.conf.Configuration) StreamingFileSink(org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink) TestLogger(org.apache.flink.util.TestLogger) Record(org.apache.flink.orc.data.Record) TypeInformation(org.apache.flink.api.common.typeinfo.TypeInformation) ClassRule(org.junit.ClassRule) RecordVectorizer(org.apache.flink.orc.vector.RecordVectorizer) TemporaryFolder(org.junit.rules.TemporaryFolder) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) Path(org.apache.flink.core.fs.Path) Configuration(org.apache.hadoop.conf.Configuration) Properties(java.util.Properties) RecordVectorizer(org.apache.flink.orc.vector.RecordVectorizer) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) Record(org.apache.flink.orc.data.Record) File(java.io.File) Test(org.junit.Test)

Example 3 with UniqueBucketAssigner

use of org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.UniqueBucketAssigner in project flink by apache.

the class CompressionFactoryITCase method testWriteCompressedFile.

@Test
public void testWriteCompressedFile() throws Exception {
    final File folder = TEMPORARY_FOLDER.newFolder();
    final Path testPath = Path.fromLocalFile(folder);
    final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    env.setParallelism(1);
    env.enableCheckpointing(100);
    DataStream<String> stream = env.addSource(new FiniteTestSource<>(testData), TypeInformation.of(String.class));
    stream.map(str -> str).addSink(StreamingFileSink.forBulkFormat(testPath, CompressWriters.forExtractor(new DefaultExtractor<String>()).withHadoopCompression(TEST_CODEC_NAME)).withBucketAssigner(new UniqueBucketAssigner<>("test")).build());
    env.execute();
    validateResults(folder, testData, new CompressionCodecFactory(configuration).getCodecByName(TEST_CODEC_NAME));
}
Also used : Path(org.apache.flink.core.fs.Path) DefaultExtractor(org.apache.flink.formats.compress.extractor.DefaultExtractor) Arrays(java.util.Arrays) FiniteTestSource(org.apache.flink.streaming.util.FiniteTestSource) CompressionCodec(org.apache.hadoop.io.compress.CompressionCodec) UniqueBucketAssigner(org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.UniqueBucketAssigner) Path(org.apache.flink.core.fs.Path) Configuration(org.apache.hadoop.conf.Configuration) StreamingFileSink(org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink) Timeout(org.junit.rules.Timeout) TypeInformation(org.apache.flink.api.common.typeinfo.TypeInformation) AbstractTestBase(org.apache.flink.test.util.AbstractTestBase) CompressionCodecFactory(org.apache.hadoop.io.compress.CompressionCodecFactory) Assert.assertNotNull(org.junit.Assert.assertNotNull) Assert.assertTrue(org.junit.Assert.assertTrue) Test(org.junit.Test) FileInputStream(java.io.FileInputStream) InputStreamReader(java.io.InputStreamReader) Collectors(java.util.stream.Collectors) File(java.io.File) DataStream(org.apache.flink.streaming.api.datastream.DataStream) List(java.util.List) Rule(org.junit.Rule) BufferedReader(java.io.BufferedReader) Assert.assertEquals(org.junit.Assert.assertEquals) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) CompressionCodecFactory(org.apache.hadoop.io.compress.CompressionCodecFactory) DefaultExtractor(org.apache.flink.formats.compress.extractor.DefaultExtractor) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) File(java.io.File) Test(org.junit.Test)

Aggregations

File (java.io.File)3 Path (org.apache.flink.core.fs.Path)3 UniqueBucketAssigner (org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.UniqueBucketAssigner)3 Arrays (java.util.Arrays)2 List (java.util.List)2 TypeInformation (org.apache.flink.api.common.typeinfo.TypeInformation)2 DataStream (org.apache.flink.streaming.api.datastream.DataStream)2 StreamExecutionEnvironment (org.apache.flink.streaming.api.environment.StreamExecutionEnvironment)2 StreamingFileSink (org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink)2 FiniteTestSource (org.apache.flink.streaming.util.FiniteTestSource)2 Configuration (org.apache.hadoop.conf.Configuration)2 Test (org.junit.Test)2 BufferedReader (java.io.BufferedReader)1 FileInputStream (java.io.FileInputStream)1 InputStreamReader (java.io.InputStreamReader)1 Properties (java.util.Properties)1 Collectors (java.util.stream.Collectors)1 DefaultExtractor (org.apache.flink.formats.compress.extractor.DefaultExtractor)1 Record (org.apache.flink.orc.data.Record)1 OrcBulkWriterTestUtil (org.apache.flink.orc.util.OrcBulkWriterTestUtil)1