Search in sources :

Example 1 with DefaultExtractor

use of org.apache.flink.formats.compress.extractor.DefaultExtractor in project flink by apache.

the class CompressionFactoryITCase method testWriteCompressedFile.

@Test
public void testWriteCompressedFile() throws Exception {
    final File folder = TEMPORARY_FOLDER.newFolder();
    final Path testPath = Path.fromLocalFile(folder);
    final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    env.setParallelism(1);
    env.enableCheckpointing(100);
    DataStream<String> stream = env.addSource(new FiniteTestSource<>(testData), TypeInformation.of(String.class));
    stream.map(str -> str).addSink(StreamingFileSink.forBulkFormat(testPath, CompressWriters.forExtractor(new DefaultExtractor<String>()).withHadoopCompression(TEST_CODEC_NAME)).withBucketAssigner(new UniqueBucketAssigner<>("test")).build());
    env.execute();
    validateResults(folder, testData, new CompressionCodecFactory(configuration).getCodecByName(TEST_CODEC_NAME));
}
Also used : Path(org.apache.flink.core.fs.Path) DefaultExtractor(org.apache.flink.formats.compress.extractor.DefaultExtractor) Arrays(java.util.Arrays) FiniteTestSource(org.apache.flink.streaming.util.FiniteTestSource) CompressionCodec(org.apache.hadoop.io.compress.CompressionCodec) UniqueBucketAssigner(org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.UniqueBucketAssigner) Path(org.apache.flink.core.fs.Path) Configuration(org.apache.hadoop.conf.Configuration) StreamingFileSink(org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink) Timeout(org.junit.rules.Timeout) TypeInformation(org.apache.flink.api.common.typeinfo.TypeInformation) AbstractTestBase(org.apache.flink.test.util.AbstractTestBase) CompressionCodecFactory(org.apache.hadoop.io.compress.CompressionCodecFactory) Assert.assertNotNull(org.junit.Assert.assertNotNull) Assert.assertTrue(org.junit.Assert.assertTrue) Test(org.junit.Test) FileInputStream(java.io.FileInputStream) InputStreamReader(java.io.InputStreamReader) Collectors(java.util.stream.Collectors) File(java.io.File) DataStream(org.apache.flink.streaming.api.datastream.DataStream) List(java.util.List) Rule(org.junit.Rule) BufferedReader(java.io.BufferedReader) Assert.assertEquals(org.junit.Assert.assertEquals) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) CompressionCodecFactory(org.apache.hadoop.io.compress.CompressionCodecFactory) DefaultExtractor(org.apache.flink.formats.compress.extractor.DefaultExtractor) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) File(java.io.File) Test(org.junit.Test)

Example 2 with DefaultExtractor

use of org.apache.flink.formats.compress.extractor.DefaultExtractor in project flink by apache.

the class CompressWriterFactoryTest method testCompressByName.

private void testCompressByName(String codec, Configuration conf) throws Exception {
    CompressWriterFactory<String> writer = CompressWriters.forExtractor(new DefaultExtractor<String>()).withHadoopCompression(codec, conf);
    List<String> lines = Arrays.asList("line1", "line2", "line3");
    File directory = prepareCompressedFile(writer, lines);
    validateResults(directory, lines, new CompressionCodecFactory(conf).getCodecByName(codec));
}
Also used : CompressionCodecFactory(org.apache.hadoop.io.compress.CompressionCodecFactory) DefaultExtractor(org.apache.flink.formats.compress.extractor.DefaultExtractor) File(java.io.File)

Aggregations

File (java.io.File)2 DefaultExtractor (org.apache.flink.formats.compress.extractor.DefaultExtractor)2 CompressionCodecFactory (org.apache.hadoop.io.compress.CompressionCodecFactory)2 BufferedReader (java.io.BufferedReader)1 FileInputStream (java.io.FileInputStream)1 InputStreamReader (java.io.InputStreamReader)1 Arrays (java.util.Arrays)1 List (java.util.List)1 Collectors (java.util.stream.Collectors)1 TypeInformation (org.apache.flink.api.common.typeinfo.TypeInformation)1 Path (org.apache.flink.core.fs.Path)1 DataStream (org.apache.flink.streaming.api.datastream.DataStream)1 StreamExecutionEnvironment (org.apache.flink.streaming.api.environment.StreamExecutionEnvironment)1 StreamingFileSink (org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink)1 UniqueBucketAssigner (org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.UniqueBucketAssigner)1 FiniteTestSource (org.apache.flink.streaming.util.FiniteTestSource)1 AbstractTestBase (org.apache.flink.test.util.AbstractTestBase)1 Configuration (org.apache.hadoop.conf.Configuration)1 CompressionCodec (org.apache.hadoop.io.compress.CompressionCodec)1 Assert.assertEquals (org.junit.Assert.assertEquals)1