Search in sources :

Example 1 with FiniteTestSource

use of org.apache.flink.streaming.util.FiniteTestSource in project flink by apache.

the class OrcBulkWriterITCase method testOrcBulkWriter.

@Test
public void testOrcBulkWriter() throws Exception {
    final File outDir = TEMPORARY_FOLDER.newFolder();
    final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    final Properties writerProps = new Properties();
    writerProps.setProperty("orc.compress", "LZ4");
    final OrcBulkWriterFactory<Record> factory = new OrcBulkWriterFactory<>(new RecordVectorizer(schema), writerProps, new Configuration());
    env.setParallelism(1);
    env.enableCheckpointing(100);
    DataStream<Record> stream = env.addSource(new FiniteTestSource<>(testData), TypeInformation.of(Record.class));
    stream.map(str -> str).addSink(StreamingFileSink.forBulkFormat(new Path(outDir.toURI()), factory).withBucketAssigner(new UniqueBucketAssigner<>("test")).build());
    env.execute();
    OrcBulkWriterTestUtil.validate(outDir, testData);
}
Also used : Arrays(java.util.Arrays) Properties(java.util.Properties) FiniteTestSource(org.apache.flink.streaming.util.FiniteTestSource) Test(org.junit.Test) File(java.io.File) DataStream(org.apache.flink.streaming.api.datastream.DataStream) List(java.util.List) UniqueBucketAssigner(org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.UniqueBucketAssigner) Path(org.apache.flink.core.fs.Path) OrcBulkWriterTestUtil(org.apache.flink.orc.util.OrcBulkWriterTestUtil) Configuration(org.apache.hadoop.conf.Configuration) StreamingFileSink(org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink) TestLogger(org.apache.flink.util.TestLogger) Record(org.apache.flink.orc.data.Record) TypeInformation(org.apache.flink.api.common.typeinfo.TypeInformation) ClassRule(org.junit.ClassRule) RecordVectorizer(org.apache.flink.orc.vector.RecordVectorizer) TemporaryFolder(org.junit.rules.TemporaryFolder) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) Path(org.apache.flink.core.fs.Path) Configuration(org.apache.hadoop.conf.Configuration) Properties(java.util.Properties) RecordVectorizer(org.apache.flink.orc.vector.RecordVectorizer) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) Record(org.apache.flink.orc.data.Record) File(java.io.File) Test(org.junit.Test)

Example 2 with FiniteTestSource

use of org.apache.flink.streaming.util.FiniteTestSource in project flink by apache.

the class SinkITCase method writerAndGlobalCommitterExecuteInStreamingMode.

@Ignore("FLINK-25726")
@Test
public void writerAndGlobalCommitterExecuteInStreamingMode() throws Exception {
    final StreamExecutionEnvironment env = buildStreamEnv();
    final FiniteTestSource<Integer> source = new FiniteTestSource<>(GLOBAL_COMMIT_QUEUE_RECEIVE_ALL_DATA, SOURCE_DATA);
    env.addSource(source, IntegerTypeInfo.INT_TYPE_INFO).sinkTo(TestSink.newBuilder().setCommittableSerializer(TestSink.StringCommittableSerializer.INSTANCE).setGlobalCommitter((Supplier<Queue<String>> & Serializable) () -> GLOBAL_COMMIT_QUEUE).build());
    env.execute();
    // TODO: At present, for a bounded scenario, the occurrence of final checkpoint is not a
    // deterministic event, so
    // we do not need to verify this matter. After the final checkpoint becomes ready in the
    // future,
    // the verification of "end of input" would be restored.
    GLOBAL_COMMIT_QUEUE.remove(END_OF_INPUT_STR);
    assertThat(getSplittedGlobalCommittedData(), containsInAnyOrder(EXPECTED_GLOBAL_COMMITTED_DATA_IN_STREAMING_MODE.toArray()));
}
Also used : FiniteTestSource(org.apache.flink.streaming.util.FiniteTestSource) Serializable(java.io.Serializable) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) Supplier(java.util.function.Supplier) BooleanSupplier(java.util.function.BooleanSupplier) Ignore(org.junit.Ignore) Test(org.junit.Test)

Example 3 with FiniteTestSource

use of org.apache.flink.streaming.util.FiniteTestSource in project flink by apache.

the class SinkITCase method writerAndCommitterExecuteInStreamingMode.

@Test
public void writerAndCommitterExecuteInStreamingMode() throws Exception {
    final StreamExecutionEnvironment env = buildStreamEnv();
    final FiniteTestSource<Integer> source = new FiniteTestSource<>(COMMIT_QUEUE_RECEIVE_ALL_DATA, SOURCE_DATA);
    env.addSource(source, IntegerTypeInfo.INT_TYPE_INFO).sinkTo(TestSink.newBuilder().setDefaultCommitter((Supplier<Queue<String>> & Serializable) () -> COMMIT_QUEUE).build());
    env.execute();
    assertThat(COMMIT_QUEUE, containsInAnyOrder(EXPECTED_COMMITTED_DATA_IN_STREAMING_MODE.toArray()));
}
Also used : FiniteTestSource(org.apache.flink.streaming.util.FiniteTestSource) Serializable(java.io.Serializable) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) Supplier(java.util.function.Supplier) BooleanSupplier(java.util.function.BooleanSupplier) Test(org.junit.Test)

Example 4 with FiniteTestSource

use of org.apache.flink.streaming.util.FiniteTestSource in project flink by apache.

the class CompressionFactoryITCase method testWriteCompressedFile.

@Test
public void testWriteCompressedFile() throws Exception {
    final File folder = TEMPORARY_FOLDER.newFolder();
    final Path testPath = Path.fromLocalFile(folder);
    final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    env.setParallelism(1);
    env.enableCheckpointing(100);
    DataStream<String> stream = env.addSource(new FiniteTestSource<>(testData), TypeInformation.of(String.class));
    stream.map(str -> str).addSink(StreamingFileSink.forBulkFormat(testPath, CompressWriters.forExtractor(new DefaultExtractor<String>()).withHadoopCompression(TEST_CODEC_NAME)).withBucketAssigner(new UniqueBucketAssigner<>("test")).build());
    env.execute();
    validateResults(folder, testData, new CompressionCodecFactory(configuration).getCodecByName(TEST_CODEC_NAME));
}
Also used : Path(org.apache.flink.core.fs.Path) DefaultExtractor(org.apache.flink.formats.compress.extractor.DefaultExtractor) Arrays(java.util.Arrays) FiniteTestSource(org.apache.flink.streaming.util.FiniteTestSource) CompressionCodec(org.apache.hadoop.io.compress.CompressionCodec) UniqueBucketAssigner(org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.UniqueBucketAssigner) Path(org.apache.flink.core.fs.Path) Configuration(org.apache.hadoop.conf.Configuration) StreamingFileSink(org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink) Timeout(org.junit.rules.Timeout) TypeInformation(org.apache.flink.api.common.typeinfo.TypeInformation) AbstractTestBase(org.apache.flink.test.util.AbstractTestBase) CompressionCodecFactory(org.apache.hadoop.io.compress.CompressionCodecFactory) Assert.assertNotNull(org.junit.Assert.assertNotNull) Assert.assertTrue(org.junit.Assert.assertTrue) Test(org.junit.Test) FileInputStream(java.io.FileInputStream) InputStreamReader(java.io.InputStreamReader) Collectors(java.util.stream.Collectors) File(java.io.File) DataStream(org.apache.flink.streaming.api.datastream.DataStream) List(java.util.List) Rule(org.junit.Rule) BufferedReader(java.io.BufferedReader) Assert.assertEquals(org.junit.Assert.assertEquals) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) CompressionCodecFactory(org.apache.hadoop.io.compress.CompressionCodecFactory) DefaultExtractor(org.apache.flink.formats.compress.extractor.DefaultExtractor) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) File(java.io.File) Test(org.junit.Test)

Example 5 with FiniteTestSource

use of org.apache.flink.streaming.util.FiniteTestSource in project flink by apache.

the class SinkITCase method writerAndCommitterAndGlobalCommitterExecuteInStreamingMode.

@Ignore("FLINK-25726")
@Test
public void writerAndCommitterAndGlobalCommitterExecuteInStreamingMode() throws Exception {
    final StreamExecutionEnvironment env = buildStreamEnv();
    final FiniteTestSource<Integer> source = new FiniteTestSource<>(BOTH_QUEUE_RECEIVE_ALL_DATA, SOURCE_DATA);
    env.addSource(source, IntegerTypeInfo.INT_TYPE_INFO).sinkTo(TestSink.newBuilder().setDefaultCommitter((Supplier<Queue<String>> & Serializable) () -> COMMIT_QUEUE).setGlobalCommitter((Supplier<Queue<String>> & Serializable) () -> GLOBAL_COMMIT_QUEUE).build());
    env.execute();
    // TODO: At present, for a bounded scenario, the occurrence of final checkpoint is not a
    // deterministic event, so
    // we do not need to verify this matter. After the final checkpoint becomes ready in the
    // future,
    // the verification of "end of input" would be restored.
    GLOBAL_COMMIT_QUEUE.remove(END_OF_INPUT_STR);
    assertThat(COMMIT_QUEUE, containsInAnyOrder(EXPECTED_COMMITTED_DATA_IN_STREAMING_MODE.toArray()));
    assertThat(getSplittedGlobalCommittedData(), containsInAnyOrder(EXPECTED_GLOBAL_COMMITTED_DATA_IN_STREAMING_MODE.toArray()));
}
Also used : FiniteTestSource(org.apache.flink.streaming.util.FiniteTestSource) Serializable(java.io.Serializable) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) Supplier(java.util.function.Supplier) BooleanSupplier(java.util.function.BooleanSupplier) Queue(java.util.Queue) ConcurrentLinkedQueue(java.util.concurrent.ConcurrentLinkedQueue) Ignore(org.junit.Ignore) Test(org.junit.Test)

Aggregations

StreamExecutionEnvironment (org.apache.flink.streaming.api.environment.StreamExecutionEnvironment)5 FiniteTestSource (org.apache.flink.streaming.util.FiniteTestSource)5 Test (org.junit.Test)5 Serializable (java.io.Serializable)3 BooleanSupplier (java.util.function.BooleanSupplier)3 Supplier (java.util.function.Supplier)3 File (java.io.File)2 Arrays (java.util.Arrays)2 List (java.util.List)2 TypeInformation (org.apache.flink.api.common.typeinfo.TypeInformation)2 Path (org.apache.flink.core.fs.Path)2 DataStream (org.apache.flink.streaming.api.datastream.DataStream)2 StreamingFileSink (org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink)2 UniqueBucketAssigner (org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.UniqueBucketAssigner)2 Configuration (org.apache.hadoop.conf.Configuration)2 Ignore (org.junit.Ignore)2 BufferedReader (java.io.BufferedReader)1 FileInputStream (java.io.FileInputStream)1 InputStreamReader (java.io.InputStreamReader)1 Properties (java.util.Properties)1