Search in sources :

Example 6 with TextLineInputFormat

use of org.apache.flink.connector.file.src.reader.TextLineInputFormat in project flink by apache.

the class FileSourceTextLinesITCase method testContinuousTextFileSource.

private void testContinuousTextFileSource(FailoverType type) throws Exception {
    final File testDir = TMP_FOLDER.newFolder();
    final FileSource<String> source = FileSource.forRecordStreamFormat(new TextLineInputFormat(), Path.fromLocalFile(testDir)).monitorContinuously(Duration.ofMillis(5)).build();
    final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    env.setParallelism(PARALLELISM);
    env.enableCheckpointing(10L);
    final DataStream<String> stream = env.fromSource(source, WatermarkStrategy.noWatermarks(), "file-source");
    final ClientAndIterator<String> client = DataStreamUtils.collectWithClient(stream, "Continuous TextFiles Monitoring Test");
    final JobID jobId = client.client.getJobID();
    // write one file, execute, and wait for its result
    // that way we know that the application was running and the source has
    // done its first chunk of work already
    final int numLinesFirst = LINES_PER_FILE[0].length;
    final int numLinesAfter = LINES.length - numLinesFirst;
    writeFile(testDir, 0);
    final List<String> result1 = DataStreamUtils.collectRecordsFromUnboundedStream(client, numLinesFirst);
    // write the remaining files over time, after that collect the final result
    for (int i = 1; i < LINES_PER_FILE.length; i++) {
        Thread.sleep(10);
        writeFile(testDir, i);
        final boolean failAfterHalfOfInput = i == LINES_PER_FILE.length / 2;
        if (failAfterHalfOfInput) {
            triggerFailover(type, jobId, () -> {
            }, miniClusterResource.getMiniCluster());
        }
    }
    final List<String> result2 = DataStreamUtils.collectRecordsFromUnboundedStream(client, numLinesAfter);
    // shut down the job, now that we have all the results we expected.
    client.client.cancel().get();
    result1.addAll(result2);
    verifyResult(result1);
}
Also used : TextLineInputFormat(org.apache.flink.connector.file.src.reader.TextLineInputFormat) StreamExecutionEnvironment(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment) File(java.io.File) JobID(org.apache.flink.api.common.JobID)

Example 7 with TextLineInputFormat

use of org.apache.flink.connector.file.src.reader.TextLineInputFormat in project flink by apache.

the class LimitableBulkFormatTest method test.

@Test
public void test() throws IOException {
    // read
    BulkFormat<String, FileSourceSplit> format = LimitableBulkFormat.create(new StreamFormatAdapter<>(new TextLineInputFormat()), 22L);
    BulkFormat.Reader<String> reader = format.createReader(new Configuration(), new FileSourceSplit("id", new Path(file.toURI()), 0, file.length(), file.lastModified(), file.length()));
    AtomicInteger i = new AtomicInteger(0);
    Utils.forEachRemaining(reader, s -> i.incrementAndGet());
    Assert.assertEquals(22, i.get());
}
Also used : Path(org.apache.flink.core.fs.Path) FileSourceSplit(org.apache.flink.connector.file.src.FileSourceSplit) TextLineInputFormat(org.apache.flink.connector.file.src.reader.TextLineInputFormat) Configuration(org.apache.flink.configuration.Configuration) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) BulkFormat(org.apache.flink.connector.file.src.reader.BulkFormat) Test(org.junit.Test)

Aggregations

TextLineInputFormat (org.apache.flink.connector.file.src.reader.TextLineInputFormat)7 StreamExecutionEnvironment (org.apache.flink.streaming.api.environment.StreamExecutionEnvironment)5 FileSource (org.apache.flink.connector.file.src.FileSource)3 CLI (org.apache.flink.streaming.examples.wordcount.util.CLI)3 File (java.io.File)2 AtomicInteger (java.util.concurrent.atomic.AtomicInteger)2 JobID (org.apache.flink.api.common.JobID)2 Tuple2 (org.apache.flink.api.java.tuple.Tuple2)2 Configuration (org.apache.flink.configuration.Configuration)2 FileSourceSplit (org.apache.flink.connector.file.src.FileSourceSplit)2 BulkFormat (org.apache.flink.connector.file.src.reader.BulkFormat)2 Path (org.apache.flink.core.fs.Path)2 Test (org.junit.Test)2 Duration (java.time.Duration)1 ArrayList (java.util.ArrayList)1 WatermarkStrategy (org.apache.flink.api.common.eventtime.WatermarkStrategy)1 SimpleStringEncoder (org.apache.flink.api.common.serialization.SimpleStringEncoder)1 Tuple4 (org.apache.flink.api.java.tuple.Tuple4)1 MemorySize (org.apache.flink.configuration.MemorySize)1 FileSink (org.apache.flink.connector.file.sink.FileSink)1