Search in sources :

Example 1 with BulkFormat

use of org.apache.flink.connector.file.src.reader.BulkFormat in project flink by apache.

the class AvroBulkFormatTest method testRestoreReader.

@Test
public void testRestoreReader() throws IOException {
    AvroBulkFormatTestUtils.TestingAvroBulkFormat bulkFormat = new AvroBulkFormatTestUtils.TestingAvroBulkFormat();
    long splitLength = tmpFile.length() / 3;
    String splitId = UUID.randomUUID().toString();
    FileSourceSplit split = new FileSourceSplit(splitId, new Path(tmpFile.toString()), splitLength * 2, tmpFile.length());
    BulkFormat.Reader<RowData> reader = bulkFormat.createReader(new Configuration(), split);
    long offset1 = assertBatch(reader, new BatchInfo(3, 5));
    assertBatch(reader, new BatchInfo(5, 6));
    assertThat(reader.readBatch()).isNull();
    reader.close();
    split = new FileSourceSplit(splitId, new Path(tmpFile.toString()), splitLength * 2, tmpFile.length(), StringUtils.EMPTY_STRING_ARRAY, new CheckpointedPosition(offset1, 1));
    reader = bulkFormat.restoreReader(new Configuration(), split);
    long offset2 = assertBatch(reader, new BatchInfo(3, 5), 1);
    assertBatch(reader, new BatchInfo(5, 6));
    assertThat(reader.readBatch()).isNull();
    reader.close();
    assertThat(offset2).isEqualTo(offset1);
}
Also used : Path(org.apache.flink.core.fs.Path) FileSourceSplit(org.apache.flink.connector.file.src.FileSourceSplit) Configuration(org.apache.flink.configuration.Configuration) GenericRowData(org.apache.flink.table.data.GenericRowData) RowData(org.apache.flink.table.data.RowData) CheckpointedPosition(org.apache.flink.connector.file.src.util.CheckpointedPosition) BulkFormat(org.apache.flink.connector.file.src.reader.BulkFormat) Test(org.junit.jupiter.api.Test)

Example 2 with BulkFormat

use of org.apache.flink.connector.file.src.reader.BulkFormat in project flink by apache.

the class AvroBulkFormatTest method assertSplit.

private void assertSplit(AvroBulkFormatTestUtils.TestingAvroBulkFormat bulkFormat, List<SplitInfo> splitInfos) throws IOException {
    for (SplitInfo splitInfo : splitInfos) {
        FileSourceSplit split = new FileSourceSplit(UUID.randomUUID().toString(), new Path(tmpFile.toString()), splitInfo.start, splitInfo.end - splitInfo.start);
        BulkFormat.Reader<RowData> reader = bulkFormat.createReader(new Configuration(), split);
        List<Long> offsets = new ArrayList<>();
        for (BatchInfo batch : splitInfo.batches) {
            offsets.add(assertBatch(reader, batch));
        }
        assertThat(reader.readBatch()).isNull();
        for (int j = 1; j < offsets.size(); j++) {
            assertThat(offsets.get(j - 1) < offsets.get(j)).isTrue();
        }
        reader.close();
    }
}
Also used : Path(org.apache.flink.core.fs.Path) GenericRowData(org.apache.flink.table.data.GenericRowData) RowData(org.apache.flink.table.data.RowData) FileSourceSplit(org.apache.flink.connector.file.src.FileSourceSplit) Configuration(org.apache.flink.configuration.Configuration) ArrayList(java.util.ArrayList) BulkFormat(org.apache.flink.connector.file.src.reader.BulkFormat)

Example 3 with BulkFormat

use of org.apache.flink.connector.file.src.reader.BulkFormat in project flink by apache.

the class AdapterTestBase method testReading.

private void testReading(FormatT format, int numSplits, int... recoverAfterRecords) throws IOException {
    // add the end boundary for recovery
    final int[] boundaries = Arrays.copyOf(recoverAfterRecords, recoverAfterRecords.length + 1);
    boundaries[boundaries.length - 1] = NUM_NUMBERS;
    // set a fetch size so that we get three records per fetch
    final Configuration config = new Configuration();
    config.set(StreamFormat.FETCH_IO_SIZE, new MemorySize(10));
    final BulkFormat<Integer, FileSourceSplit> adapter = wrapWithAdapter(format);
    final Queue<FileSourceSplit> splits = buildSplits(numSplits);
    final List<Integer> result = new ArrayList<>();
    FileSourceSplit currentSplit = null;
    BulkFormat.Reader<Integer> currentReader = null;
    for (int nextRecordToRecover : boundaries) {
        final FileSourceSplit toRecoverFrom = readNumbers(currentReader, currentSplit, adapter, splits, config, result, nextRecordToRecover - result.size());
        currentSplit = toRecoverFrom;
        currentReader = toRecoverFrom == null ? null : adapter.restoreReader(config, toRecoverFrom);
    }
    verifyIntListResult(result);
}
Also used : MemorySize(org.apache.flink.configuration.MemorySize) Configuration(org.apache.flink.configuration.Configuration) FileSourceSplit(org.apache.flink.connector.file.src.FileSourceSplit) ArrayList(java.util.ArrayList) BulkFormat(org.apache.flink.connector.file.src.reader.BulkFormat)

Example 4 with BulkFormat

use of org.apache.flink.connector.file.src.reader.BulkFormat in project flink by apache.

the class LimitableBulkFormatTest method testLimitOverBatches.

@Test
public void testLimitOverBatches() throws IOException {
    // set limit
    Long limit = 2048L;
    // configuration for small batches
    Configuration conf = new Configuration();
    conf.set(StreamFormat.FETCH_IO_SIZE, MemorySize.parse("4k"));
    // read
    BulkFormat<String, FileSourceSplit> format = LimitableBulkFormat.create(new StreamFormatAdapter<>(new TextLineInputFormat()), limit);
    BulkFormat.Reader<String> reader = format.createReader(conf, new FileSourceSplit("id", new Path(file.toURI()), 0, file.length(), file.lastModified(), file.length()));
    // check
    AtomicInteger i = new AtomicInteger(0);
    Utils.forEachRemaining(reader, s -> i.incrementAndGet());
    Assert.assertEquals(limit.intValue(), i.get());
}
Also used : Path(org.apache.flink.core.fs.Path) Configuration(org.apache.flink.configuration.Configuration) FileSourceSplit(org.apache.flink.connector.file.src.FileSourceSplit) TextLineInputFormat(org.apache.flink.connector.file.src.reader.TextLineInputFormat) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) BulkFormat(org.apache.flink.connector.file.src.reader.BulkFormat) Test(org.junit.Test)

Example 5 with BulkFormat

use of org.apache.flink.connector.file.src.reader.BulkFormat in project flink by apache.

the class LimitableBulkFormatTest method test.

@Test
public void test() throws IOException {
    // read
    BulkFormat<String, FileSourceSplit> format = LimitableBulkFormat.create(new StreamFormatAdapter<>(new TextLineInputFormat()), 22L);
    BulkFormat.Reader<String> reader = format.createReader(new Configuration(), new FileSourceSplit("id", new Path(file.toURI()), 0, file.length(), file.lastModified(), file.length()));
    AtomicInteger i = new AtomicInteger(0);
    Utils.forEachRemaining(reader, s -> i.incrementAndGet());
    Assert.assertEquals(22, i.get());
}
Also used : Path(org.apache.flink.core.fs.Path) FileSourceSplit(org.apache.flink.connector.file.src.FileSourceSplit) TextLineInputFormat(org.apache.flink.connector.file.src.reader.TextLineInputFormat) Configuration(org.apache.flink.configuration.Configuration) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) BulkFormat(org.apache.flink.connector.file.src.reader.BulkFormat) Test(org.junit.Test)

Aggregations

FileSourceSplit (org.apache.flink.connector.file.src.FileSourceSplit)7 BulkFormat (org.apache.flink.connector.file.src.reader.BulkFormat)7 Configuration (org.apache.flink.configuration.Configuration)6 Path (org.apache.flink.core.fs.Path)5 RowData (org.apache.flink.table.data.RowData)4 ArrayList (java.util.ArrayList)3 AtomicInteger (java.util.concurrent.atomic.AtomicInteger)2 DeserializationSchema (org.apache.flink.api.common.serialization.DeserializationSchema)2 TextLineInputFormat (org.apache.flink.connector.file.src.reader.TextLineInputFormat)2 TableException (org.apache.flink.table.api.TableException)2 GenericRowData (org.apache.flink.table.data.GenericRowData)2 DataType (org.apache.flink.table.types.DataType)2 IOException (java.io.IOException)1 OutputStream (java.io.OutputStream)1 Collections (java.util.Collections)1 LinkedHashMap (java.util.LinkedHashMap)1 List (java.util.List)1 Map (java.util.Map)1 Objects (java.util.Objects)1 Optional (java.util.Optional)1