Examples with FileSourceSplit - org.apache.flink.connector.file.src.FileSourceSplit

Example 1 with FileSourceSplit

use of org.apache.flink.connector.file.src.FileSourceSplit in project flink by apache.

the class ContinuousHiveSplitEnumerator method assignSplits.

private void assignSplits() {
    final Iterator<Map.Entry<Integer, String>> awaitingReader = readersAwaitingSplit.entrySet().iterator();
    while (awaitingReader.hasNext()) {
        final Map.Entry<Integer, String> nextAwaiting = awaitingReader.next();
        final String hostname = nextAwaiting.getValue();
        final int awaitingSubtask = nextAwaiting.getKey();
        final Optional<FileSourceSplit> nextSplit = splitAssigner.getNext(hostname);
        if (nextSplit.isPresent()) {
            enumeratorContext.assignSplit((HiveSourceSplit) nextSplit.get(), awaitingSubtask);
            awaitingReader.remove();
        } else {
            break;
        }
    }
}

Also used : FileSourceSplit(org.apache.flink.connector.file.src.FileSourceSplit) LinkedHashMap(java.util.LinkedHashMap) Map(java.util.Map) PendingSplitsCheckpoint(org.apache.flink.connector.file.src.PendingSplitsCheckpoint)

Example 2 with FileSourceSplit

use of org.apache.flink.connector.file.src.FileSourceSplit in project flink by apache.

the class AvroBulkFormatTest method testRestoreReader.

@Test
public void testRestoreReader() throws IOException {
    AvroBulkFormatTestUtils.TestingAvroBulkFormat bulkFormat = new AvroBulkFormatTestUtils.TestingAvroBulkFormat();
    long splitLength = tmpFile.length() / 3;
    String splitId = UUID.randomUUID().toString();
    FileSourceSplit split = new FileSourceSplit(splitId, new Path(tmpFile.toString()), splitLength * 2, tmpFile.length());
    BulkFormat.Reader<RowData> reader = bulkFormat.createReader(new Configuration(), split);
    long offset1 = assertBatch(reader, new BatchInfo(3, 5));
    assertBatch(reader, new BatchInfo(5, 6));
    assertThat(reader.readBatch()).isNull();
    reader.close();
    split = new FileSourceSplit(splitId, new Path(tmpFile.toString()), splitLength * 2, tmpFile.length(), StringUtils.EMPTY_STRING_ARRAY, new CheckpointedPosition(offset1, 1));
    reader = bulkFormat.restoreReader(new Configuration(), split);
    long offset2 = assertBatch(reader, new BatchInfo(3, 5), 1);
    assertBatch(reader, new BatchInfo(5, 6));
    assertThat(reader.readBatch()).isNull();
    reader.close();
    assertThat(offset2).isEqualTo(offset1);
}

Also used : Path(org.apache.flink.core.fs.Path) FileSourceSplit(org.apache.flink.connector.file.src.FileSourceSplit) Configuration(org.apache.flink.configuration.Configuration) GenericRowData(org.apache.flink.table.data.GenericRowData) RowData(org.apache.flink.table.data.RowData) CheckpointedPosition(org.apache.flink.connector.file.src.util.CheckpointedPosition) BulkFormat(org.apache.flink.connector.file.src.reader.BulkFormat) Test(org.junit.jupiter.api.Test)

Example 3 with FileSourceSplit

use of org.apache.flink.connector.file.src.FileSourceSplit in project flink by apache.

the class AvroBulkFormatTest method assertSplit.

private void assertSplit(AvroBulkFormatTestUtils.TestingAvroBulkFormat bulkFormat, List<SplitInfo> splitInfos) throws IOException {
    for (SplitInfo splitInfo : splitInfos) {
        FileSourceSplit split = new FileSourceSplit(UUID.randomUUID().toString(), new Path(tmpFile.toString()), splitInfo.start, splitInfo.end - splitInfo.start);
        BulkFormat.Reader<RowData> reader = bulkFormat.createReader(new Configuration(), split);
        List<Long> offsets = new ArrayList<>();
        for (BatchInfo batch : splitInfo.batches) {
            offsets.add(assertBatch(reader, batch));
        }
        assertThat(reader.readBatch()).isNull();
        for (int j = 1; j < offsets.size(); j++) {
            assertThat(offsets.get(j - 1) < offsets.get(j)).isTrue();
        }
        reader.close();
    }
}

Also used : Path(org.apache.flink.core.fs.Path) GenericRowData(org.apache.flink.table.data.GenericRowData) RowData(org.apache.flink.table.data.RowData) FileSourceSplit(org.apache.flink.connector.file.src.FileSourceSplit) Configuration(org.apache.flink.configuration.Configuration) ArrayList(java.util.ArrayList) BulkFormat(org.apache.flink.connector.file.src.reader.BulkFormat)

Example 4 with FileSourceSplit

use of org.apache.flink.connector.file.src.FileSourceSplit in project flink by apache.

the class HiveSourceSplitSerializer method serialize.

private void serialize(ObjectOutputStream outputStream, HiveSourceSplit split) throws IOException {
    byte[] superBytes = FileSourceSplitSerializer.INSTANCE.serialize(new FileSourceSplit(split.splitId(), split.path(), split.offset(), split.length(), split.fileModificationTime(), split.fileSize(), split.hostnames(), split.getReaderPosition().orElse(null)));
    outputStream.writeInt(superBytes.length);
    outputStream.write(superBytes);
    outputStream.writeObject(split.getHiveTablePartition());
}

Also used : FileSourceSplit(org.apache.flink.connector.file.src.FileSourceSplit)

Example 5 with FileSourceSplit

use of org.apache.flink.connector.file.src.FileSourceSplit in project flink by apache.

the class HiveSourceSplitSerializer method deserializeV1.

private HiveSourceSplit deserializeV1(ObjectInputStream inputStream) throws IOException {
    try {
        int superLen = inputStream.readInt();
        byte[] superBytes = new byte[superLen];
        inputStream.readFully(superBytes);
        FileSourceSplit superSplit = FileSourceSplitSerializer.INSTANCE.deserialize(FileSourceSplitSerializer.INSTANCE.getVersion(), superBytes);
        HiveTablePartition hiveTablePartition = (HiveTablePartition) inputStream.readObject();
        return new HiveSourceSplit(superSplit.splitId(), superSplit.path(), superSplit.offset(), superSplit.length(), superSplit.fileModificationTime(), superSplit.fileSize(), superSplit.hostnames(), superSplit.getReaderPosition().orElse(null), hiveTablePartition);
    } catch (ClassNotFoundException e) {
        throw new IOException("Failed to deserialize HiveSourceSplit", e);
    }
}

Also used : HiveSourceSplit(org.apache.flink.connectors.hive.read.HiveSourceSplit) FileSourceSplit(org.apache.flink.connector.file.src.FileSourceSplit) IOException(java.io.IOException)

Aggregations

FileSourceSplit (org.apache.flink.connector.file.src.FileSourceSplit)50 Test (org.junit.Test)32 Path (org.apache.flink.core.fs.Path)20 AtomicInteger (java.util.concurrent.atomic.AtomicInteger)11 BulkFormat (org.apache.flink.connector.file.src.reader.BulkFormat)11 Configuration (org.apache.flink.configuration.Configuration)10 ArrayList (java.util.ArrayList)9 TestingSplitEnumeratorContext (org.apache.flink.connector.testutils.source.reader.TestingSplitEnumeratorContext)7 IOException (java.io.IOException)6 RowData (org.apache.flink.table.data.RowData)6 LogicalType (org.apache.flink.table.types.logical.LogicalType)6 LinkedHashMap (java.util.LinkedHashMap)5 TestingFileSystem (org.apache.flink.connector.file.src.testutils.TestingFileSystem)5 FileStatus (org.apache.flink.core.fs.FileStatus)5 AtomicLong (java.util.concurrent.atomic.AtomicLong)4 BigIntType (org.apache.flink.table.types.logical.BigIntType)4 DoubleType (org.apache.flink.table.types.logical.DoubleType)4 IntType (org.apache.flink.table.types.logical.IntType)4 SmallIntType (org.apache.flink.table.types.logical.SmallIntType)4 TinyIntType (org.apache.flink.table.types.logical.TinyIntType)4