Search in sources :

Example 1 with CheckpointedPosition

use of org.apache.flink.connector.file.src.util.CheckpointedPosition in project flink by apache.

the class AvroBulkFormatTest method testRestoreReader.

@Test
public void testRestoreReader() throws IOException {
    AvroBulkFormatTestUtils.TestingAvroBulkFormat bulkFormat = new AvroBulkFormatTestUtils.TestingAvroBulkFormat();
    long splitLength = tmpFile.length() / 3;
    String splitId = UUID.randomUUID().toString();
    FileSourceSplit split = new FileSourceSplit(splitId, new Path(tmpFile.toString()), splitLength * 2, tmpFile.length());
    BulkFormat.Reader<RowData> reader = bulkFormat.createReader(new Configuration(), split);
    long offset1 = assertBatch(reader, new BatchInfo(3, 5));
    assertBatch(reader, new BatchInfo(5, 6));
    assertThat(reader.readBatch()).isNull();
    reader.close();
    split = new FileSourceSplit(splitId, new Path(tmpFile.toString()), splitLength * 2, tmpFile.length(), StringUtils.EMPTY_STRING_ARRAY, new CheckpointedPosition(offset1, 1));
    reader = bulkFormat.restoreReader(new Configuration(), split);
    long offset2 = assertBatch(reader, new BatchInfo(3, 5), 1);
    assertBatch(reader, new BatchInfo(5, 6));
    assertThat(reader.readBatch()).isNull();
    reader.close();
    assertThat(offset2).isEqualTo(offset1);
}
Also used : Path(org.apache.flink.core.fs.Path) FileSourceSplit(org.apache.flink.connector.file.src.FileSourceSplit) Configuration(org.apache.flink.configuration.Configuration) GenericRowData(org.apache.flink.table.data.GenericRowData) RowData(org.apache.flink.table.data.RowData) CheckpointedPosition(org.apache.flink.connector.file.src.util.CheckpointedPosition) BulkFormat(org.apache.flink.connector.file.src.reader.BulkFormat) Test(org.junit.jupiter.api.Test)

Example 2 with CheckpointedPosition

use of org.apache.flink.connector.file.src.util.CheckpointedPosition in project flink by apache.

the class FileSourceSplitState method toFileSourceSplit.

/**
 * Use the current row count as the starting row count to create a new FileSourceSplit.
 */
@SuppressWarnings("unchecked")
public SplitT toFileSourceSplit() {
    final CheckpointedPosition position = (offset == CheckpointedPosition.NO_OFFSET && recordsToSkipAfterOffset == 0) ? null : new CheckpointedPosition(offset, recordsToSkipAfterOffset);
    final FileSourceSplit updatedSplit = split.updateWithCheckpointedPosition(position);
    // some sanity checks to avoid surprises and not accidentally lose split information
    if (updatedSplit == null) {
        throw new FlinkRuntimeException("Split returned 'null' in updateWithCheckpointedPosition(): " + split);
    }
    if (updatedSplit.getClass() != split.getClass()) {
        throw new FlinkRuntimeException(String.format("Split returned different type in updateWithCheckpointedPosition(). " + "Split type is %s, returned type is %s", split.getClass().getName(), updatedSplit.getClass().getName()));
    }
    return (SplitT) updatedSplit;
}
Also used : CheckpointedPosition(org.apache.flink.connector.file.src.util.CheckpointedPosition) FlinkRuntimeException(org.apache.flink.util.FlinkRuntimeException)

Example 3 with CheckpointedPosition

use of org.apache.flink.connector.file.src.util.CheckpointedPosition in project flink by apache.

the class FileRecordFormatAdapter method restoreReader.

@Override
public BulkFormat.Reader<T> restoreReader(final Configuration config, final FileSourceSplit split) throws IOException {
    assert split.getReaderPosition().isPresent();
    final CheckpointedPosition checkpointedPosition = split.getReaderPosition().get();
    final Path filePath = split.path();
    final long splitOffset = split.offset();
    final long splitLength = split.length();
    final FileRecordFormat.Reader<T> reader = checkpointedPosition.getOffset() == CheckpointedPosition.NO_OFFSET ? fileFormat.createReader(config, filePath, splitOffset, splitLength) : fileFormat.restoreReader(config, filePath, checkpointedPosition.getOffset(), splitOffset, splitLength);
    return doWithCleanupOnException(reader, () -> {
        long remaining = checkpointedPosition.getRecordsAfterOffset();
        while (remaining > 0 && reader.read() != null) {
            remaining--;
        }
        return wrapReader(reader, config, checkpointedPosition.getOffset(), checkpointedPosition.getRecordsAfterOffset());
    });
}
Also used : Path(org.apache.flink.core.fs.Path) FileRecordFormat(org.apache.flink.connector.file.src.reader.FileRecordFormat) CheckpointedPosition(org.apache.flink.connector.file.src.util.CheckpointedPosition)

Example 4 with CheckpointedPosition

use of org.apache.flink.connector.file.src.util.CheckpointedPosition in project flink by apache.

the class StreamFormatAdapter method restoreReader.

@Override
public BulkFormat.Reader<T> restoreReader(final Configuration config, final FileSourceSplit split) throws IOException {
    assert split.getReaderPosition().isPresent();
    final CheckpointedPosition checkpointedPosition = split.getReaderPosition().get();
    final TrackingFsDataInputStream trackingStream = openStream(split.path(), config, split.offset());
    final long splitEnd = split.offset() + split.length();
    return doWithCleanupOnException(trackingStream, () -> {
        // if there never was a checkpointed offset, yet, we need to initialize the
        // reader like a fresh reader.
        // see the JavaDocs on StreamFormat.restoreReader() for details
        final StreamFormat.Reader<T> streamReader = checkpointedPosition.getOffset() == CheckpointedPosition.NO_OFFSET ? streamFormat.createReader(config, trackingStream, trackingStream.getFileLength(), splitEnd) : streamFormat.restoreReader(config, trackingStream, checkpointedPosition.getOffset(), trackingStream.getFileLength(), splitEnd);
        // skip the records to skip, but make sure we close the reader if something goes
        // wrong
        doWithCleanupOnException(streamReader, () -> {
            long toSkip = checkpointedPosition.getRecordsAfterOffset();
            while (toSkip > 0 && streamReader.read() != null) {
                toSkip--;
            }
            if (LOG.isDebugEnabled()) {
                LOG.debug("{} records have been skipped.", checkpointedPosition.getRecordsAfterOffset());
            }
        });
        return new Reader<>(streamReader, trackingStream, checkpointedPosition.getOffset(), checkpointedPosition.getRecordsAfterOffset());
    });
}
Also used : CheckpointedPosition(org.apache.flink.connector.file.src.util.CheckpointedPosition) StreamFormat(org.apache.flink.connector.file.src.reader.StreamFormat)

Example 5 with CheckpointedPosition

use of org.apache.flink.connector.file.src.util.CheckpointedPosition in project flink by apache.

the class ParquetColumnarRowInputFormatTest method testReadingSplit.

private int testReadingSplit(List<Integer> expected, Path path, long splitStart, long splitLength, long seekToRow) throws IOException {
    LogicalType[] fieldTypes = new LogicalType[] { new VarCharType(VarCharType.MAX_LENGTH), new BooleanType(), new TinyIntType(), new SmallIntType(), new IntType(), new BigIntType(), new FloatType(), new DoubleType(), new TimestampType(9), new DecimalType(5, 0), new DecimalType(15, 0), new DecimalType(20, 0), new DecimalType(5, 0), new DecimalType(15, 0), new DecimalType(20, 0) };
    ParquetColumnarRowInputFormat format = new ParquetColumnarRowInputFormat(new Configuration(), RowType.of(fieldTypes, new String[] { "f0", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "f10", "f11", "f12", "f13", "f14" }), null, 500, false, true);
    // validate java serialization
    try {
        InstantiationUtil.clone(format);
    } catch (ClassNotFoundException e) {
        throw new IOException(e);
    }
    FileStatus fileStatus = path.getFileSystem().getFileStatus(path);
    BulkFormat.Reader<RowData> reader = format.restoreReader(EMPTY_CONF, new FileSourceSplit("id", path, splitStart, splitLength, fileStatus.getModificationTime(), fileStatus.getLen(), new String[0], new CheckpointedPosition(CheckpointedPosition.NO_OFFSET, seekToRow)));
    AtomicInteger cnt = new AtomicInteger(0);
    final AtomicReference<RowData> previousRow = new AtomicReference<>();
    forEachRemaining(reader, row -> {
        if (previousRow.get() == null) {
            previousRow.set(row);
        } else {
            // ParquetColumnarRowInputFormat should only have one row instance.
            assertSame(previousRow.get(), row);
        }
        Integer v = expected.get(cnt.get());
        if (v == null) {
            assertTrue(row.isNullAt(0));
            assertTrue(row.isNullAt(1));
            assertTrue(row.isNullAt(2));
            assertTrue(row.isNullAt(3));
            assertTrue(row.isNullAt(4));
            assertTrue(row.isNullAt(5));
            assertTrue(row.isNullAt(6));
            assertTrue(row.isNullAt(7));
            assertTrue(row.isNullAt(8));
            assertTrue(row.isNullAt(9));
            assertTrue(row.isNullAt(10));
            assertTrue(row.isNullAt(11));
            assertTrue(row.isNullAt(12));
            assertTrue(row.isNullAt(13));
            assertTrue(row.isNullAt(14));
        } else {
            assertEquals("" + v, row.getString(0).toString());
            assertEquals(v % 2 == 0, row.getBoolean(1));
            assertEquals(v.byteValue(), row.getByte(2));
            assertEquals(v.shortValue(), row.getShort(3));
            assertEquals(v.intValue(), row.getInt(4));
            assertEquals(v.longValue(), row.getLong(5));
            assertEquals(v.floatValue(), row.getFloat(6), 0);
            assertEquals(v.doubleValue(), row.getDouble(7), 0);
            assertEquals(toDateTime(v), row.getTimestamp(8, 9).toLocalDateTime());
            assertEquals(BigDecimal.valueOf(v), row.getDecimal(9, 5, 0).toBigDecimal());
            assertEquals(BigDecimal.valueOf(v), row.getDecimal(10, 15, 0).toBigDecimal());
            assertEquals(BigDecimal.valueOf(v), row.getDecimal(11, 20, 0).toBigDecimal());
            assertEquals(BigDecimal.valueOf(v), row.getDecimal(12, 5, 0).toBigDecimal());
            assertEquals(BigDecimal.valueOf(v), row.getDecimal(13, 15, 0).toBigDecimal());
            assertEquals(BigDecimal.valueOf(v), row.getDecimal(14, 20, 0).toBigDecimal());
        }
        cnt.incrementAndGet();
    });
    return cnt.get();
}
Also used : FileStatus(org.apache.flink.core.fs.FileStatus) Configuration(org.apache.hadoop.conf.Configuration) FileSourceSplit(org.apache.flink.connector.file.src.FileSourceSplit) LogicalType(org.apache.flink.table.types.logical.LogicalType) BigIntType(org.apache.flink.table.types.logical.BigIntType) TinyIntType(org.apache.flink.table.types.logical.TinyIntType) IntType(org.apache.flink.table.types.logical.IntType) BigIntType(org.apache.flink.table.types.logical.BigIntType) SmallIntType(org.apache.flink.table.types.logical.SmallIntType) FloatType(org.apache.flink.table.types.logical.FloatType) RowData(org.apache.flink.table.data.RowData) CheckpointedPosition(org.apache.flink.connector.file.src.util.CheckpointedPosition) TimestampType(org.apache.flink.table.types.logical.TimestampType) VarCharType(org.apache.flink.table.types.logical.VarCharType) BooleanType(org.apache.flink.table.types.logical.BooleanType) AtomicReference(java.util.concurrent.atomic.AtomicReference) IOException(java.io.IOException) TinyIntType(org.apache.flink.table.types.logical.TinyIntType) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) SmallIntType(org.apache.flink.table.types.logical.SmallIntType) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) DoubleType(org.apache.flink.table.types.logical.DoubleType) DecimalType(org.apache.flink.table.types.logical.DecimalType) BulkFormat(org.apache.flink.connector.file.src.reader.BulkFormat)

Aggregations

CheckpointedPosition (org.apache.flink.connector.file.src.util.CheckpointedPosition)12 Path (org.apache.flink.core.fs.Path)5 Test (org.junit.Test)4 FileSourceSplit (org.apache.flink.connector.file.src.FileSourceSplit)3 IOException (java.io.IOException)2 Configuration (org.apache.flink.configuration.Configuration)2 BulkFormat (org.apache.flink.connector.file.src.reader.BulkFormat)2 RowData (org.apache.flink.table.data.RowData)2 AtomicInteger (java.util.concurrent.atomic.AtomicInteger)1 AtomicReference (java.util.concurrent.atomic.AtomicReference)1 FileRecordFormat (org.apache.flink.connector.file.src.reader.FileRecordFormat)1 StreamFormat (org.apache.flink.connector.file.src.reader.StreamFormat)1 TestingFileSystem (org.apache.flink.connector.file.src.testutils.TestingFileSystem)1 FileStatus (org.apache.flink.core.fs.FileStatus)1 DataInputDeserializer (org.apache.flink.core.memory.DataInputDeserializer)1 DataOutputSerializer (org.apache.flink.core.memory.DataOutputSerializer)1 GenericRowData (org.apache.flink.table.data.GenericRowData)1 BigIntType (org.apache.flink.table.types.logical.BigIntType)1 BooleanType (org.apache.flink.table.types.logical.BooleanType)1 DecimalType (org.apache.flink.table.types.logical.DecimalType)1