Search in sources :

Example 1 with BucketStatePathResolver

use of org.apache.flink.streaming.api.functions.sink.filesystem.BucketStatePathResolver in project flink by apache.

the class FileWriterBucketStateSerializerMigrationTest method testSerializationOnlyInProgress.

@Test
public void testSerializationOnlyInProgress() throws IOException {
    final String scenarioName = "only-in-progress";
    final BucketStatePathResolver pathResolver = new BucketStatePathResolver(BASE_PATH, previousVersion);
    final java.nio.file.Path outputPath = pathResolver.getOutputPath(scenarioName);
    final Path testBucketPath = new Path(outputPath.resolve(BUCKET_ID).toString());
    final FileWriterBucketState recoveredState = readBucketState(scenarioName, previousVersion);
    final FileWriterBucket<String> bucket = restoreBucket(recoveredState);
    Assert.assertEquals(testBucketPath, bucket.getBucketPath());
    // check restore the correct in progress file writer
    Assert.assertEquals(8, bucket.getInProgressPart().getSize());
    long numFiles = Files.list(Paths.get(testBucketPath.toString())).map(file -> {
        assertThat(file.getFileName().toString(), startsWith(".part-0-0.inprogress"));
        return 1;
    }).count();
    assertThat(numFiles, is(1L));
}
Also used : Path(org.apache.flink.core.fs.Path) RowWiseBucketWriter(org.apache.flink.streaming.api.functions.sink.filesystem.RowWiseBucketWriter) CoreMatchers.is(org.hamcrest.CoreMatchers.is) Arrays(java.util.Arrays) CoreMatchers.hasItem(org.hamcrest.CoreMatchers.hasItem) FileUtils(org.apache.flink.util.FileUtils) RunWith(org.junit.runner.RunWith) CoreMatchers.startsWith(org.hamcrest.CoreMatchers.startsWith) MemorySize(org.apache.flink.configuration.MemorySize) Assert.assertThat(org.junit.Assert.assertThat) BucketStatePathResolver(org.apache.flink.streaming.api.functions.sink.filesystem.BucketStatePathResolver) Path(org.apache.flink.core.fs.Path) SimpleVersionedSerialization(org.apache.flink.core.io.SimpleVersionedSerialization) Map(java.util.Map) Matchers.iterableWithSize(org.hamcrest.Matchers.iterableWithSize) StreamingFileSink(org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink) BucketStateGenerator(org.apache.flink.streaming.api.functions.sink.filesystem.BucketStateGenerator) ClassRule(org.junit.ClassRule) Parameterized(org.junit.runners.Parameterized) CommitRequest(org.apache.flink.api.connector.sink2.Committer.CommitRequest) Matchers.empty(org.hamcrest.Matchers.empty) Files(java.nio.file.Files) FileSinkCommittable(org.apache.flink.connector.file.sink.FileSinkCommittable) Collection(java.util.Collection) Set(java.util.Set) Test(org.junit.Test) IOException(java.io.IOException) MockCommitRequest(org.apache.flink.api.connector.sink2.mocks.MockCommitRequest) Collectors(java.util.stream.Collectors) List(java.util.List) FileCommitter(org.apache.flink.connector.file.sink.committer.FileCommitter) FileSystem(org.apache.flink.core.fs.FileSystem) Ignore(org.junit.Ignore) Paths(java.nio.file.Paths) SimpleStringEncoder(org.apache.flink.api.common.serialization.SimpleStringEncoder) SimpleVersionedSerializer(org.apache.flink.core.io.SimpleVersionedSerializer) OutputFileConfig(org.apache.flink.streaming.api.functions.sink.filesystem.OutputFileConfig) DefaultRollingPolicy(org.apache.flink.streaming.api.functions.sink.filesystem.rollingpolicies.DefaultRollingPolicy) Assert(org.junit.Assert) InProgressFileWriter(org.apache.flink.streaming.api.functions.sink.filesystem.InProgressFileWriter) TemporaryFolder(org.junit.rules.TemporaryFolder) BucketStatePathResolver(org.apache.flink.streaming.api.functions.sink.filesystem.BucketStatePathResolver) Test(org.junit.Test)

Example 2 with BucketStatePathResolver

use of org.apache.flink.streaming.api.functions.sink.filesystem.BucketStatePathResolver in project flink by apache.

the class FileWriterBucketStateSerializerMigrationTest method readBucketStateFromTemplate.

private static FileWriterBucketState readBucketStateFromTemplate(final String scenarioName, final int version) throws IOException {
    final BucketStatePathResolver pathResolver = new BucketStatePathResolver(BASE_PATH, version);
    final java.nio.file.Path scenarioPath = pathResolver.getResourcePath(scenarioName);
    // clear the scenario files first
    FileUtils.deleteDirectory(scenarioPath.toFile());
    // prepare the scenario files
    FileUtils.copy(new Path(scenarioPath.toString() + "-template"), new Path(scenarioPath.toString()), false);
    return readBucketState(scenarioName, version);
}
Also used : Path(org.apache.flink.core.fs.Path) BucketStatePathResolver(org.apache.flink.streaming.api.functions.sink.filesystem.BucketStatePathResolver)

Example 3 with BucketStatePathResolver

use of org.apache.flink.streaming.api.functions.sink.filesystem.BucketStatePathResolver in project flink by apache.

the class FileWriterBucketStateSerializerMigrationTest method testDeserializationFull.

private void testDeserializationFull(final boolean withInProgress, final String scenarioName) throws IOException, InterruptedException {
    final BucketStatePathResolver pathResolver = new BucketStatePathResolver(BASE_PATH, previousVersion);
    try {
        final java.nio.file.Path outputPath = pathResolver.getOutputPath(scenarioName);
        final Path testBucketPath = new Path(outputPath.resolve(BUCKET_ID).toString());
        // restore the state
        final FileWriterBucketState recoveredState = readBucketStateFromTemplate(scenarioName, previousVersion);
        final int noOfPendingCheckpoints = 5;
        // there are 5 checkpoint does not complete.
        final Map<Long, List<InProgressFileWriter.PendingFileRecoverable>> pendingFileRecoverables = recoveredState.getPendingFileRecoverablesPerCheckpoint();
        Assert.assertEquals(5L, pendingFileRecoverables.size());
        final Set<String> beforeRestorePaths = Files.list(outputPath.resolve(BUCKET_ID)).map(file -> file.getFileName().toString()).collect(Collectors.toSet());
        // before retsoring all file has "inprogress"
        for (int i = 0; i < noOfPendingCheckpoints; i++) {
            final String part = ".part-0-" + i + ".inprogress";
            assertThat(beforeRestorePaths, hasItem(startsWith(part)));
        }
        final FileWriterBucket<String> bucket = restoreBucket(recoveredState);
        Assert.assertEquals(testBucketPath, bucket.getBucketPath());
        Assert.assertEquals(noOfPendingCheckpoints, bucket.getPendingFiles().size());
        // simulates we commit the recovered pending files on the first checkpoint
        bucket.snapshotState();
        Collection<CommitRequest<FileSinkCommittable>> committables = bucket.prepareCommit(false).stream().map(MockCommitRequest::new).collect(Collectors.toList());
        FileCommitter committer = new FileCommitter(createBucketWriter());
        committer.commit(committables);
        final Set<String> afterRestorePaths = Files.list(outputPath.resolve(BUCKET_ID)).map(file -> file.getFileName().toString()).collect(Collectors.toSet());
        // there is no "inporgress" in file name for the committed files.
        for (int i = 0; i < noOfPendingCheckpoints; i++) {
            final String part = "part-0-" + i;
            assertThat(afterRestorePaths, hasItem(part));
            afterRestorePaths.remove(part);
        }
        if (withInProgress) {
            // only the in-progress must be left
            assertThat(afterRestorePaths, iterableWithSize(1));
            // verify that the in-progress file is still there
            assertThat(afterRestorePaths, hasItem(startsWith(".part-0-" + noOfPendingCheckpoints + ".inprogress")));
        } else {
            assertThat(afterRestorePaths, empty());
        }
    } finally {
        FileUtils.deleteDirectory(pathResolver.getResourcePath(scenarioName).toFile());
    }
}
Also used : Path(org.apache.flink.core.fs.Path) RowWiseBucketWriter(org.apache.flink.streaming.api.functions.sink.filesystem.RowWiseBucketWriter) CoreMatchers.is(org.hamcrest.CoreMatchers.is) Arrays(java.util.Arrays) CoreMatchers.hasItem(org.hamcrest.CoreMatchers.hasItem) FileUtils(org.apache.flink.util.FileUtils) RunWith(org.junit.runner.RunWith) CoreMatchers.startsWith(org.hamcrest.CoreMatchers.startsWith) MemorySize(org.apache.flink.configuration.MemorySize) Assert.assertThat(org.junit.Assert.assertThat) BucketStatePathResolver(org.apache.flink.streaming.api.functions.sink.filesystem.BucketStatePathResolver) Path(org.apache.flink.core.fs.Path) SimpleVersionedSerialization(org.apache.flink.core.io.SimpleVersionedSerialization) Map(java.util.Map) Matchers.iterableWithSize(org.hamcrest.Matchers.iterableWithSize) StreamingFileSink(org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink) BucketStateGenerator(org.apache.flink.streaming.api.functions.sink.filesystem.BucketStateGenerator) ClassRule(org.junit.ClassRule) Parameterized(org.junit.runners.Parameterized) CommitRequest(org.apache.flink.api.connector.sink2.Committer.CommitRequest) Matchers.empty(org.hamcrest.Matchers.empty) Files(java.nio.file.Files) FileSinkCommittable(org.apache.flink.connector.file.sink.FileSinkCommittable) Collection(java.util.Collection) Set(java.util.Set) Test(org.junit.Test) IOException(java.io.IOException) MockCommitRequest(org.apache.flink.api.connector.sink2.mocks.MockCommitRequest) Collectors(java.util.stream.Collectors) List(java.util.List) FileCommitter(org.apache.flink.connector.file.sink.committer.FileCommitter) FileSystem(org.apache.flink.core.fs.FileSystem) Ignore(org.junit.Ignore) Paths(java.nio.file.Paths) SimpleStringEncoder(org.apache.flink.api.common.serialization.SimpleStringEncoder) SimpleVersionedSerializer(org.apache.flink.core.io.SimpleVersionedSerializer) OutputFileConfig(org.apache.flink.streaming.api.functions.sink.filesystem.OutputFileConfig) DefaultRollingPolicy(org.apache.flink.streaming.api.functions.sink.filesystem.rollingpolicies.DefaultRollingPolicy) Assert(org.junit.Assert) InProgressFileWriter(org.apache.flink.streaming.api.functions.sink.filesystem.InProgressFileWriter) TemporaryFolder(org.junit.rules.TemporaryFolder) CommitRequest(org.apache.flink.api.connector.sink2.Committer.CommitRequest) MockCommitRequest(org.apache.flink.api.connector.sink2.mocks.MockCommitRequest) InProgressFileWriter(org.apache.flink.streaming.api.functions.sink.filesystem.InProgressFileWriter) BucketStatePathResolver(org.apache.flink.streaming.api.functions.sink.filesystem.BucketStatePathResolver) FileCommitter(org.apache.flink.connector.file.sink.committer.FileCommitter) List(java.util.List)

Example 4 with BucketStatePathResolver

use of org.apache.flink.streaming.api.functions.sink.filesystem.BucketStatePathResolver in project flink by apache.

the class FileWriterBucketStateSerializerMigrationTest method testSerializationEmpty.

@Test
public void testSerializationEmpty() throws IOException {
    final String scenarioName = "empty";
    final BucketStatePathResolver pathResolver = new BucketStatePathResolver(BASE_PATH, previousVersion);
    final java.nio.file.Path outputPath = pathResolver.getOutputPath(scenarioName);
    final Path testBucketPath = new Path(outputPath.resolve(BUCKET_ID).toString());
    final FileWriterBucketState recoveredState = readBucketState(scenarioName, previousVersion);
    final FileWriterBucket<String> bucket = restoreBucket(recoveredState);
    Assert.assertEquals(testBucketPath, bucket.getBucketPath());
    Assert.assertNull(bucket.getInProgressPart());
    Assert.assertTrue(bucket.getPendingFiles().isEmpty());
}
Also used : Path(org.apache.flink.core.fs.Path) BucketStatePathResolver(org.apache.flink.streaming.api.functions.sink.filesystem.BucketStatePathResolver) Test(org.junit.Test)

Example 5 with BucketStatePathResolver

use of org.apache.flink.streaming.api.functions.sink.filesystem.BucketStatePathResolver in project flink by apache.

the class FileWriterBucketStateSerializerMigrationTest method readBucketState.

private static FileWriterBucketState readBucketState(final String scenarioName, final int version) throws IOException {
    final BucketStatePathResolver pathResolver = new BucketStatePathResolver(BASE_PATH, version);
    byte[] bytes = Files.readAllBytes(pathResolver.getSnapshotPath(scenarioName));
    return SimpleVersionedSerialization.readVersionAndDeSerialize(bucketStateSerializer(), bytes);
}
Also used : BucketStatePathResolver(org.apache.flink.streaming.api.functions.sink.filesystem.BucketStatePathResolver)

Aggregations

BucketStatePathResolver (org.apache.flink.streaming.api.functions.sink.filesystem.BucketStatePathResolver)5 Path (org.apache.flink.core.fs.Path)4 Test (org.junit.Test)3 IOException (java.io.IOException)2 Files (java.nio.file.Files)2 Paths (java.nio.file.Paths)2 Arrays (java.util.Arrays)2 Collection (java.util.Collection)2 List (java.util.List)2 Map (java.util.Map)2 Set (java.util.Set)2 Collectors (java.util.stream.Collectors)2 SimpleStringEncoder (org.apache.flink.api.common.serialization.SimpleStringEncoder)2 CommitRequest (org.apache.flink.api.connector.sink2.Committer.CommitRequest)2 MockCommitRequest (org.apache.flink.api.connector.sink2.mocks.MockCommitRequest)2 MemorySize (org.apache.flink.configuration.MemorySize)2 FileSinkCommittable (org.apache.flink.connector.file.sink.FileSinkCommittable)2 FileCommitter (org.apache.flink.connector.file.sink.committer.FileCommitter)2 FileSystem (org.apache.flink.core.fs.FileSystem)2 SimpleVersionedSerialization (org.apache.flink.core.io.SimpleVersionedSerialization)2