Search in sources :

Example 26 with Metadata

use of org.apache.beam.sdk.io.fs.MatchResult.Metadata in project beam by apache.

the class FileBasedSourceTest method testToStringFile.

@Test
public void testToStringFile() throws Exception {
    File f = createFileWithData("foo", Collections.<String>emptyList());
    Metadata metadata = FileSystems.matchSingleFileSpec(f.getPath());
    TestFileBasedSource source = new TestFileBasedSource(metadata, 1, 0, 10, null);
    assertEquals(String.format("%s range [0, 10)", f.getAbsolutePath()), source.toString());
}
Also used : Metadata(org.apache.beam.sdk.io.fs.MatchResult.Metadata) File(java.io.File) Test(org.junit.Test)

Example 27 with Metadata

use of org.apache.beam.sdk.io.fs.MatchResult.Metadata in project beam by apache.

the class FileBasedSourceTest method testSplitAtFraction.

@Test
public void testSplitAtFraction() throws Exception {
    PipelineOptions options = PipelineOptionsFactory.create();
    File file = createFileWithData("file", createStringDataset(3, 100));
    Metadata metadata = FileSystems.matchSingleFileSpec(file.getPath());
    TestFileBasedSource source = new TestFileBasedSource(metadata, 1, 0, file.length(), null);
    // Shouldn't be able to split while unstarted.
    assertSplitAtFractionFails(source, 0, 0.7, options);
    assertSplitAtFractionSucceedsAndConsistent(source, 1, 0.7, options);
    assertSplitAtFractionSucceedsAndConsistent(source, 30, 0.7, options);
    assertSplitAtFractionFails(source, 0, 0.0, options);
    assertSplitAtFractionFails(source, 70, 0.3, options);
    assertSplitAtFractionFails(source, 100, 1.0, options);
    assertSplitAtFractionFails(source, 100, 0.99, options);
    assertSplitAtFractionSucceedsAndConsistent(source, 100, 0.995, options);
}
Also used : PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) Metadata(org.apache.beam.sdk.io.fs.MatchResult.Metadata) File(java.io.File) Test(org.junit.Test)

Example 28 with Metadata

use of org.apache.beam.sdk.io.fs.MatchResult.Metadata in project beam by apache.

the class FileBasedSourceTest method testSplitAtFractionExhaustive.

@Test
public void testSplitAtFractionExhaustive() throws Exception {
    PipelineOptions options = PipelineOptionsFactory.create();
    // Smaller file for exhaustive testing.
    File file = createFileWithData("file", createStringDataset(3, 20));
    Metadata metadata = FileSystems.matchSingleFileSpec(file.getPath());
    TestFileBasedSource source = new TestFileBasedSource(metadata, 1, 0, file.length(), null);
    assertSplitAtFractionExhaustive(source, options);
}
Also used : PipelineOptions(org.apache.beam.sdk.options.PipelineOptions) Metadata(org.apache.beam.sdk.io.fs.MatchResult.Metadata) File(java.io.File) Test(org.junit.Test)

Example 29 with Metadata

use of org.apache.beam.sdk.io.fs.MatchResult.Metadata in project beam by apache.

the class NumberedShardedFile method readFilesWithRetries.

/**
   * Discovers all shards of this file using the provided {@link Sleeper} and {@link BackOff}.
   *
   * <p>Because of eventual consistency, reads may discover no files or fewer files than
   * the shard template implies. In this case, the read is considered to have failed.
   */
@Override
public List<String> readFilesWithRetries(Sleeper sleeper, BackOff backOff) throws IOException, InterruptedException {
    IOException lastException = null;
    do {
        try {
            // Match inputPath which may contains glob
            Collection<Metadata> files = Iterables.getOnlyElement(FileSystems.match(Collections.singletonList(filePattern))).metadata();
            LOG.debug("Found {} file(s) by matching the path: {}", files.size(), filePattern);
            if (files.isEmpty() || !checkTotalNumOfFiles(files)) {
                continue;
            }
            // Read data from file paths
            return readLines(files);
        } catch (IOException e) {
            // Ignore and retry
            lastException = e;
            LOG.warn("Error in file reading. Ignore and retry.");
        }
    } while (BackOffUtils.next(sleeper, backOff));
    // Failed after max retries
    throw new IOException(String.format("Unable to read file(s) after retrying %d times", MAX_READ_RETRIES), lastException);
}
Also used : Metadata(org.apache.beam.sdk.io.fs.MatchResult.Metadata) IOException(java.io.IOException)

Aggregations

Metadata (org.apache.beam.sdk.io.fs.MatchResult.Metadata)29 Test (org.junit.Test)15 File (java.io.File)14 ArrayList (java.util.ArrayList)12 PipelineOptions (org.apache.beam.sdk.options.PipelineOptions)9 VisibleForTesting (com.google.common.annotations.VisibleForTesting)4 AvroMetadata (org.apache.beam.sdk.io.AvroSource.AvroMetadata)4 ImmutableList (com.google.common.collect.ImmutableList)3 FileReader (java.io.FileReader)3 Reader (java.io.Reader)3 MatchResult (org.apache.beam.sdk.io.fs.MatchResult)3 BufferedReader (java.io.BufferedReader)2 IOException (java.io.IOException)2 List (java.util.List)2 GenericRecord (org.apache.avro.generic.GenericRecord)2 Objects (com.google.api.services.storage.model.Objects)1 StorageObject (com.google.api.services.storage.model.StorageObject)1 Predicate (com.google.common.base.Predicate)1 FileNotFoundException (java.io.FileNotFoundException)1 BigInteger (java.math.BigInteger)1