Search in sources :

Example 6 with FsInput

use of org.apache.avro.mapred.FsInput in project incubator-gobblin by apache.

the class FsSpecConsumer method changedSpecs.

/**
 * List of newly changed {@link Spec}s for execution on {@link SpecExecutor}.
 * The {@link Spec}s are returned in the increasing order of their modification times.
 */
@Override
public Future<? extends List<Pair<SpecExecutor.Verb, Spec>>> changedSpecs() {
    List<Pair<SpecExecutor.Verb, Spec>> specList = new ArrayList<>();
    FileStatus[] fileStatuses;
    try {
        fileStatuses = this.fs.listStatus(this.specDirPath, new AndPathFilter(new HiddenFilter(), new AvroUtils.AvroPathFilter()));
    } catch (IOException e) {
        log.error("Error when listing files at path: {}", this.specDirPath.toString(), e);
        return null;
    }
    log.info("Found {} files at path {}", fileStatuses.length, this.specDirPath.toString());
    // Sort the {@link JobSpec}s in increasing order of their modification times.
    // This is done so that the {JobSpec}s can be handled in FIFO order by the
    // JobConfigurationManager and eventually, the GobblinHelixJobScheduler.
    Arrays.sort(fileStatuses, Comparator.comparingLong(FileStatus::getModificationTime));
    for (FileStatus fileStatus : fileStatuses) {
        DataFileReader<AvroJobSpec> dataFileReader;
        try {
            dataFileReader = new DataFileReader<>(new FsInput(fileStatus.getPath(), this.fs.getConf()), new SpecificDatumReader<>());
        } catch (IOException e) {
            log.error("Error creating DataFileReader for: {}", fileStatus.getPath().toString(), e);
            continue;
        }
        AvroJobSpec avroJobSpec = null;
        while (dataFileReader.hasNext()) {
            avroJobSpec = dataFileReader.next();
            break;
        }
        if (avroJobSpec != null) {
            JobSpec.Builder jobSpecBuilder = new JobSpec.Builder(avroJobSpec.getUri());
            Properties props = new Properties();
            props.putAll(avroJobSpec.getProperties());
            jobSpecBuilder.withJobCatalogURI(avroJobSpec.getUri()).withVersion(avroJobSpec.getVersion()).withDescription(avroJobSpec.getDescription()).withConfigAsProperties(props).withConfig(ConfigUtils.propertiesToConfig(props));
            try {
                if (!avroJobSpec.getTemplateUri().isEmpty()) {
                    jobSpecBuilder.withTemplate(new URI(avroJobSpec.getTemplateUri()));
                }
            } catch (URISyntaxException u) {
                log.error("Error building a job spec: ", u);
                continue;
            }
            String verbName = avroJobSpec.getMetadata().get(SpecExecutor.VERB_KEY);
            SpecExecutor.Verb verb = SpecExecutor.Verb.valueOf(verbName);
            JobSpec jobSpec = jobSpecBuilder.build();
            log.debug("Successfully built jobspec: {}", jobSpec.getUri().toString());
            specList.add(new ImmutablePair<SpecExecutor.Verb, Spec>(verb, jobSpec));
            this.specToPathMap.put(jobSpec.getUri(), fileStatus.getPath());
        }
    }
    return new CompletedFuture<>(specList, null);
}
Also used : AvroUtils(org.apache.gobblin.util.AvroUtils) FileStatus(org.apache.hadoop.fs.FileStatus) HiddenFilter(org.apache.gobblin.util.filters.HiddenFilter) ArrayList(java.util.ArrayList) URISyntaxException(java.net.URISyntaxException) Properties(java.util.Properties) URI(java.net.URI) AndPathFilter(org.apache.gobblin.util.filters.AndPathFilter) SpecificDatumReader(org.apache.avro.specific.SpecificDatumReader) Pair(org.apache.commons.lang3.tuple.Pair) ImmutablePair(org.apache.commons.lang3.tuple.ImmutablePair) IOException(java.io.IOException) AvroJobSpec(org.apache.gobblin.runtime.job_spec.AvroJobSpec) FsInput(org.apache.avro.mapred.FsInput) AvroJobSpec(org.apache.gobblin.runtime.job_spec.AvroJobSpec) AvroJobSpec(org.apache.gobblin.runtime.job_spec.AvroJobSpec) CompletedFuture(org.apache.gobblin.util.CompletedFuture)

Example 7 with FsInput

use of org.apache.avro.mapred.FsInput in project crunch by cloudera.

the class AvroRecordReader method initialize.

@Override
public void initialize(InputSplit genericSplit, TaskAttemptContext context) throws IOException, InterruptedException {
    FileSplit split = (FileSplit) genericSplit;
    Configuration conf = context.getConfiguration();
    SeekableInput in = new FsInput(split.getPath(), conf);
    DatumReader<T> datumReader = null;
    if (context.getConfiguration().getBoolean(AvroJob.INPUT_IS_REFLECT, true)) {
        ReflectDataFactory factory = Avros.getReflectDataFactory(conf);
        datumReader = factory.getReader(schema);
    } else {
        datumReader = new SpecificDatumReader<T>(schema);
    }
    this.reader = DataFileReader.openReader(in, datumReader);
    // sync to start
    reader.sync(split.getStart());
    this.start = reader.tell();
    this.end = split.getStart() + split.getLength();
}
Also used : Configuration(org.apache.hadoop.conf.Configuration) FsInput(org.apache.avro.mapred.FsInput) SeekableInput(org.apache.avro.file.SeekableInput) FileSplit(org.apache.hadoop.mapreduce.lib.input.FileSplit)

Example 8 with FsInput

use of org.apache.avro.mapred.FsInput in project gora by apache.

the class DataFileAvroStore method executePartial.

@Override
protected Result<K, T> executePartial(FileSplitPartitionQuery<K, T> query) throws IOException {
    FsInput fsInput = createFsInput();
    DataFileReader<T> reader = createReader(fsInput);
    return new DataFileAvroResult<>(this, query, reader, fsInput, query.getStart(), query.getLength());
}
Also used : DataFileAvroResult(org.apache.gora.avro.query.DataFileAvroResult) FsInput(org.apache.avro.mapred.FsInput)

Example 9 with FsInput

use of org.apache.avro.mapred.FsInput in project incubator-gobblin by apache.

the class TestAvroExtractor method getRecordFromFile.

public static List<GenericRecord> getRecordFromFile(String path) throws IOException {
    Configuration config = new Configuration();
    SeekableInput input = new FsInput(new Path(path), config);
    DatumReader<GenericRecord> reader1 = new GenericDatumReader<>();
    FileReader<GenericRecord> fileReader = DataFileReader.openReader(input, reader1);
    List<GenericRecord> records = new ArrayList<>();
    for (GenericRecord datum : fileReader) {
        records.add(datum);
    }
    fileReader.close();
    return records;
}
Also used : Path(org.apache.hadoop.fs.Path) Configuration(org.apache.hadoop.conf.Configuration) FsInput(org.apache.avro.mapred.FsInput) GenericDatumReader(org.apache.avro.generic.GenericDatumReader) ArrayList(java.util.ArrayList) SeekableInput(org.apache.avro.file.SeekableInput) GenericRecord(org.apache.avro.generic.GenericRecord)

Example 10 with FsInput

use of org.apache.avro.mapred.FsInput in project incubator-gobblin by apache.

the class HdfsReader method getFsInput.

public FsInput getFsInput() throws IOException {
    Path path = new Path(this.filePathInHdfs);
    Configuration conf = getConfiguration();
    return new FsInput(path, conf);
}
Also used : Path(org.apache.hadoop.fs.Path) Configuration(org.apache.hadoop.conf.Configuration) FsInput(org.apache.avro.mapred.FsInput)

Aggregations

FsInput (org.apache.avro.mapred.FsInput)11 IOException (java.io.IOException)6 GenericRecord (org.apache.avro.generic.GenericRecord)6 Configuration (org.apache.hadoop.conf.Configuration)5 DataFileReader (org.apache.avro.file.DataFileReader)4 SeekableInput (org.apache.avro.file.SeekableInput)4 GenericDatumReader (org.apache.avro.generic.GenericDatumReader)4 ArrayList (java.util.ArrayList)3 FileStatus (org.apache.hadoop.fs.FileStatus)3 Path (org.apache.hadoop.fs.Path)3 Schema (org.apache.avro.Schema)2 HiddenFilter (org.apache.gobblin.util.filters.HiddenFilter)2 AbstractIterator (com.google.common.collect.AbstractIterator)1 UnmodifiableIterator (com.google.common.collect.UnmodifiableIterator)1 Closer (com.google.common.io.Closer)1 URI (java.net.URI)1 URISyntaxException (java.net.URISyntaxException)1 Properties (java.util.Properties)1 TreeMap (java.util.TreeMap)1 SpecificDatumReader (org.apache.avro.specific.SpecificDatumReader)1