Search in sources :

Example 6 with FileBasedHelperException

use of org.apache.gobblin.source.extractor.filebased.FileBasedHelperException in project incubator-gobblin by apache.

the class GoogleDriveSource method getExtractor.

/**
 * As Google Drive extractor needs file system helper, it invokes to initialize file system helper.
 * {@inheritDoc}
 * @see org.apache.gobblin.source.Source#getExtractor(org.apache.gobblin.configuration.WorkUnitState)
 */
@Override
public Extractor<S, D> getExtractor(WorkUnitState state) throws IOException {
    Preconditions.checkNotNull(state, "WorkUnitState should not be null");
    LOG.info("WorkUnitState from getExtractor: " + state);
    try {
        // GoogleDriveExtractor needs GoogleDriveFsHelper
        initFileSystemHelper(state);
    } catch (FileBasedHelperException e) {
        throw new IOException(e);
    }
    Preconditions.checkNotNull(fsHelper, "File system helper should not be null");
    return new GoogleDriveExtractor<>(state, fsHelper);
}
Also used : FileBasedHelperException(org.apache.gobblin.source.extractor.filebased.FileBasedHelperException) IOException(java.io.IOException)

Example 7 with FileBasedHelperException

use of org.apache.gobblin.source.extractor.filebased.FileBasedHelperException in project incubator-gobblin by apache.

the class DatePartitionedNestedRetriever method getFilesToProcess.

@Override
public List<FileInfo> getFilesToProcess(long minWatermark, int maxFilesToReturn) throws IOException {
    DateTime currentDay = new DateTime().minus(leadTimeDuration);
    DateTime lowWaterMarkDate = new DateTime(minWatermark);
    List<FileInfo> filesToProcess = new ArrayList<>();
    try {
        helper.connect();
        this.fs = helper.getFileSystem();
    } catch (FileBasedHelperException e) {
        throw new IOException("Error initializing FileSystem", e);
    }
    for (DateTime date = lowWaterMarkDate; !date.isAfter(currentDay) && filesToProcess.size() < maxFilesToReturn; date = date.withFieldAdded(incrementalUnit, 1)) {
        // Constructs the path folder - e.g. /my/data/prefix/2015/01/01/suffix
        Path sourcePath = constructSourcePath(date);
        if (this.fs.exists(sourcePath)) {
            for (FileStatus fileStatus : this.fs.listStatus(sourcePath, getFileFilter())) {
                LOG.info("Will process file " + fileStatus.getPath());
                filesToProcess.add(new FileInfo(fileStatus.getPath().toString(), fileStatus.getLen(), date.getMillis()));
            }
        }
    }
    return filesToProcess;
}
Also used : Path(org.apache.hadoop.fs.Path) FileStatus(org.apache.hadoop.fs.FileStatus) FileBasedHelperException(org.apache.gobblin.source.extractor.filebased.FileBasedHelperException) ArrayList(java.util.ArrayList) IOException(java.io.IOException) DateTime(org.joda.time.DateTime)

Example 8 with FileBasedHelperException

use of org.apache.gobblin.source.extractor.filebased.FileBasedHelperException in project incubator-gobblin by apache.

the class PartitionedFileSourceBase method init.

/**
 * Gobblin calls the {@link Source#getWorkunits(SourceState)} method after creating a {@link Source} object with a
 * blank constructor, so any custom initialization of the object needs to be done here.
 */
protected void init(SourceState state) {
    retriever.init(state);
    try {
        initFileSystemHelper(state);
    } catch (FileBasedHelperException e) {
        Throwables.propagate(e);
    }
    AvroFsHelper fsHelper = (AvroFsHelper) this.fsHelper;
    this.fs = fsHelper.getFileSystem();
    this.sourceState = state;
    this.lowWaterMark = getLowWaterMark(state.getPreviousWorkUnitStates(), state.getProp(DATE_PARTITIONED_SOURCE_MIN_WATERMARK_VALUE, String.valueOf(DEFAULT_DATE_PARTITIONED_SOURCE_MIN_WATERMARK_VALUE)));
    this.maxFilesPerJob = state.getPropAsInt(DATE_PARTITIONED_SOURCE_MAX_FILES_PER_JOB, DEFAULT_DATE_PARTITIONED_SOURCE_MAX_FILES_PER_JOB);
    this.maxWorkUnitsPerJob = state.getPropAsInt(DATE_PARTITIONED_SOURCE_MAX_WORKUNITS_PER_JOB, DEFAULT_DATE_PARTITIONED_SOURCE_MAX_WORKUNITS_PER_JOB);
    this.tableType = TableType.valueOf(state.getProp(ConfigurationKeys.EXTRACT_TABLE_TYPE_KEY).toUpperCase());
    this.fileCount = 0;
    this.sourceDir = new Path(state.getProp(ConfigurationKeys.SOURCE_FILEBASED_DATA_DIRECTORY));
}
Also used : Path(org.apache.hadoop.fs.Path) FileBasedHelperException(org.apache.gobblin.source.extractor.filebased.FileBasedHelperException) AvroFsHelper(org.apache.gobblin.source.extractor.hadoop.AvroFsHelper)

Example 9 with FileBasedHelperException

use of org.apache.gobblin.source.extractor.filebased.FileBasedHelperException in project incubator-gobblin by apache.

the class GoogleDriveFsHelper method getFileMTime.

@Override
public long getFileMTime(String fileId) throws FileBasedHelperException {
    Preconditions.checkNotNull(fileId, "fileId is required");
    Path p = new Path(fileId);
    try {
        FileStatus status = fileSystem.getFileStatus(p);
        return status.getModificationTime();
    } catch (IOException e) {
        throw new FileBasedHelperException("Failed to retrieve getModificationTime on path: " + p + " , fileId: " + fileId, e);
    }
}
Also used : Path(org.apache.hadoop.fs.Path) FileStatus(org.apache.hadoop.fs.FileStatus) FileBasedHelperException(org.apache.gobblin.source.extractor.filebased.FileBasedHelperException) IOException(java.io.IOException)

Example 10 with FileBasedHelperException

use of org.apache.gobblin.source.extractor.filebased.FileBasedHelperException in project incubator-gobblin by apache.

the class GoogleDriveFsHelper method getFileStream.

@Override
public InputStream getFileStream(String fileId) throws FileBasedHelperException {
    Preconditions.checkNotNull(fileId, "fileId is required");
    Path p = new Path(fileId);
    try {
        if (bufferSizeByte.isPresent()) {
            return fileSystem.open(p, bufferSizeByte.get());
        }
        return fileSystem.open(p);
    } catch (IOException e) {
        throw new FileBasedHelperException("Failed to open files stream on path: " + p + " , fileId: " + fileId, e);
    }
}
Also used : Path(org.apache.hadoop.fs.Path) FileBasedHelperException(org.apache.gobblin.source.extractor.filebased.FileBasedHelperException) IOException(java.io.IOException)

Aggregations

FileBasedHelperException (org.apache.gobblin.source.extractor.filebased.FileBasedHelperException)15 IOException (java.io.IOException)10 Path (org.apache.hadoop.fs.Path)9 FileStatus (org.apache.hadoop.fs.FileStatus)6 ArrayList (java.util.ArrayList)4 ChannelSftp (com.jcraft.jsch.ChannelSftp)3 SftpException (com.jcraft.jsch.SftpException)3 DateTime (org.joda.time.DateTime)2 LsEntry (com.jcraft.jsch.ChannelSftp.LsEntry)1 JSch (com.jcraft.jsch.JSch)1 JSchException (com.jcraft.jsch.JSchException)1 ProxyHTTP (com.jcraft.jsch.ProxyHTTP)1 UserInfo (com.jcraft.jsch.UserInfo)1 FileNotFoundException (java.io.FileNotFoundException)1 InputStream (java.io.InputStream)1 State (org.apache.gobblin.configuration.State)1 AvroFsHelper (org.apache.gobblin.source.extractor.hadoop.AvroFsHelper)1 FileSystem (org.apache.hadoop.fs.FileSystem)1 CompressionCodec (org.apache.hadoop.io.compress.CompressionCodec)1 CompressionCodecFactory (org.apache.hadoop.io.compress.CompressionCodecFactory)1