Search in sources :

Example 1 with CorruptMetadataException

use of bio.terra.service.snapshot.exception.CorruptMetadataException in project jade-data-repo by DataBiosphere.

the class DatasetDao method retrieveWorker.

private Dataset retrieveWorker(DatasetSummary summary) {
    Dataset dataset = null;
    try {
        if (summary != null) {
            dataset = new Dataset(summary);
            dataset.tables(tableDao.retrieveTables(dataset.getId()));
            relationshipDao.retrieve(dataset);
            assetDao.retrieve(dataset);
        }
        return dataset;
    } catch (EmptyResultDataAccessException ex) {
        throw new CorruptMetadataException("Inconsistent data", ex);
    }
}
Also used : CorruptMetadataException(bio.terra.service.snapshot.exception.CorruptMetadataException)

Example 2 with CorruptMetadataException

use of bio.terra.service.snapshot.exception.CorruptMetadataException in project jade-data-repo by DataBiosphere.

the class LoadDao method lockLoad.

// -- load tags public methods --
// This must be serializable so that conflicting updates of the locked state and flightid
// are detected. We lock the table so that we avoid serialization errors.
/**
 * We implement a rule that one load job can use one load tag at a time. That rule is needed to control
 * concurrent operations. For example, a delete-by-load-tag cannot compete with a load; two loads cannot
 * run in parallel with the same load tag - it confuses the algorithm for re-running a load with a load tag
 * and skipping already-loaded files.
 *
 * This call and the unlock call use a load table in the database to record that a load tag is in use.
 * The load tag is associated with a load id (a guid); that guid is a foreign key to the load_file table
 * that maintains the state of files being loaded.
 *
 * We expect conflicts on load tags to be rare. The typical case will be: a load starts, runs, and ends
 * without conflict and with a re-run.
 *
 * We learned from the first implementation of this code that when there were conflicts, we would get
 * serialization errors from Postgres. Those require building retry logic. Instead, we chose to use
 * table locks to serialize access to the load table during the time we are setting and freeing the
 * our load lock state.
 *
 * A lock is taken by creating the load tag row and storing the flight id holding the lock.
 * The lock is freed by deleting the load tag row. Code can safely re-lock a load tag lock it holds and
 * unlock a load tag lock it has freed.
 *
 * There is never a case where a lock row is updated. They are only ever inserted or deleted.
 *
 * @param loadTag tag identifying this load
 * @param flightId flight id taking the lock
 * @return Load object including the load id
 */
@Transactional(propagation = Propagation.REQUIRED, isolation = Isolation.SERIALIZABLE)
public Load lockLoad(String loadTag, String flightId) throws InterruptedException {
    jdbcTemplate.getJdbcTemplate().execute("LOCK TABLE load IN EXCLUSIVE MODE");
    String upsert = "INSERT INTO load (load_tag, locked, locking_flight_id)" + " VALUES (:load_tag, true, :flight_id)" + " ON CONFLICT ON CONSTRAINT load_load_tag_key DO NOTHING";
    MapSqlParameterSource params = new MapSqlParameterSource().addValue("load_tag", loadTag).addValue("flight_id", flightId);
    DaoKeyHolder keyHolder = new DaoKeyHolder();
    int rows = jdbcTemplate.update(upsert, params, keyHolder);
    Load load;
    if (rows == 0) {
        // We did not insert. Therefore, someone has the load tag locked.
        // Retrieve it, in case it is us re-locking
        load = lookupLoadByTag(loadTag);
        if (load == null) {
            throw new CorruptMetadataException("Load row should exist! Load tag: " + loadTag);
        }
        // It is locked by someone else
        if (!StringUtils.equals(load.getLockingFlightId(), flightId)) {
            throw new LoadLockedException("Load '" + loadTag + "' is locked by flight '" + load.getLockingFlightId() + "'");
        }
    } else {
        load = new Load().id(keyHolder.getId()).loadTag(keyHolder.getString("load_tag")).locked(keyHolder.getField("locked", Boolean.class)).lockingFlightId(keyHolder.getString("locking_flight_id"));
    }
    return load;
}
Also used : MapSqlParameterSource(org.springframework.jdbc.core.namedparam.MapSqlParameterSource) DaoKeyHolder(bio.terra.common.DaoKeyHolder) LoadLockedException(bio.terra.service.load.exception.LoadLockedException) CorruptMetadataException(bio.terra.service.snapshot.exception.CorruptMetadataException) Transactional(org.springframework.transaction.annotation.Transactional)

Example 3 with CorruptMetadataException

use of bio.terra.service.snapshot.exception.CorruptMetadataException in project jade-data-repo by DataBiosphere.

the class GoogleResourceDao method getBucket.

/**
 * Fetch an existing bucket_resource metadata row using the name amd project id.
 * This method expects that there is exactly one row matching the provided name and project id.
 * @param bucketRequest
 * @return a reference to the bucket as a POJO GoogleBucketResource or null if not found
 * @throws GoogleResourceException if the bucket matches, but is in the wrong project
 * @throws CorruptMetadataException if multiple buckets have the same name
 */
public GoogleBucketResource getBucket(GoogleBucketRequest bucketRequest) {
    String bucketName = bucketRequest.getBucketName();
    List<GoogleBucketResource> bucketResourcesByName = retrieveBucketsBy("name", bucketName, String.class);
    if (bucketResourcesByName == null || bucketResourcesByName.size() == 0) {
        return null;
    }
    if (bucketResourcesByName.size() > 1) {
        // this also never happen because Google bucket names are unique
        throw new CorruptMetadataException("Multiple buckets found with same name: " + bucketName);
    }
    GoogleBucketResource bucketResource = bucketResourcesByName.get(0);
    UUID foundProjectId = bucketResource.getProjectResource().getRepositoryId();
    UUID requestedProjectId = bucketRequest.getGoogleProjectResource().getRepositoryId();
    if (!foundProjectId.equals(requestedProjectId)) {
        // there is a bucket with this name in our metadata, but it's for a different project
        throw new GoogleResourceException(String.format("A bucket with this name already exists for a different project: %s, %s", bucketName, requestedProjectId));
    }
    return bucketResource;
}
Also used : CorruptMetadataException(bio.terra.service.snapshot.exception.CorruptMetadataException) UUID(java.util.UUID) GoogleResourceException(bio.terra.service.resourcemanagement.exception.GoogleResourceException)

Example 4 with CorruptMetadataException

use of bio.terra.service.snapshot.exception.CorruptMetadataException in project jade-data-repo by DataBiosphere.

the class GoogleResourceService method getOrCreateBucket.

/**
 * Fetch/create a bucket cloud resource and the associated metadata in the bucket_resource table.
 *
 * On entry to this method, there are 9 states along 3 main dimensions:
 * Google Bucket - exists or not
 * DR Metadata record - exists or not
 * DR Metadata lock state (only if record exists):
 *  - not locked
 *  - locked by this flight
 *  - locked by another flight
 * In addition, there is one case where it matters if we are reusing buckets or not.
 *
 * Itemizing the 9 cases:
 * CASE 1: bucket exists, record exists, record is unlocked
 *   The predominant case. We return the bucket resource
 *
 * CASE 2: bucket exists, record exists, locked by another flight
 *   We have to wait until the other flight finishes creating the bucket. Throw BucketLockFailureException.
 *   We expect the calling Step to retry on that exception.
 *
 * CASE 3: bucket exists, record exists, locked by us
 *   This flight created the bucket, but failed before we could unlock it. So, we unlock and
 *   return the bucket resource.
 *
 * CASE 4: bucket exists, no record exists, we are allowed to reuse buckets
 *   This is a common case in development where we re-use the same cloud resources over and over during
 *   testing rather than continually create and destroy them. In this case, we proceed with the
 *   try-to-create-bucket-metadata algorithm.
 *
 * CASE 5: bucket exists, no record exists, we are not reusing buckets
 *   This is the production mode and should not happen. It means we our metadata does not reflect the
 *   actual cloud resources. Throw CorruptMetadataException
 *
 * CASE 6: no bucket exists, record exists, not locked
 *   This should not happen. Throw CorruptMetadataException
 *
 * CASE 7: no bucket exists, record exists, locked by another flight
 *   We have to wait until the other flight finishes creating the bucket. Throw BucketLockFailureException.
 *   We expect the calling Step to retry on that exception.
 *
 * CASE 8: no bucket exists, record exists, locked by this flight
 *   We must have failed after creating and locking the record, but before creating the bucket.
 *   Proceed with the finish-trying-to-create-bucket algorithm
 *
 * CASE 9: no bucket exists, no record exists
 *   Proceed with try-to-create-bucket algorithm
 *
 * The algorithm to create a bucket is like a miniature flight and we implement it as a set
 * of methods that chain to make the whole algorithm:
 *  1. createMetadataRecord: create and lock the metadata record; then
 *  2. createCloudBucket: if the bucket does not exist, create it; then
 *  3. createFinish: unlock the metadata record
 * The algorithm may fail between any of those steps, so we may arrive in this method needing to
 * do some or all of those steps.
 *
 * @param bucketRequest request for a new or existing bucket
 * @param flightId flight making the request
 * @return a reference to the bucket as a POJO GoogleBucketResource
 * @throws CorruptMetadataException in CASE 5 and CASE 6
 * @throws BucketLockFailureException in CASE 2 and CASE 7, and sometimes case 9
 */
public GoogleBucketResource getOrCreateBucket(GoogleBucketRequest bucketRequest, String flightId) throws InterruptedException {
    logger.info("application property allowReuseExistingBuckets = " + allowReuseExistingBuckets);
    String bucketName = bucketRequest.getBucketName();
    // Try to get the bucket record and the bucket object
    GoogleBucketResource googleBucketResource = resourceDao.getBucket(bucketRequest);
    Bucket bucket = getBucket(bucketRequest.getBucketName());
    // Test all of the cases
    if (bucket != null) {
        if (googleBucketResource != null) {
            String lockingFlightId = googleBucketResource.getFlightId();
            if (lockingFlightId == null) {
                // CASE 1: everything exists and is unlocked
                return googleBucketResource;
            }
            if (!StringUtils.equals(lockingFlightId, flightId)) {
                // CASE 2: another flight is creating the bucket
                throw bucketLockException(flightId);
            }
            // CASE 3: we have the flight locked, but we did all of the creating.
            return createFinish(bucket, flightId, googleBucketResource);
        } else {
            // bucket exists, but metadata record does not exist.
            if (allowReuseExistingBuckets) {
                // CASE 4: go ahead and reuse the bucket
                return createMetadataRecord(bucketRequest, flightId);
            } else {
                // CASE 5:
                throw new CorruptMetadataException("Bucket already exists, metadata out of sync with cloud state: " + bucketName);
            }
        }
    } else {
        // bucket does not exist
        if (googleBucketResource != null) {
            String lockingFlightId = googleBucketResource.getFlightId();
            if (lockingFlightId == null) {
                // CASE 6: no bucket, but the metadata record exists unlocked
                throw new CorruptMetadataException("Bucket does not exist, metadata out of sync with cloud state: " + bucketName);
            }
            if (!StringUtils.equals(lockingFlightId, flightId)) {
                // CASE 7: another flight is creating the bucket
                throw bucketLockException(flightId);
            }
            // CASE 8: this flight has the metadata locked, but didn't finish creating the bucket
            return createCloudBucket(bucketRequest, flightId, googleBucketResource);
        } else {
            // CASE 9: no bucket and no record
            return createMetadataRecord(bucketRequest, flightId);
        }
    }
}
Also used : Bucket(com.google.cloud.storage.Bucket) CorruptMetadataException(bio.terra.service.snapshot.exception.CorruptMetadataException)

Example 5 with CorruptMetadataException

use of bio.terra.service.snapshot.exception.CorruptMetadataException in project jade-data-repo by DataBiosphere.

the class SnapshotMapTableDao method retrieveMapColumns.

public List<SnapshotMapColumn> retrieveMapColumns(UUID mapTableId, Table fromTable, Table toTable) {
    String sql = "SELECT id, from_column_id, to_column_id" + " FROM snapshot_map_column WHERE map_table_id = :map_table_id";
    List<SnapshotMapColumn> mapColumns = jdbcTemplate.query(sql, new MapSqlParameterSource().addValue("map_table_id", mapTableId), (rs, rowNum) -> {
        UUID fromId = rs.getObject("from_column_id", UUID.class);
        Optional<Column> datasetColumn = fromTable.getColumnById(fromId);
        if (!datasetColumn.isPresent()) {
            throw new CorruptMetadataException("Dataset column referenced by snapshot source map column was not found");
        }
        UUID toId = rs.getObject("to_column_id", UUID.class);
        Optional<Column> snapshotColumn = toTable.getColumnById(toId);
        if (!snapshotColumn.isPresent()) {
            throw new CorruptMetadataException("Snapshot column referenced by snapshot source map column was not found");
        }
        return new SnapshotMapColumn().id(rs.getObject("from_column_id", UUID.class)).fromColumn(datasetColumn.get()).toColumn(snapshotColumn.get());
    });
    return mapColumns;
}
Also used : MapSqlParameterSource(org.springframework.jdbc.core.namedparam.MapSqlParameterSource) Column(bio.terra.common.Column) UUID(java.util.UUID) CorruptMetadataException(bio.terra.service.snapshot.exception.CorruptMetadataException)

Aggregations

CorruptMetadataException (bio.terra.service.snapshot.exception.CorruptMetadataException)13 MapSqlParameterSource (org.springframework.jdbc.core.namedparam.MapSqlParameterSource)7 UUID (java.util.UUID)5 SQLException (java.sql.SQLException)3 Column (bio.terra.common.Column)2 DaoKeyHolder (bio.terra.common.DaoKeyHolder)2 DatasetTable (bio.terra.service.dataset.DatasetTable)2 LoadLockedException (bio.terra.service.load.exception.LoadLockedException)2 GoogleBucketRequest (bio.terra.service.resourcemanagement.google.GoogleBucketRequest)2 GoogleBucketResource (bio.terra.service.resourcemanagement.google.GoogleBucketResource)2 Bucket (com.google.cloud.storage.Bucket)2 ArrayList (java.util.ArrayList)2 Test (org.junit.Test)2 Table (bio.terra.common.Table)1 PdaoException (bio.terra.common.exception.PdaoException)1 BulkLoadFileModel (bio.terra.model.BulkLoadFileModel)1 BulkLoadFileResultModel (bio.terra.model.BulkLoadFileResultModel)1 BulkLoadFileState (bio.terra.model.BulkLoadFileState)1 BulkLoadHistoryModel (bio.terra.model.BulkLoadHistoryModel)1 BulkLoadResultModel (bio.terra.model.BulkLoadResultModel)1