use of bio.terra.service.snapshot.exception.CorruptMetadataException in project jade-data-repo by DataBiosphere.
the class DatasetDao method retrieveWorker.
private Dataset retrieveWorker(DatasetSummary summary) {
Dataset dataset = null;
try {
if (summary != null) {
dataset = new Dataset(summary);
dataset.tables(tableDao.retrieveTables(dataset.getId()));
relationshipDao.retrieve(dataset);
assetDao.retrieve(dataset);
}
return dataset;
} catch (EmptyResultDataAccessException ex) {
throw new CorruptMetadataException("Inconsistent data", ex);
}
}
use of bio.terra.service.snapshot.exception.CorruptMetadataException in project jade-data-repo by DataBiosphere.
the class LoadDao method lockLoad.
// -- load tags public methods --
// This must be serializable so that conflicting updates of the locked state and flightid
// are detected. We lock the table so that we avoid serialization errors.
/**
* We implement a rule that one load job can use one load tag at a time. That rule is needed to control
* concurrent operations. For example, a delete-by-load-tag cannot compete with a load; two loads cannot
* run in parallel with the same load tag - it confuses the algorithm for re-running a load with a load tag
* and skipping already-loaded files.
*
* This call and the unlock call use a load table in the database to record that a load tag is in use.
* The load tag is associated with a load id (a guid); that guid is a foreign key to the load_file table
* that maintains the state of files being loaded.
*
* We expect conflicts on load tags to be rare. The typical case will be: a load starts, runs, and ends
* without conflict and with a re-run.
*
* We learned from the first implementation of this code that when there were conflicts, we would get
* serialization errors from Postgres. Those require building retry logic. Instead, we chose to use
* table locks to serialize access to the load table during the time we are setting and freeing the
* our load lock state.
*
* A lock is taken by creating the load tag row and storing the flight id holding the lock.
* The lock is freed by deleting the load tag row. Code can safely re-lock a load tag lock it holds and
* unlock a load tag lock it has freed.
*
* There is never a case where a lock row is updated. They are only ever inserted or deleted.
*
* @param loadTag tag identifying this load
* @param flightId flight id taking the lock
* @return Load object including the load id
*/
@Transactional(propagation = Propagation.REQUIRED, isolation = Isolation.SERIALIZABLE)
public Load lockLoad(String loadTag, String flightId) throws InterruptedException {
jdbcTemplate.getJdbcTemplate().execute("LOCK TABLE load IN EXCLUSIVE MODE");
String upsert = "INSERT INTO load (load_tag, locked, locking_flight_id)" + " VALUES (:load_tag, true, :flight_id)" + " ON CONFLICT ON CONSTRAINT load_load_tag_key DO NOTHING";
MapSqlParameterSource params = new MapSqlParameterSource().addValue("load_tag", loadTag).addValue("flight_id", flightId);
DaoKeyHolder keyHolder = new DaoKeyHolder();
int rows = jdbcTemplate.update(upsert, params, keyHolder);
Load load;
if (rows == 0) {
// We did not insert. Therefore, someone has the load tag locked.
// Retrieve it, in case it is us re-locking
load = lookupLoadByTag(loadTag);
if (load == null) {
throw new CorruptMetadataException("Load row should exist! Load tag: " + loadTag);
}
// It is locked by someone else
if (!StringUtils.equals(load.getLockingFlightId(), flightId)) {
throw new LoadLockedException("Load '" + loadTag + "' is locked by flight '" + load.getLockingFlightId() + "'");
}
} else {
load = new Load().id(keyHolder.getId()).loadTag(keyHolder.getString("load_tag")).locked(keyHolder.getField("locked", Boolean.class)).lockingFlightId(keyHolder.getString("locking_flight_id"));
}
return load;
}
use of bio.terra.service.snapshot.exception.CorruptMetadataException in project jade-data-repo by DataBiosphere.
the class GoogleResourceDao method getBucket.
/**
* Fetch an existing bucket_resource metadata row using the name amd project id.
* This method expects that there is exactly one row matching the provided name and project id.
* @param bucketRequest
* @return a reference to the bucket as a POJO GoogleBucketResource or null if not found
* @throws GoogleResourceException if the bucket matches, but is in the wrong project
* @throws CorruptMetadataException if multiple buckets have the same name
*/
public GoogleBucketResource getBucket(GoogleBucketRequest bucketRequest) {
String bucketName = bucketRequest.getBucketName();
List<GoogleBucketResource> bucketResourcesByName = retrieveBucketsBy("name", bucketName, String.class);
if (bucketResourcesByName == null || bucketResourcesByName.size() == 0) {
return null;
}
if (bucketResourcesByName.size() > 1) {
// this also never happen because Google bucket names are unique
throw new CorruptMetadataException("Multiple buckets found with same name: " + bucketName);
}
GoogleBucketResource bucketResource = bucketResourcesByName.get(0);
UUID foundProjectId = bucketResource.getProjectResource().getRepositoryId();
UUID requestedProjectId = bucketRequest.getGoogleProjectResource().getRepositoryId();
if (!foundProjectId.equals(requestedProjectId)) {
// there is a bucket with this name in our metadata, but it's for a different project
throw new GoogleResourceException(String.format("A bucket with this name already exists for a different project: %s, %s", bucketName, requestedProjectId));
}
return bucketResource;
}
use of bio.terra.service.snapshot.exception.CorruptMetadataException in project jade-data-repo by DataBiosphere.
the class GoogleResourceService method getOrCreateBucket.
/**
* Fetch/create a bucket cloud resource and the associated metadata in the bucket_resource table.
*
* On entry to this method, there are 9 states along 3 main dimensions:
* Google Bucket - exists or not
* DR Metadata record - exists or not
* DR Metadata lock state (only if record exists):
* - not locked
* - locked by this flight
* - locked by another flight
* In addition, there is one case where it matters if we are reusing buckets or not.
*
* Itemizing the 9 cases:
* CASE 1: bucket exists, record exists, record is unlocked
* The predominant case. We return the bucket resource
*
* CASE 2: bucket exists, record exists, locked by another flight
* We have to wait until the other flight finishes creating the bucket. Throw BucketLockFailureException.
* We expect the calling Step to retry on that exception.
*
* CASE 3: bucket exists, record exists, locked by us
* This flight created the bucket, but failed before we could unlock it. So, we unlock and
* return the bucket resource.
*
* CASE 4: bucket exists, no record exists, we are allowed to reuse buckets
* This is a common case in development where we re-use the same cloud resources over and over during
* testing rather than continually create and destroy them. In this case, we proceed with the
* try-to-create-bucket-metadata algorithm.
*
* CASE 5: bucket exists, no record exists, we are not reusing buckets
* This is the production mode and should not happen. It means we our metadata does not reflect the
* actual cloud resources. Throw CorruptMetadataException
*
* CASE 6: no bucket exists, record exists, not locked
* This should not happen. Throw CorruptMetadataException
*
* CASE 7: no bucket exists, record exists, locked by another flight
* We have to wait until the other flight finishes creating the bucket. Throw BucketLockFailureException.
* We expect the calling Step to retry on that exception.
*
* CASE 8: no bucket exists, record exists, locked by this flight
* We must have failed after creating and locking the record, but before creating the bucket.
* Proceed with the finish-trying-to-create-bucket algorithm
*
* CASE 9: no bucket exists, no record exists
* Proceed with try-to-create-bucket algorithm
*
* The algorithm to create a bucket is like a miniature flight and we implement it as a set
* of methods that chain to make the whole algorithm:
* 1. createMetadataRecord: create and lock the metadata record; then
* 2. createCloudBucket: if the bucket does not exist, create it; then
* 3. createFinish: unlock the metadata record
* The algorithm may fail between any of those steps, so we may arrive in this method needing to
* do some or all of those steps.
*
* @param bucketRequest request for a new or existing bucket
* @param flightId flight making the request
* @return a reference to the bucket as a POJO GoogleBucketResource
* @throws CorruptMetadataException in CASE 5 and CASE 6
* @throws BucketLockFailureException in CASE 2 and CASE 7, and sometimes case 9
*/
public GoogleBucketResource getOrCreateBucket(GoogleBucketRequest bucketRequest, String flightId) throws InterruptedException {
logger.info("application property allowReuseExistingBuckets = " + allowReuseExistingBuckets);
String bucketName = bucketRequest.getBucketName();
// Try to get the bucket record and the bucket object
GoogleBucketResource googleBucketResource = resourceDao.getBucket(bucketRequest);
Bucket bucket = getBucket(bucketRequest.getBucketName());
// Test all of the cases
if (bucket != null) {
if (googleBucketResource != null) {
String lockingFlightId = googleBucketResource.getFlightId();
if (lockingFlightId == null) {
// CASE 1: everything exists and is unlocked
return googleBucketResource;
}
if (!StringUtils.equals(lockingFlightId, flightId)) {
// CASE 2: another flight is creating the bucket
throw bucketLockException(flightId);
}
// CASE 3: we have the flight locked, but we did all of the creating.
return createFinish(bucket, flightId, googleBucketResource);
} else {
// bucket exists, but metadata record does not exist.
if (allowReuseExistingBuckets) {
// CASE 4: go ahead and reuse the bucket
return createMetadataRecord(bucketRequest, flightId);
} else {
// CASE 5:
throw new CorruptMetadataException("Bucket already exists, metadata out of sync with cloud state: " + bucketName);
}
}
} else {
// bucket does not exist
if (googleBucketResource != null) {
String lockingFlightId = googleBucketResource.getFlightId();
if (lockingFlightId == null) {
// CASE 6: no bucket, but the metadata record exists unlocked
throw new CorruptMetadataException("Bucket does not exist, metadata out of sync with cloud state: " + bucketName);
}
if (!StringUtils.equals(lockingFlightId, flightId)) {
// CASE 7: another flight is creating the bucket
throw bucketLockException(flightId);
}
// CASE 8: this flight has the metadata locked, but didn't finish creating the bucket
return createCloudBucket(bucketRequest, flightId, googleBucketResource);
} else {
// CASE 9: no bucket and no record
return createMetadataRecord(bucketRequest, flightId);
}
}
}
use of bio.terra.service.snapshot.exception.CorruptMetadataException in project jade-data-repo by DataBiosphere.
the class SnapshotMapTableDao method retrieveMapColumns.
public List<SnapshotMapColumn> retrieveMapColumns(UUID mapTableId, Table fromTable, Table toTable) {
String sql = "SELECT id, from_column_id, to_column_id" + " FROM snapshot_map_column WHERE map_table_id = :map_table_id";
List<SnapshotMapColumn> mapColumns = jdbcTemplate.query(sql, new MapSqlParameterSource().addValue("map_table_id", mapTableId), (rs, rowNum) -> {
UUID fromId = rs.getObject("from_column_id", UUID.class);
Optional<Column> datasetColumn = fromTable.getColumnById(fromId);
if (!datasetColumn.isPresent()) {
throw new CorruptMetadataException("Dataset column referenced by snapshot source map column was not found");
}
UUID toId = rs.getObject("to_column_id", UUID.class);
Optional<Column> snapshotColumn = toTable.getColumnById(toId);
if (!snapshotColumn.isPresent()) {
throw new CorruptMetadataException("Snapshot column referenced by snapshot source map column was not found");
}
return new SnapshotMapColumn().id(rs.getObject("from_column_id", UUID.class)).fromColumn(datasetColumn.get()).toColumn(snapshotColumn.get());
});
return mapColumns;
}
Aggregations