Search in sources :

Example 6 with CloudObjectLocation

use of org.apache.druid.data.input.impl.CloudObjectLocation in project druid by druid-io.

the class OssUtils method deleteObjectsInPath.

/**
 * Delete the files from aliyun OSS in a specified bucket, matching a specified prefix and filter
 *
 * @param client aliyun OSS client
 * @param config specifies the configuration to use when finding matching files in aliyun OSS to delete
 * @param bucket aliyun OSS bucket
 * @param prefix the file prefix
 * @param filter function which returns true if the prefix file found should be deleted and false otherwise.
 * @throws Exception
 */
public static void deleteObjectsInPath(OSS client, OssInputDataConfig config, String bucket, String prefix, Predicate<OSSObjectSummary> filter) throws Exception {
    final List<String> keysToDelete = new ArrayList<>(config.getMaxListingLength());
    final OssObjectSummaryIterator iterator = new OssObjectSummaryIterator(client, ImmutableList.of(new CloudObjectLocation(bucket, prefix).toUri("http")), config.getMaxListingLength());
    while (iterator.hasNext()) {
        final OSSObjectSummary nextObject = iterator.next();
        if (filter.apply(nextObject)) {
            keysToDelete.add(nextObject.getKey());
            if (keysToDelete.size() == config.getMaxListingLength()) {
                deleteBucketKeys(client, bucket, keysToDelete);
                log.info("Deleted %d files", keysToDelete.size());
                keysToDelete.clear();
            }
        }
    }
    if (keysToDelete.size() > 0) {
        deleteBucketKeys(client, bucket, keysToDelete);
        log.info("Deleted %d files", keysToDelete.size());
    }
}
Also used : OSSObjectSummary(com.aliyun.oss.model.OSSObjectSummary) CloudObjectLocation(org.apache.druid.data.input.impl.CloudObjectLocation) ArrayList(java.util.ArrayList)

Example 7 with CloudObjectLocation

use of org.apache.druid.data.input.impl.CloudObjectLocation in project druid by druid-io.

the class OssTimestampVersionedDataFinder method getLatestVersion.

/**
 * Gets the key with the most recently modified timestamp.
 * `pattern` is evaluated against the entire key AFTER the path given in `uri`.
 * The substring `pattern` is matched against will have a leading `/` removed.
 * For example `oss://some_bucket/some_prefix/some_key` with a URI of `oss://some_bucket/some_prefix` will match against `some_key`.
 * `oss://some_bucket/some_prefixsome_key` with a URI of `oss://some_bucket/some_prefix` will match against `some_key`
 * `oss://some_bucket/some_prefix//some_key` with a URI of `oss://some_bucket/some_prefix` will match against `/some_key`
 *
 * @param uri     The URI of in the form of `oss://some_bucket/some_key`
 * @param pattern The pattern matcher to determine if a *key* is of interest, or `null` to match everything.
 * @return A URI to the most recently modified object which matched the pattern.
 */
@Override
public URI getLatestVersion(final URI uri, @Nullable final Pattern pattern) {
    try {
        final CloudObjectLocation coords = new CloudObjectLocation(OssUtils.checkURI(uri));
        long mostRecent = Long.MIN_VALUE;
        URI latest = null;
        final Iterator<OSSObjectSummary> objectSummaryIterator = OssUtils.objectSummaryIterator(client, Collections.singletonList(uri), OssUtils.MAX_LISTING_LENGTH);
        while (objectSummaryIterator.hasNext()) {
            final OSSObjectSummary objectSummary = objectSummaryIterator.next();
            final CloudObjectLocation objectLocation = OssUtils.summaryToCloudObjectLocation(objectSummary);
            // remove coords path prefix from object path
            String keyString = StringUtils.maybeRemoveLeadingSlash(objectLocation.getPath().substring(coords.getPath().length()));
            if (pattern != null && !pattern.matcher(keyString).matches()) {
                continue;
            }
            final long latestModified = objectSummary.getLastModified().getTime();
            if (latestModified >= mostRecent) {
                mostRecent = latestModified;
                latest = objectLocation.toUri(OssStorageDruidModule.SCHEME);
            }
        }
        return latest;
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}
Also used : OSSObjectSummary(com.aliyun.oss.model.OSSObjectSummary) CloudObjectLocation(org.apache.druid.data.input.impl.CloudObjectLocation) URI(java.net.URI)

Example 8 with CloudObjectLocation

use of org.apache.druid.data.input.impl.CloudObjectLocation in project druid by druid-io.

the class OssInputSourceTest method testInputSourceUseDefaultPasswordWhenCloudConfigPropertiesWithoutCrediential.

@Test
public void testInputSourceUseDefaultPasswordWhenCloudConfigPropertiesWithoutCrediential() {
    OssClientConfig mockConfigPropertiesWithoutKeyAndSecret = EasyMock.createMock(OssClientConfig.class);
    EasyMock.reset(mockConfigPropertiesWithoutKeyAndSecret);
    EasyMock.expect(mockConfigPropertiesWithoutKeyAndSecret.isCredentialsConfigured()).andStubReturn(false);
    EasyMock.expect(mockConfigPropertiesWithoutKeyAndSecret.buildClient()).andReturn(OSSCLIENT);
    EasyMock.replay(mockConfigPropertiesWithoutKeyAndSecret);
    final OssInputSource withPrefixes = new OssInputSource(OSSCLIENT, INPUT_DATA_CONFIG, null, null, EXPECTED_LOCATION, mockConfigPropertiesWithoutKeyAndSecret);
    Assert.assertNotNull(withPrefixes);
    withPrefixes.createEntity(new CloudObjectLocation("bucket", "path"));
    EasyMock.verify(mockConfigPropertiesWithoutKeyAndSecret);
}
Also used : CloudObjectLocation(org.apache.druid.data.input.impl.CloudObjectLocation) InitializedNullHandlingTest(org.apache.druid.testing.InitializedNullHandlingTest) Test(org.junit.Test)

Example 9 with CloudObjectLocation

use of org.apache.druid.data.input.impl.CloudObjectLocation in project druid by druid-io.

the class OssInputSourceTest method testCreateSplitsWithSplitHintSpecRespectingHint.

@Test
public void testCreateSplitsWithSplitHintSpecRespectingHint() {
    EasyMock.reset(OSSCLIENT);
    expectListObjects(PREFIXES.get(0), ImmutableList.of(EXPECTED_URIS.get(0)), CONTENT);
    expectListObjects(PREFIXES.get(1), ImmutableList.of(EXPECTED_URIS.get(1)), CONTENT);
    EasyMock.replay(OSSCLIENT);
    OssInputSource inputSource = new OssInputSource(OSSCLIENT, INPUT_DATA_CONFIG, null, PREFIXES, null, null);
    Stream<InputSplit<List<CloudObjectLocation>>> splits = inputSource.createSplits(new JsonInputFormat(JSONPathSpec.DEFAULT, null, null), new MaxSizeSplitHintSpec(new HumanReadableBytes(CONTENT.length * 3L), null));
    Assert.assertEquals(ImmutableList.of(EXPECTED_URIS.stream().map(CloudObjectLocation::new).collect(Collectors.toList())), splits.map(InputSplit::get).collect(Collectors.toList()));
    EasyMock.verify(OSSCLIENT);
}
Also used : JsonInputFormat(org.apache.druid.data.input.impl.JsonInputFormat) CloudObjectLocation(org.apache.druid.data.input.impl.CloudObjectLocation) HumanReadableBytes(org.apache.druid.java.util.common.HumanReadableBytes) InputSplit(org.apache.druid.data.input.InputSplit) MaxSizeSplitHintSpec(org.apache.druid.data.input.MaxSizeSplitHintSpec) InitializedNullHandlingTest(org.apache.druid.testing.InitializedNullHandlingTest) Test(org.junit.Test)

Example 10 with CloudObjectLocation

use of org.apache.druid.data.input.impl.CloudObjectLocation in project druid by druid-io.

the class OssInputSourceTest method testWithUrisSplit.

@Test
public void testWithUrisSplit() {
    OssInputSource inputSource = new OssInputSource(OSSCLIENT, INPUT_DATA_CONFIG, EXPECTED_URIS, null, null, null);
    Stream<InputSplit<List<CloudObjectLocation>>> splits = inputSource.createSplits(new JsonInputFormat(JSONPathSpec.DEFAULT, null, null), null);
    Assert.assertEquals(EXPECTED_COORDS, splits.map(InputSplit::get).collect(Collectors.toList()));
}
Also used : JsonInputFormat(org.apache.druid.data.input.impl.JsonInputFormat) CloudObjectLocation(org.apache.druid.data.input.impl.CloudObjectLocation) InputSplit(org.apache.druid.data.input.InputSplit) InitializedNullHandlingTest(org.apache.druid.testing.InitializedNullHandlingTest) Test(org.junit.Test)

Aggregations

CloudObjectLocation (org.apache.druid.data.input.impl.CloudObjectLocation)34 Test (org.junit.Test)21 InitializedNullHandlingTest (org.apache.druid.testing.InitializedNullHandlingTest)15 InputSplit (org.apache.druid.data.input.InputSplit)13 JsonInputFormat (org.apache.druid.data.input.impl.JsonInputFormat)11 OSSObjectSummary (com.aliyun.oss.model.OSSObjectSummary)6 S3ObjectSummary (com.amazonaws.services.s3.model.S3ObjectSummary)6 MaxSizeSplitHintSpec (org.apache.druid.data.input.MaxSizeSplitHintSpec)6 URI (java.net.URI)5 File (java.io.File)4 FileInputStream (java.io.FileInputStream)4 FileOutputStream (java.io.FileOutputStream)4 OutputStream (java.io.OutputStream)4 Date (java.util.Date)4 GZIPOutputStream (java.util.zip.GZIPOutputStream)4 FileUtils (org.apache.druid.java.util.common.FileUtils)4 IOE (org.apache.druid.java.util.common.IOE)4 OSSException (com.aliyun.oss.OSSException)3 OSSObject (com.aliyun.oss.model.OSSObject)3 S3Object (com.amazonaws.services.s3.model.S3Object)3