Search in sources :

Example 26 with CloudObjectLocation

use of org.apache.druid.data.input.impl.CloudObjectLocation in project druid by druid-io.

the class GoogleCloudStorageInputSourceTest method addExpectedGetCompressedObjectMock.

private static void addExpectedGetCompressedObjectMock(URI uri) throws IOException {
    CloudObjectLocation location = new CloudObjectLocation(uri);
    ByteArrayOutputStream gzipped = new ByteArrayOutputStream();
    CompressionUtils.gzip(new ByteArrayInputStream(CONTENT), gzipped);
    EasyMock.expect(STORAGE.get(EasyMock.eq(location.getBucket()), EasyMock.eq(location.getPath()), EasyMock.eq(0L))).andReturn(new ByteArrayInputStream(gzipped.toByteArray())).once();
}
Also used : ByteArrayInputStream(java.io.ByteArrayInputStream) CloudObjectLocation(org.apache.druid.data.input.impl.CloudObjectLocation) ByteArrayOutputStream(java.io.ByteArrayOutputStream)

Example 27 with CloudObjectLocation

use of org.apache.druid.data.input.impl.CloudObjectLocation in project druid by druid-io.

the class GoogleCloudStorageInputSourceTest method testSerdeObjects.

@Test
public void testSerdeObjects() throws Exception {
    final ObjectMapper mapper = createGoogleObjectMapper();
    final GoogleCloudStorageInputSource withObjects = new GoogleCloudStorageInputSource(STORAGE, INPUT_DATA_CONFIG, null, null, ImmutableList.of(new CloudObjectLocation("foo", "bar/file.gz")));
    final GoogleCloudStorageInputSource serdeWithObjects = mapper.readValue(mapper.writeValueAsString(withObjects), GoogleCloudStorageInputSource.class);
    Assert.assertEquals(withObjects, serdeWithObjects);
}
Also used : CloudObjectLocation(org.apache.druid.data.input.impl.CloudObjectLocation) ObjectMapper(com.fasterxml.jackson.databind.ObjectMapper) DefaultObjectMapper(org.apache.druid.jackson.DefaultObjectMapper) InitializedNullHandlingTest(org.apache.druid.testing.InitializedNullHandlingTest) Test(org.junit.Test)

Example 28 with CloudObjectLocation

use of org.apache.druid.data.input.impl.CloudObjectLocation in project druid by druid-io.

the class GoogleCloudStorageInputSource method getPrefixesSplitStream.

@Override
protected Stream<InputSplit<List<CloudObjectLocation>>> getPrefixesSplitStream(@Nonnull SplitHintSpec splitHintSpec) {
    final Iterator<List<StorageObject>> splitIterator = splitHintSpec.split(storageObjectIterable().iterator(), storageObject -> {
        final BigInteger sizeInBigInteger = storageObject.getSize();
        long sizeInLong;
        if (sizeInBigInteger == null) {
            sizeInLong = Long.MAX_VALUE;
        } else {
            try {
                sizeInLong = sizeInBigInteger.longValueExact();
            } catch (ArithmeticException e) {
                LOG.warn(e, "The object [%s, %s] has a size [%s] out of the range of the long type. " + "The max long value will be used for its size instead.", storageObject.getBucket(), storageObject.getName(), sizeInBigInteger);
                sizeInLong = Long.MAX_VALUE;
            }
        }
        return new InputFileAttribute(sizeInLong);
    });
    return Streams.sequentialStreamFrom(splitIterator).map(objects -> objects.stream().map(this::byteSourceFromStorageObject).collect(Collectors.toList())).map(InputSplit::new);
}
Also used : Logger(org.apache.druid.java.util.common.logger.Logger) Streams(org.apache.druid.utils.Streams) JsonProperty(com.fasterxml.jackson.annotation.JsonProperty) GoogleStorageDruidModule(org.apache.druid.storage.google.GoogleStorageDruidModule) GoogleUtils(org.apache.druid.storage.google.GoogleUtils) InputSplit(org.apache.druid.data.input.InputSplit) CloudObjectInputSource(org.apache.druid.data.input.impl.CloudObjectInputSource) InputFileAttribute(org.apache.druid.data.input.InputFileAttribute) GoogleInputDataConfig(org.apache.druid.storage.google.GoogleInputDataConfig) BigInteger(java.math.BigInteger) URI(java.net.URI) Nonnull(javax.annotation.Nonnull) Nullable(javax.annotation.Nullable) StorageObject(com.google.api.services.storage.model.StorageObject) JacksonInject(com.fasterxml.jackson.annotation.JacksonInject) GoogleStorage(org.apache.druid.storage.google.GoogleStorage) Iterator(java.util.Iterator) SplitHintSpec(org.apache.druid.data.input.SplitHintSpec) SplittableInputSource(org.apache.druid.data.input.impl.SplittableInputSource) Collectors(java.util.stream.Collectors) List(java.util.List) Stream(java.util.stream.Stream) CloudObjectLocation(org.apache.druid.data.input.impl.CloudObjectLocation) JsonCreator(com.fasterxml.jackson.annotation.JsonCreator) InputEntity(org.apache.druid.data.input.InputEntity) InputFileAttribute(org.apache.druid.data.input.InputFileAttribute) BigInteger(java.math.BigInteger) List(java.util.List) InputSplit(org.apache.druid.data.input.InputSplit)

Example 29 with CloudObjectLocation

use of org.apache.druid.data.input.impl.CloudObjectLocation in project druid by druid-io.

the class S3DataSegmentPullerTest method testGZUncompress.

@Test
public void testGZUncompress() throws IOException, SegmentLoadingException {
    final String bucket = "bucket";
    final String keyPrefix = "prefix/dir/0";
    final ServerSideEncryptingAmazonS3 s3Client = EasyMock.createStrictMock(ServerSideEncryptingAmazonS3.class);
    final byte[] value = bucket.getBytes(StandardCharsets.UTF_8);
    final File tmpFile = temporaryFolder.newFile("gzTest.gz");
    try (OutputStream outputStream = new GZIPOutputStream(new FileOutputStream(tmpFile))) {
        outputStream.write(value);
    }
    final S3Object object0 = new S3Object();
    object0.setBucketName(bucket);
    object0.setKey(keyPrefix + "/renames-0.gz");
    object0.getObjectMetadata().setLastModified(new Date(0));
    object0.setObjectContent(new FileInputStream(tmpFile));
    final S3ObjectSummary objectSummary = new S3ObjectSummary();
    objectSummary.setBucketName(bucket);
    objectSummary.setKey(keyPrefix + "/renames-0.gz");
    objectSummary.setLastModified(new Date(0));
    final File tmpDir = temporaryFolder.newFolder("gzTestDir");
    EasyMock.expect(s3Client.doesObjectExist(EasyMock.eq(object0.getBucketName()), EasyMock.eq(object0.getKey()))).andReturn(true).once();
    EasyMock.expect(s3Client.getObject(EasyMock.eq(object0.getBucketName()), EasyMock.eq(object0.getKey()))).andReturn(object0).once();
    S3DataSegmentPuller puller = new S3DataSegmentPuller(s3Client);
    EasyMock.replay(s3Client);
    FileUtils.FileCopyResult result = puller.getSegmentFiles(new CloudObjectLocation(bucket, object0.getKey()), tmpDir);
    EasyMock.verify(s3Client);
    Assert.assertEquals(value.length, result.size());
    File expected = new File(tmpDir, "renames-0");
    Assert.assertTrue(expected.exists());
    Assert.assertEquals(value.length, expected.length());
}
Also used : FileUtils(org.apache.druid.java.util.common.FileUtils) OutputStream(java.io.OutputStream) FileOutputStream(java.io.FileOutputStream) GZIPOutputStream(java.util.zip.GZIPOutputStream) S3ObjectSummary(com.amazonaws.services.s3.model.S3ObjectSummary) Date(java.util.Date) FileInputStream(java.io.FileInputStream) GZIPOutputStream(java.util.zip.GZIPOutputStream) CloudObjectLocation(org.apache.druid.data.input.impl.CloudObjectLocation) FileOutputStream(java.io.FileOutputStream) S3Object(com.amazonaws.services.s3.model.S3Object) File(java.io.File) Test(org.junit.Test)

Example 30 with CloudObjectLocation

use of org.apache.druid.data.input.impl.CloudObjectLocation in project druid by druid-io.

the class S3DataSegmentPullerTest method testGZUncompressRetries.

@Test
public void testGZUncompressRetries() throws IOException, SegmentLoadingException {
    final String bucket = "bucket";
    final String keyPrefix = "prefix/dir/0";
    final ServerSideEncryptingAmazonS3 s3Client = EasyMock.createStrictMock(ServerSideEncryptingAmazonS3.class);
    final byte[] value = bucket.getBytes(StandardCharsets.UTF_8);
    final File tmpFile = temporaryFolder.newFile("gzTest.gz");
    try (OutputStream outputStream = new GZIPOutputStream(new FileOutputStream(tmpFile))) {
        outputStream.write(value);
    }
    S3Object object0 = new S3Object();
    object0.setBucketName(bucket);
    object0.setKey(keyPrefix + "/renames-0.gz");
    object0.getObjectMetadata().setLastModified(new Date(0));
    object0.setObjectContent(new FileInputStream(tmpFile));
    final S3ObjectSummary objectSummary = new S3ObjectSummary();
    objectSummary.setBucketName(bucket);
    objectSummary.setKey(keyPrefix + "/renames-0.gz");
    objectSummary.setLastModified(new Date(0));
    final ListObjectsV2Result listObjectsResult = new ListObjectsV2Result();
    listObjectsResult.setKeyCount(1);
    listObjectsResult.getObjectSummaries().add(objectSummary);
    File tmpDir = temporaryFolder.newFolder("gzTestDir");
    AmazonS3Exception exception = new AmazonS3Exception("S3DataSegmentPullerTest");
    exception.setErrorCode("NoSuchKey");
    exception.setStatusCode(404);
    EasyMock.expect(s3Client.doesObjectExist(EasyMock.eq(object0.getBucketName()), EasyMock.eq(object0.getKey()))).andReturn(true).once();
    EasyMock.expect(s3Client.getObject(EasyMock.eq(bucket), EasyMock.eq(object0.getKey()))).andThrow(exception).once();
    EasyMock.expect(s3Client.getObject(EasyMock.eq(bucket), EasyMock.eq(object0.getKey()))).andReturn(object0).once();
    S3DataSegmentPuller puller = new S3DataSegmentPuller(s3Client);
    EasyMock.replay(s3Client);
    FileUtils.FileCopyResult result = puller.getSegmentFiles(new CloudObjectLocation(bucket, object0.getKey()), tmpDir);
    EasyMock.verify(s3Client);
    Assert.assertEquals(value.length, result.size());
    File expected = new File(tmpDir, "renames-0");
    Assert.assertTrue(expected.exists());
    Assert.assertEquals(value.length, expected.length());
}
Also used : ListObjectsV2Result(com.amazonaws.services.s3.model.ListObjectsV2Result) FileUtils(org.apache.druid.java.util.common.FileUtils) OutputStream(java.io.OutputStream) FileOutputStream(java.io.FileOutputStream) GZIPOutputStream(java.util.zip.GZIPOutputStream) S3ObjectSummary(com.amazonaws.services.s3.model.S3ObjectSummary) AmazonS3Exception(com.amazonaws.services.s3.model.AmazonS3Exception) Date(java.util.Date) FileInputStream(java.io.FileInputStream) GZIPOutputStream(java.util.zip.GZIPOutputStream) CloudObjectLocation(org.apache.druid.data.input.impl.CloudObjectLocation) FileOutputStream(java.io.FileOutputStream) S3Object(com.amazonaws.services.s3.model.S3Object) File(java.io.File) Test(org.junit.Test)

Aggregations

CloudObjectLocation (org.apache.druid.data.input.impl.CloudObjectLocation)34 Test (org.junit.Test)21 InitializedNullHandlingTest (org.apache.druid.testing.InitializedNullHandlingTest)15 InputSplit (org.apache.druid.data.input.InputSplit)13 JsonInputFormat (org.apache.druid.data.input.impl.JsonInputFormat)11 OSSObjectSummary (com.aliyun.oss.model.OSSObjectSummary)6 S3ObjectSummary (com.amazonaws.services.s3.model.S3ObjectSummary)6 MaxSizeSplitHintSpec (org.apache.druid.data.input.MaxSizeSplitHintSpec)6 URI (java.net.URI)5 File (java.io.File)4 FileInputStream (java.io.FileInputStream)4 FileOutputStream (java.io.FileOutputStream)4 OutputStream (java.io.OutputStream)4 Date (java.util.Date)4 GZIPOutputStream (java.util.zip.GZIPOutputStream)4 FileUtils (org.apache.druid.java.util.common.FileUtils)4 IOE (org.apache.druid.java.util.common.IOE)4 OSSException (com.aliyun.oss.OSSException)3 OSSObject (com.aliyun.oss.model.OSSObject)3 S3Object (com.amazonaws.services.s3.model.S3Object)3