Search in sources :

Example 1 with VolumeMetered

use of org.talend.dataprep.metrics.VolumeMetered in project data-prep by Talend.

the class TransformationService method execute.

@RequestMapping(value = "/apply", method = POST)
@ApiOperation(value = "Run the transformation given the provided export parameters", notes = "This operation transforms the dataset or preparation using parameters in export parameters.")
@VolumeMetered
@// 
AsyncOperation(// 
conditionalClass = GetPrepContentAsyncCondition.class, // 
resultUrlGenerator = PreparationGetContentUrlGenerator.class, // 
executionIdGeneratorClass = ExportParametersExecutionIdGenerator.class)
public StreamingResponseBody execute(@ApiParam(value = "Preparation id to apply.") @RequestBody @Valid @AsyncParameter @AsyncExecutionId final ExportParameters parameters) throws IOException {
    ExportParameters completeParameters = parameters;
    if (StringUtils.isNotEmpty(completeParameters.getPreparationId())) {
        // we deal with preparation transformation (not dataset)
        completeParameters = exportParametersUtil.populateFromPreparationExportParameter(parameters);
        ContentCacheKey cacheKey = cacheKeyGenerator.generateContentKey(completeParameters);
        if (!contentCache.has(cacheKey)) {
            preparationExportStrategy.performPreparation(completeParameters, new NullOutputStream());
        }
    }
    return executeSampleExportStrategy(completeParameters);
}
Also used : ExportParameters(org.talend.dataprep.api.export.ExportParameters) ContentCacheKey(org.talend.dataprep.cache.ContentCacheKey) NullOutputStream(org.apache.commons.io.output.NullOutputStream) VolumeMetered(org.talend.dataprep.metrics.VolumeMetered) ApiOperation(io.swagger.annotations.ApiOperation)

Example 2 with VolumeMetered

use of org.talend.dataprep.metrics.VolumeMetered in project data-prep by Talend.

the class TransformationService method applyOnDataset.

/**
 * Apply the preparation to the dataset out of the given IDs.
 *
 * @param preparationId the preparation id to apply on the dataset.
 * @param datasetId the dataset id to transform.
 * @param formatName The output {@link ExportFormat format}. This format also set the MIME response type.
 * @param stepId the preparation step id to use (default is 'head').
 * @param name the transformation name.
 * @param exportParams additional (optional) export parameters.
 */
// @formatter:off
@RequestMapping(value = "/apply/preparation/{preparationId}/dataset/{datasetId}/{format}", method = GET)
@ApiOperation(value = "Transform the given preparation to the given format on the given dataset id", notes = "This operation transforms the dataset using preparation id in the provided format.")
@VolumeMetered
public StreamingResponseBody applyOnDataset(@ApiParam(value = "Preparation id to apply.") @PathVariable(value = "preparationId") final String preparationId, @ApiParam(value = "DataSet id to transform.") @PathVariable(value = "datasetId") final String datasetId, @ApiParam(value = "Output format") @PathVariable("format") final String formatName, @ApiParam(value = "Step id", defaultValue = "head") @RequestParam(value = "stepId", required = false, defaultValue = "head") final String stepId, @ApiParam(value = "Name of the transformation", defaultValue = "untitled") @RequestParam(value = "name", required = false, defaultValue = "untitled") final String name, @RequestParam final Map<String, String> exportParams) {
    // @formatter:on
    final ExportParameters exportParameters = new ExportParameters();
    exportParameters.setPreparationId(preparationId);
    exportParameters.setDatasetId(datasetId);
    exportParameters.setExportType(formatName);
    exportParameters.setStepId(stepId);
    exportParameters.setExportName(name);
    exportParameters.getArguments().putAll(exportParams);
    return executeSampleExportStrategy(exportParameters);
}
Also used : ExportParameters(org.talend.dataprep.api.export.ExportParameters) VolumeMetered(org.talend.dataprep.metrics.VolumeMetered) ApiOperation(io.swagger.annotations.ApiOperation)

Example 3 with VolumeMetered

use of org.talend.dataprep.metrics.VolumeMetered in project data-prep by Talend.

the class TransformationService method aggregate.

/**
 * Compute the given aggregation.
 *
 * @param rawParams the aggregation rawParams as body rawParams.
 */
// @formatter:off
@RequestMapping(value = "/aggregate", method = POST, consumes = APPLICATION_JSON_VALUE)
@ApiOperation(value = "Compute the aggregation according to the request body rawParams", consumes = APPLICATION_JSON_VALUE)
@VolumeMetered
public AggregationResult aggregate(@ApiParam(value = "The aggregation rawParams in json") @RequestBody final String rawParams) {
    // @formatter:on
    // parse the aggregation parameters
    final AggregationParameters parameters;
    try {
        parameters = mapper.readerFor(AggregationParameters.class).readValue(rawParams);
        LOG.debug("Aggregation requested {}", parameters);
    } catch (IOException e) {
        throw new TDPException(CommonErrorCodes.BAD_AGGREGATION_PARAMETERS, e);
    }
    InputStream contentToAggregate;
    // get the content of the preparation (internal call with piped streams)
    if (StringUtils.isNotBlank(parameters.getPreparationId())) {
        try {
            PipedOutputStream temp = new PipedOutputStream();
            contentToAggregate = new PipedInputStream(temp);
            // because of piped streams, processing must be asynchronous
            Runnable r = () -> {
                try {
                    final ExportParameters exportParameters = new ExportParameters();
                    exportParameters.setPreparationId(parameters.getPreparationId());
                    exportParameters.setDatasetId(parameters.getDatasetId());
                    final String filter = parameters.getFilter();
                    if (filter != null) {
                        if (filter.isEmpty()) {
                            throw new TDPException(CommonErrorCodes.UNABLE_TO_AGGREGATE, new IllegalArgumentException("Source should not be empty"));
                        }
                        exportParameters.setFilter(mapper.readTree(filter));
                    }
                    exportParameters.setExportType(JSON);
                    exportParameters.setStepId(parameters.getStepId());
                    final StreamingResponseBody body = executeSampleExportStrategy(exportParameters);
                    body.writeTo(temp);
                } catch (IOException e) {
                    throw new TDPException(CommonErrorCodes.UNABLE_TO_AGGREGATE, e);
                }
            };
            executor.execute(r);
        } catch (IOException e) {
            throw new TDPException(CommonErrorCodes.UNABLE_TO_AGGREGATE, e);
        }
    } else {
        final DataSetGet dataSetGet = context.getBean(DataSetGet.class, parameters.getDatasetId(), false, true);
        contentToAggregate = dataSetGet.execute();
    }
    // apply the aggregation
    try (JsonParser parser = mapper.getFactory().createParser(new InputStreamReader(contentToAggregate, UTF_8))) {
        final DataSet dataSet = mapper.readerFor(DataSet.class).readValue(parser);
        return aggregationService.aggregate(parameters, dataSet);
    } catch (IOException e) {
        throw new TDPException(CommonErrorCodes.UNABLE_TO_PARSE_JSON, e);
    } finally {
        // don't forget to release the connection
        if (contentToAggregate != null) {
            try {
                contentToAggregate.close();
            } catch (IOException e) {
                LOG.warn("Could not close dataset input stream while aggregating", e);
            }
        }
    }
}
Also used : DataSetGet(org.talend.dataprep.command.dataset.DataSetGet) StreamingResponseBody(org.springframework.web.servlet.mvc.method.annotation.StreamingResponseBody) DataSet(org.talend.dataprep.api.dataset.DataSet) AggregationParameters(org.talend.dataprep.transformation.aggregation.api.AggregationParameters) TDPException(org.talend.dataprep.exception.TDPException) ExportParameters(org.talend.dataprep.api.export.ExportParameters) JsonParser(com.fasterxml.jackson.core.JsonParser) VolumeMetered(org.talend.dataprep.metrics.VolumeMetered) ApiOperation(io.swagger.annotations.ApiOperation)

Example 4 with VolumeMetered

use of org.talend.dataprep.metrics.VolumeMetered in project data-prep by Talend.

the class DataSetService method updateRawDataSet.

/**
 * Updates a data set content and metadata. If no data set exists for given id, data set is silently created.
 *
 * @param dataSetId The id of data set to be updated.
 * @param name The new name for the data set. Empty name (or <code>null</code>) does not update dataset name.
 * @param dataSetContent The new content for the data set. If empty, existing content will <b>not</b> be replaced.
 * For delete operation, look at {@link #delete(String)}.
 */
@RequestMapping(value = "/datasets/{id}/raw", method = PUT)
@ApiOperation(value = "Update a data set by id", notes = "Update a data set content based on provided id and PUT body. Id should be a UUID returned by the list operation. Not valid or non existing data set id returns empty content. For documentation purposes, body is typed as 'text/plain' but operation accepts binary content too.")
@Timed
@VolumeMetered
public String updateRawDataSet(// 
@PathVariable(value = "id") @ApiParam(name = "id", value = "Id of the data set to update") String dataSetId, // 
@RequestParam(value = "name", required = false) @ApiParam(name = "name", value = "New value for the data set name") String name, // 
@RequestParam(value = "size", required = false) @ApiParam(name = "size", value = "The size of the dataSet") Long size, @ApiParam(value = "content") InputStream dataSetContent) {
    LOG.debug("updating dataset content #{}", dataSetId);
    if (name != null) {
        checkDataSetName(name);
    }
    DataSetMetadata currentDataSetMetadata = dataSetMetadataRepository.get(dataSetId);
    if (currentDataSetMetadata == null) {
        return create(name, null, size, TEXT_PLAIN_VALUE, dataSetContent);
    } else {
        // just like the creation, let's make sure invalid size forbids dataset creation
        if (size != null && size < 0) {
            LOG.warn("invalid size provided {}", size);
            throw new TDPException(UNSUPPORTED_CONTENT);
        }
        final UpdateDataSetCacheKey cacheKey = new UpdateDataSetCacheKey(currentDataSetMetadata.getId());
        final DistributedLock lock = dataSetMetadataRepository.createDatasetMetadataLock(currentDataSetMetadata.getId());
        try {
            lock.lock();
            // check the size if it's available (quick win)
            if (size != null && size > 0) {
                quotaService.checkIfAddingSizeExceedsAvailableStorage(Math.abs(size - currentDataSetMetadata.getDataSetSize()));
            }
            final DataSetMetadataBuilder datasetBuilder = metadataBuilder.metadata().id(currentDataSetMetadata.getId());
            datasetBuilder.copyNonContentRelated(currentDataSetMetadata);
            datasetBuilder.modified(System.currentTimeMillis());
            if (!StringUtils.isEmpty(name)) {
                datasetBuilder.name(name);
            }
            final DataSetMetadata updatedDataSetMetadata = datasetBuilder.build();
            // Save data set content into cache to make sure there's enough space in the content store
            final long maxDataSetSizeAllowed = getMaxDataSetSizeAllowed();
            final StrictlyBoundedInputStream sizeCalculator = new StrictlyBoundedInputStream(dataSetContent, maxDataSetSizeAllowed);
            try (OutputStream cacheEntry = cacheManager.put(cacheKey, TimeToLive.DEFAULT)) {
                IOUtils.copy(sizeCalculator, cacheEntry);
            }
            // once fully copied to the cache, we know for sure that the content store has enough space, so let's copy
            // from the cache to the content store
            PipedInputStream toContentStore = new PipedInputStream();
            PipedOutputStream fromCache = new PipedOutputStream(toContentStore);
            Runnable r = () -> {
                try (final InputStream input = cacheManager.get(cacheKey)) {
                    IOUtils.copy(input, fromCache);
                    // it's important to close this stream, otherwise the piped stream will never close
                    fromCache.close();
                } catch (IOException e) {
                    throw new TDPException(UNABLE_TO_CREATE_OR_UPDATE_DATASET, e);
                }
            };
            executor.execute(r);
            contentStore.storeAsRaw(updatedDataSetMetadata, toContentStore);
            // update the dataset metadata with its new size
            updatedDataSetMetadata.setDataSetSize(sizeCalculator.getTotal());
            dataSetMetadataRepository.save(updatedDataSetMetadata);
            // publishing update event
            publisher.publishEvent(new DatasetUpdatedEvent(updatedDataSetMetadata));
        } catch (StrictlyBoundedInputStream.InputStreamTooLargeException e) {
            LOG.warn("Dataset update {} cannot be done, new content is too big", currentDataSetMetadata.getId());
            throw new TDPException(MAX_STORAGE_MAY_BE_EXCEEDED, e, build().put("limit", e.getMaxSize()));
        } catch (IOException e) {
            LOG.error("Error updating the dataset", e);
            throw new TDPException(UNABLE_TO_CREATE_OR_UPDATE_DATASET, e);
        } finally {
            dataSetContentToNull(dataSetContent);
            // whatever the outcome the cache needs to be cleaned
            if (cacheManager.has(cacheKey)) {
                cacheManager.evict(cacheKey);
            }
            lock.unlock();
        }
        // Content was changed, so queue events (format analysis, content indexing for search...)
        analyzeDataSet(currentDataSetMetadata.getId(), true, emptyList());
        return currentDataSetMetadata.getId();
    }
}
Also used : DataSetMetadataBuilder(org.talend.dataprep.dataset.DataSetMetadataBuilder) PipedInputStream(java.io.PipedInputStream) StrictlyBoundedInputStream(org.talend.dataprep.dataset.store.content.StrictlyBoundedInputStream) InputStream(java.io.InputStream) PipedOutputStream(java.io.PipedOutputStream) NullOutputStream(org.apache.commons.io.output.NullOutputStream) OutputStream(java.io.OutputStream) PipedOutputStream(java.io.PipedOutputStream) PipedInputStream(java.io.PipedInputStream) IOException(java.io.IOException) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) TDPException(org.talend.dataprep.exception.TDPException) DistributedLock(org.talend.dataprep.lock.DistributedLock) StrictlyBoundedInputStream(org.talend.dataprep.dataset.store.content.StrictlyBoundedInputStream) DatasetUpdatedEvent(org.talend.dataprep.dataset.event.DatasetUpdatedEvent) UpdateDataSetCacheKey(org.talend.dataprep.dataset.service.cache.UpdateDataSetCacheKey) VolumeMetered(org.talend.dataprep.metrics.VolumeMetered) Timed(org.talend.dataprep.metrics.Timed) ApiOperation(io.swagger.annotations.ApiOperation) RequestMapping(org.springframework.web.bind.annotation.RequestMapping)

Example 5 with VolumeMetered

use of org.talend.dataprep.metrics.VolumeMetered in project data-prep by Talend.

the class DataSetService method create.

/**
 * Creates a new data set and returns the new data set id as text in the response.
 *
 * @param name An optional name for the new data set (might be <code>null</code>).
 * @param size An optional size for the newly created data set.
 * @param contentType the request content type.
 * @param content The raw content of the data set (might be a CSV, XLS...) or the connection parameter in case of a
 * remote csv.
 * @return The new data id.
 * @see DataSetService#get(boolean, boolean, String, String)
 */
// @formatter:off
@RequestMapping(value = "/datasets", method = POST, produces = TEXT_PLAIN_VALUE)
@ApiOperation(value = "Create a data set", produces = TEXT_PLAIN_VALUE, notes = "Create a new data set based on content provided in POST body. For documentation purposes, body is typed as 'text/plain' but operation accepts binary content too. Returns the id of the newly created data set.")
@Timed
@VolumeMetered
public String create(@ApiParam(value = "User readable name of the data set (e.g. 'Finance Report 2015', 'Test Data Set').") @RequestParam(defaultValue = "") String name, @ApiParam(value = "An optional tag to be added in data set metadata once created.") @RequestParam(defaultValue = "") String tag, @ApiParam(value = "Size of the data set, in bytes.") @RequestParam(required = false) Long size, @RequestHeader(CONTENT_TYPE) String contentType, @ApiParam(value = "content") InputStream content) {
    // @formatter:on
    checkDataSetName(name);
    final String id = UUID.randomUUID().toString();
    final Marker marker = Markers.dataset(id);
    LOG.debug(marker, "Creating...");
    // sanity check
    if (size != null && size < 0) {
        LOG.warn("invalid size provided {}", size);
        throw new TDPException(UNEXPECTED_CONTENT, build().put("size", size));
    }
    // check that the name is not already taken
    checkIfNameIsAvailable(name);
    // get the location out of the content type and the request body
    final DataSetLocation location;
    try {
        location = datasetLocator.getDataSetLocation(contentType, content);
    } catch (IOException e) {
        throw new TDPException(DataSetErrorCodes.UNABLE_TO_READ_DATASET_LOCATION, e);
    }
    DataSetMetadata dataSetMetadata = null;
    final TDPException hypotheticalException;
    try {
        // if the size is provided, let's check if the quota will not be exceeded
        if (size != null && size > 0) {
            quotaService.checkIfAddingSizeExceedsAvailableStorage(size);
        }
        dataSetMetadata = // 
        metadataBuilder.metadata().id(// 
        id).name(// 
        name).author(// 
        security.getUserId()).location(// 
        location).created(// 
        System.currentTimeMillis()).tag(// 
        tag).build();
        // Indicate data set is being imported
        dataSetMetadata.getLifecycle().setImporting(true);
        // Save data set content
        LOG.debug(marker, "Storing content...");
        final long maxDataSetSizeAllowed = getMaxDataSetSizeAllowed();
        final StrictlyBoundedInputStream sizeCalculator = new StrictlyBoundedInputStream(content, maxDataSetSizeAllowed);
        contentStore.storeAsRaw(dataSetMetadata, sizeCalculator);
        dataSetMetadata.setDataSetSize(sizeCalculator.getTotal());
        LOG.debug(marker, "Content stored.");
        // Create the new data set
        dataSetMetadataRepository.save(dataSetMetadata);
        LOG.debug(marker, "dataset metadata stored {}", dataSetMetadata);
        // Queue events (format analysis, content indexing for search...)
        analyzeDataSet(id, true, emptyList());
        LOG.debug(marker, "Created!");
        return id;
    } catch (StrictlyBoundedInputStream.InputStreamTooLargeException e) {
        hypotheticalException = new TDPException(MAX_STORAGE_MAY_BE_EXCEEDED, e, build().put("limit", e.getMaxSize()));
    } catch (TDPException e) {
        hypotheticalException = e;
    } catch (Exception e) {
        hypotheticalException = new TDPException(UNABLE_CREATE_DATASET, e);
    } finally {
        // because the client might still be writing the request content, closing the connexion right now
        // might end up in a 'connection reset' or a 'broken pipe' error in API.
        // 
        // So, let's read fully the request content before closing the connection.
        dataSetContentToNull(content);
    }
    dataSetMetadataRepository.remove(id);
    if (dataSetMetadata != null) {
        try {
            contentStore.delete(dataSetMetadata);
        } catch (Exception e) {
            LOG.error("Unable to delete uploaded data.", e);
        }
    }
    throw hypotheticalException;
}
Also used : TDPException(org.talend.dataprep.exception.TDPException) DataSetLocation(org.talend.dataprep.api.dataset.DataSetLocation) StrictlyBoundedInputStream(org.talend.dataprep.dataset.store.content.StrictlyBoundedInputStream) Marker(org.slf4j.Marker) IOException(java.io.IOException) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) IOException(java.io.IOException) TDPException(org.talend.dataprep.exception.TDPException) VolumeMetered(org.talend.dataprep.metrics.VolumeMetered) Timed(org.talend.dataprep.metrics.Timed) ApiOperation(io.swagger.annotations.ApiOperation) RequestMapping(org.springframework.web.bind.annotation.RequestMapping)

Aggregations

ApiOperation (io.swagger.annotations.ApiOperation)6 VolumeMetered (org.talend.dataprep.metrics.VolumeMetered)6 ExportParameters (org.talend.dataprep.api.export.ExportParameters)4 TDPException (org.talend.dataprep.exception.TDPException)4 NullOutputStream (org.apache.commons.io.output.NullOutputStream)3 DataSetMetadata (org.talend.dataprep.api.dataset.DataSetMetadata)3 IOException (java.io.IOException)2 RequestMapping (org.springframework.web.bind.annotation.RequestMapping)2 StrictlyBoundedInputStream (org.talend.dataprep.dataset.store.content.StrictlyBoundedInputStream)2 Timed (org.talend.dataprep.metrics.Timed)2 JsonParser (com.fasterxml.jackson.core.JsonParser)1 InputStream (java.io.InputStream)1 OutputStream (java.io.OutputStream)1 PipedInputStream (java.io.PipedInputStream)1 PipedOutputStream (java.io.PipedOutputStream)1 Marker (org.slf4j.Marker)1 StreamingResponseBody (org.springframework.web.servlet.mvc.method.annotation.StreamingResponseBody)1 DataSet (org.talend.dataprep.api.dataset.DataSet)1 DataSetLocation (org.talend.dataprep.api.dataset.DataSetLocation)1 Preparation (org.talend.dataprep.api.preparation.Preparation)1