Search in sources :

Example 26 with ContentCacheKey

use of org.talend.dataprep.cache.ContentCacheKey in project data-prep by Talend.

the class TransformationService method shouldApplyDiffToSampleSource.

private boolean shouldApplyDiffToSampleSource(final PreviewParameters previewParameters) {
    if (previewParameters.getSourceType() != HEAD && previewParameters.getPreparationId() != null) {
        final TransformationMetadataCacheKey metadataKey = // 
        cacheKeyGenerator.generateMetadataKey(// 
        previewParameters.getPreparationId(), // 
        Step.ROOT_STEP.id(), // 
        previewParameters.getSourceType());
        final ContentCacheKey contentKey = // 
        cacheKeyGenerator.generateContentKey(// 
        previewParameters.getDataSetId(), // 
        previewParameters.getPreparationId(), // 
        Step.ROOT_STEP.id(), // 
        JSON, // 
        previewParameters.getSourceType(), // no filter for preview parameters
        "");
        return contentCache.has(metadataKey) && contentCache.has(contentKey);
    }
    return false;
}
Also used : ContentCacheKey(org.talend.dataprep.cache.ContentCacheKey) TransformationMetadataCacheKey(org.talend.dataprep.cache.TransformationMetadataCacheKey)

Example 27 with ContentCacheKey

use of org.talend.dataprep.cache.ContentCacheKey in project data-prep by Talend.

the class TransformationService method executeDiffOnSample.

private void executeDiffOnSample(final PreviewParameters previewParameters, final OutputStream output) {
    final TransformationMetadataCacheKey metadataKey = // 
    cacheKeyGenerator.generateMetadataKey(// 
    previewParameters.getPreparationId(), // 
    Step.ROOT_STEP.id(), // 
    previewParameters.getSourceType());
    final ContentCacheKey contentKey = // 
    cacheKeyGenerator.generateContentKey(// 
    previewParameters.getDataSetId(), // 
    previewParameters.getPreparationId(), // 
    Step.ROOT_STEP.id(), // 
    JSON, // 
    previewParameters.getSourceType(), // no filters for preview
    "");
    try (// 
    final InputStream metadata = contentCache.get(metadataKey);
        // 
        final InputStream content = contentCache.get(contentKey);
        final JsonParser contentParser = mapper.getFactory().createParser(new InputStreamReader(content, UTF_8))) {
        // build metadata
        final RowMetadata rowMetadata = mapper.readerFor(RowMetadata.class).readValue(metadata);
        final DataSetMetadata dataSetMetadata = new DataSetMetadata();
        dataSetMetadata.setRowMetadata(rowMetadata);
        // build dataset
        final DataSet dataSet = mapper.readerFor(DataSet.class).readValue(contentParser);
        dataSet.setMetadata(dataSetMetadata);
        // trigger diff
        executePreview(// 
        previewParameters.getNewActions(), // 
        previewParameters.getBaseActions(), // 
        previewParameters.getTdpIds(), // 
        dataSet, // 
        output);
    } catch (final IOException e) {
        throw new TDPException(TransformationErrorCodes.UNABLE_TO_PERFORM_PREVIEW, e);
    }
}
Also used : TDPException(org.talend.dataprep.exception.TDPException) DataSet(org.talend.dataprep.api.dataset.DataSet) ContentCacheKey(org.talend.dataprep.cache.ContentCacheKey) TransformationMetadataCacheKey(org.talend.dataprep.cache.TransformationMetadataCacheKey) RowMetadata(org.talend.dataprep.api.dataset.RowMetadata) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) JsonParser(com.fasterxml.jackson.core.JsonParser)

Example 28 with ContentCacheKey

use of org.talend.dataprep.cache.ContentCacheKey in project data-prep by Talend.

the class TransformationService method getPreparationColumnSemanticCategories.

/**
 * Return the semantic types for a given preparation / column.
 *
 * @param preparationId the preparation id.
 * @param columnId the column id.
 * @param stepId the step id (optional, if not specified, it's 'head')
 * @return the semantic types for a given preparation / column.
 */
@RequestMapping(value = "/preparations/{preparationId}/columns/{columnId}/types", method = GET)
@ApiOperation(value = "list the types of the wanted column", notes = "This list can be used by user to change the column type.")
@Timed
@PublicAPI
public List<SemanticDomain> getPreparationColumnSemanticCategories(@ApiParam(value = "The preparation id") @PathVariable String preparationId, @ApiParam(value = "The column id") @PathVariable String columnId, @ApiParam(value = "The preparation version") @RequestParam(defaultValue = "head") String stepId) {
    LOG.debug("listing preparation semantic categories for preparation #{} column #{}@{}", preparationId, columnId, stepId);
    // get the preparation
    final Preparation preparation = getPreparation(preparationId);
    // get the step (in case of 'head', the real step id must be found)
    final String version = // 
    StringUtils.equals("head", stepId) ? preparation.getSteps().get(preparation.getSteps().size() - 1).getId() : stepId;
    /*
         * OK, this one is a bit tricky so pay attention.
         *
         * To be able to get the semantic types, the analyzer service needs to run on the result of the preparation.
         *
         * The result must be found in the cache, so if the preparation is not cached, the preparation is run so that
         * it gets cached.
         *
         * Then, the analyzer service just gets the data from the cache. That's it.
         */
    // generate the cache keys for both metadata & content
    final ContentCacheKey metadataKey = cacheKeyGenerator.metadataBuilder().preparationId(preparationId).stepId(version).sourceType(HEAD).build();
    final ContentCacheKey contentKey = cacheKeyGenerator.contentBuilder().datasetId(preparation.getDataSetId()).preparationId(preparationId).stepId(// 
    version).format(JSON).sourceType(// 
    HEAD).build();
    // if the preparation is not cached, let's compute it to have some cache
    if (!contentCache.has(metadataKey) || !contentCache.has(contentKey)) {
        addPreparationInCache(preparation, stepId);
    }
    // run the analyzer service on the cached content
    try (final InputStream metadataCache = contentCache.get(metadataKey);
        final InputStream contentCache = this.contentCache.get(contentKey)) {
        final DataSetMetadata metadata = mapper.readerFor(DataSetMetadata.class).readValue(metadataCache);
        final List<SemanticDomain> semanticDomains = getSemanticDomains(metadata, columnId, contentCache);
        LOG.debug("found {} for preparation #{}, column #{}", semanticDomains, preparationId, columnId);
        return semanticDomains;
    } catch (IOException e) {
        throw new TDPException(UNEXPECTED_EXCEPTION, e);
    }
}
Also used : TDPException(org.talend.dataprep.exception.TDPException) Preparation(org.talend.dataprep.api.preparation.Preparation) ContentCacheKey(org.talend.dataprep.cache.ContentCacheKey) SemanticDomain(org.talend.dataprep.api.dataset.statistics.SemanticDomain) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) Timed(org.talend.dataprep.metrics.Timed) ApiOperation(io.swagger.annotations.ApiOperation) PublicAPI(org.talend.dataprep.security.PublicAPI)

Aggregations

ContentCacheKey (org.talend.dataprep.cache.ContentCacheKey)28 Test (org.junit.Test)20 ServiceBaseTest (org.talend.ServiceBaseTest)14 TransformationMetadataCacheKey (org.talend.dataprep.cache.TransformationMetadataCacheKey)4 OutputStream (java.io.OutputStream)3 ArrayList (java.util.ArrayList)3 DataSetMetadata (org.talend.dataprep.api.dataset.DataSetMetadata)3 TDPException (org.talend.dataprep.exception.TDPException)3 ApiOperation (io.swagger.annotations.ApiOperation)2 InputStream (java.io.InputStream)2 Predicate (java.util.function.Predicate)2 Logger (org.slf4j.Logger)2 LoggerFactory (org.slf4j.LoggerFactory)2 Autowired (org.springframework.beans.factory.annotation.Autowired)2 Component (org.springframework.stereotype.Component)2 TransformationCacheKey (org.talend.dataprep.cache.TransformationCacheKey)2 Timed (org.talend.dataprep.metrics.Timed)2 JsonParser (com.fasterxml.jackson.core.JsonParser)1 IOException (java.io.IOException)1 Long.parseLong (java.lang.Long.parseLong)1