Search in sources :

Example 1 with ExecutableTransformer

use of org.talend.dataprep.transformation.api.transformer.ExecutableTransformer in project data-prep by Talend.

the class PipelineDiffTransformer method buildExecutable.

/**
 * Starts the transformation in preview mode.
 *
 * @param input the dataset content.
 * @param configuration The {@link Configuration configuration} for this transformation.
 */
@Override
public ExecutableTransformer buildExecutable(DataSet input, Configuration configuration) {
    Validate.notNull(input, "Input cannot be null.");
    final PreviewConfiguration previewConfiguration = (PreviewConfiguration) configuration;
    final RowMetadata rowMetadata = input.getMetadata().getRowMetadata();
    final TransformerWriter writer = writerRegistrationService.getWriter(configuration.formatId(), configuration.output(), configuration.getArguments());
    // Build diff pipeline
    final Node diffWriterNode = new DiffWriterNode(writer);
    final String referenceActions = previewConfiguration.getReferenceActions();
    final String previewActions = previewConfiguration.getPreviewActions();
    final Pipeline referencePipeline = buildPipeline(rowMetadata, referenceActions);
    final Pipeline previewPipeline = buildPipeline(rowMetadata, previewActions);
    // Filter source records (extract TDP ids information)
    final List<Long> indexes = previewConfiguration.getIndexes();
    final boolean isIndexLimited = indexes != null && !indexes.isEmpty();
    final Long minIndex = isIndexLimited ? indexes.stream().mapToLong(Long::longValue).min().getAsLong() : 0L;
    final Long maxIndex = isIndexLimited ? indexes.stream().mapToLong(Long::longValue).max().getAsLong() : Long.MAX_VALUE;
    final Predicate<DataSetRow> filter = isWithinWantedIndexes(minIndex, maxIndex);
    // Build diff pipeline
    final Node diffPipeline = // 
    NodeBuilder.filteredSource(filter).dispatchTo(referencePipeline, // 
    previewPipeline).zipTo(// 
    diffWriterNode).build();
    // wrap this transformer into an ExecutableTransformer
    return new ExecutableTransformer() {

        @Override
        public void execute() {
            // Run diff
            try {
                // Print pipeline before execution (for debug purposes).
                diffPipeline.logStatus(LOGGER, "Before execution: {}");
                input.getRecords().forEach(r -> diffPipeline.exec().receive(r, rowMetadata));
                diffPipeline.exec().signal(Signal.END_OF_STREAM);
            } finally {
                // Print pipeline after execution (for debug purposes).
                diffPipeline.logStatus(LOGGER, "After execution: {}");
            }
        }

        @Override
        public void signal(Signal signal) {
            diffPipeline.exec().signal(signal);
        }
    };
}
Also used : DiffWriterNode(org.talend.dataprep.transformation.pipeline.model.DiffWriterNode) BasicNode(org.talend.dataprep.transformation.pipeline.node.BasicNode) Node(org.talend.dataprep.transformation.pipeline.Node) DiffWriterNode(org.talend.dataprep.transformation.pipeline.model.DiffWriterNode) Pipeline(org.talend.dataprep.transformation.pipeline.Pipeline) Signal(org.talend.dataprep.transformation.pipeline.Signal) PreviewConfiguration(org.talend.dataprep.transformation.api.transformer.configuration.PreviewConfiguration) ExecutableTransformer(org.talend.dataprep.transformation.api.transformer.ExecutableTransformer) RowMetadata(org.talend.dataprep.api.dataset.RowMetadata) TransformerWriter(org.talend.dataprep.transformation.api.transformer.TransformerWriter) DataSetRow(org.talend.dataprep.api.dataset.row.DataSetRow)

Example 2 with ExecutableTransformer

use of org.talend.dataprep.transformation.api.transformer.ExecutableTransformer in project data-prep by Talend.

the class PreparationExportStrategyTest method setUp.

@Before
public void setUp() throws Exception {
    // Given
    mapper.registerModule(new Jdk8Module());
    strategy.setMapper(new ObjectMapper());
    when(formatRegistrationService.getByName(eq("JSON"))).thenReturn(new JsonFormat());
    final DataSetGetMetadata dataSetGetMetadata = mock(DataSetGetMetadata.class);
    when(applicationContext.getBean(eq(DataSetGetMetadata.class), anyVararg())).thenReturn(dataSetGetMetadata);
    DataSetGet dataSetGet = mock(DataSetGet.class);
    final StringWriter dataSetAsString = new StringWriter();
    DataSet dataSet = new DataSet();
    final DataSetMetadata dataSetMetadata = new DataSetMetadata("ds-1234", "", "", 0L, 0L, new RowMetadata(), "");
    final DataSetContent content = new DataSetContent();
    dataSetMetadata.setContent(content);
    dataSet.setMetadata(dataSetMetadata);
    dataSet.setRecords(Stream.empty());
    mapper.writerFor(DataSet.class).writeValue(dataSetAsString, dataSet);
    when(dataSetGet.execute()).thenReturn(new ByteArrayInputStream(dataSetAsString.toString().getBytes()));
    when(applicationContext.getBean(eq(DataSetGet.class), anyVararg())).thenReturn(dataSetGet);
    final PreparationGetActions preparationGetActions = mock(PreparationGetActions.class);
    when(preparationGetActions.execute()).thenReturn(new ByteArrayInputStream("{}".getBytes()));
    when(applicationContext.getBean(eq(PreparationGetActions.class), eq("prep-1234"), anyString())).thenReturn(preparationGetActions);
    final TransformationCacheKey cacheKey = mock(TransformationCacheKey.class);
    when(cacheKey.getKey()).thenReturn("cache-1234");
    when(cacheKeyGenerator.generateContentKey(anyString(), anyString(), anyString(), anyString(), any(), any(), anyString())).thenReturn(cacheKey);
    final ExecutableTransformer executableTransformer = mock(ExecutableTransformer.class);
    reset(transformer);
    when(transformer.buildExecutable(any(), any())).thenReturn(executableTransformer);
    when(factory.get(any())).thenReturn(transformer);
    when(contentCache.put(any(), any())).thenReturn(new NullOutputStream());
}
Also used : DataSetGet(org.talend.dataprep.command.dataset.DataSetGet) DataSet(org.talend.dataprep.api.dataset.DataSet) DataSetGetMetadata(org.talend.dataprep.command.dataset.DataSetGetMetadata) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) TransformationCacheKey(org.talend.dataprep.cache.TransformationCacheKey) Jdk8Module(com.fasterxml.jackson.datatype.jdk8.Jdk8Module) JsonFormat(org.talend.dataprep.transformation.format.JsonFormat) StringWriter(java.io.StringWriter) ByteArrayInputStream(java.io.ByteArrayInputStream) PreparationGetActions(org.talend.dataprep.command.preparation.PreparationGetActions) ExecutableTransformer(org.talend.dataprep.transformation.api.transformer.ExecutableTransformer) RowMetadata(org.talend.dataprep.api.dataset.RowMetadata) DataSetContent(org.talend.dataprep.api.dataset.DataSetContent) ObjectMapper(com.fasterxml.jackson.databind.ObjectMapper) NullOutputStream(org.apache.commons.io.output.NullOutputStream) Before(org.junit.Before)

Example 3 with ExecutableTransformer

use of org.talend.dataprep.transformation.api.transformer.ExecutableTransformer in project data-prep by Talend.

the class PipelineTransformer method buildExecutable.

@Override
public ExecutableTransformer buildExecutable(DataSet input, Configuration configuration) {
    final RowMetadata rowMetadata = input.getMetadata().getRowMetadata();
    // prepare the fallback row metadata
    RowMetadata fallBackRowMetadata = transformationRowMetadataUtils.getMatchingEmptyRowMetadata(rowMetadata);
    final TransformerWriter writer = writerRegistrationService.getWriter(configuration.formatId(), configuration.output(), configuration.getArguments());
    final ConfiguredCacheWriter metadataWriter = new ConfiguredCacheWriter(contentCache, DEFAULT);
    final TransformationMetadataCacheKey metadataKey = cacheKeyGenerator.generateMetadataKey(configuration.getPreparationId(), configuration.stepId(), configuration.getSourceType());
    final PreparationMessage preparation = configuration.getPreparation();
    // function that from a step gives the rowMetadata associated to the previous/parent step
    final Function<Step, RowMetadata> previousStepRowMetadataSupplier = s -> // 
    Optional.ofNullable(s.getParent()).map(// 
    id -> preparationUpdater.get(id)).orElse(null);
    final Pipeline pipeline = // 
    Pipeline.Builder.builder().withAnalyzerService(// 
    analyzerService).withActionRegistry(// 
    actionRegistry).withPreparation(// 
    preparation).withActions(// 
    actionParser.parse(configuration.getActions())).withInitialMetadata(rowMetadata, // 
    configuration.volume() == SMALL).withMonitor(// 
    configuration.getMonitor()).withFilter(// 
    configuration.getFilter()).withLimit(// 
    configuration.getLimit()).withFilterOut(// 
    configuration.getOutFilter()).withOutput(// 
    () -> new WriterNode(writer, metadataWriter, metadataKey, fallBackRowMetadata)).withStatisticsAdapter(// 
    adapter).withStepMetadataSupplier(// 
    previousStepRowMetadataSupplier).withGlobalStatistics(// 
    configuration.isGlobalStatistics()).allowMetadataChange(// 
    configuration.isAllowMetadataChange()).build();
    // wrap this transformer into an executable transformer
    return new ExecutableTransformer() {

        @Override
        public void execute() {
            try {
                LOGGER.debug("Before transformation: {}", pipeline);
                pipeline.execute(input);
            } finally {
                LOGGER.debug("After transformation: {}", pipeline);
            }
            if (preparation != null) {
                final UpdatedStepVisitor visitor = new UpdatedStepVisitor(preparationUpdater);
                pipeline.accept(visitor);
            }
        }

        @Override
        public void signal(Signal signal) {
            pipeline.signal(signal);
        }
    };
}
Also used : WriterNode(org.talend.dataprep.transformation.pipeline.model.WriterNode) WriterRegistrationService(org.talend.dataprep.transformation.format.WriterRegistrationService) SMALL(org.talend.dataprep.transformation.api.transformer.configuration.Configuration.Volume.SMALL) StepMetadataRepository(org.talend.dataprep.transformation.service.StepMetadataRepository) TransformerWriter(org.talend.dataprep.transformation.api.transformer.TransformerWriter) LoggerFactory(org.slf4j.LoggerFactory) Autowired(org.springframework.beans.factory.annotation.Autowired) Configuration(org.talend.dataprep.transformation.api.transformer.configuration.Configuration) Signal(org.talend.dataprep.transformation.pipeline.Signal) DEFAULT(org.talend.dataprep.cache.ContentCache.TimeToLive.DEFAULT) PreparationMessage(org.talend.dataprep.api.preparation.PreparationMessage) Function(java.util.function.Function) AnalyzerService(org.talend.dataprep.quality.AnalyzerService) ActionParser(org.talend.dataprep.transformation.api.action.ActionParser) CacheKeyGenerator(org.talend.dataprep.cache.CacheKeyGenerator) TransformationMetadataCacheKey(org.talend.dataprep.cache.TransformationMetadataCacheKey) DataSet(org.talend.dataprep.api.dataset.DataSet) Logger(org.slf4j.Logger) ActionRegistry(org.talend.dataprep.transformation.pipeline.ActionRegistry) TransformationRowMetadataUtils(org.talend.dataprep.transformation.service.TransformationRowMetadataUtils) Step(org.talend.dataprep.api.preparation.Step) ConfiguredCacheWriter(org.talend.dataprep.transformation.api.transformer.ConfiguredCacheWriter) ContentCache(org.talend.dataprep.cache.ContentCache) ExecutableTransformer(org.talend.dataprep.transformation.api.transformer.ExecutableTransformer) Component(org.springframework.stereotype.Component) StatisticsAdapter(org.talend.dataprep.dataset.StatisticsAdapter) Optional(java.util.Optional) Pipeline(org.talend.dataprep.transformation.pipeline.Pipeline) Transformer(org.talend.dataprep.transformation.api.transformer.Transformer) RowMetadata(org.talend.dataprep.api.dataset.RowMetadata) TransformationMetadataCacheKey(org.talend.dataprep.cache.TransformationMetadataCacheKey) Step(org.talend.dataprep.api.preparation.Step) Pipeline(org.talend.dataprep.transformation.pipeline.Pipeline) Signal(org.talend.dataprep.transformation.pipeline.Signal) WriterNode(org.talend.dataprep.transformation.pipeline.model.WriterNode) ExecutableTransformer(org.talend.dataprep.transformation.api.transformer.ExecutableTransformer) RowMetadata(org.talend.dataprep.api.dataset.RowMetadata) PreparationMessage(org.talend.dataprep.api.preparation.PreparationMessage) TransformerWriter(org.talend.dataprep.transformation.api.transformer.TransformerWriter) ConfiguredCacheWriter(org.talend.dataprep.transformation.api.transformer.ConfiguredCacheWriter)

Aggregations

RowMetadata (org.talend.dataprep.api.dataset.RowMetadata)3 ExecutableTransformer (org.talend.dataprep.transformation.api.transformer.ExecutableTransformer)3 DataSet (org.talend.dataprep.api.dataset.DataSet)2 TransformerWriter (org.talend.dataprep.transformation.api.transformer.TransformerWriter)2 Pipeline (org.talend.dataprep.transformation.pipeline.Pipeline)2 Signal (org.talend.dataprep.transformation.pipeline.Signal)2 ObjectMapper (com.fasterxml.jackson.databind.ObjectMapper)1 Jdk8Module (com.fasterxml.jackson.datatype.jdk8.Jdk8Module)1 ByteArrayInputStream (java.io.ByteArrayInputStream)1 StringWriter (java.io.StringWriter)1 Optional (java.util.Optional)1 Function (java.util.function.Function)1 NullOutputStream (org.apache.commons.io.output.NullOutputStream)1 Before (org.junit.Before)1 Logger (org.slf4j.Logger)1 LoggerFactory (org.slf4j.LoggerFactory)1 Autowired (org.springframework.beans.factory.annotation.Autowired)1 Component (org.springframework.stereotype.Component)1 DataSetContent (org.talend.dataprep.api.dataset.DataSetContent)1 DataSetMetadata (org.talend.dataprep.api.dataset.DataSetMetadata)1