Search in sources :

Example 1 with AnalyzerService

use of org.talend.dataprep.quality.AnalyzerService in project data-prep by Talend.

the class Analyzers method analyzerService.

@Bean
public AnalyzerService analyzerService() {
    LOGGER.info("Data Quality strategy is {} and located in {}", luceneIndexStrategy, dataqualityIndexesLocation);
    LOGGER.info("DataQuality indexes location : '{}'", this.dataqualityIndexesLocation);
    CategoryRegistryManager.setLocalRegistryPath(this.dataqualityIndexesLocation);
    // Configure DQ index creation strategy (one copy per use or one copy shared by all calls).
    LOGGER.info("Analyzer service lucene index strategy set to '{}'", luceneIndexStrategy);
    if ("basic".equalsIgnoreCase(luceneIndexStrategy)) {
        ClassPathDirectory.setProvider(new ClassPathDirectory.BasicProvider());
    } else if ("singleton".equalsIgnoreCase(luceneIndexStrategy)) {
        ClassPathDirectory.setProvider(new ClassPathDirectory.SingletonProvider());
    } else {
        // Default
        LOGGER.warn("Not a supported strategy for lucene indexes: '{}'", luceneIndexStrategy);
        ClassPathDirectory.setProvider(new ClassPathDirectory.SingletonProvider());
    }
    LOGGER.info("DataQuality indexes location : '{}'", this.dataqualityIndexesLocation);
    return new AnalyzerService(new StandardDictionarySnapshotProvider());
}
Also used : ClassPathDirectory(org.talend.dataquality.semantic.index.ClassPathDirectory) AnalyzerService(org.talend.dataprep.quality.AnalyzerService) StandardDictionarySnapshotProvider(org.talend.dataquality.semantic.snapshot.StandardDictionarySnapshotProvider) DisposableBean(org.springframework.beans.factory.DisposableBean) Bean(org.springframework.context.annotation.Bean)

Example 2 with AnalyzerService

use of org.talend.dataprep.quality.AnalyzerService in project data-prep by Talend.

the class ReplaceCellValueTest method should_tag_invalid_value.

@Test
public void should_tag_invalid_value() {
    // given
    final DataSetRow row = getRow("True");
    row.setTdpId(1L);
    final ColumnMetadata columnMetadata = row.getRowMetadata().getColumns().get(0);
    // Column is a boolean
    columnMetadata.setType(Type.BOOLEAN.getName());
    columnMetadata.setTypeForced(true);
    final Map<String, String> parameters = getParameters(1L, "True", "NotABoolean");
    // when
    final AnalyzerService analyzerService = new AnalyzerService();
    ActionTestWorkbench.test(Collections.singleton(row), analyzerService, actionRegistry, factory.create(action, parameters));
    // then
    assertThat(row.get("0000"), is("NotABoolean"));
    assertThat(row.getInternalValues().get(FlagNames.TDP_INVALID), is(",0000"));
}
Also used : ColumnMetadata(org.talend.dataprep.api.dataset.ColumnMetadata) AnalyzerService(org.talend.dataprep.quality.AnalyzerService) DataSetRow(org.talend.dataprep.api.dataset.row.DataSetRow) AbstractMetadataBaseTest(org.talend.dataprep.transformation.actions.AbstractMetadataBaseTest) Test(org.junit.Test)

Example 3 with AnalyzerService

use of org.talend.dataprep.quality.AnalyzerService in project data-prep by Talend.

the class PipelineTransformer method buildExecutable.

@Override
public ExecutableTransformer buildExecutable(DataSet input, Configuration configuration) {
    final RowMetadata rowMetadata = input.getMetadata().getRowMetadata();
    // prepare the fallback row metadata
    RowMetadata fallBackRowMetadata = transformationRowMetadataUtils.getMatchingEmptyRowMetadata(rowMetadata);
    final TransformerWriter writer = writerRegistrationService.getWriter(configuration.formatId(), configuration.output(), configuration.getArguments());
    final ConfiguredCacheWriter metadataWriter = new ConfiguredCacheWriter(contentCache, DEFAULT);
    final TransformationMetadataCacheKey metadataKey = cacheKeyGenerator.generateMetadataKey(configuration.getPreparationId(), configuration.stepId(), configuration.getSourceType());
    final PreparationMessage preparation = configuration.getPreparation();
    // function that from a step gives the rowMetadata associated to the previous/parent step
    final Function<Step, RowMetadata> previousStepRowMetadataSupplier = s -> // 
    Optional.ofNullable(s.getParent()).map(// 
    id -> preparationUpdater.get(id)).orElse(null);
    final Pipeline pipeline = // 
    Pipeline.Builder.builder().withAnalyzerService(// 
    analyzerService).withActionRegistry(// 
    actionRegistry).withPreparation(// 
    preparation).withActions(// 
    actionParser.parse(configuration.getActions())).withInitialMetadata(rowMetadata, // 
    configuration.volume() == SMALL).withMonitor(// 
    configuration.getMonitor()).withFilter(// 
    configuration.getFilter()).withLimit(// 
    configuration.getLimit()).withFilterOut(// 
    configuration.getOutFilter()).withOutput(// 
    () -> new WriterNode(writer, metadataWriter, metadataKey, fallBackRowMetadata)).withStatisticsAdapter(// 
    adapter).withStepMetadataSupplier(// 
    previousStepRowMetadataSupplier).withGlobalStatistics(// 
    configuration.isGlobalStatistics()).allowMetadataChange(// 
    configuration.isAllowMetadataChange()).build();
    // wrap this transformer into an executable transformer
    return new ExecutableTransformer() {

        @Override
        public void execute() {
            try {
                LOGGER.debug("Before transformation: {}", pipeline);
                pipeline.execute(input);
            } finally {
                LOGGER.debug("After transformation: {}", pipeline);
            }
            if (preparation != null) {
                final UpdatedStepVisitor visitor = new UpdatedStepVisitor(preparationUpdater);
                pipeline.accept(visitor);
            }
        }

        @Override
        public void signal(Signal signal) {
            pipeline.signal(signal);
        }
    };
}
Also used : WriterNode(org.talend.dataprep.transformation.pipeline.model.WriterNode) WriterRegistrationService(org.talend.dataprep.transformation.format.WriterRegistrationService) SMALL(org.talend.dataprep.transformation.api.transformer.configuration.Configuration.Volume.SMALL) StepMetadataRepository(org.talend.dataprep.transformation.service.StepMetadataRepository) TransformerWriter(org.talend.dataprep.transformation.api.transformer.TransformerWriter) LoggerFactory(org.slf4j.LoggerFactory) Autowired(org.springframework.beans.factory.annotation.Autowired) Configuration(org.talend.dataprep.transformation.api.transformer.configuration.Configuration) Signal(org.talend.dataprep.transformation.pipeline.Signal) DEFAULT(org.talend.dataprep.cache.ContentCache.TimeToLive.DEFAULT) PreparationMessage(org.talend.dataprep.api.preparation.PreparationMessage) Function(java.util.function.Function) AnalyzerService(org.talend.dataprep.quality.AnalyzerService) ActionParser(org.talend.dataprep.transformation.api.action.ActionParser) CacheKeyGenerator(org.talend.dataprep.cache.CacheKeyGenerator) TransformationMetadataCacheKey(org.talend.dataprep.cache.TransformationMetadataCacheKey) DataSet(org.talend.dataprep.api.dataset.DataSet) Logger(org.slf4j.Logger) ActionRegistry(org.talend.dataprep.transformation.pipeline.ActionRegistry) TransformationRowMetadataUtils(org.talend.dataprep.transformation.service.TransformationRowMetadataUtils) Step(org.talend.dataprep.api.preparation.Step) ConfiguredCacheWriter(org.talend.dataprep.transformation.api.transformer.ConfiguredCacheWriter) ContentCache(org.talend.dataprep.cache.ContentCache) ExecutableTransformer(org.talend.dataprep.transformation.api.transformer.ExecutableTransformer) Component(org.springframework.stereotype.Component) StatisticsAdapter(org.talend.dataprep.dataset.StatisticsAdapter) Optional(java.util.Optional) Pipeline(org.talend.dataprep.transformation.pipeline.Pipeline) Transformer(org.talend.dataprep.transformation.api.transformer.Transformer) RowMetadata(org.talend.dataprep.api.dataset.RowMetadata) TransformationMetadataCacheKey(org.talend.dataprep.cache.TransformationMetadataCacheKey) Step(org.talend.dataprep.api.preparation.Step) Pipeline(org.talend.dataprep.transformation.pipeline.Pipeline) Signal(org.talend.dataprep.transformation.pipeline.Signal) WriterNode(org.talend.dataprep.transformation.pipeline.model.WriterNode) ExecutableTransformer(org.talend.dataprep.transformation.api.transformer.ExecutableTransformer) RowMetadata(org.talend.dataprep.api.dataset.RowMetadata) PreparationMessage(org.talend.dataprep.api.preparation.PreparationMessage) TransformerWriter(org.talend.dataprep.transformation.api.transformer.TransformerWriter) ConfiguredCacheWriter(org.talend.dataprep.transformation.api.transformer.ConfiguredCacheWriter)

Aggregations

AnalyzerService (org.talend.dataprep.quality.AnalyzerService)3 Optional (java.util.Optional)1 Function (java.util.function.Function)1 Test (org.junit.Test)1 Logger (org.slf4j.Logger)1 LoggerFactory (org.slf4j.LoggerFactory)1 DisposableBean (org.springframework.beans.factory.DisposableBean)1 Autowired (org.springframework.beans.factory.annotation.Autowired)1 Bean (org.springframework.context.annotation.Bean)1 Component (org.springframework.stereotype.Component)1 ColumnMetadata (org.talend.dataprep.api.dataset.ColumnMetadata)1 DataSet (org.talend.dataprep.api.dataset.DataSet)1 RowMetadata (org.talend.dataprep.api.dataset.RowMetadata)1 DataSetRow (org.talend.dataprep.api.dataset.row.DataSetRow)1 PreparationMessage (org.talend.dataprep.api.preparation.PreparationMessage)1 Step (org.talend.dataprep.api.preparation.Step)1 CacheKeyGenerator (org.talend.dataprep.cache.CacheKeyGenerator)1 ContentCache (org.talend.dataprep.cache.ContentCache)1 DEFAULT (org.talend.dataprep.cache.ContentCache.TimeToLive.DEFAULT)1 TransformationMetadataCacheKey (org.talend.dataprep.cache.TransformationMetadataCacheKey)1