Search in sources :

Example 1 with AbstractPatternRecognizer

use of org.talend.dataquality.statistics.frequency.recognition.AbstractPatternRecognizer in project data-prep by Talend.

the class AnalyzerService method buildPatternAnalyzer.

private static AbstractFrequencyAnalyzer buildPatternAnalyzer(List<ColumnMetadata> columns) {
    // deal with specific date, even custom date pattern
    final DateTimePatternRecognizer dateTimePatternFrequencyAnalyzer = new DateTimePatternRecognizer();
    List<String> patterns = new ArrayList<>(columns.size());
    for (ColumnMetadata column : columns) {
        final String pattern = RowMetadataUtils.getMostUsedDatePattern(column);
        if (StringUtils.isNotBlank(pattern)) {
            patterns.add(pattern);
        }
    }
    dateTimePatternFrequencyAnalyzer.addCustomDateTimePatterns(patterns);
    // warning, the order is important
    List<AbstractPatternRecognizer> patternFrequencyAnalyzers = new ArrayList<>();
    patternFrequencyAnalyzers.add(new EmptyPatternRecognizer());
    patternFrequencyAnalyzers.add(dateTimePatternFrequencyAnalyzer);
    patternFrequencyAnalyzers.add(new LatinExtendedCharPatternRecognizer());
    return new CompositePatternFrequencyAnalyzer(patternFrequencyAnalyzers, TypeUtils.convert(columns));
}
Also used : CompositePatternFrequencyAnalyzer(org.talend.dataquality.statistics.frequency.pattern.CompositePatternFrequencyAnalyzer) AbstractPatternRecognizer(org.talend.dataquality.statistics.frequency.recognition.AbstractPatternRecognizer) DateTimePatternRecognizer(org.talend.dataquality.statistics.frequency.recognition.DateTimePatternRecognizer) ColumnMetadata(org.talend.dataprep.api.dataset.ColumnMetadata) LatinExtendedCharPatternRecognizer(org.talend.dataquality.statistics.frequency.recognition.LatinExtendedCharPatternRecognizer) EmptyPatternRecognizer(org.talend.dataquality.statistics.frequency.recognition.EmptyPatternRecognizer)

Aggregations

ColumnMetadata (org.talend.dataprep.api.dataset.ColumnMetadata)1 CompositePatternFrequencyAnalyzer (org.talend.dataquality.statistics.frequency.pattern.CompositePatternFrequencyAnalyzer)1 AbstractPatternRecognizer (org.talend.dataquality.statistics.frequency.recognition.AbstractPatternRecognizer)1 DateTimePatternRecognizer (org.talend.dataquality.statistics.frequency.recognition.DateTimePatternRecognizer)1 EmptyPatternRecognizer (org.talend.dataquality.statistics.frequency.recognition.EmptyPatternRecognizer)1 LatinExtendedCharPatternRecognizer (org.talend.dataquality.statistics.frequency.recognition.LatinExtendedCharPatternRecognizer)1