Search in sources :

Example 1 with Quality

use of org.talend.dataprep.api.dataset.Quality in project data-prep by Talend.

the class ReorderColumn method swapColumnMetadata.

protected void swapColumnMetadata(ColumnMetadata originColumn, ColumnMetadata targetColumn) throws Exception {
    ColumnMetadata targetColumnCopy = ColumnMetadata.Builder.column().copy(targetColumn).build();
    ColumnMetadata originColumnCopy = ColumnMetadata.Builder.column().copy(originColumn).build();
    BeanUtils.copyProperties(targetColumn, originColumn);
    BeanUtils.copyProperties(originColumn, targetColumnCopy);
    Statistics originalStatistics = originColumnCopy.getStatistics();
    Statistics targetStatistics = targetColumnCopy.getStatistics();
    BeanUtils.copyProperties(targetColumn.getStatistics(), originalStatistics);
    BeanUtils.copyProperties(originColumn.getStatistics(), targetStatistics);
    Quality originalQuality = originColumnCopy.getQuality();
    Quality targetQualityCopty = targetColumnCopy.getQuality();
    BeanUtils.copyProperties(targetColumn.getQuality(), originalQuality);
    BeanUtils.copyProperties(originColumn.getQuality(), targetQualityCopty);
}
Also used : ColumnMetadata(org.talend.dataprep.api.dataset.ColumnMetadata) Quality(org.talend.dataprep.api.dataset.Quality) Statistics(org.talend.dataprep.api.dataset.statistics.Statistics)

Example 2 with Quality

use of org.talend.dataprep.api.dataset.Quality in project data-prep by Talend.

the class StatisticsAdapter method injectValueQuality.

private void injectValueQuality(final ColumnMetadata column, final Analyzers.Result result) {
    if (result.exist(ValueQualityStatistics.class)) {
        final Statistics statistics = column.getStatistics();
        final Quality quality = column.getQuality();
        final ValueQualityStatistics valueQualityStatistics = result.get(ValueQualityStatistics.class);
        final long allCount = valueQualityStatistics.getCount();
        final long emptyCount = valueQualityStatistics.getEmptyCount();
        final long validCount = valueQualityStatistics.getValidCount();
        final long invalidCount = allCount - emptyCount - validCount;
        // Set in column quality...
        quality.setEmpty((int) emptyCount);
        quality.setValid((int) validCount);
        quality.setInvalid((int) invalidCount);
        // ... and statistics
        statistics.setCount(allCount);
        statistics.setEmpty((int) emptyCount);
        statistics.setInvalid((int) invalidCount);
        statistics.setValid(validCount);
    }
}
Also used : Quality(org.talend.dataprep.api.dataset.Quality) ValueQualityStatistics(org.talend.dataquality.common.inference.ValueQualityStatistics) CardinalityStatistics(org.talend.dataquality.statistics.cardinality.CardinalityStatistics) DataTypeFrequencyStatistics(org.talend.dataquality.statistics.frequency.DataTypeFrequencyStatistics) StreamNumberHistogramStatistics(org.talend.dataprep.api.dataset.statistics.number.StreamNumberHistogramStatistics) ValueQualityStatistics(org.talend.dataquality.common.inference.ValueQualityStatistics) SummaryStatistics(org.talend.dataquality.statistics.numeric.summary.SummaryStatistics) StreamDateHistogramStatistics(org.talend.dataprep.api.dataset.statistics.date.StreamDateHistogramStatistics) TextLengthStatistics(org.talend.dataquality.statistics.text.TextLengthStatistics) PatternFrequencyStatistics(org.talend.dataquality.statistics.frequency.pattern.PatternFrequencyStatistics) QuantileStatistics(org.talend.dataquality.statistics.numeric.quantile.QuantileStatistics)

Example 3 with Quality

use of org.talend.dataprep.api.dataset.Quality in project data-prep by Talend.

the class QualityAnalysisTest method testAnalysis.

@Test
public void testAnalysis() {
    String id = UUID.randomUUID().toString();
    final DataSetMetadata metadata = metadataBuilder.metadata().id(id).build();
    dataSetMetadataRepository.save(metadata);
    contentStore.storeAsRaw(metadata, DataSetServiceTest.class.getResourceAsStream("../avengers.csv"));
    formatAnalysis.analyze(id);
    contentAnalysis.analyze(id);
    schemaAnalysis.analyze(id);
    // Analyze quality
    qualityAnalysis.analyze(id);
    final DataSetMetadata actual = dataSetMetadataRepository.get(id);
    assertThat(actual.getLifecycle().qualityAnalyzed(), is(true));
    assertThat(actual.getContent().getNbRecords(), is(5L));
    for (ColumnMetadata column : actual.getRowMetadata().getColumns()) {
        final Quality quality = column.getQuality();
        assertThat(quality.getValid(), is(5));
        assertThat(quality.getInvalid(), is(0));
        assertThat(quality.getEmpty(), is(0));
    }
}
Also used : ColumnMetadata(org.talend.dataprep.api.dataset.ColumnMetadata) DataSetServiceTest(org.talend.dataprep.dataset.service.DataSetServiceTest) Quality(org.talend.dataprep.api.dataset.Quality) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) Test(org.junit.Test) DataSetBaseTest(org.talend.dataprep.dataset.DataSetBaseTest) DataSetServiceTest(org.talend.dataprep.dataset.service.DataSetServiceTest)

Example 4 with Quality

use of org.talend.dataprep.api.dataset.Quality in project data-prep by Talend.

the class QualityAnalysisTest method testAnalysisWithInvalidValues.

@Test
public void testAnalysisWithInvalidValues() {
    String dsId = UUID.randomUUID().toString();
    final DataSetMetadata metadata = metadataBuilder.metadata().id(dsId).build();
    dataSetMetadataRepository.save(metadata);
    contentStore.storeAsRaw(metadata, DataSetServiceTest.class.getResourceAsStream("../dataset_with_invalid_records.csv"));
    formatAnalysis.analyze(dsId);
    contentAnalysis.analyze(dsId);
    schemaAnalysis.analyze(dsId);
    // Analyze quality
    qualityAnalysis.analyze(dsId);
    final DataSetMetadata actual = dataSetMetadataRepository.get(dsId);
    assertThat(actual.getLifecycle().qualityAnalyzed(), is(true));
    assertThat(actual.getContent().getNbRecords(), is(9L));
    assertThat(actual.getRowMetadata().getColumns().size(), is(2));
    ColumnMetadata secondColumn = actual.getRowMetadata().getColumns().get(1);
    Quality quality = secondColumn.getQuality();
    assertThat(quality.getValid(), is(6));
    assertThat(quality.getInvalid(), is(2));
    assertThat(quality.getEmpty(), is(1));
}
Also used : ColumnMetadata(org.talend.dataprep.api.dataset.ColumnMetadata) DataSetServiceTest(org.talend.dataprep.dataset.service.DataSetServiceTest) Quality(org.talend.dataprep.api.dataset.Quality) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) Test(org.junit.Test) DataSetBaseTest(org.talend.dataprep.dataset.DataSetBaseTest) DataSetServiceTest(org.talend.dataprep.dataset.service.DataSetServiceTest)

Aggregations

Quality (org.talend.dataprep.api.dataset.Quality)4 ColumnMetadata (org.talend.dataprep.api.dataset.ColumnMetadata)3 Test (org.junit.Test)2 DataSetMetadata (org.talend.dataprep.api.dataset.DataSetMetadata)2 DataSetBaseTest (org.talend.dataprep.dataset.DataSetBaseTest)2 DataSetServiceTest (org.talend.dataprep.dataset.service.DataSetServiceTest)2 Statistics (org.talend.dataprep.api.dataset.statistics.Statistics)1 StreamDateHistogramStatistics (org.talend.dataprep.api.dataset.statistics.date.StreamDateHistogramStatistics)1 StreamNumberHistogramStatistics (org.talend.dataprep.api.dataset.statistics.number.StreamNumberHistogramStatistics)1 ValueQualityStatistics (org.talend.dataquality.common.inference.ValueQualityStatistics)1 CardinalityStatistics (org.talend.dataquality.statistics.cardinality.CardinalityStatistics)1 DataTypeFrequencyStatistics (org.talend.dataquality.statistics.frequency.DataTypeFrequencyStatistics)1 PatternFrequencyStatistics (org.talend.dataquality.statistics.frequency.pattern.PatternFrequencyStatistics)1 QuantileStatistics (org.talend.dataquality.statistics.numeric.quantile.QuantileStatistics)1 SummaryStatistics (org.talend.dataquality.statistics.numeric.summary.SummaryStatistics)1 TextLengthStatistics (org.talend.dataquality.statistics.text.TextLengthStatistics)1