Search in sources :

Example 46 with DataSetMetadata

use of org.talend.dataprep.api.dataset.DataSetMetadata in project data-prep by Talend.

the class QualityAnalysisTest method testAnalysis.

@Test
public void testAnalysis() {
    String id = UUID.randomUUID().toString();
    final DataSetMetadata metadata = metadataBuilder.metadata().id(id).build();
    dataSetMetadataRepository.save(metadata);
    contentStore.storeAsRaw(metadata, DataSetServiceTest.class.getResourceAsStream("../avengers.csv"));
    formatAnalysis.analyze(id);
    contentAnalysis.analyze(id);
    schemaAnalysis.analyze(id);
    // Analyze quality
    qualityAnalysis.analyze(id);
    final DataSetMetadata actual = dataSetMetadataRepository.get(id);
    assertThat(actual.getLifecycle().qualityAnalyzed(), is(true));
    assertThat(actual.getContent().getNbRecords(), is(5L));
    for (ColumnMetadata column : actual.getRowMetadata().getColumns()) {
        final Quality quality = column.getQuality();
        assertThat(quality.getValid(), is(5));
        assertThat(quality.getInvalid(), is(0));
        assertThat(quality.getEmpty(), is(0));
    }
}
Also used : ColumnMetadata(org.talend.dataprep.api.dataset.ColumnMetadata) DataSetServiceTest(org.talend.dataprep.dataset.service.DataSetServiceTest) Quality(org.talend.dataprep.api.dataset.Quality) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) Test(org.junit.Test) DataSetBaseTest(org.talend.dataprep.dataset.DataSetBaseTest) DataSetServiceTest(org.talend.dataprep.dataset.service.DataSetServiceTest)

Example 47 with DataSetMetadata

use of org.talend.dataprep.api.dataset.DataSetMetadata in project data-prep by Talend.

the class SchemaAnalysisTest method testTDP_1674.

/**
 * See <a href="https://jira.talendforge.org/browse/TDP-1674">TDP-1674_error_with_ipv6_addresses</a>.
 */
@Test
public void testTDP_1674() {
    final DataSetMetadata actual = initializeDataSetMetadata(DataSetServiceTest.class.getResourceAsStream("../ipv6.csv"));
    assertThat(actual.getLifecycle().schemaAnalyzed(), is(true));
    String[] expectedNames = { "number", "description", "address" };
    Type[] expectedTypes = { Type.INTEGER, Type.STRING, Type.STRING };
    int i = 0;
    for (ColumnMetadata column : actual.getRowMetadata().getColumns()) {
        assertThat(column.getName(), is(expectedNames[i]));
        assertThat(column.getType(), is(expectedTypes[i].getName()));
        i++;
    }
    assertThat("IPv6 Address", is(actual.getRowMetadata().getColumns().get(2).getDomainLabel()));
}
Also used : Type(org.talend.dataprep.api.type.Type) ColumnMetadata(org.talend.dataprep.api.dataset.ColumnMetadata) DataSetServiceTest(org.talend.dataprep.dataset.service.DataSetServiceTest) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) Test(org.junit.Test) DataSetBaseTest(org.talend.dataprep.dataset.DataSetBaseTest) DataSetServiceTest(org.talend.dataprep.dataset.service.DataSetServiceTest)

Example 48 with DataSetMetadata

use of org.talend.dataprep.api.dataset.DataSetMetadata in project data-prep by Talend.

the class SchemaAnalysisTest method testTDP_224.

/**
 * See <a href="https://jira.talendforge.org/browse/TDP-224">https://jira.talendforge.org/browse/TDP-224</a>.
 *
 * @throws Exception
 */
@Test
public void testTDP_224() {
    final DataSetMetadata actual = initializeDataSetMetadata(DataSetServiceTest.class.getResourceAsStream("../whatever.xls"));
    assertThat(actual.getLifecycle().schemaAnalyzed(), is(true));
    // Not a typo: this is what QA provided as column name.
    String[] expectedNames = { "whaterver" };
    Type[] expectedTypes = { Type.STRING };
    int i = 0;
    int j = 0;
    for (ColumnMetadata column : actual.getRowMetadata().getColumns()) {
        assertThat(column.getName(), is(expectedNames[i++]));
        assertThat(column.getType(), is(expectedTypes[j++].getName()));
    }
}
Also used : Type(org.talend.dataprep.api.type.Type) ColumnMetadata(org.talend.dataprep.api.dataset.ColumnMetadata) DataSetServiceTest(org.talend.dataprep.dataset.service.DataSetServiceTest) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) Test(org.junit.Test) DataSetBaseTest(org.talend.dataprep.dataset.DataSetBaseTest) DataSetServiceTest(org.talend.dataprep.dataset.service.DataSetServiceTest)

Example 49 with DataSetMetadata

use of org.talend.dataprep.api.dataset.DataSetMetadata in project data-prep by Talend.

the class SchemaAnalysisTest method testTDP_226.

/**
 * See <a href="https://jira.talendforge.org/browse/TDP-226">https://jira.talendforge.org/browse/TDP-226</a>.
 *
 * @throws Exception
 */
@Test
public void testTDP_226() {
    final DataSetMetadata actual = initializeDataSetMetadata(DataSetServiceTest.class.getResourceAsStream("../empty_lines.csv"));
    assertThat(actual.getLifecycle().schemaAnalyzed(), is(true));
    String[] expectedNames = { "id", "firstname", "lastname", "age", "date-of-birth", "alive" };
    Type[] expectedTypes = { Type.INTEGER, Type.STRING, Type.STRING, Type.INTEGER, Type.DATE, Type.BOOLEAN };
    int i = 0;
    for (ColumnMetadata column : actual.getRowMetadata().getColumns()) {
        assertThat(column.getName(), is(expectedNames[i]));
        assertThat(column.getType(), is(expectedTypes[i].getName()));
        i++;
    }
}
Also used : Type(org.talend.dataprep.api.type.Type) ColumnMetadata(org.talend.dataprep.api.dataset.ColumnMetadata) DataSetServiceTest(org.talend.dataprep.dataset.service.DataSetServiceTest) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) Test(org.junit.Test) DataSetBaseTest(org.talend.dataprep.dataset.DataSetBaseTest) DataSetServiceTest(org.talend.dataprep.dataset.service.DataSetServiceTest)

Example 50 with DataSetMetadata

use of org.talend.dataprep.api.dataset.DataSetMetadata in project data-prep by Talend.

the class SchemaAnalysisTest method testTDP_279.

/**
 * See <a href="https://jira.talendforge.org/browse/TDP-279">https://jira.talendforge.org/browse/TDP-279</a>.
 *
 * @throws Exception
 */
@Test
public void testTDP_279() {
    final DataSetMetadata actual = initializeDataSetMetadata(DataSetServiceTest.class.getResourceAsStream("../post_code.xls"));
    assertThat(actual.getLifecycle().schemaAnalyzed(), is(true));
    String[] expectedNames = { "zip" };
    Type[] expectedTypes = { Type.INTEGER };
    String[] expectedDomains = { "FR_POSTAL_CODE" };
    int i = 0;
    for (ColumnMetadata column : actual.getRowMetadata().getColumns()) {
        assertThat(column.getName(), is(expectedNames[i]));
        assertThat(column.getType(), is(expectedTypes[i].getName()));
        assertThat(column.getDomain(), is(expectedDomains[i++]));
        assertThat(column.getSemanticDomains()).isNotNull().isNotEmpty().hasSize(4).contains(// 
        new SemanticDomain("FR_POSTAL_CODE", "FR Postal Code", (float) 58.33), // 
        new SemanticDomain("FR_CODE_COMMUNE_INSEE", "FR Insee Code", (float) 58.33), // 
        new SemanticDomain("DE_POSTAL_CODE", "DE Postal Code", (float) 58.33), new SemanticDomain("US_POSTAL_CODE", "US Postal Code", (float) 58.33));
    }
}
Also used : Type(org.talend.dataprep.api.type.Type) ColumnMetadata(org.talend.dataprep.api.dataset.ColumnMetadata) DataSetServiceTest(org.talend.dataprep.dataset.service.DataSetServiceTest) SemanticDomain(org.talend.dataprep.api.dataset.statistics.SemanticDomain) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) Test(org.junit.Test) DataSetBaseTest(org.talend.dataprep.dataset.DataSetBaseTest) DataSetServiceTest(org.talend.dataprep.dataset.service.DataSetServiceTest)

Aggregations

DataSetMetadata (org.talend.dataprep.api.dataset.DataSetMetadata)192 Test (org.junit.Test)126 DataSetBaseTest (org.talend.dataprep.dataset.DataSetBaseTest)63 ColumnMetadata (org.talend.dataprep.api.dataset.ColumnMetadata)48 InputStream (java.io.InputStream)45 Matchers.containsString (org.hamcrest.Matchers.containsString)28 Matchers.isEmptyString (org.hamcrest.Matchers.isEmptyString)28 TDPException (org.talend.dataprep.exception.TDPException)26 RowMetadata (org.talend.dataprep.api.dataset.RowMetadata)20 DataSetServiceTest (org.talend.dataprep.dataset.service.DataSetServiceTest)20 ApiOperation (io.swagger.annotations.ApiOperation)18 DataSet (org.talend.dataprep.api.dataset.DataSet)18 Type (org.talend.dataprep.api.type.Type)17 Timed (org.talend.dataprep.metrics.Timed)17 DistributedLock (org.talend.dataprep.lock.DistributedLock)16 Autowired (org.springframework.beans.factory.annotation.Autowired)14 DataSetRow (org.talend.dataprep.api.dataset.row.DataSetRow)14 IOException (java.io.IOException)13 RequestMapping (org.springframework.web.bind.annotation.RequestMapping)13 ArrayList (java.util.ArrayList)12