Search in sources :

Example 1 with DataSet

use of org.talend.dataprep.api.dataset.DataSet in project data-prep by Talend.

the class ActionTestWorkbench method test.

public static void test(Collection<DataSetRow> input, AnalyzerService analyzerService, ActionRegistry actionRegistry, RunnableAction... actions) {
    final List<RunnableAction> allActions = new ArrayList<>();
    Collections.addAll(allActions, actions);
    final DataSet dataSet = new DataSet();
    final RowMetadata rowMetadata = input.iterator().next().getRowMetadata();
    final DataSetMetadata dataSetMetadata = new DataSetMetadata();
    dataSetMetadata.setRowMetadata(rowMetadata);
    dataSet.setMetadata(dataSetMetadata);
    dataSet.setRecords(input.stream());
    final TestOutputNode outputNode = new TestOutputNode(input);
    Pipeline pipeline = // 
    Pipeline.Builder.builder().withActionRegistry(actionRegistry).withInitialMetadata(rowMetadata, // 
    true).withActions(// 
    allActions).withAnalyzerService(analyzerService).withStatisticsAdapter(// 
    new StatisticsAdapter(40)).withOutput(// 
    () -> outputNode).build();
    pipeline.execute(dataSet);
    // Some tests rely on the metadata changes in the provided metadata so set back modified columns in row metadata
    // (although this should be avoided in tests).
    // TODO Make this method return the modified metadata iso. setting modified columns.
    rowMetadata.setColumns(outputNode.getMetadata().getColumns());
    for (DataSetRow dataSetRow : input) {
        dataSetRow.setRowMetadata(rowMetadata);
    }
}
Also used : StatisticsAdapter(org.talend.dataprep.dataset.StatisticsAdapter) DataSet(org.talend.dataprep.api.dataset.DataSet) RunnableAction(org.talend.dataprep.transformation.actions.common.RunnableAction) RowMetadata(org.talend.dataprep.api.dataset.RowMetadata) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) DataSetRow(org.talend.dataprep.api.dataset.row.DataSetRow) Pipeline(org.talend.dataprep.transformation.pipeline.Pipeline)

Example 2 with DataSet

use of org.talend.dataprep.api.dataset.DataSet in project data-prep by Talend.

the class PipelineTest method testPipeline.

@Test
public void testPipeline() throws Exception {
    // given
    final Pipeline pipeline = new Pipeline(NodeBuilder.source().to(output).build());
    final RowMetadata rowMetadata = new RowMetadata();
    final DataSetRow row1 = new DataSetRow(rowMetadata);
    final DataSetRow row2 = new DataSetRow(rowMetadata);
    final List<DataSetRow> records = new ArrayList<>();
    records.add(row1);
    records.add(row2);
    final DataSet dataSet = new DataSet();
    final DataSetMetadata metadata = new DataSetMetadata();
    metadata.setRowMetadata(rowMetadata);
    dataSet.setMetadata(metadata);
    dataSet.setRecords(records.stream());
    // when
    pipeline.execute(dataSet);
    // then
    assertThat(output.getCount(), is(2));
    assertThat(output.getRow(), is(row2));
    assertThat(output.getMetadata(), is(rowMetadata));
    assertThat(output.getSignal(), is(END_OF_STREAM));
}
Also used : DataSet(org.talend.dataprep.api.dataset.DataSet) ArrayList(java.util.ArrayList) RowMetadata(org.talend.dataprep.api.dataset.RowMetadata) DataSetRow(org.talend.dataprep.api.dataset.row.DataSetRow) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) Test(org.junit.Test)

Example 3 with DataSet

use of org.talend.dataprep.api.dataset.DataSet in project data-prep by Talend.

the class Pipeline method execute.

public void execute(DataSet dataSet) {
    final RowMetadata rowMetadata = dataSet.getMetadata().getRowMetadata().clone();
    try (Stream<DataSetRow> records = dataSet.getRecords()) {
        // get the lock on isFinished to make the signal(STOP) method wait for the whole pipeline to finish
        synchronized (isFinished) {
            AtomicLong counter = new AtomicLong();
            // we use map/allMatch to stop the stream when isStopped = true
            // with only forEach((row) -> if(isStopped)) for ex we just stop the processed code
            // but we proceed all the rows of the stream
            // to replace when java introduce more useful functions to stream (ex: takeWhile)
            // 
            records.peek(row -> {
                // 
                node.exec().receive(row, rowMetadata);
                counter.addAndGet(1L);
            }).allMatch(row -> !isStopped.get());
            LOG.debug("{} rows sent in the pipeline", counter.get());
            node.exec().signal(Signal.END_OF_STREAM);
        }
    }
}
Also used : ImplicitParameters(org.talend.dataprep.transformation.actions.common.ImplicitParameters) CompileDataSetRowAction(org.talend.dataprep.transformation.actions.common.CompileDataSetRowAction) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) HashMap(java.util.HashMap) PreparationMessage(org.talend.dataprep.api.preparation.PreparationMessage) Function(java.util.function.Function) Supplier(java.util.function.Supplier) ScopeCategory(org.talend.dataprep.transformation.actions.category.ScopeCategory) ArrayList(java.util.ArrayList) AnalyzerService(org.talend.dataprep.quality.AnalyzerService) Map(java.util.Map) DataSetRow(org.talend.dataprep.api.dataset.row.DataSetRow) DataSet(org.talend.dataprep.api.dataset.DataSet) BasicNode(org.talend.dataprep.transformation.pipeline.node.BasicNode) LimitNode(org.talend.dataprep.transformation.pipeline.node.LimitNode) Logger(org.slf4j.Logger) Predicate(java.util.function.Predicate) FilteredNode(org.talend.dataprep.transformation.pipeline.node.FilteredNode) Step(org.talend.dataprep.api.preparation.Step) Collectors(java.util.stream.Collectors) Serializable(java.io.Serializable) AtomicLong(java.util.concurrent.atomic.AtomicLong) List(java.util.List) Stream(java.util.stream.Stream) ApplyDataSetRowAction(org.talend.dataprep.transformation.actions.common.ApplyDataSetRowAction) StatisticsAdapter(org.talend.dataprep.dataset.StatisticsAdapter) LoggerFactory.getLogger(org.slf4j.LoggerFactory.getLogger) ActionNodesBuilder(org.talend.dataprep.transformation.pipeline.builder.ActionNodesBuilder) ActionDefinition(org.talend.dataprep.api.action.ActionDefinition) NodeBuilder(org.talend.dataprep.transformation.pipeline.builder.NodeBuilder) RowMetadata(org.talend.dataprep.api.dataset.RowMetadata) RunnableAction(org.talend.dataprep.transformation.actions.common.RunnableAction) AtomicLong(java.util.concurrent.atomic.AtomicLong) RowMetadata(org.talend.dataprep.api.dataset.RowMetadata) DataSetRow(org.talend.dataprep.api.dataset.row.DataSetRow)

Example 4 with DataSet

use of org.talend.dataprep.api.dataset.DataSet in project data-prep by Talend.

the class DataSetJSONTest method shouldDealWithNoRecords.

@Test
public void shouldDealWithNoRecords() throws Exception {
    // given
    final InputStream input = this.getClass().getResourceAsStream("no_records.json");
    // when
    DataSet dataSet = from(input);
    // then
    final List<DataSetRow> records = dataSet.getRecords().collect(Collectors.toList());
    assertTrue(records.isEmpty());
}
Also used : DataSet(org.talend.dataprep.api.dataset.DataSet) DataSetRow(org.talend.dataprep.api.dataset.row.DataSetRow) ServiceBaseTest(org.talend.ServiceBaseTest) Test(org.junit.Test)

Example 5 with DataSet

use of org.talend.dataprep.api.dataset.DataSet in project data-prep by Talend.

the class DataSetJSONTest method testRoundTrip.

@Test
public void testRoundTrip() throws Exception {
    DataSet dataSet = from(DataSetJSONTest.class.getResourceAsStream("test3.json"));
    final DataSetMetadata metadata = dataSet.getMetadata();
    metadata.getContent().addParameter(CSVFormatFamily.SEPARATOR_PARAMETER, ",");
    metadata.getContent().setFormatFamilyId(new CSVFormatFamily().getBeanId());
    assertNotNull(metadata);
    StringWriter writer = new StringWriter();
    to(dataSet, writer);
    assertThat(writer.toString(), sameJSONAsFile(DataSetJSONTest.class.getResourceAsStream("test3.json")));
}
Also used : DataSet(org.talend.dataprep.api.dataset.DataSet) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) CSVFormatFamily(org.talend.dataprep.schema.csv.CSVFormatFamily) ServiceBaseTest(org.talend.ServiceBaseTest) Test(org.junit.Test)

Aggregations

DataSet (org.talend.dataprep.api.dataset.DataSet)39 DataSetMetadata (org.talend.dataprep.api.dataset.DataSetMetadata)18 Test (org.junit.Test)16 TDPException (org.talend.dataprep.exception.TDPException)15 JsonParser (com.fasterxml.jackson.core.JsonParser)13 InputStream (java.io.InputStream)13 RowMetadata (org.talend.dataprep.api.dataset.RowMetadata)11 DataSetRow (org.talend.dataprep.api.dataset.row.DataSetRow)10 OutputStream (java.io.OutputStream)8 Logger (org.slf4j.Logger)8 DataSetGet (org.talend.dataprep.command.dataset.DataSetGet)8 Configuration (org.talend.dataprep.transformation.api.transformer.configuration.Configuration)8 ApiOperation (io.swagger.annotations.ApiOperation)7 IOException (java.io.IOException)7 ArrayList (java.util.ArrayList)7 LoggerFactory (org.slf4j.LoggerFactory)7 Autowired (org.springframework.beans.factory.annotation.Autowired)7 ServiceBaseTest (org.talend.ServiceBaseTest)7 ColumnMetadata (org.talend.dataprep.api.dataset.ColumnMetadata)7 ContentCache (org.talend.dataprep.cache.ContentCache)7