Search in sources :

Example 1 with DataSetContent

use of org.talend.dataprep.api.dataset.DataSetContent in project data-prep by Talend.

the class DataSetContentStore method get.

/**
 * Returns the {@link DataSetMetadata data set} content as <b>JSON</b> format. Whether data set content was JSON or
 * not, method is expected to provide a JSON output. It's up to the implementation to:
 * <ul>
 * <li>Convert data content to JSON.</li>
 * <li>Throw an exception if data set is not ready for read (content type missing).</li>
 * </ul>
 * Implementations are also encouraged to implement method with no blocking code.
 *
 * @param dataSetMetadata The {@link DataSetMetadata data set} to read content from.
 * @param limit A limit to pass to content supplier (use -1 for "no limit). Used as parameter for both raw content supplier
 * and JSON serializer.
 * @return A valid <b>JSON</b> stream. It is a JSON array where each element in the array contains a single data set
 * row (it does not mean there's a line in input stream per data set row, a data set row might be split on multiple
 * rows in stream).
 */
protected InputStream get(DataSetMetadata dataSetMetadata, long limit) {
    DataSetContent content = dataSetMetadata.getContent();
    Serializer serializer = factory.getFormatFamily(content.getFormatFamilyId()).getSerializer();
    return serializer.serialize(getAsRaw(dataSetMetadata, limit), dataSetMetadata, limit);
}
Also used : DataSetContent(org.talend.dataprep.api.dataset.DataSetContent) Serializer(org.talend.dataprep.schema.Serializer)

Example 2 with DataSetContent

use of org.talend.dataprep.api.dataset.DataSetContent in project data-prep by Talend.

the class PreparationExportStrategyTest method setUp.

@Before
public void setUp() throws Exception {
    // Given
    mapper.registerModule(new Jdk8Module());
    strategy.setMapper(new ObjectMapper());
    when(formatRegistrationService.getByName(eq("JSON"))).thenReturn(new JsonFormat());
    final DataSetGetMetadata dataSetGetMetadata = mock(DataSetGetMetadata.class);
    when(applicationContext.getBean(eq(DataSetGetMetadata.class), anyVararg())).thenReturn(dataSetGetMetadata);
    DataSetGet dataSetGet = mock(DataSetGet.class);
    final StringWriter dataSetAsString = new StringWriter();
    DataSet dataSet = new DataSet();
    final DataSetMetadata dataSetMetadata = new DataSetMetadata("ds-1234", "", "", 0L, 0L, new RowMetadata(), "");
    final DataSetContent content = new DataSetContent();
    dataSetMetadata.setContent(content);
    dataSet.setMetadata(dataSetMetadata);
    dataSet.setRecords(Stream.empty());
    mapper.writerFor(DataSet.class).writeValue(dataSetAsString, dataSet);
    when(dataSetGet.execute()).thenReturn(new ByteArrayInputStream(dataSetAsString.toString().getBytes()));
    when(applicationContext.getBean(eq(DataSetGet.class), anyVararg())).thenReturn(dataSetGet);
    final PreparationGetActions preparationGetActions = mock(PreparationGetActions.class);
    when(preparationGetActions.execute()).thenReturn(new ByteArrayInputStream("{}".getBytes()));
    when(applicationContext.getBean(eq(PreparationGetActions.class), eq("prep-1234"), anyString())).thenReturn(preparationGetActions);
    final TransformationCacheKey cacheKey = mock(TransformationCacheKey.class);
    when(cacheKey.getKey()).thenReturn("cache-1234");
    when(cacheKeyGenerator.generateContentKey(anyString(), anyString(), anyString(), anyString(), any(), any(), anyString())).thenReturn(cacheKey);
    final ExecutableTransformer executableTransformer = mock(ExecutableTransformer.class);
    reset(transformer);
    when(transformer.buildExecutable(any(), any())).thenReturn(executableTransformer);
    when(factory.get(any())).thenReturn(transformer);
    when(contentCache.put(any(), any())).thenReturn(new NullOutputStream());
}
Also used : DataSetGet(org.talend.dataprep.command.dataset.DataSetGet) DataSet(org.talend.dataprep.api.dataset.DataSet) DataSetGetMetadata(org.talend.dataprep.command.dataset.DataSetGetMetadata) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) TransformationCacheKey(org.talend.dataprep.cache.TransformationCacheKey) Jdk8Module(com.fasterxml.jackson.datatype.jdk8.Jdk8Module) JsonFormat(org.talend.dataprep.transformation.format.JsonFormat) StringWriter(java.io.StringWriter) ByteArrayInputStream(java.io.ByteArrayInputStream) PreparationGetActions(org.talend.dataprep.command.preparation.PreparationGetActions) ExecutableTransformer(org.talend.dataprep.transformation.api.transformer.ExecutableTransformer) RowMetadata(org.talend.dataprep.api.dataset.RowMetadata) DataSetContent(org.talend.dataprep.api.dataset.DataSetContent) ObjectMapper(com.fasterxml.jackson.databind.ObjectMapper) NullOutputStream(org.apache.commons.io.output.NullOutputStream) Before(org.junit.Before)

Example 3 with DataSetContent

use of org.talend.dataprep.api.dataset.DataSetContent in project data-prep by Talend.

the class ContentAnalysis method updateHeaderAndFooter.

/**
 * Update the header and footer information in the dataset metadata.
 *
 * @param metadata the dataset metadata to update.
 */
private void updateHeaderAndFooter(DataSetMetadata metadata) {
    DataSetContent datasetContent = metadata.getContent();
    // parameters
    final Map<String, String> parameters = metadata.getContent().getParameters();
    int headerNBLines = 1;
    try {
        headerNBLines = Integer.parseInt(parameters.get(CSVFormatFamily.HEADER_NB_LINES_PARAMETER));
    } catch (NumberFormatException e) {
        LOG.info("No header information for {}, let's use the first line as header.", metadata.getId());
    }
    datasetContent.setNbLinesInHeader(headerNBLines);
    datasetContent.setNbLinesInFooter(0);
}
Also used : DataSetContent(org.talend.dataprep.api.dataset.DataSetContent)

Example 4 with DataSetContent

use of org.talend.dataprep.api.dataset.DataSetContent in project data-prep by Talend.

the class FormatAnalysis method internalUpdateMetadata.

/**
 * Update the given dataset metadata with the specified format.
 *
 * @param metadata the dataset metadata to update.
 * @param format the specified format used to update the dataset metadata
 */
private void internalUpdateMetadata(DataSetMetadata metadata, Format format) {
    FormatFamily formatFamily = format.getFormatFamily();
    DataSetContent dataSetContent = metadata.getContent();
    final String mediaType = metadata.getLocation().toMediaType(format.getFormatFamily());
    dataSetContent.setFormatFamilyId(formatFamily.getBeanId());
    dataSetContent.setMediaType(mediaType);
    metadata.setEncoding(format.getEncoding());
    parseColumnNameInformation(metadata.getId(), metadata, format);
    repository.save(metadata);
}
Also used : DataSetContent(org.talend.dataprep.api.dataset.DataSetContent)

Example 5 with DataSetContent

use of org.talend.dataprep.api.dataset.DataSetContent in project data-prep by Talend.

the class DataSetJSONTest method testWrite1.

@Test
public void testWrite1() throws Exception {
    final ColumnMetadata.Builder columnBuilder = // 
    ColumnMetadata.Builder.column().id(// 
    5).name(// 
    "column1").type(// 
    Type.STRING).empty(// 
    0).invalid(// 
    10).valid(50);
    DataSetMetadata metadata = metadataBuilder.metadata().id("1234").name("name").author("author").created(0).row(columnBuilder).build();
    final DataSetContent content = metadata.getContent();
    content.addParameter(CSVFormatFamily.SEPARATOR_PARAMETER, ",");
    content.setFormatFamilyId(new CSVFormatFamily().getBeanId());
    content.setMediaType("text/csv");
    metadata.getLifecycle().qualityAnalyzed(true);
    metadata.getLifecycle().schemaAnalyzed(true);
    LocalStoreLocation location = new LocalStoreLocation();
    metadata.setLocation(location);
    StringWriter writer = new StringWriter();
    DataSet dataSet = new DataSet();
    dataSet.setMetadata(metadata);
    to(dataSet, writer);
    assertThat(writer.toString(), sameJSONAsFile(DataSetJSONTest.class.getResourceAsStream("test2.json")));
}
Also used : ColumnMetadata(org.talend.dataprep.api.dataset.ColumnMetadata) DataSet(org.talend.dataprep.api.dataset.DataSet) DataSetContent(org.talend.dataprep.api.dataset.DataSetContent) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) CSVFormatFamily(org.talend.dataprep.schema.csv.CSVFormatFamily) LocalStoreLocation(org.talend.dataprep.api.dataset.location.LocalStoreLocation) ServiceBaseTest(org.talend.ServiceBaseTest) Test(org.junit.Test)

Aggregations

DataSetContent (org.talend.dataprep.api.dataset.DataSetContent)5 DataSet (org.talend.dataprep.api.dataset.DataSet)2 DataSetMetadata (org.talend.dataprep.api.dataset.DataSetMetadata)2 ObjectMapper (com.fasterxml.jackson.databind.ObjectMapper)1 Jdk8Module (com.fasterxml.jackson.datatype.jdk8.Jdk8Module)1 ByteArrayInputStream (java.io.ByteArrayInputStream)1 StringWriter (java.io.StringWriter)1 NullOutputStream (org.apache.commons.io.output.NullOutputStream)1 Before (org.junit.Before)1 Test (org.junit.Test)1 ServiceBaseTest (org.talend.ServiceBaseTest)1 ColumnMetadata (org.talend.dataprep.api.dataset.ColumnMetadata)1 RowMetadata (org.talend.dataprep.api.dataset.RowMetadata)1 LocalStoreLocation (org.talend.dataprep.api.dataset.location.LocalStoreLocation)1 TransformationCacheKey (org.talend.dataprep.cache.TransformationCacheKey)1 DataSetGet (org.talend.dataprep.command.dataset.DataSetGet)1 DataSetGetMetadata (org.talend.dataprep.command.dataset.DataSetGetMetadata)1 PreparationGetActions (org.talend.dataprep.command.preparation.PreparationGetActions)1 Serializer (org.talend.dataprep.schema.Serializer)1 CSVFormatFamily (org.talend.dataprep.schema.csv.CSVFormatFamily)1