Search in sources :

Example 26 with CsvMapper

use of com.fasterxml.jackson.dataformat.csv.CsvMapper in project synthea by synthetichealth.

the class NHANESSample method loadSamples.

/**
 * Load the NHANES samples from resources.
 * @return A list of samples.
 */
public static List<NHANESSample> loadSamples() {
    CsvMapper mapper = new CsvMapper();
    List<NHANESSample> samples = new LinkedList<NHANESSample>();
    CsvSchema schema = CsvSchema.emptySchema().withHeader();
    String filename = "nhanes_two_year_olds_bmi.csv";
    try {
        String rawCSV = Utilities.readResource(filename);
        MappingIterator<NHANESSample> it = mapper.readerFor(NHANESSample.class).with(schema).readValues(rawCSV);
        while (it.hasNextValue()) {
            samples.add(it.nextValue());
        }
    } catch (Exception e) {
        System.err.println("ERROR: unable to load CSV: " + filename);
        e.printStackTrace();
        throw new RuntimeException(e);
    }
    return samples;
}
Also used : CsvSchema(com.fasterxml.jackson.dataformat.csv.CsvSchema) CsvMapper(com.fasterxml.jackson.dataformat.csv.CsvMapper) LinkedList(java.util.LinkedList)

Example 27 with CsvMapper

use of com.fasterxml.jackson.dataformat.csv.CsvMapper in project flink by splunk.

the class CsvBulkWriter method forPojo.

/**
 * Builds a writer based on a POJO class definition.
 *
 * @param pojoClass The class of the POJO.
 * @param stream The output stream.
 * @param <T> The type of the elements accepted by this writer.
 */
static <T> CsvBulkWriter<T, T, Void> forPojo(Class<T> pojoClass, FSDataOutputStream stream) {
    final Converter<T, T, Void> converter = (value, context) -> value;
    final CsvMapper csvMapper = new CsvMapper();
    final CsvSchema schema = csvMapper.schemaFor(pojoClass).withoutQuoteChar();
    return new CsvBulkWriter<>(csvMapper, schema, converter, null, stream);
}
Also used : Converter(org.apache.flink.formats.common.Converter) FSDataOutputStream(org.apache.flink.core.fs.FSDataOutputStream) ObjectWriter(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectWriter) BulkWriter(org.apache.flink.api.common.serialization.BulkWriter) JsonGenerator(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonGenerator) CsvSchema(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema) IOException(java.io.IOException) Preconditions.checkNotNull(org.apache.flink.util.Preconditions.checkNotNull) Nullable(javax.annotation.Nullable) CsvMapper(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper) CsvSchema(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema) CsvMapper(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper)

Example 28 with CsvMapper

use of com.fasterxml.jackson.dataformat.csv.CsvMapper in project flink by splunk.

the class RowDataToCsvConverters method createArrayRowFieldConverter.

private static RowFieldConverter createArrayRowFieldConverter(ArrayType type) {
    LogicalType elementType = type.getElementType();
    final ArrayElementConverter elementConverter = createNullableArrayElementConverter(elementType);
    return (csvMapper, container, row, pos) -> {
        ArrayNode arrayNode = csvMapper.createArrayNode();
        ArrayData arrayData = row.getArray(pos);
        int numElements = arrayData.size();
        for (int i = 0; i < numElements; i++) {
            arrayNode.add(elementConverter.convert(csvMapper, arrayNode, arrayData, i));
        }
        return arrayNode;
    };
}
Also used : Arrays(java.util.Arrays) SQL_TIMESTAMP_FORMAT(org.apache.flink.formats.common.TimeFormats.SQL_TIMESTAMP_FORMAT) JsonNode(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonNode) RowType(org.apache.flink.table.types.logical.RowType) ArrayNode(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.node.ArrayNode) TimestampType(org.apache.flink.table.types.logical.TimestampType) SQL_TIMESTAMP_WITH_LOCAL_TIMEZONE_FORMAT(org.apache.flink.formats.common.TimeFormats.SQL_TIMESTAMP_WITH_LOCAL_TIMEZONE_FORMAT) DecimalType(org.apache.flink.table.types.logical.DecimalType) LocalTime(java.time.LocalTime) ISO_LOCAL_DATE(java.time.format.DateTimeFormatter.ISO_LOCAL_DATE) LocalZonedTimestampType(org.apache.flink.table.types.logical.LocalZonedTimestampType) RowData(org.apache.flink.table.data.RowData) TimestampData(org.apache.flink.table.data.TimestampData) ObjectNode(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.node.ObjectNode) DecimalData(org.apache.flink.table.data.DecimalData) ArrayType(org.apache.flink.table.types.logical.ArrayType) Serializable(java.io.Serializable) Converter(org.apache.flink.formats.common.Converter) ArrayData(org.apache.flink.table.data.ArrayData) LogicalType(org.apache.flink.table.types.logical.LogicalType) LocalDate(java.time.LocalDate) DateTimeFormatter(java.time.format.DateTimeFormatter) Internal(org.apache.flink.annotation.Internal) ISO_LOCAL_TIME(java.time.format.DateTimeFormatter.ISO_LOCAL_TIME) ContainerNode(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.node.ContainerNode) CsvMapper(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper) LogicalType(org.apache.flink.table.types.logical.LogicalType) ArrayNode(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.node.ArrayNode) ArrayData(org.apache.flink.table.data.ArrayData)

Example 29 with CsvMapper

use of com.fasterxml.jackson.dataformat.csv.CsvMapper in project flink by splunk.

the class CsvFileFormatFactory method createEncodingFormat.

@Override
public EncodingFormat<Factory<RowData>> createEncodingFormat(DynamicTableFactory.Context context, ReadableConfig formatOptions) {
    return new EncodingFormat<BulkWriter.Factory<RowData>>() {

        @Override
        public BulkWriter.Factory<RowData> createRuntimeEncoder(DynamicTableSink.Context context, DataType physicalDataType) {
            final RowType rowType = (RowType) physicalDataType.getLogicalType();
            final CsvSchema schema = buildCsvSchema(rowType, formatOptions);
            final RowDataToCsvConverter converter = RowDataToCsvConverters.createRowConverter(rowType);
            final CsvMapper mapper = new CsvMapper();
            final ObjectNode container = mapper.createObjectNode();
            final RowDataToCsvConverter.RowDataToCsvFormatConverterContext converterContext = new RowDataToCsvConverter.RowDataToCsvFormatConverterContext(mapper, container);
            return out -> CsvBulkWriter.forSchema(mapper, schema, converter, converterContext, out);
        }

        @Override
        public ChangelogMode getChangelogMode() {
            return ChangelogMode.insertOnly();
        }
    };
}
Also used : Context(org.apache.flink.table.connector.source.DynamicTableSource.Context) DynamicTableFactory(org.apache.flink.table.factories.DynamicTableFactory) DataType(org.apache.flink.table.types.DataType) EncodingFormat(org.apache.flink.table.connector.format.EncodingFormat) ChangelogMode(org.apache.flink.table.connector.ChangelogMode) FIELD_DELIMITER(org.apache.flink.formats.csv.CsvFormatOptions.FIELD_DELIMITER) BulkWriterFormatFactory(org.apache.flink.connector.file.table.factories.BulkWriterFormatFactory) CsvSchema(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema) Context(org.apache.flink.table.connector.source.DynamicTableSource.Context) JsonNode(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonNode) RowType(org.apache.flink.table.types.logical.RowType) ALLOW_COMMENTS(org.apache.flink.formats.csv.CsvFormatOptions.ALLOW_COMMENTS) Factory(org.apache.flink.api.common.serialization.BulkWriter.Factory) ReadableConfig(org.apache.flink.configuration.ReadableConfig) FileSourceSplit(org.apache.flink.connector.file.src.FileSourceSplit) IGNORE_PARSE_ERRORS(org.apache.flink.formats.csv.CsvFormatOptions.IGNORE_PARSE_ERRORS) QUOTE_CHARACTER(org.apache.flink.formats.csv.CsvFormatOptions.QUOTE_CHARACTER) RowDataToCsvConverter(org.apache.flink.formats.csv.RowDataToCsvConverters.RowDataToCsvConverter) ESCAPE_CHARACTER(org.apache.flink.formats.csv.CsvFormatOptions.ESCAPE_CHARACTER) StreamFormatAdapter(org.apache.flink.connector.file.src.impl.StreamFormatAdapter) ConfigOption(org.apache.flink.configuration.ConfigOption) StringEscapeUtils(org.apache.commons.lang3.StringEscapeUtils) Preconditions.checkNotNull(org.apache.flink.util.Preconditions.checkNotNull) BulkDecodingFormat(org.apache.flink.connector.file.table.format.BulkDecodingFormat) Projection(org.apache.flink.table.connector.Projection) BulkReaderFormatFactory(org.apache.flink.connector.file.table.factories.BulkReaderFormatFactory) RowData(org.apache.flink.table.data.RowData) DynamicTableSink(org.apache.flink.table.connector.sink.DynamicTableSink) BulkWriter(org.apache.flink.api.common.serialization.BulkWriter) ObjectNode(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.node.ObjectNode) Set(java.util.Set) ProjectableDecodingFormat(org.apache.flink.table.connector.format.ProjectableDecodingFormat) DISABLE_QUOTE_CHARACTER(org.apache.flink.formats.csv.CsvFormatOptions.DISABLE_QUOTE_CHARACTER) ARRAY_ELEMENT_DELIMITER(org.apache.flink.formats.csv.CsvFormatOptions.ARRAY_ELEMENT_DELIMITER) Converter(org.apache.flink.formats.common.Converter) NULL_LITERAL(org.apache.flink.formats.csv.CsvFormatOptions.NULL_LITERAL) Internal(org.apache.flink.annotation.Internal) BulkFormat(org.apache.flink.connector.file.src.reader.BulkFormat) Collections(java.util.Collections) CsvMapper(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper) RowDataToCsvConverter(org.apache.flink.formats.csv.RowDataToCsvConverters.RowDataToCsvConverter) ObjectNode(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.node.ObjectNode) CsvMapper(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper) RowType(org.apache.flink.table.types.logical.RowType) EncodingFormat(org.apache.flink.table.connector.format.EncodingFormat) RowData(org.apache.flink.table.data.RowData) CsvSchema(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema) BulkWriter(org.apache.flink.api.common.serialization.BulkWriter) DataType(org.apache.flink.table.types.DataType)

Example 30 with CsvMapper

use of com.fasterxml.jackson.dataformat.csv.CsvMapper in project flink by splunk.

the class DataStreamCsvITCase method testCsvReaderFormatFromSchema.

@Test
public void testCsvReaderFormatFromSchema() throws Exception {
    writeFile(outDir, "data.csv", CSV_LINES_PIPE_SEPARATED);
    CsvMapper mapper = new CsvMapper();
    CsvSchema schema = mapper.schemaFor(CityPojo.class).withoutQuoteChar().withColumnSeparator('|');
    final CsvReaderFormat<CityPojo> csvFormat = CsvReaderFormat.forSchema(mapper, schema, TypeInformation.of(CityPojo.class));
    final List<CityPojo> result = initializeSourceAndReadData(outDir, csvFormat);
    assertThat(Arrays.asList(POJOS)).isEqualTo(result);
}
Also used : CsvSchema(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema) CsvMapper(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper) Test(org.junit.jupiter.api.Test)

Aggregations

CsvMapper (com.fasterxml.jackson.dataformat.csv.CsvMapper)68 CsvSchema (com.fasterxml.jackson.dataformat.csv.CsvSchema)55 IOException (java.io.IOException)17 CsvMapper (org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper)15 Map (java.util.Map)14 Test (org.junit.jupiter.api.Test)13 ObjectWriter (com.fasterxml.jackson.databind.ObjectWriter)12 Converter (org.apache.flink.formats.common.Converter)12 CsvSchema (org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema)12 ObjectReader (com.fasterxml.jackson.databind.ObjectReader)10 HashMap (java.util.HashMap)10 List (java.util.List)10 ArrayList (java.util.ArrayList)9 BulkWriter (org.apache.flink.api.common.serialization.BulkWriter)9 File (java.io.File)7 InputStream (java.io.InputStream)7 Serializable (java.io.Serializable)6 Arrays (java.util.Arrays)6 Internal (org.apache.flink.annotation.Internal)5 JsonNode (org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonNode)5