Search in sources :

Example 6 with FieldSpec

use of com.linkedin.pinot.common.data.FieldSpec in project pinot by linkedin.

the class CSVRecordReader method next.

@Override
public GenericRow next(GenericRow row) {
    CSVRecord record = _iterator.next();
    for (final FieldSpec fieldSpec : _schema.getAllFieldSpecs()) {
        String column = fieldSpec.getName();
        String token = getValueForColumn(record, column);
        Object value = null;
        if (token == null || token.isEmpty()) {
            incrementNullCountFor(fieldSpec.getName());
        }
        if (fieldSpec.isSingleValueField()) {
            value = RecordReaderUtils.convertToDataType(token, fieldSpec.getDataType());
        } else {
            String[] tokens = (token != null) ? StringUtils.split(token, _delimiterString) : null;
            value = RecordReaderUtils.convertToDataTypeArray(tokens, fieldSpec.getDataType());
        }
        row.putField(column, value);
    }
    return row;
}
Also used : CSVRecord(org.apache.commons.csv.CSVRecord) FieldSpec(com.linkedin.pinot.common.data.FieldSpec)

Example 7 with FieldSpec

use of com.linkedin.pinot.common.data.FieldSpec in project pinot by linkedin.

the class JSONRecordReader method next.

@Override
public GenericRow next(GenericRow row) {
    Map<String, Object> record = _iterator.next();
    for (final FieldSpec fieldSpec : _schema.getAllFieldSpecs()) {
        String column = fieldSpec.getName();
        Object data = record.get(column);
        Object value = null;
        if (fieldSpec.isSingleValueField()) {
            String token = (data != null) ? data.toString() : null;
            if (token == null || token.isEmpty()) {
                incrementNullCountFor(fieldSpec.getName());
            }
            value = RecordReaderUtils.convertToDataType(token, fieldSpec.getDataType());
        } else {
            value = convertToDataTypeArray(data, fieldSpec.getDataType());
        }
        row.putField(column, value);
    }
    return row;
}
Also used : FieldSpec(com.linkedin.pinot.common.data.FieldSpec)

Example 8 with FieldSpec

use of com.linkedin.pinot.common.data.FieldSpec in project pinot by linkedin.

the class PlainFieldExtractor method initColumnTypes.

private void initColumnTypes() {
    // Get the map from column name to pinot data type.
    for (String column : _schema.getColumnNames()) {
        FieldSpec fieldSpec = _schema.getFieldSpecFor(column);
        Preconditions.checkNotNull(fieldSpec, "Bad schema: " + _schema.getSchemaName() + ", field: " + column);
        _columnType.put(column, PinotDataType.getPinotDataType(fieldSpec));
    }
}
Also used : MetricFieldSpec(com.linkedin.pinot.common.data.MetricFieldSpec) TimeFieldSpec(com.linkedin.pinot.common.data.TimeFieldSpec) FieldSpec(com.linkedin.pinot.common.data.FieldSpec)

Example 9 with FieldSpec

use of com.linkedin.pinot.common.data.FieldSpec in project pinot by linkedin.

the class AvroRecordToPinotRowGenerator method transform.

public GenericRow transform(GenericData.Record record, org.apache.avro.Schema schema, GenericRow destination) {
    for (String column : indexingSchema.getColumnNames()) {
        Object entry = record.get(column);
        FieldSpec fieldSpec = indexingSchema.getFieldSpecFor(column);
        if (entry != null) {
            if (entry instanceof Array) {
                entry = AvroRecordReader.transformAvroArrayToObjectArray((Array) entry, fieldSpec);
                if (fieldSpec.getDataType() == DataType.STRING || fieldSpec.getDataType() == DataType.STRING_ARRAY) {
                    for (int i = 0; i < ((Object[]) entry).length; ++i) {
                        if (((Object[]) entry)[i] != null) {
                            ((Object[]) entry)[i] = ((Object[]) entry)[i].toString();
                        }
                    }
                }
            } else {
                if (entry instanceof Utf8) {
                    entry = ((Utf8) entry).toString();
                }
                if (fieldSpec.getDataType() == DataType.STRING) {
                    entry = entry.toString();
                }
            }
        } else {
            // entry was null.
            if (fieldSpec.isSingleValueField()) {
                entry = AvroRecordReader.getDefaultNullValue(fieldSpec);
            } else {
                // A multi-value field, and null. Any of the instanceof checks above will not match, so we need to repeat some
                // of the logic above here.
                entry = AvroRecordReader.transformAvroArrayToObjectArray((Array) entry, fieldSpec);
                if (fieldSpec.getDataType() == DataType.STRING || fieldSpec.getDataType() == DataType.STRING_ARRAY) {
                    for (int i = 0; i < ((Object[]) entry).length; ++i) {
                        if (((Object[]) entry)[i] != null) {
                            ((Object[]) entry)[i] = ((Object[]) entry)[i].toString();
                        }
                    }
                }
            }
        }
        destination.putField(column, entry);
    }
    return destination;
}
Also used : Array(org.apache.avro.generic.GenericData.Array) Utf8(org.apache.avro.util.Utf8) FieldSpec(com.linkedin.pinot.common.data.FieldSpec)

Example 10 with FieldSpec

use of com.linkedin.pinot.common.data.FieldSpec in project pinot by linkedin.

the class KafkaJSONMessageDecoder method decode.

@Override
public GenericRow decode(byte[] payload, GenericRow destination) {
    try {
        String text = new String(payload, "UTF-8");
        JSONObject message = new JSONObject(text);
        for (FieldSpec dimensionSpec : schema.getDimensionFieldSpecs()) {
            if (message.has(dimensionSpec.getName())) {
                Object entry;
                if (dimensionSpec.isSingleValueField()) {
                    entry = stringToDataType(dimensionSpec, message.getString(dimensionSpec.getName()));
                } else {
                    JSONArray jsonArray = message.getJSONArray(dimensionSpec.getName());
                    Object[] array = new Object[jsonArray.length()];
                    for (int i = 0; i < array.length; i++) {
                        array[i] = stringToDataType(dimensionSpec, jsonArray.getString(i));
                    }
                    if (array.length == 0) {
                        entry = new Object[] { AvroRecordReader.getDefaultNullValue(dimensionSpec) };
                    } else {
                        entry = array;
                    }
                }
                destination.putField(dimensionSpec.getName(), entry);
            } else {
                Object entry = AvroRecordReader.getDefaultNullValue(dimensionSpec);
                destination.putField(dimensionSpec.getName(), entry);
            }
        }
        for (FieldSpec metricSpec : schema.getMetricFieldSpecs()) {
            if (message.has(metricSpec.getName())) {
                Object entry = stringToDataType(metricSpec, message.getString(metricSpec.getName()));
                destination.putField(metricSpec.getName(), entry);
            } else {
                Object entry = AvroRecordReader.getDefaultNullValue(metricSpec);
                destination.putField(metricSpec.getName(), entry);
            }
        }
        TimeFieldSpec timeSpec = schema.getTimeFieldSpec();
        if (message.has(timeSpec.getName())) {
            Object entry = stringToDataType(timeSpec, message.getString(timeSpec.getName()));
            destination.putField(timeSpec.getName(), entry);
        } else {
            Object entry = AvroRecordReader.getDefaultNullValue(timeSpec);
            destination.putField(timeSpec.getName(), entry);
        }
        return destination;
    } catch (Exception e) {
        LOGGER.error("error decoding , ", e);
    }
    return null;
}
Also used : JSONObject(org.json.JSONObject) TimeFieldSpec(com.linkedin.pinot.common.data.TimeFieldSpec) JSONArray(org.json.JSONArray) JSONObject(org.json.JSONObject) TimeFieldSpec(com.linkedin.pinot.common.data.TimeFieldSpec) FieldSpec(com.linkedin.pinot.common.data.FieldSpec)

Aggregations

FieldSpec (com.linkedin.pinot.common.data.FieldSpec)52 DimensionFieldSpec (com.linkedin.pinot.common.data.DimensionFieldSpec)28 Test (org.testng.annotations.Test)15 TimeFieldSpec (com.linkedin.pinot.common.data.TimeFieldSpec)14 MetricFieldSpec (com.linkedin.pinot.common.data.MetricFieldSpec)13 File (java.io.File)11 Schema (com.linkedin.pinot.common.data.Schema)10 SegmentDictionaryCreator (com.linkedin.pinot.core.segment.creator.impl.SegmentDictionaryCreator)7 HashMap (java.util.HashMap)7 TimeGranularitySpec (com.linkedin.pinot.common.data.TimeGranularitySpec)6 AbstractColumnStatisticsCollector (com.linkedin.pinot.core.segment.creator.AbstractColumnStatisticsCollector)6 Random (java.util.Random)5 Block (com.linkedin.pinot.core.common.Block)4 BlockMetadata (com.linkedin.pinot.core.common.BlockMetadata)4 DataSource (com.linkedin.pinot.core.common.DataSource)4 GenericRow (com.linkedin.pinot.core.data.GenericRow)4 SegmentGeneratorConfig (com.linkedin.pinot.core.indexsegment.generator.SegmentGeneratorConfig)4 SegmentIndexCreationDriverImpl (com.linkedin.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl)4 ArrayList (java.util.ArrayList)4 DataType (com.linkedin.pinot.common.data.FieldSpec.DataType)3