Search in sources :

Example 1 with ConverterConsumer

use of org.apache.parquet.io.ConverterConsumer in project parquet-mr by apache.

the class TestTupleRecordConsumer method newTupleWriter.

private <T> TupleWriteSupport newTupleWriter(String pigSchemaString, RecordMaterializer<T> recordConsumer) throws ParserException {
    TupleWriteSupport tupleWriter = TupleWriteSupport.fromPigSchema(pigSchemaString);
    tupleWriter.init(null);
    tupleWriter.prepareForWrite(new ConverterConsumer(recordConsumer.getRootConverter(), tupleWriter.getParquetSchema()));
    return tupleWriter;
}
Also used : ConverterConsumer(org.apache.parquet.io.ConverterConsumer)

Example 2 with ConverterConsumer

use of org.apache.parquet.io.ConverterConsumer in project parquet-mr by apache.

the class TestThriftToPigCompatibility method validateSameTupleAsEB.

/**
 * <ul> steps:
 * <li>Writes using the thrift mapping
 * <li>Reads using the pig mapping
 * <li>Use Elephant bird to convert from thrift to pig
 * <li>Check that both transformations give the same result
 * @param o the object to convert
 * @throws TException
 */
public static <T extends TBase<?, ?>> void validateSameTupleAsEB(T o) throws TException {
    final ThriftSchemaConverter thriftSchemaConverter = new ThriftSchemaConverter();
    @SuppressWarnings("unchecked") final Class<T> class1 = (Class<T>) o.getClass();
    final MessageType schema = thriftSchemaConverter.convert(class1);
    final StructType structType = ThriftSchemaConverter.toStructType(class1);
    final ThriftToPig<T> thriftToPig = new ThriftToPig<T>(class1);
    final Schema pigSchema = thriftToPig.toSchema();
    final TupleRecordMaterializer tupleRecordConverter = new TupleRecordMaterializer(schema, pigSchema, true);
    RecordConsumer recordConsumer = new ConverterConsumer(tupleRecordConverter.getRootConverter(), schema);
    final MessageColumnIO columnIO = new ColumnIOFactory().getColumnIO(schema);
    ParquetWriteProtocol p = new ParquetWriteProtocol(new RecordConsumerLoggingWrapper(recordConsumer), columnIO, structType);
    o.write(p);
    final Tuple t = tupleRecordConverter.getCurrentRecord();
    final Tuple expected = thriftToPig.getPigTuple(o);
    assertEquals(expected.toString(), t.toString());
    final MessageType filtered = new PigSchemaConverter().filter(schema, pigSchema);
    assertEquals(schema.toString(), filtered.toString());
}
Also used : StructType(org.apache.parquet.thrift.struct.ThriftType.StructType) RecordConsumerLoggingWrapper(org.apache.parquet.io.RecordConsumerLoggingWrapper) Schema(org.apache.pig.impl.logicalLayer.schema.Schema) PigSchemaConverter(org.apache.parquet.pig.PigSchemaConverter) ThriftToPig(com.twitter.elephantbird.pig.util.ThriftToPig) RecordConsumer(org.apache.parquet.io.api.RecordConsumer) ConverterConsumer(org.apache.parquet.io.ConverterConsumer) MessageColumnIO(org.apache.parquet.io.MessageColumnIO) ColumnIOFactory(org.apache.parquet.io.ColumnIOFactory) TupleRecordMaterializer(org.apache.parquet.pig.convert.TupleRecordMaterializer) MessageType(org.apache.parquet.schema.MessageType) Tuple(org.apache.pig.data.Tuple)

Example 3 with ConverterConsumer

use of org.apache.parquet.io.ConverterConsumer in project parquet-mr by apache.

the class TestTupleRecordConsumer method testFromGroups.

private void testFromGroups(String pigSchemaString, List<Group> input) throws ParserException {
    List<Tuple> tuples = new ArrayList<Tuple>();
    MessageType schema = getMessageType(pigSchemaString);
    RecordMaterializer<Tuple> pigRecordConsumer = newPigRecordConsumer(pigSchemaString);
    GroupWriter groupWriter = new GroupWriter(new RecordConsumerLoggingWrapper(new ConverterConsumer(pigRecordConsumer.getRootConverter(), schema)), schema);
    for (Group group : input) {
        groupWriter.write(group);
        final Tuple tuple = pigRecordConsumer.getCurrentRecord();
        tuples.add(tuple);
        LOG.debug("in: {}\nout:{}", group, tuple);
    }
    List<Group> groups = new ArrayList<Group>();
    GroupRecordConverter recordConsumer = new GroupRecordConverter(schema);
    TupleWriteSupport tupleWriter = newTupleWriter(pigSchemaString, recordConsumer);
    for (Tuple t : tuples) {
        LOG.debug("{}", t);
        tupleWriter.write(t);
        groups.add(recordConsumer.getCurrentRecord());
    }
    assertEquals(input.size(), groups.size());
    for (int i = 0; i < input.size(); i++) {
        Group in = input.get(i);
        LOG.debug("{}", in);
        Group out = groups.get(i);
        assertEquals(in.toString(), out.toString());
    }
}
Also used : Group(org.apache.parquet.example.data.Group) SimpleGroup(org.apache.parquet.example.data.simple.SimpleGroup) GroupRecordConverter(org.apache.parquet.example.data.simple.convert.GroupRecordConverter) RecordConsumerLoggingWrapper(org.apache.parquet.io.RecordConsumerLoggingWrapper) ArrayList(java.util.ArrayList) GroupWriter(org.apache.parquet.example.data.GroupWriter) ConverterConsumer(org.apache.parquet.io.ConverterConsumer) Tuple(org.apache.pig.data.Tuple) MessageType(org.apache.parquet.schema.MessageType)

Aggregations

ConverterConsumer (org.apache.parquet.io.ConverterConsumer)3 RecordConsumerLoggingWrapper (org.apache.parquet.io.RecordConsumerLoggingWrapper)2 MessageType (org.apache.parquet.schema.MessageType)2 Tuple (org.apache.pig.data.Tuple)2 ThriftToPig (com.twitter.elephantbird.pig.util.ThriftToPig)1 ArrayList (java.util.ArrayList)1 Group (org.apache.parquet.example.data.Group)1 GroupWriter (org.apache.parquet.example.data.GroupWriter)1 SimpleGroup (org.apache.parquet.example.data.simple.SimpleGroup)1 GroupRecordConverter (org.apache.parquet.example.data.simple.convert.GroupRecordConverter)1 ColumnIOFactory (org.apache.parquet.io.ColumnIOFactory)1 MessageColumnIO (org.apache.parquet.io.MessageColumnIO)1 RecordConsumer (org.apache.parquet.io.api.RecordConsumer)1 PigSchemaConverter (org.apache.parquet.pig.PigSchemaConverter)1 TupleRecordMaterializer (org.apache.parquet.pig.convert.TupleRecordMaterializer)1 StructType (org.apache.parquet.thrift.struct.ThriftType.StructType)1 Schema (org.apache.pig.impl.logicalLayer.schema.Schema)1