Search in sources :

Example 1 with PigSchemaConverter

use of org.apache.parquet.pig.PigSchemaConverter in project parquet-mr by apache.

the class TestThriftToPigCompatibility method validateSameTupleAsEB.

/**
 * <ul> steps:
 * <li>Writes using the thrift mapping
 * <li>Reads using the pig mapping
 * <li>Use Elephant bird to convert from thrift to pig
 * <li>Check that both transformations give the same result
 * @param o the object to convert
 * @throws TException
 */
public static <T extends TBase<?, ?>> void validateSameTupleAsEB(T o) throws TException {
    final ThriftSchemaConverter thriftSchemaConverter = new ThriftSchemaConverter();
    @SuppressWarnings("unchecked") final Class<T> class1 = (Class<T>) o.getClass();
    final MessageType schema = thriftSchemaConverter.convert(class1);
    final StructType structType = ThriftSchemaConverter.toStructType(class1);
    final ThriftToPig<T> thriftToPig = new ThriftToPig<T>(class1);
    final Schema pigSchema = thriftToPig.toSchema();
    final TupleRecordMaterializer tupleRecordConverter = new TupleRecordMaterializer(schema, pigSchema, true);
    RecordConsumer recordConsumer = new ConverterConsumer(tupleRecordConverter.getRootConverter(), schema);
    final MessageColumnIO columnIO = new ColumnIOFactory().getColumnIO(schema);
    ParquetWriteProtocol p = new ParquetWriteProtocol(new RecordConsumerLoggingWrapper(recordConsumer), columnIO, structType);
    o.write(p);
    final Tuple t = tupleRecordConverter.getCurrentRecord();
    final Tuple expected = thriftToPig.getPigTuple(o);
    assertEquals(expected.toString(), t.toString());
    final MessageType filtered = new PigSchemaConverter().filter(schema, pigSchema);
    assertEquals(schema.toString(), filtered.toString());
}
Also used : StructType(org.apache.parquet.thrift.struct.ThriftType.StructType) RecordConsumerLoggingWrapper(org.apache.parquet.io.RecordConsumerLoggingWrapper) Schema(org.apache.pig.impl.logicalLayer.schema.Schema) PigSchemaConverter(org.apache.parquet.pig.PigSchemaConverter) ThriftToPig(com.twitter.elephantbird.pig.util.ThriftToPig) RecordConsumer(org.apache.parquet.io.api.RecordConsumer) ConverterConsumer(org.apache.parquet.io.ConverterConsumer) MessageColumnIO(org.apache.parquet.io.MessageColumnIO) ColumnIOFactory(org.apache.parquet.io.ColumnIOFactory) TupleRecordMaterializer(org.apache.parquet.pig.convert.TupleRecordMaterializer) MessageType(org.apache.parquet.schema.MessageType) Tuple(org.apache.pig.data.Tuple)

Example 2 with PigSchemaConverter

use of org.apache.parquet.pig.PigSchemaConverter in project parquet-mr by apache.

the class TestParquetWriteProtocol method validatePig.

private MessageType validatePig(String[] expectations, TBase<?, ?> a) {
    ThriftToPig<TBase<?, ?>> thriftToPig = new ThriftToPig(a.getClass());
    ExpectationValidatingRecordConsumer recordConsumer = new ExpectationValidatingRecordConsumer(new ArrayDeque<String>(Arrays.asList(expectations)));
    Schema pigSchema = thriftToPig.toSchema();
    LOG.info("{}", pigSchema);
    MessageType schema = new PigSchemaConverter().convert(pigSchema);
    LOG.info("{}", schema);
    TupleWriteSupport tupleWriteSupport = new TupleWriteSupport(pigSchema);
    tupleWriteSupport.init(null);
    tupleWriteSupport.prepareForWrite(recordConsumer);
    final Tuple pigTuple = thriftToPig.getPigTuple(a);
    LOG.info("{}", pigTuple);
    tupleWriteSupport.write(pigTuple);
    return schema;
}
Also used : Schema(org.apache.pig.impl.logicalLayer.schema.Schema) PigSchemaConverter(org.apache.parquet.pig.PigSchemaConverter) TBase(org.apache.thrift.TBase) TupleWriteSupport(org.apache.parquet.pig.TupleWriteSupport) ExpectationValidatingRecordConsumer(org.apache.parquet.io.ExpectationValidatingRecordConsumer) ThriftToPig(com.twitter.elephantbird.pig.util.ThriftToPig) MessageType(org.apache.parquet.schema.MessageType) Tuple(org.apache.pig.data.Tuple)

Aggregations

ThriftToPig (com.twitter.elephantbird.pig.util.ThriftToPig)2 PigSchemaConverter (org.apache.parquet.pig.PigSchemaConverter)2 MessageType (org.apache.parquet.schema.MessageType)2 Tuple (org.apache.pig.data.Tuple)2 Schema (org.apache.pig.impl.logicalLayer.schema.Schema)2 ColumnIOFactory (org.apache.parquet.io.ColumnIOFactory)1 ConverterConsumer (org.apache.parquet.io.ConverterConsumer)1 ExpectationValidatingRecordConsumer (org.apache.parquet.io.ExpectationValidatingRecordConsumer)1 MessageColumnIO (org.apache.parquet.io.MessageColumnIO)1 RecordConsumerLoggingWrapper (org.apache.parquet.io.RecordConsumerLoggingWrapper)1 RecordConsumer (org.apache.parquet.io.api.RecordConsumer)1 TupleWriteSupport (org.apache.parquet.pig.TupleWriteSupport)1 TupleRecordMaterializer (org.apache.parquet.pig.convert.TupleRecordMaterializer)1 StructType (org.apache.parquet.thrift.struct.ThriftType.StructType)1 TBase (org.apache.thrift.TBase)1