Search in sources :

Example 1 with ThriftToPig

use of com.twitter.elephantbird.pig.util.ThriftToPig in project elephant-bird by twitter.

the class TestThriftToPig method testSetConversionProperties.

@SuppressWarnings({ "rawtypes", "unchecked" })
@Test
public void testSetConversionProperties() throws ExecException {
    PhoneNumber pn = new PhoneNumber();
    pn.setNumber("1234");
    pn.setType(PhoneType.HOME);
    ThriftToPig ttp = ThriftToPig.newInstance(PhoneNumber.class);
    Tuple tuple = ttp.getPigTuple(pn);
    assertEquals(DataType.CHARARRAY, tuple.getType(1));
    assertEquals(PhoneType.HOME.toString(), tuple.get(1));
    Configuration conf = new Configuration();
    conf.setBoolean(ThriftToPig.USE_ENUM_ID_CONF_KEY, true);
    ThriftToPig.setConversionProperties(conf);
    tuple = ttp.getPigTuple(pn);
    assertEquals(DataType.INTEGER, tuple.getType(1));
    assertEquals(PhoneType.HOME.getValue(), tuple.get(1));
}
Also used : Configuration(org.apache.hadoop.conf.Configuration) PhoneNumber(com.twitter.elephantbird.thrift.test.PhoneNumber) ThriftToPig(com.twitter.elephantbird.pig.util.ThriftToPig) ThriftBytesToTuple(com.twitter.elephantbird.pig.piggybank.ThriftBytesToTuple) Tuple(org.apache.pig.data.Tuple) Test(org.junit.Test)

Example 2 with ThriftToPig

use of com.twitter.elephantbird.pig.util.ThriftToPig in project parquet-mr by apache.

the class AbstractThriftWriteSupport method init.

protected void init(Class<T> thriftClass) {
    this.thriftClass = thriftClass;
    this.thriftStruct = getThriftStruct();
    this.schema = ThriftSchemaConverter.convertWithoutProjection(thriftStruct);
    final Map<String, String> extraMetaData = new ThriftMetaData(thriftClass.getName(), thriftStruct).toExtraMetaData();
    // TODO: make this work for non-tbase types
    if (isPigLoaded() && TBase.class.isAssignableFrom(thriftClass)) {
        new PigMetaData(new ThriftToPig((Class<? extends TBase<?, ?>>) thriftClass).toSchema()).addToMetaData(extraMetaData);
    }
    this.writeContext = new WriteContext(schema, extraMetaData);
}
Also used : ThriftMetaData(org.apache.parquet.thrift.ThriftMetaData) PigMetaData(org.apache.parquet.pig.PigMetaData) TBase(org.apache.thrift.TBase) ThriftToPig(com.twitter.elephantbird.pig.util.ThriftToPig)

Example 3 with ThriftToPig

use of com.twitter.elephantbird.pig.util.ThriftToPig in project parquet-mr by apache.

the class TestThriftToPigCompatibility method validateSameTupleAsEB.

/**
 * <ul> steps:
 * <li>Writes using the thrift mapping
 * <li>Reads using the pig mapping
 * <li>Use Elephant bird to convert from thrift to pig
 * <li>Check that both transformations give the same result
 * @param o the object to convert
 * @throws TException
 */
public static <T extends TBase<?, ?>> void validateSameTupleAsEB(T o) throws TException {
    final ThriftSchemaConverter thriftSchemaConverter = new ThriftSchemaConverter();
    @SuppressWarnings("unchecked") final Class<T> class1 = (Class<T>) o.getClass();
    final MessageType schema = thriftSchemaConverter.convert(class1);
    final StructType structType = ThriftSchemaConverter.toStructType(class1);
    final ThriftToPig<T> thriftToPig = new ThriftToPig<T>(class1);
    final Schema pigSchema = thriftToPig.toSchema();
    final TupleRecordMaterializer tupleRecordConverter = new TupleRecordMaterializer(schema, pigSchema, true);
    RecordConsumer recordConsumer = new ConverterConsumer(tupleRecordConverter.getRootConverter(), schema);
    final MessageColumnIO columnIO = new ColumnIOFactory().getColumnIO(schema);
    ParquetWriteProtocol p = new ParquetWriteProtocol(new RecordConsumerLoggingWrapper(recordConsumer), columnIO, structType);
    o.write(p);
    final Tuple t = tupleRecordConverter.getCurrentRecord();
    final Tuple expected = thriftToPig.getPigTuple(o);
    assertEquals(expected.toString(), t.toString());
    final MessageType filtered = new PigSchemaConverter().filter(schema, pigSchema);
    assertEquals(schema.toString(), filtered.toString());
}
Also used : StructType(org.apache.parquet.thrift.struct.ThriftType.StructType) RecordConsumerLoggingWrapper(org.apache.parquet.io.RecordConsumerLoggingWrapper) Schema(org.apache.pig.impl.logicalLayer.schema.Schema) PigSchemaConverter(org.apache.parquet.pig.PigSchemaConverter) ThriftToPig(com.twitter.elephantbird.pig.util.ThriftToPig) RecordConsumer(org.apache.parquet.io.api.RecordConsumer) ConverterConsumer(org.apache.parquet.io.ConverterConsumer) MessageColumnIO(org.apache.parquet.io.MessageColumnIO) ColumnIOFactory(org.apache.parquet.io.ColumnIOFactory) TupleRecordMaterializer(org.apache.parquet.pig.convert.TupleRecordMaterializer) MessageType(org.apache.parquet.schema.MessageType) Tuple(org.apache.pig.data.Tuple)

Example 4 with ThriftToPig

use of com.twitter.elephantbird.pig.util.ThriftToPig in project parquet-mr by apache.

the class TestParquetWriteProtocol method validatePig.

private MessageType validatePig(String[] expectations, TBase<?, ?> a) {
    ThriftToPig<TBase<?, ?>> thriftToPig = new ThriftToPig(a.getClass());
    ExpectationValidatingRecordConsumer recordConsumer = new ExpectationValidatingRecordConsumer(new ArrayDeque<String>(Arrays.asList(expectations)));
    Schema pigSchema = thriftToPig.toSchema();
    LOG.info("{}", pigSchema);
    MessageType schema = new PigSchemaConverter().convert(pigSchema);
    LOG.info("{}", schema);
    TupleWriteSupport tupleWriteSupport = new TupleWriteSupport(pigSchema);
    tupleWriteSupport.init(null);
    tupleWriteSupport.prepareForWrite(recordConsumer);
    final Tuple pigTuple = thriftToPig.getPigTuple(a);
    LOG.info("{}", pigTuple);
    tupleWriteSupport.write(pigTuple);
    return schema;
}
Also used : Schema(org.apache.pig.impl.logicalLayer.schema.Schema) PigSchemaConverter(org.apache.parquet.pig.PigSchemaConverter) TBase(org.apache.thrift.TBase) TupleWriteSupport(org.apache.parquet.pig.TupleWriteSupport) ExpectationValidatingRecordConsumer(org.apache.parquet.io.ExpectationValidatingRecordConsumer) ThriftToPig(com.twitter.elephantbird.pig.util.ThriftToPig) MessageType(org.apache.parquet.schema.MessageType) Tuple(org.apache.pig.data.Tuple)

Example 5 with ThriftToPig

use of com.twitter.elephantbird.pig.util.ThriftToPig in project elephant-bird by twitter.

the class TestThriftToPig method testMapValueFieldAlias.

/**
 * Tests that thrift map field value has no field schema alias.
 * @throws FrontendException
 */
@Test
public void testMapValueFieldAlias() throws FrontendException {
    ThriftToPig<TestMap> thriftToPig = new ThriftToPig<TestMap>(TestMap.class);
    Schema schema = thriftToPig.toSchema();
    Assert.assertEquals("{name: chararray,names: map[chararray]}", schema.toString());
    Assert.assertNull(schema.getField(1).schema.getField(0).alias);
    schema = ThriftToPig.toSchema(TestMap.class);
    Assert.assertEquals("{name: chararray,names: map[chararray]}", schema.toString());
    Assert.assertNull(schema.getField(1).schema.getField(0).alias);
}
Also used : TestMap(com.twitter.elephantbird.thrift.test.TestMap) Schema(org.apache.pig.impl.logicalLayer.schema.Schema) ResourceSchema(org.apache.pig.ResourceSchema) ThriftToPig(com.twitter.elephantbird.pig.util.ThriftToPig) Test(org.junit.Test)

Aggregations

ThriftToPig (com.twitter.elephantbird.pig.util.ThriftToPig)5 Tuple (org.apache.pig.data.Tuple)3 Schema (org.apache.pig.impl.logicalLayer.schema.Schema)3 PigSchemaConverter (org.apache.parquet.pig.PigSchemaConverter)2 MessageType (org.apache.parquet.schema.MessageType)2 TBase (org.apache.thrift.TBase)2 Test (org.junit.Test)2 ThriftBytesToTuple (com.twitter.elephantbird.pig.piggybank.ThriftBytesToTuple)1 PhoneNumber (com.twitter.elephantbird.thrift.test.PhoneNumber)1 TestMap (com.twitter.elephantbird.thrift.test.TestMap)1 Configuration (org.apache.hadoop.conf.Configuration)1 ColumnIOFactory (org.apache.parquet.io.ColumnIOFactory)1 ConverterConsumer (org.apache.parquet.io.ConverterConsumer)1 ExpectationValidatingRecordConsumer (org.apache.parquet.io.ExpectationValidatingRecordConsumer)1 MessageColumnIO (org.apache.parquet.io.MessageColumnIO)1 RecordConsumerLoggingWrapper (org.apache.parquet.io.RecordConsumerLoggingWrapper)1 RecordConsumer (org.apache.parquet.io.api.RecordConsumer)1 PigMetaData (org.apache.parquet.pig.PigMetaData)1 TupleWriteSupport (org.apache.parquet.pig.TupleWriteSupport)1 TupleRecordMaterializer (org.apache.parquet.pig.convert.TupleRecordMaterializer)1