Search in sources :

Example 11 with LazySimpleSerDe

use of org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe in project hive by apache.

the class TestLazySimpleSerDe method testSerDeParameters.

/**
 * Tests the deprecated usage of SerDeParameters.
 */
@Test
@SuppressWarnings("deprecation")
public void testSerDeParameters() throws SerDeException, IOException {
    // Setup
    LazySimpleSerDe serDe = new LazySimpleSerDe();
    Configuration conf = new Configuration();
    MyTestClass row = new MyTestClass();
    ExtraTypeInfo extraTypeInfo = new ExtraTypeInfo();
    row.randomFill(new Random(1234), extraTypeInfo);
    StructObjectInspector rowOI = (StructObjectInspector) ObjectInspectorFactory.getReflectionObjectInspector(MyTestClass.class, ObjectInspectorOptions.JAVA);
    String fieldNames = ObjectInspectorUtils.getFieldNames(rowOI);
    String fieldTypes = ObjectInspectorUtils.getFieldTypes(rowOI);
    Properties schema = new Properties();
    schema.setProperty(serdeConstants.LIST_COLUMNS, fieldNames);
    schema.setProperty(serdeConstants.LIST_COLUMN_TYPES, fieldTypes);
    SerDeUtils.initializeSerDe(serDe, conf, schema, null);
    SerDeParameters serdeParams = LazySimpleSerDe.initSerdeParams(conf, schema, "testSerdeName");
    // Test
    LazyStruct data = (LazyStruct) serializeAndDeserialize(row, rowOI, serDe, serdeParams);
    assertEquals((boolean) row.myBool, ((LazyBoolean) data.getField(0)).getWritableObject().get());
    assertEquals((int) row.myInt, ((LazyInteger) data.getField(3)).getWritableObject().get());
}
Also used : Configuration(org.apache.hadoop.conf.Configuration) Random(java.util.Random) MyTestClass(org.apache.hadoop.hive.serde2.binarysortable.MyTestClass) ExtraTypeInfo(org.apache.hadoop.hive.serde2.binarysortable.MyTestPrimitiveClass.ExtraTypeInfo) SerDeParameters(org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.SerDeParameters) Properties(java.util.Properties) StructObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector) Test(org.junit.Test)

Example 12 with LazySimpleSerDe

use of org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe in project hive by apache.

the class TestLazySimpleSerDe method testLazySimpleSerDeExtraColumns.

/**
 * Test the LazySimpleSerDe class with extra columns.
 */
public void testLazySimpleSerDeExtraColumns() throws Throwable {
    try {
        // Create the SerDe
        LazySimpleSerDe serDe = new LazySimpleSerDe();
        Configuration conf = new Configuration();
        Properties tbl = createProperties();
        SerDeUtils.initializeSerDe(serDe, conf, tbl, null);
        // Data
        Text t = new Text("123\t456\t789\t1000\t5.3\thive and hadoop\t1.\ta\tb\t");
        String s = "123\t456\t789\t1000\t5.3\thive and hadoop\t1\ta";
        Object[] expectedFieldsData = { new ByteWritable((byte) 123), new ShortWritable((short) 456), new IntWritable(789), new LongWritable(1000), new DoubleWritable(5.3), new Text("hive and hadoop"), new IntWritable(1), new Text("a") };
        // Test
        deserializeAndSerialize(serDe, t, s, expectedFieldsData);
    } catch (Throwable e) {
        e.printStackTrace();
        throw e;
    }
}
Also used : Configuration(org.apache.hadoop.conf.Configuration) Text(org.apache.hadoop.io.Text) DoubleWritable(org.apache.hadoop.hive.serde2.io.DoubleWritable) Properties(java.util.Properties) ShortWritable(org.apache.hadoop.hive.serde2.io.ShortWritable) LongWritable(org.apache.hadoop.io.LongWritable) ByteWritable(org.apache.hadoop.hive.serde2.io.ByteWritable) IntWritable(org.apache.hadoop.io.IntWritable)

Example 13 with LazySimpleSerDe

use of org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe in project hive by apache.

the class TestCrossMapEqualComparer method serializeAndDeserialize.

Object serializeAndDeserialize(TextStringMapHolder o1, StructObjectInspector oi1, LazySimpleSerDe serde, LazySerDeParameters serdeParams) throws IOException, SerDeException {
    ByteStream.Output serializeStream = new ByteStream.Output();
    LazySimpleSerDe.serialize(serializeStream, o1, oi1, serdeParams.getSeparators(), 0, serdeParams.getNullSequence(), serdeParams.isEscaped(), serdeParams.getEscapeChar(), serdeParams.getNeedsEscape());
    Text t = new Text(serializeStream.toByteArray());
    return serde.deserialize(t);
}
Also used : ByteStream(org.apache.hadoop.hive.serde2.ByteStream) Text(org.apache.hadoop.io.Text)

Example 14 with LazySimpleSerDe

use of org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe in project hive by apache.

the class TestSimpleMapEqualComparer method serializeAndDeserialize.

Object serializeAndDeserialize(TextStringMapHolder o1, StructObjectInspector oi1, LazySimpleSerDe serde, LazySerDeParameters serdeParams) throws IOException, SerDeException {
    ByteStream.Output serializeStream = new ByteStream.Output();
    LazySimpleSerDe.serialize(serializeStream, o1, oi1, serdeParams.getSeparators(), 0, serdeParams.getNullSequence(), serdeParams.isEscaped(), serdeParams.getEscapeChar(), serdeParams.getNeedsEscape());
    Text t = new Text(serializeStream.toByteArray());
    return serde.deserialize(t);
}
Also used : ByteStream(org.apache.hadoop.hive.serde2.ByteStream) Text(org.apache.hadoop.io.Text)

Example 15 with LazySimpleSerDe

use of org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe in project hive by apache.

the class HCatBaseStorer method getJavaObj.

/**
 * Convert from Pig value object to Hive value object
 * This method assumes that {@link #validateSchema(org.apache.pig.impl.logicalLayer.schema.Schema.FieldSchema, org.apache.hive.hcatalog.data.schema.HCatFieldSchema, org.apache.pig.impl.logicalLayer.schema.Schema, org.apache.hive.hcatalog.data.schema.HCatSchema, int)}
 * which checks the types in Pig schema are compatible with target Hive table, has been called.
 */
private Object getJavaObj(Object pigObj, HCatFieldSchema hcatFS) throws HCatException, BackendException {
    try {
        if (pigObj == null)
            return null;
        // The real work-horse. Spend time and energy in this method if there is
        // need to keep HCatStorer lean and go fast.
        Type type = hcatFS.getType();
        switch(type) {
            case BINARY:
                return ((DataByteArray) pigObj).get();
            case STRUCT:
                HCatSchema structSubSchema = hcatFS.getStructSubSchema();
                // Unwrap the tuple.
                List<Object> all = ((Tuple) pigObj).getAll();
                ArrayList<Object> converted = new ArrayList<Object>(all.size());
                for (int i = 0; i < all.size(); i++) {
                    converted.add(getJavaObj(all.get(i), structSubSchema.get(i)));
                }
                return converted;
            case ARRAY:
                // Unwrap the bag.
                DataBag pigBag = (DataBag) pigObj;
                HCatFieldSchema tupFS = hcatFS.getArrayElementSchema().get(0);
                boolean needTuple = tupFS.getType() == Type.STRUCT;
                List<Object> bagContents = new ArrayList<Object>((int) pigBag.size());
                Iterator<Tuple> bagItr = pigBag.iterator();
                while (bagItr.hasNext()) {
                    // If there is only one element in tuple contained in bag, we throw away the tuple.
                    bagContents.add(getJavaObj(needTuple ? bagItr.next() : bagItr.next().get(0), tupFS));
                }
                return bagContents;
            case MAP:
                Map<?, ?> pigMap = (Map<?, ?>) pigObj;
                Map<Object, Object> typeMap = new HashMap<Object, Object>();
                for (Entry<?, ?> entry : pigMap.entrySet()) {
                    // the value has a schema and not a FieldSchema
                    typeMap.put(// Schema validation enforces that the Key is a String
                    (String) entry.getKey(), getJavaObj(entry.getValue(), hcatFS.getMapValueSchema().get(0)));
                }
                return typeMap;
            case STRING:
            case INT:
            case BIGINT:
            case FLOAT:
            case DOUBLE:
                return pigObj;
            case SMALLINT:
                if ((Integer) pigObj < Short.MIN_VALUE || (Integer) pigObj > Short.MAX_VALUE) {
                    handleOutOfRangeValue(pigObj, hcatFS);
                    return null;
                }
                return ((Integer) pigObj).shortValue();
            case TINYINT:
                if ((Integer) pigObj < Byte.MIN_VALUE || (Integer) pigObj > Byte.MAX_VALUE) {
                    handleOutOfRangeValue(pigObj, hcatFS);
                    return null;
                }
                return ((Integer) pigObj).byteValue();
            case BOOLEAN:
                if (pigObj instanceof String) {
                    if (((String) pigObj).trim().compareTo("0") == 0) {
                        return Boolean.FALSE;
                    }
                    if (((String) pigObj).trim().compareTo("1") == 0) {
                        return Boolean.TRUE;
                    }
                    throw new BackendException("Unexpected type " + type + " for value " + pigObj + " of class " + pigObj.getClass().getName(), PigHCatUtil.PIG_EXCEPTION_CODE);
                }
                return Boolean.parseBoolean(pigObj.toString());
            case DECIMAL:
                BigDecimal bd = (BigDecimal) pigObj;
                DecimalTypeInfo dti = (DecimalTypeInfo) hcatFS.getTypeInfo();
                if (bd.precision() > dti.precision() || bd.scale() > dti.scale()) {
                    handleOutOfRangeValue(pigObj, hcatFS);
                    return null;
                }
                return HiveDecimal.create(bd);
            case CHAR:
                String charVal = (String) pigObj;
                CharTypeInfo cti = (CharTypeInfo) hcatFS.getTypeInfo();
                if (charVal.length() > cti.getLength()) {
                    handleOutOfRangeValue(pigObj, hcatFS);
                    return null;
                }
                return new HiveChar(charVal, cti.getLength());
            case VARCHAR:
                String varcharVal = (String) pigObj;
                VarcharTypeInfo vti = (VarcharTypeInfo) hcatFS.getTypeInfo();
                if (varcharVal.length() > vti.getLength()) {
                    handleOutOfRangeValue(pigObj, hcatFS);
                    return null;
                }
                return new HiveVarchar(varcharVal, vti.getLength());
            case TIMESTAMP:
                DateTime dt = (DateTime) pigObj;
                // getMillis() returns UTC time regardless of TZ
                return new Timestamp(dt.getMillis());
            case DATE:
                /**
                 * We ignore any TZ setting on Pig value since java.sql.Date doesn't have it (in any
                 * meaningful way).  So the assumption is that if Pig value has 0 time component (midnight)
                 * we assume it reasonably 'fits' into a Hive DATE.  If time part is not 0, it's considered
                 * out of range for target type.
                 */
                DateTime dateTime = ((DateTime) pigObj);
                if (dateTime.getMillisOfDay() != 0) {
                    handleOutOfRangeValue(pigObj, hcatFS, "Time component must be 0 (midnight) in local timezone; Local TZ val='" + pigObj + "'");
                    return null;
                }
                /*java.sql.Date is a poorly defined API.  Some (all?) SerDes call toString() on it
        [e.g. LazySimpleSerDe, uses LazyUtils.writePrimitiveUTF8()],  which automatically adjusts
          for local timezone.  Date.valueOf() also uses local timezone (as does Date(int,int,int).
          Also see PigHCatUtil#extractPigObject() for corresponding read op.  This way a DATETIME from Pig,
          when stored into Hive and read back comes back with the same value.*/
                return new Date(dateTime.getYear() - 1900, dateTime.getMonthOfYear() - 1, dateTime.getDayOfMonth());
            default:
                throw new BackendException("Unexpected HCat type " + type + " for value " + pigObj + " of class " + pigObj.getClass().getName(), PigHCatUtil.PIG_EXCEPTION_CODE);
        }
    } catch (BackendException e) {
        // provide the path to the field in the error message
        throw new BackendException((hcatFS.getName() == null ? " " : hcatFS.getName() + ".") + e.getMessage(), e);
    }
}
Also used : VarcharTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.VarcharTypeInfo) HashMap(java.util.HashMap) ArrayList(java.util.ArrayList) HiveChar(org.apache.hadoop.hive.common.type.HiveChar) Timestamp(java.sql.Timestamp) DateTime(org.joda.time.DateTime) HCatSchema(org.apache.hive.hcatalog.data.schema.HCatSchema) DataByteArray(org.apache.pig.data.DataByteArray) DataBag(org.apache.pig.data.DataBag) CharTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.CharTypeInfo) HiveVarchar(org.apache.hadoop.hive.common.type.HiveVarchar) BigDecimal(java.math.BigDecimal) Date(java.sql.Date) HCatFieldSchema(org.apache.hive.hcatalog.data.schema.HCatFieldSchema) BackendException(org.apache.pig.backend.BackendException) DecimalTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo) DataType(org.apache.pig.data.DataType) Type(org.apache.hive.hcatalog.data.schema.HCatFieldSchema.Type) Map(java.util.Map) HashMap(java.util.HashMap) Tuple(org.apache.pig.data.Tuple)

Aggregations

Text (org.apache.hadoop.io.Text)24 Properties (java.util.Properties)17 Configuration (org.apache.hadoop.conf.Configuration)14 LazySimpleSerDe (org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe)14 StructObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector)8 ByteStream (org.apache.hadoop.hive.serde2.ByteStream)7 LazySerDeParameters (org.apache.hadoop.hive.serde2.lazy.LazySerDeParameters)6 ByteWritable (org.apache.hadoop.hive.serde2.io.ByteWritable)5 IntWritable (org.apache.hadoop.io.IntWritable)5 DoubleWritable (org.apache.hadoop.hive.serde2.io.DoubleWritable)4 ShortWritable (org.apache.hadoop.hive.serde2.io.ShortWritable)4 LongWritable (org.apache.hadoop.io.LongWritable)4 Path (org.apache.hadoop.fs.Path)3 ObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector)3 Test (org.junit.Test)3 ArrayList (java.util.ArrayList)2 Map (java.util.Map)2 Entry (java.util.Map.Entry)2 FieldSchema (org.apache.hadoop.hive.metastore.api.FieldSchema)2 SerDeException (org.apache.hadoop.hive.serde2.SerDeException)2