Search in sources :

Example 1 with InspectableObject

use of org.apache.hadoop.hive.serde2.objectinspector.InspectableObject in project SQLWindowing by hbutani.

the class QueryOutputPrinter method printQueryOutput.

@SuppressWarnings({ "unchecked", "rawtypes" })
public void printQueryOutput(QueryDef qry, HiveConf cfg) throws WindowingException {
    try {
        JobConf jCfg = new JobConf(cfg);
        SerDe outSerDe = setupOutputSerDe(qry, jCfg);
        RowSchema rSchema = getQueryOutputRowSchema(qry, jCfg);
        TableDesc tDesc = setupTableDesc(rSchema);
        tDesc.setDeserializerClass(qry.getOutput().getSerDe().getClass());
        String outputFormatClassName = qry.getOutput().getSpec().getOutputFormatClass();
        Class<? extends OutputFormat> outputFormatClass = (outputFormatClassName != null) ? (Class<? extends OutputFormat>) Class.forName(outputFormatClassName) : SequenceFileOutputFormat.class;
        // todo this is hack; check how this is done in Hive
        tDesc.setInputFileFormatClass(mapToInputFormat(outputFormatClass));
        tDesc.setProperties(qry.getOutput().getSpec().getSerDeProps());
        FetchOperator ftOp = setupFetchOperator(qry, tDesc, jCfg);
        while (true) {
            InspectableObject io = ftOp.getNextRow();
            if (io == null) {
                return;
            }
            String s = ((Text) outSerDe.serialize(io.o, io.oi)).toString();
            printOutput(s);
        }
    } catch (WindowingException we) {
        throw we;
    } catch (Exception e) {
        throw new WindowingException(e);
    }
}
Also used : SerDe(org.apache.hadoop.hive.serde2.SerDe) DelimitedJSONSerDe(org.apache.hadoop.hive.serde2.DelimitedJSONSerDe) InspectableObject(org.apache.hadoop.hive.serde2.objectinspector.InspectableObject) RowSchema(org.apache.hadoop.hive.ql.exec.RowSchema) WindowingException(com.sap.hadoop.windowing.WindowingException) Text(org.apache.hadoop.io.Text) TableDesc(org.apache.hadoop.hive.ql.plan.TableDesc) JobConf(org.apache.hadoop.mapred.JobConf) FetchOperator(org.apache.hadoop.hive.ql.exec.FetchOperator) WindowingException(com.sap.hadoop.windowing.WindowingException)

Example 2 with InspectableObject

use of org.apache.hadoop.hive.serde2.objectinspector.InspectableObject in project hive by apache.

the class MapredLocalTask method startForward.

private void startForward(boolean inputFileChangeSenstive, String bigTableBucket) throws Exception {
    for (Operator<?> source : work.getAliasToWork().values()) {
        source.reset();
    }
    if (inputFileChangeSenstive) {
        execContext.setCurrentBigBucketFile(bigTableBucket);
    }
    for (Map.Entry<String, FetchOperator> entry : fetchOperators.entrySet()) {
        String alias = entry.getKey();
        FetchOperator fetchOp = entry.getValue();
        if (inputFileChangeSenstive) {
            fetchOp.clearFetchContext();
            setUpFetchOpContext(fetchOp, alias, bigTableBucket);
        }
        // get the root operator
        Operator<? extends OperatorDesc> forwardOp = work.getAliasToWork().get(alias);
        // walk through the operator tree
        while (!forwardOp.getDone()) {
            InspectableObject row = fetchOp.getNextRow();
            if (row == null) {
                break;
            }
            forwardOp.process(row.o, 0);
        }
        forwardOp.flush();
    }
    for (Operator<?> source : work.getAliasToWork().values()) {
        source.close(false);
    }
}
Also used : InspectableObject(org.apache.hadoop.hive.serde2.objectinspector.InspectableObject) Map(java.util.Map) HashMap(java.util.HashMap) FetchOperator(org.apache.hadoop.hive.ql.exec.FetchOperator)

Example 3 with InspectableObject

use of org.apache.hadoop.hive.serde2.objectinspector.InspectableObject in project hive by apache.

the class ColStatsProcessor method constructColumnStatsFromPackedRows.

private boolean constructColumnStatsFromPackedRows(Table tbl, List<ColumnStatistics> stats, long maxNumStats) throws HiveException, MetaException, IOException {
    String partName = null;
    List<String> colName = colStatDesc.getColName();
    List<String> colType = colStatDesc.getColType();
    boolean isTblLevel = colStatDesc.isTblLevel();
    InspectableObject packedRow;
    long numStats = 0;
    while ((packedRow = ftOp.getNextRow()) != null) {
        if (packedRow.oi.getCategory() != ObjectInspector.Category.STRUCT) {
            throw new HiveException("Unexpected object type encountered while unpacking row");
        }
        final List<ColumnStatisticsObj> statsObjs = new ArrayList<>();
        final StructObjectInspector soi = (StructObjectInspector) packedRow.oi;
        final List<? extends StructField> fields = soi.getAllStructFieldRefs();
        final List<Object> values = soi.getStructFieldsDataAsList(packedRow.o);
        // Partition columns are appended at end, we only care about stats column
        int pos = 0;
        for (int i = 0; i < colName.size(); i++) {
            String columnName = colName.get(i);
            String columnType = colType.get(i);
            PrimitiveTypeInfo typeInfo = (PrimitiveTypeInfo) TypeInfoUtils.getTypeInfoFromTypeString(columnType);
            List<ColumnStatsField> columnStatsFields = ColumnStatsType.getColumnStats(typeInfo);
            try {
                ColumnStatisticsObj statObj = ColumnStatisticsObjTranslator.readHiveColumnStatistics(columnName, columnType, columnStatsFields, pos, fields, values);
                statsObjs.add(statObj);
                numStats++;
            } catch (Exception e) {
                if (isStatsReliable) {
                    throw new HiveException("Statistics collection failed while (hive.stats.reliable)", e);
                } else {
                    LOG.debug("Because {} is infinite or NaN, we skip stats.", columnName, e);
                }
            }
            pos += columnStatsFields.size();
        }
        if (!statsObjs.isEmpty()) {
            if (!isTblLevel) {
                List<FieldSchema> partColSchema = tbl.getPartCols();
                List<String> partVals = new ArrayList<>();
                // Iterate over partition columns to figure out partition name
                for (int i = pos; i < pos + partColSchema.size(); i++) {
                    Object partVal = ((PrimitiveObjectInspector) fields.get(i).getFieldObjectInspector()).getPrimitiveJavaObject(values.get(i));
                    partVals.add(// could be null for default partition
                    partVal == null ? this.conf.getVar(ConfVars.DEFAULTPARTITIONNAME) : partVal.toString());
                }
                partName = Warehouse.makePartName(partColSchema, partVals);
            }
            ColumnStatisticsDesc statsDesc = buildColumnStatsDesc(tbl, partName, isTblLevel);
            ColumnStatistics colStats = new ColumnStatistics();
            colStats.setStatsDesc(statsDesc);
            colStats.setStatsObj(statsObjs);
            colStats.setEngine(Constants.HIVE_ENGINE);
            stats.add(colStats);
            if (numStats >= maxNumStats) {
                return false;
            }
        }
    }
    ftOp.clearFetchContext();
    return true;
}
Also used : ColumnStatistics(org.apache.hadoop.hive.metastore.api.ColumnStatistics) HiveException(org.apache.hadoop.hive.ql.metadata.HiveException) FieldSchema(org.apache.hadoop.hive.metastore.api.FieldSchema) ArrayList(java.util.ArrayList) PrimitiveTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo) MetaException(org.apache.hadoop.hive.metastore.api.MetaException) SemanticException(org.apache.hadoop.hive.ql.parse.SemanticException) IOException(java.io.IOException) HiveException(org.apache.hadoop.hive.ql.metadata.HiveException) InspectableObject(org.apache.hadoop.hive.serde2.objectinspector.InspectableObject) ColumnStatisticsObj(org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj) ColumnStatisticsDesc(org.apache.hadoop.hive.metastore.api.ColumnStatisticsDesc) InspectableObject(org.apache.hadoop.hive.serde2.objectinspector.InspectableObject) PrimitiveObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector) StructObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector)

Example 4 with InspectableObject

use of org.apache.hadoop.hive.serde2.objectinspector.InspectableObject in project hive by apache.

the class TestOperators method testMapOperator.

@Test
public void testMapOperator() throws Throwable {
    try {
        System.out.println("Testing Map Operator");
        // initialize configuration
        JobConf hconf = new JobConf(TestOperators.class);
        hconf.set(MRJobConfig.MAP_INPUT_FILE, "hdfs:///testDir/testFile");
        IOContextMap.get(hconf).setInputPath(new Path("hdfs:///testDir/testFile"));
        // initialize pathToAliases
        List<String> aliases = new ArrayList<String>();
        aliases.add("a");
        aliases.add("b");
        Map<Path, List<String>> pathToAliases = new LinkedHashMap<>();
        pathToAliases.put(new Path("hdfs:///testDir"), aliases);
        // initialize pathToTableInfo
        // Default: treat the table as a single column "col"
        TableDesc td = Utilities.defaultTd;
        PartitionDesc pd = new PartitionDesc(td, null);
        LinkedHashMap<Path, org.apache.hadoop.hive.ql.plan.PartitionDesc> pathToPartitionInfo = new LinkedHashMap<>();
        pathToPartitionInfo.put(new Path("hdfs:///testDir"), pd);
        // initialize aliasToWork
        CompilationOpContext ctx = new CompilationOpContext();
        CollectDesc cd = new CollectDesc(Integer.valueOf(1));
        CollectOperator cdop1 = (CollectOperator) OperatorFactory.get(ctx, CollectDesc.class);
        cdop1.setConf(cd);
        CollectOperator cdop2 = (CollectOperator) OperatorFactory.get(ctx, CollectDesc.class);
        cdop2.setConf(cd);
        LinkedHashMap<String, Operator<? extends OperatorDesc>> aliasToWork = new LinkedHashMap<String, Operator<? extends OperatorDesc>>();
        aliasToWork.put("a", cdop1);
        aliasToWork.put("b", cdop2);
        // initialize mapredWork
        MapredWork mrwork = new MapredWork();
        mrwork.getMapWork().setPathToAliases(pathToAliases);
        mrwork.getMapWork().setPathToPartitionInfo(pathToPartitionInfo);
        mrwork.getMapWork().setAliasToWork(aliasToWork);
        // get map operator and initialize it
        MapOperator mo = new MapOperator(new CompilationOpContext());
        mo.initializeAsRoot(hconf, mrwork.getMapWork());
        Text tw = new Text();
        InspectableObject io1 = new InspectableObject();
        InspectableObject io2 = new InspectableObject();
        for (int i = 0; i < 5; i++) {
            String answer = "[[" + i + ", " + (i + 1) + ", " + (i + 2) + "]]";
            tw.set("" + i + "\u0001" + (i + 1) + "\u0001" + (i + 2));
            mo.process(tw);
            cdop1.retrieve(io1);
            cdop2.retrieve(io2);
            System.out.println("io1.o.toString() = " + io1.o.toString());
            System.out.println("io2.o.toString() = " + io2.o.toString());
            System.out.println("answer.toString() = " + answer.toString());
            assertEquals(answer.toString(), io1.o.toString());
            assertEquals(answer.toString(), io2.o.toString());
        }
        System.out.println("Map Operator ok");
    } catch (Throwable e) {
        e.printStackTrace();
        throw (e);
    }
}
Also used : VectorSelectOperator(org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator) ArrayList(java.util.ArrayList) LinkedHashMap(java.util.LinkedHashMap) InspectableObject(org.apache.hadoop.hive.serde2.objectinspector.InspectableObject) MapredWork(org.apache.hadoop.hive.ql.plan.MapredWork) List(java.util.List) ArrayList(java.util.ArrayList) JobConf(org.apache.hadoop.mapred.JobConf) Path(org.apache.hadoop.fs.Path) CollectDesc(org.apache.hadoop.hive.ql.plan.CollectDesc) Text(org.apache.hadoop.io.Text) CompilationOpContext(org.apache.hadoop.hive.ql.CompilationOpContext) PartitionDesc(org.apache.hadoop.hive.ql.plan.PartitionDesc) TableDesc(org.apache.hadoop.hive.ql.plan.TableDesc) OperatorDesc(org.apache.hadoop.hive.ql.plan.OperatorDesc) Test(org.junit.Test)

Example 5 with InspectableObject

use of org.apache.hadoop.hive.serde2.objectinspector.InspectableObject in project hive by apache.

the class TestOperators method testScriptOperator.

@Test
public void testScriptOperator() throws Throwable {
    try {
        System.out.println("Testing Script Operator");
        // col1
        ExprNodeDesc exprDesc1 = new ExprNodeColumnDesc(TypeInfoFactory.stringTypeInfo, "col1", "", false);
        // col2
        ExprNodeDesc expr1 = new ExprNodeColumnDesc(TypeInfoFactory.stringTypeInfo, "col0", "", false);
        ExprNodeDesc expr2 = new ExprNodeConstantDesc("1");
        ExprNodeDesc exprDesc2 = ExprNodeTypeCheck.getExprNodeDefaultExprProcessor().getFuncExprNodeDesc("concat", expr1, expr2);
        // select operator to project these two columns
        ArrayList<ExprNodeDesc> earr = new ArrayList<ExprNodeDesc>();
        earr.add(exprDesc1);
        earr.add(exprDesc2);
        ArrayList<String> outputCols = new ArrayList<String>();
        for (int i = 0; i < earr.size(); i++) {
            outputCols.add("_col" + i);
        }
        SelectDesc selectCtx = new SelectDesc(earr, outputCols);
        Operator<SelectDesc> op = OperatorFactory.get(new CompilationOpContext(), SelectDesc.class);
        op.setConf(selectCtx);
        // scriptOperator to echo the output of the select
        TableDesc scriptOutput = PlanUtils.getDefaultTableDesc("" + Utilities.tabCode, "a,b");
        TableDesc scriptInput = PlanUtils.getDefaultTableDesc("" + Utilities.tabCode, "a,b");
        ScriptDesc sd = new ScriptDesc("cat", scriptOutput, TextRecordWriter.class, scriptInput, TextRecordReader.class, TextRecordReader.class, PlanUtils.getDefaultTableDesc("" + Utilities.tabCode, "key"));
        Operator<ScriptDesc> sop = OperatorFactory.getAndMakeChild(sd, op);
        // Collect operator to observe the output of the script
        CollectDesc cd = new CollectDesc(Integer.valueOf(10));
        CollectOperator cdop = (CollectOperator) OperatorFactory.getAndMakeChild(cd, sop);
        op.initialize(new JobConf(TestOperators.class), new ObjectInspector[] { r[0].oi });
        // evaluate on row
        for (int i = 0; i < 5; i++) {
            op.process(r[i].o, 0);
        }
        op.close(false);
        InspectableObject io = new InspectableObject();
        for (int i = 0; i < 5; i++) {
            cdop.retrieve(io);
            System.out.println("[" + i + "] io.o=" + io.o);
            System.out.println("[" + i + "] io.oi=" + io.oi);
            StructObjectInspector soi = (StructObjectInspector) io.oi;
            assert (soi != null);
            StructField a = soi.getStructFieldRef("a");
            StructField b = soi.getStructFieldRef("b");
            assertEquals("" + (i + 1), ((PrimitiveObjectInspector) a.getFieldObjectInspector()).getPrimitiveJavaObject(soi.getStructFieldData(io.o, a)));
            assertEquals((i) + "1", ((PrimitiveObjectInspector) b.getFieldObjectInspector()).getPrimitiveJavaObject(soi.getStructFieldData(io.o, b)));
        }
        System.out.println("Script Operator ok");
    } catch (Throwable e) {
        e.printStackTrace();
        throw e;
    }
}
Also used : ScriptDesc(org.apache.hadoop.hive.ql.plan.ScriptDesc) ExprNodeConstantDesc(org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc) CollectDesc(org.apache.hadoop.hive.ql.plan.CollectDesc) ArrayList(java.util.ArrayList) InspectableObject(org.apache.hadoop.hive.serde2.objectinspector.InspectableObject) StructField(org.apache.hadoop.hive.serde2.objectinspector.StructField) CompilationOpContext(org.apache.hadoop.hive.ql.CompilationOpContext) ExprNodeColumnDesc(org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc) ExprNodeDesc(org.apache.hadoop.hive.ql.plan.ExprNodeDesc) SelectDesc(org.apache.hadoop.hive.ql.plan.SelectDesc) VectorSelectDesc(org.apache.hadoop.hive.ql.plan.VectorSelectDesc) TableDesc(org.apache.hadoop.hive.ql.plan.TableDesc) JobConf(org.apache.hadoop.mapred.JobConf) StructObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector) Test(org.junit.Test)

Aggregations

InspectableObject (org.apache.hadoop.hive.serde2.objectinspector.InspectableObject)19 ArrayList (java.util.ArrayList)9 StructObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector)7 ObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector)5 CompilationOpContext (org.apache.hadoop.hive.ql.CompilationOpContext)4 HiveException (org.apache.hadoop.hive.ql.metadata.HiveException)4 CollectDesc (org.apache.hadoop.hive.ql.plan.CollectDesc)4 ExprNodeDesc (org.apache.hadoop.hive.ql.plan.ExprNodeDesc)4 PrimitiveObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector)4 JobConf (org.apache.hadoop.mapred.JobConf)4 Test (org.junit.Test)4 IOException (java.io.IOException)3 ColumnStatistics (org.apache.hadoop.hive.metastore.api.ColumnStatistics)3 ColumnStatisticsDesc (org.apache.hadoop.hive.metastore.api.ColumnStatisticsDesc)3 ColumnStatisticsObj (org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj)3 FieldSchema (org.apache.hadoop.hive.metastore.api.FieldSchema)3 ExprNodeColumnDesc (org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc)3 TableDesc (org.apache.hadoop.hive.ql.plan.TableDesc)3 StructField (org.apache.hadoop.hive.serde2.objectinspector.StructField)3 Text (org.apache.hadoop.io.Text)3