Search in sources :

Example 11 with WriteEntity

use of org.apache.hadoop.hive.ql.hooks.WriteEntity in project incubator-atlas by apache.

the class HiveHook method renameTable.

private void renameTable(HiveMetaStoreBridge dgiBridge, HiveEventContext event) throws AtlasHookException {
    try {
        //crappy, no easy of getting new name
        assert event.getInputs() != null && event.getInputs().size() == 1;
        assert event.getOutputs() != null && event.getOutputs().size() > 0;
        //Update entity if not exists
        ReadEntity oldEntity = event.getInputs().iterator().next();
        Table oldTable = oldEntity.getTable();
        for (WriteEntity writeEntity : event.getOutputs()) {
            if (writeEntity.getType() == Entity.Type.TABLE) {
                Table newTable = writeEntity.getTable();
                //Hive sends with both old and new table names in the outputs which is weird. So skipping that with the below check
                if (!newTable.getDbName().equals(oldTable.getDbName()) || !newTable.getTableName().equals(oldTable.getTableName())) {
                    final String oldQualifiedName = HiveMetaStoreBridge.getTableQualifiedName(dgiBridge.getClusterName(), oldTable);
                    final String newQualifiedName = HiveMetaStoreBridge.getTableQualifiedName(dgiBridge.getClusterName(), newTable);
                    //Create/update old table entity - create entity with oldQFNme and old tableName if it doesnt exist. If exists, will update
                    //We always use the new entity while creating the table since some flags, attributes of the table are not set in inputEntity and Hive.getTable(oldTableName) also fails since the table doesnt exist in hive anymore
                    final LinkedHashMap<Type, Referenceable> tables = createOrUpdateEntities(dgiBridge, event, writeEntity, true);
                    Referenceable tableEntity = tables.get(Type.TABLE);
                    //Reset regular column QF Name to old Name and create a new partial notification request to replace old column QFName to newName to retain any existing traits
                    replaceColumnQFName(event, (List<Referenceable>) tableEntity.get(HiveMetaStoreBridge.COLUMNS), oldQualifiedName, newQualifiedName);
                    //Reset partition key column QF Name to old Name and create a new partial notification request to replace old column QFName to newName to retain any existing traits
                    replaceColumnQFName(event, (List<Referenceable>) tableEntity.get(HiveMetaStoreBridge.PART_COLS), oldQualifiedName, newQualifiedName);
                    //Reset SD QF Name to old Name and create a new partial notification request to replace old SD QFName to newName to retain any existing traits
                    replaceSDQFName(event, tableEntity, oldQualifiedName, newQualifiedName);
                    //Reset Table QF Name to old Name and create a new partial notification request to replace old Table QFName to newName
                    replaceTableQFName(event, oldTable, newTable, tableEntity, oldQualifiedName, newQualifiedName);
                }
            }
        }
    } catch (Exception e) {
        throw new AtlasHookException("HiveHook.renameTable() failed.", e);
    }
}
Also used : ReadEntity(org.apache.hadoop.hive.ql.hooks.ReadEntity) Type(org.apache.hadoop.hive.ql.hooks.Entity.Type) TableType(org.apache.hadoop.hive.metastore.TableType) Table(org.apache.hadoop.hive.ql.metadata.Table) Referenceable(org.apache.atlas.typesystem.Referenceable) WriteEntity(org.apache.hadoop.hive.ql.hooks.WriteEntity) AtlasHookException(org.apache.atlas.hook.AtlasHookException) HiveException(org.apache.hadoop.hive.ql.metadata.HiveException) MalformedURLException(java.net.MalformedURLException) AtlasHookException(org.apache.atlas.hook.AtlasHookException)

Example 12 with WriteEntity

use of org.apache.hadoop.hive.ql.hooks.WriteEntity in project incubator-atlas by apache.

the class HiveHook method getProcessQualifiedName.

@VisibleForTesting
static String getProcessQualifiedName(HiveMetaStoreBridge dgiBridge, HiveEventContext eventContext, final SortedSet<ReadEntity> sortedHiveInputs, final SortedSet<WriteEntity> sortedHiveOutputs, SortedMap<ReadEntity, Referenceable> hiveInputsMap, SortedMap<WriteEntity, Referenceable> hiveOutputsMap) throws HiveException {
    HiveOperation op = eventContext.getOperation();
    if (isCreateOp(eventContext)) {
        Entity entity = getEntityByType(sortedHiveOutputs, Type.TABLE);
        if (entity != null) {
            Table outTable = entity.getTable();
            //refresh table
            outTable = dgiBridge.hiveClient.getTable(outTable.getDbName(), outTable.getTableName());
            return HiveMetaStoreBridge.getTableProcessQualifiedName(dgiBridge.getClusterName(), outTable);
        }
    }
    StringBuilder buffer = new StringBuilder(op.getOperationName());
    boolean ignoreHDFSPathsinQFName = ignoreHDFSPathsinQFName(op, sortedHiveInputs, sortedHiveOutputs);
    if (ignoreHDFSPathsinQFName && LOG.isDebugEnabled()) {
        LOG.debug("Ignoring HDFS paths in qualifiedName for {} {} ", op, eventContext.getQueryStr());
    }
    addInputs(dgiBridge, op, sortedHiveInputs, buffer, hiveInputsMap, ignoreHDFSPathsinQFName);
    buffer.append(IO_SEP);
    addOutputs(dgiBridge, op, sortedHiveOutputs, buffer, hiveOutputsMap, ignoreHDFSPathsinQFName);
    LOG.info("Setting process qualified name to {}", buffer);
    return buffer.toString();
}
Also used : HiveOperation(org.apache.hadoop.hive.ql.plan.HiveOperation) WriteEntity(org.apache.hadoop.hive.ql.hooks.WriteEntity) ReadEntity(org.apache.hadoop.hive.ql.hooks.ReadEntity) Entity(org.apache.hadoop.hive.ql.hooks.Entity) Table(org.apache.hadoop.hive.ql.metadata.Table) VisibleForTesting(com.google.common.annotations.VisibleForTesting)

Example 13 with WriteEntity

use of org.apache.hadoop.hive.ql.hooks.WriteEntity in project incubator-atlas by apache.

the class HiveHook method renameColumn.

private void renameColumn(HiveMetaStoreBridge dgiBridge, HiveEventContext event) throws AtlasHookException {
    try {
        assert event.getInputs() != null && event.getInputs().size() == 1;
        assert event.getOutputs() != null && event.getOutputs().size() > 0;
        Table oldTable = event.getInputs().iterator().next().getTable();
        List<FieldSchema> oldColList = oldTable.getAllCols();
        Table outputTbl = event.getOutputs().iterator().next().getTable();
        outputTbl = dgiBridge.hiveClient.getTable(outputTbl.getDbName(), outputTbl.getTableName());
        List<FieldSchema> newColList = outputTbl.getAllCols();
        assert oldColList.size() == newColList.size();
        Pair<String, String> changedColNamePair = findChangedColNames(oldColList, newColList);
        String oldColName = changedColNamePair.getLeft();
        String newColName = changedColNamePair.getRight();
        for (WriteEntity writeEntity : event.getOutputs()) {
            if (writeEntity.getType() == Type.TABLE) {
                Table newTable = writeEntity.getTable();
                createOrUpdateEntities(dgiBridge, event, writeEntity, true, oldTable);
                final String newQualifiedTableName = HiveMetaStoreBridge.getTableQualifiedName(dgiBridge.getClusterName(), newTable);
                String oldColumnQFName = HiveMetaStoreBridge.getColumnQualifiedName(newQualifiedTableName, oldColName);
                String newColumnQFName = HiveMetaStoreBridge.getColumnQualifiedName(newQualifiedTableName, newColName);
                Referenceable newColEntity = new Referenceable(HiveDataTypes.HIVE_COLUMN.getName());
                newColEntity.set(AtlasClient.REFERENCEABLE_ATTRIBUTE_NAME, newColumnQFName);
                event.addMessage(new HookNotification.EntityPartialUpdateRequest(event.getUser(), HiveDataTypes.HIVE_COLUMN.getName(), AtlasClient.REFERENCEABLE_ATTRIBUTE_NAME, oldColumnQFName, newColEntity));
            }
        }
        handleEventOutputs(dgiBridge, event, Type.TABLE);
    } catch (Exception e) {
        throw new AtlasHookException("HiveHook.renameColumn() failed.", e);
    }
}
Also used : Table(org.apache.hadoop.hive.ql.metadata.Table) HookNotification(org.apache.atlas.notification.hook.HookNotification) Referenceable(org.apache.atlas.typesystem.Referenceable) FieldSchema(org.apache.hadoop.hive.metastore.api.FieldSchema) WriteEntity(org.apache.hadoop.hive.ql.hooks.WriteEntity) AtlasHookException(org.apache.atlas.hook.AtlasHookException) HiveException(org.apache.hadoop.hive.ql.metadata.HiveException) MalformedURLException(java.net.MalformedURLException) AtlasHookException(org.apache.atlas.hook.AtlasHookException)

Example 14 with WriteEntity

use of org.apache.hadoop.hive.ql.hooks.WriteEntity in project hive by apache.

the class SemanticAnalyzer method genFileSinkPlan.

@SuppressWarnings("nls")
protected Operator genFileSinkPlan(String dest, QB qb, Operator input) throws SemanticException {
    RowResolver inputRR = opParseCtx.get(input).getRowResolver();
    QBMetaData qbm = qb.getMetaData();
    Integer dest_type = qbm.getDestTypeForAlias(dest);
    // destination table if any
    Table dest_tab = null;
    // true for full ACID table and MM table
    boolean destTableIsTransactional;
    // should the destination table be written to using ACID
    boolean destTableIsFullAcid;
    boolean destTableIsTemporary = false;
    boolean destTableIsMaterialization = false;
    // destination partition if any
    Partition dest_part = null;
    // the intermediate destination directory
    Path queryTmpdir = null;
    // the final destination directory
    Path dest_path = null;
    TableDesc table_desc = null;
    int currentTableId = 0;
    boolean isLocal = false;
    SortBucketRSCtx rsCtx = new SortBucketRSCtx();
    DynamicPartitionCtx dpCtx = null;
    LoadTableDesc ltd = null;
    ListBucketingCtx lbCtx = null;
    Map<String, String> partSpec = null;
    boolean isMmTable = false, isMmCtas = false;
    Long writeId = null;
    HiveTxnManager txnMgr = SessionState.get().getTxnMgr();
    switch(dest_type.intValue()) {
        case QBMetaData.DEST_TABLE:
            {
                dest_tab = qbm.getDestTableForAlias(dest);
                destTableIsTransactional = AcidUtils.isTransactionalTable(dest_tab);
                destTableIsFullAcid = AcidUtils.isFullAcidTable(dest_tab);
                destTableIsTemporary = dest_tab.isTemporary();
                // Is the user trying to insert into a external tables
                checkExternalTable(dest_tab);
                partSpec = qbm.getPartSpecForAlias(dest);
                dest_path = dest_tab.getPath();
                checkImmutableTable(qb, dest_tab, dest_path, false);
                // check for partition
                List<FieldSchema> parts = dest_tab.getPartitionKeys();
                if (parts != null && parts.size() > 0) {
                    // table is partitioned
                    if (partSpec == null || partSpec.size() == 0) {
                        // user did NOT specify partition
                        throw new SemanticException(generateErrorMessage(qb.getParseInfo().getDestForClause(dest), ErrorMsg.NEED_PARTITION_ERROR.getMsg()));
                    }
                    dpCtx = qbm.getDPCtx(dest);
                    if (dpCtx == null) {
                        dest_tab.validatePartColumnNames(partSpec, false);
                        dpCtx = new DynamicPartitionCtx(dest_tab, partSpec, conf.getVar(HiveConf.ConfVars.DEFAULTPARTITIONNAME), conf.getIntVar(HiveConf.ConfVars.DYNAMICPARTITIONMAXPARTSPERNODE));
                        qbm.setDPCtx(dest, dpCtx);
                    }
                }
                // Check for dynamic partitions.
                dpCtx = checkDynPart(qb, qbm, dest_tab, partSpec, dest);
                if (dpCtx != null && dpCtx.getSPPath() != null) {
                    dest_path = new Path(dest_tab.getPath(), dpCtx.getSPPath());
                }
                boolean isNonNativeTable = dest_tab.isNonNative();
                isMmTable = AcidUtils.isInsertOnlyTable(dest_tab.getParameters());
                if (isNonNativeTable || isMmTable) {
                    queryTmpdir = dest_path;
                } else {
                    queryTmpdir = ctx.getTempDirForFinalJobPath(dest_path);
                }
                if (Utilities.FILE_OP_LOGGER.isTraceEnabled()) {
                    Utilities.FILE_OP_LOGGER.trace("create filesink w/DEST_TABLE specifying " + queryTmpdir + " from " + dest_path);
                }
                if (dpCtx != null) {
                    // set the root of the temporary path where dynamic partition columns will populate
                    dpCtx.setRootPath(queryTmpdir);
                }
                // this table_desc does not contain the partitioning columns
                table_desc = Utilities.getTableDesc(dest_tab);
                // Add NOT NULL constraint check
                input = genConstraintsPlan(dest, qb, input);
                // Add sorting/bucketing if needed
                input = genBucketingSortingDest(dest, input, qb, table_desc, dest_tab, rsCtx);
                idToTableNameMap.put(String.valueOf(destTableId), dest_tab.getTableName());
                currentTableId = destTableId;
                destTableId++;
                lbCtx = constructListBucketingCtx(dest_tab.getSkewedColNames(), dest_tab.getSkewedColValues(), dest_tab.getSkewedColValueLocationMaps(), dest_tab.isStoredAsSubDirectories(), conf);
                // NOTE: specify Dynamic partitions in dest_tab for WriteEntity
                if (!isNonNativeTable) {
                    AcidUtils.Operation acidOp = AcidUtils.Operation.NOT_ACID;
                    if (destTableIsFullAcid) {
                        acidOp = getAcidType(table_desc.getOutputFileFormatClass(), dest);
                        // todo: should this be done for MM?  is it ok to use CombineHiveInputFormat with MM
                        checkAcidConstraints(qb, table_desc, dest_tab);
                    }
                    try {
                        if (ctx.getExplainConfig() != null) {
                            // For explain plan, txn won't be opened and doesn't make sense to allocate write id
                            writeId = 0L;
                        } else {
                            if (isMmTable) {
                                writeId = txnMgr.getTableWriteId(dest_tab.getDbName(), dest_tab.getTableName());
                            } else {
                                writeId = acidOp == Operation.NOT_ACID ? null : txnMgr.getTableWriteId(dest_tab.getDbName(), dest_tab.getTableName());
                            }
                        }
                    } catch (LockException ex) {
                        throw new SemanticException("Failed to allocate write Id", ex);
                    }
                    boolean isReplace = !qb.getParseInfo().isInsertIntoTable(dest_tab.getDbName(), dest_tab.getTableName());
                    ltd = new LoadTableDesc(queryTmpdir, table_desc, dpCtx, acidOp, isReplace, writeId);
                    // For Acid table, Insert Overwrite shouldn't replace the table content. We keep the old
                    // deltas and base and leave them up to the cleaner to clean up
                    boolean isInsertInto = qb.getParseInfo().isInsertIntoTable(dest_tab.getDbName(), dest_tab.getTableName());
                    LoadFileType loadType = (!isInsertInto && !destTableIsTransactional) ? LoadFileType.REPLACE_ALL : LoadFileType.KEEP_EXISTING;
                    ltd.setLoadFileType(loadType);
                    ltd.setInsertOverwrite(!isInsertInto);
                    ltd.setLbCtx(lbCtx);
                    loadTableWork.add(ltd);
                } else {
                    // This is a non-native table.
                    // We need to set stats as inaccurate.
                    setStatsForNonNativeTable(dest_tab);
                    // true if it is insert overwrite.
                    boolean overwrite = !qb.getParseInfo().isInsertIntoTable(String.format("%s.%s", dest_tab.getDbName(), dest_tab.getTableName()));
                    createInsertDesc(dest_tab, overwrite);
                }
                if (dest_tab.isMaterializedView()) {
                    materializedViewUpdateDesc = new MaterializedViewDesc(dest_tab.getFullyQualifiedName(), false, false, true);
                }
                WriteEntity output = generateTableWriteEntity(dest, dest_tab, partSpec, ltd, dpCtx, isNonNativeTable);
                ctx.getLoadTableOutputMap().put(ltd, output);
                break;
            }
        case QBMetaData.DEST_PARTITION:
            {
                dest_part = qbm.getDestPartitionForAlias(dest);
                dest_tab = dest_part.getTable();
                destTableIsTransactional = AcidUtils.isTransactionalTable(dest_tab);
                destTableIsFullAcid = AcidUtils.isFullAcidTable(dest_tab);
                checkExternalTable(dest_tab);
                Path tabPath = dest_tab.getPath();
                Path partPath = dest_part.getDataLocation();
                checkImmutableTable(qb, dest_tab, partPath, true);
                // if the table is in a different dfs than the partition,
                // replace the partition's dfs with the table's dfs.
                dest_path = new Path(tabPath.toUri().getScheme(), tabPath.toUri().getAuthority(), partPath.toUri().getPath());
                isMmTable = AcidUtils.isInsertOnlyTable(dest_tab.getParameters());
                queryTmpdir = isMmTable ? dest_path : ctx.getTempDirForFinalJobPath(dest_path);
                if (Utilities.FILE_OP_LOGGER.isTraceEnabled()) {
                    Utilities.FILE_OP_LOGGER.trace("create filesink w/DEST_PARTITION specifying " + queryTmpdir + " from " + dest_path);
                }
                table_desc = Utilities.getTableDesc(dest_tab);
                // Add NOT NULL constraint check
                input = genConstraintsPlan(dest, qb, input);
                // Add sorting/bucketing if needed
                input = genBucketingSortingDest(dest, input, qb, table_desc, dest_tab, rsCtx);
                idToTableNameMap.put(String.valueOf(destTableId), dest_tab.getTableName());
                currentTableId = destTableId;
                destTableId++;
                lbCtx = constructListBucketingCtx(dest_part.getSkewedColNames(), dest_part.getSkewedColValues(), dest_part.getSkewedColValueLocationMaps(), dest_part.isStoredAsSubDirectories(), conf);
                AcidUtils.Operation acidOp = AcidUtils.Operation.NOT_ACID;
                if (destTableIsFullAcid) {
                    acidOp = getAcidType(table_desc.getOutputFileFormatClass(), dest);
                    // todo: should this be done for MM?  is it ok to use CombineHiveInputFormat with MM?
                    checkAcidConstraints(qb, table_desc, dest_tab);
                }
                try {
                    if (ctx.getExplainConfig() != null) {
                        // For explain plan, txn won't be opened and doesn't make sense to allocate write id
                        writeId = 0L;
                    } else {
                        if (isMmTable) {
                            writeId = txnMgr.getTableWriteId(dest_tab.getDbName(), dest_tab.getTableName());
                        } else {
                            writeId = (acidOp == Operation.NOT_ACID) ? null : txnMgr.getTableWriteId(dest_tab.getDbName(), dest_tab.getTableName());
                        }
                    }
                } catch (LockException ex) {
                    throw new SemanticException("Failed to allocate write Id", ex);
                }
                ltd = new LoadTableDesc(queryTmpdir, table_desc, dest_part.getSpec(), acidOp, writeId);
                // For Acid table, Insert Overwrite shouldn't replace the table content. We keep the old
                // deltas and base and leave them up to the cleaner to clean up
                boolean isInsertInto = qb.getParseInfo().isInsertIntoTable(dest_tab.getDbName(), dest_tab.getTableName());
                LoadFileType loadType = (!isInsertInto && !destTableIsTransactional) ? LoadFileType.REPLACE_ALL : LoadFileType.KEEP_EXISTING;
                ltd.setLoadFileType(loadType);
                ltd.setInsertOverwrite(!isInsertInto);
                ltd.setLbCtx(lbCtx);
                loadTableWork.add(ltd);
                if (!outputs.add(new WriteEntity(dest_part, determineWriteType(ltd, dest_tab.isNonNative(), dest)))) {
                    throw new SemanticException(ErrorMsg.OUTPUT_SPECIFIED_MULTIPLE_TIMES.getMsg(dest_tab.getTableName() + "@" + dest_part.getName()));
                }
                break;
            }
        case QBMetaData.DEST_LOCAL_FILE:
            isLocal = true;
        // fall through
        case QBMetaData.DEST_DFS_FILE:
            {
                dest_path = new Path(qbm.getDestFileForAlias(dest));
                ArrayList<ColumnInfo> colInfos = inputRR.getColumnInfos();
                // CTAS case: the file output format and serde are defined by the create
                // table command rather than taking the default value
                List<FieldSchema> field_schemas = null;
                CreateTableDesc tblDesc = qb.getTableDesc();
                CreateViewDesc viewDesc = qb.getViewDesc();
                boolean isCtas = false;
                if (tblDesc != null) {
                    field_schemas = new ArrayList<FieldSchema>();
                    destTableIsTemporary = tblDesc.isTemporary();
                    destTableIsMaterialization = tblDesc.isMaterialization();
                    if (AcidUtils.isInsertOnlyTable(tblDesc.getTblProps(), true)) {
                        isMmTable = isMmCtas = true;
                        try {
                            if (ctx.getExplainConfig() != null) {
                                // For explain plan, txn won't be opened and doesn't make sense to allocate write id
                                writeId = 0L;
                            } else {
                                writeId = txnMgr.getTableWriteId(tblDesc.getDatabaseName(), tblDesc.getTableName());
                            }
                        } catch (LockException ex) {
                            throw new SemanticException("Failed to allocate write Id", ex);
                        }
                        tblDesc.setInitialMmWriteId(writeId);
                    }
                } else if (viewDesc != null) {
                    field_schemas = new ArrayList<FieldSchema>();
                    destTableIsTemporary = false;
                }
                if (isLocal) {
                    assert !isMmTable;
                    // for local directory - we always write to map-red intermediate
                    // store and then copy to local fs
                    queryTmpdir = ctx.getMRTmpPath();
                } else {
                    // no copy is required. we may want to revisit this policy in future
                    try {
                        Path qPath = FileUtils.makeQualified(dest_path, conf);
                        queryTmpdir = isMmTable ? qPath : ctx.getTempDirForFinalJobPath(qPath);
                        if (Utilities.FILE_OP_LOGGER.isTraceEnabled()) {
                            Utilities.FILE_OP_LOGGER.trace("Setting query directory " + queryTmpdir + " from " + dest_path + " (" + isMmTable + ")");
                        }
                    } catch (Exception e) {
                        throw new SemanticException("Error creating temporary folder on: " + dest_path, e);
                    }
                }
                ColsAndTypes ct = deriveFileSinkColTypes(inputRR, field_schemas);
                String cols = ct.cols, colTypes = ct.colTypes;
                // update the create table descriptor with the resulting schema.
                if (tblDesc != null) {
                    tblDesc.setCols(new ArrayList<FieldSchema>(field_schemas));
                } else if (viewDesc != null) {
                    viewDesc.setSchema(new ArrayList<FieldSchema>(field_schemas));
                }
                destTableIsTransactional = tblDesc != null && AcidUtils.isTransactionalTable(tblDesc);
                destTableIsFullAcid = tblDesc != null && AcidUtils.isFullAcidTable(tblDesc);
                boolean isDestTempFile = true;
                if (!ctx.isMRTmpFileURI(dest_path.toUri().toString())) {
                    idToTableNameMap.put(String.valueOf(destTableId), dest_path.toUri().toString());
                    currentTableId = destTableId;
                    destTableId++;
                    isDestTempFile = false;
                }
                boolean isDfsDir = (dest_type.intValue() == QBMetaData.DEST_DFS_FILE);
                // Create LFD even for MM CTAS - it's a no-op move, but it still seems to be used for stats.
                loadFileWork.add(new LoadFileDesc(tblDesc, viewDesc, queryTmpdir, dest_path, isDfsDir, cols, colTypes, // there is a change here - prev version had 'transadtional', one beofre' acid'
                destTableIsFullAcid ? Operation.INSERT : Operation.NOT_ACID, isMmCtas));
                if (tblDesc == null) {
                    if (viewDesc != null) {
                        table_desc = PlanUtils.getTableDesc(viewDesc, cols, colTypes);
                    } else if (qb.getIsQuery()) {
                        String fileFormat;
                        if (SessionState.get().getIsUsingThriftJDBCBinarySerDe()) {
                            fileFormat = "SequenceFile";
                            HiveConf.setVar(conf, HiveConf.ConfVars.HIVEQUERYRESULTFILEFORMAT, fileFormat);
                            table_desc = PlanUtils.getDefaultQueryOutputTableDesc(cols, colTypes, fileFormat, ThriftJDBCBinarySerDe.class);
                            // Set the fetch formatter to be a no-op for the ListSinkOperator, since we'll
                            // write out formatted thrift objects to SequenceFile
                            conf.set(SerDeUtils.LIST_SINK_OUTPUT_FORMATTER, NoOpFetchFormatter.class.getName());
                        } else {
                            fileFormat = HiveConf.getVar(conf, HiveConf.ConfVars.HIVEQUERYRESULTFILEFORMAT);
                            Class<? extends Deserializer> serdeClass = LazySimpleSerDe.class;
                            if (fileFormat.equals(PlanUtils.LLAP_OUTPUT_FORMAT_KEY)) {
                                serdeClass = LazyBinarySerDe2.class;
                            }
                            table_desc = PlanUtils.getDefaultQueryOutputTableDesc(cols, colTypes, fileFormat, serdeClass);
                        }
                    } else {
                        table_desc = PlanUtils.getDefaultTableDesc(qb.getDirectoryDesc(), cols, colTypes);
                    }
                } else {
                    table_desc = PlanUtils.getTableDesc(tblDesc, cols, colTypes);
                }
                if (!outputs.add(new WriteEntity(dest_path, !isDfsDir, isDestTempFile))) {
                    throw new SemanticException(ErrorMsg.OUTPUT_SPECIFIED_MULTIPLE_TIMES.getMsg(dest_path.toUri().toString()));
                }
                break;
            }
        default:
            throw new SemanticException("Unknown destination type: " + dest_type);
    }
    if (!(dest_type.intValue() == QBMetaData.DEST_DFS_FILE && qb.getIsQuery())) {
        input = genConversionSelectOperator(dest, qb, input, table_desc, dpCtx);
    }
    inputRR = opParseCtx.get(input).getRowResolver();
    ArrayList<ColumnInfo> vecCol = new ArrayList<ColumnInfo>();
    if (updating(dest) || deleting(dest)) {
        vecCol.add(new ColumnInfo(VirtualColumn.ROWID.getName(), VirtualColumn.ROWID.getTypeInfo(), "", true));
    } else {
        try {
            StructObjectInspector rowObjectInspector = (StructObjectInspector) table_desc.getDeserializer(conf).getObjectInspector();
            List<? extends StructField> fields = rowObjectInspector.getAllStructFieldRefs();
            for (int i = 0; i < fields.size(); i++) {
                vecCol.add(new ColumnInfo(fields.get(i).getFieldName(), TypeInfoUtils.getTypeInfoFromObjectInspector(fields.get(i).getFieldObjectInspector()), "", false));
            }
        } catch (Exception e) {
            throw new SemanticException(e.getMessage(), e);
        }
    }
    RowSchema fsRS = new RowSchema(vecCol);
    // The output files of a FileSink can be merged if they are either not being written to a table
    // or are being written to a table which is not bucketed
    // and table the table is not sorted
    boolean canBeMerged = (dest_tab == null || !((dest_tab.getNumBuckets() > 0) || (dest_tab.getSortCols() != null && dest_tab.getSortCols().size() > 0)));
    // If this table is working with ACID semantics, turn off merging
    canBeMerged &= !destTableIsFullAcid;
    // Generate the partition columns from the parent input
    if (dest_type.intValue() == QBMetaData.DEST_TABLE || dest_type.intValue() == QBMetaData.DEST_PARTITION) {
        genPartnCols(dest, input, qb, table_desc, dest_tab, rsCtx);
    }
    FileSinkDesc fileSinkDesc = createFileSinkDesc(dest, table_desc, dest_part, // this was 1/4 acid
    dest_path, // this was 1/4 acid
    currentTableId, // this was 1/4 acid
    destTableIsFullAcid, // this was 1/4 acid
    destTableIsTemporary, destTableIsMaterialization, queryTmpdir, rsCtx, dpCtx, lbCtx, fsRS, canBeMerged, dest_tab, writeId, isMmCtas, dest_type, qb);
    if (isMmCtas) {
        // Add FSD so that the LoadTask compilation could fix up its path to avoid the move.
        tableDesc.setWriter(fileSinkDesc);
    }
    if (fileSinkDesc.getInsertOverwrite()) {
        if (ltd != null) {
            ltd.setInsertOverwrite(true);
        }
    }
    if (SessionState.get().isHiveServerQuery() && null != table_desc && table_desc.getSerdeClassName().equalsIgnoreCase(ThriftJDBCBinarySerDe.class.getName()) && HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_SERVER2_THRIFT_RESULTSET_SERIALIZE_IN_TASKS)) {
        fileSinkDesc.setIsUsingThriftJDBCBinarySerDe(true);
    } else {
        fileSinkDesc.setIsUsingThriftJDBCBinarySerDe(false);
    }
    Operator output = putOpInsertMap(OperatorFactory.getAndMakeChild(fileSinkDesc, fsRS, input), inputRR);
    handleLineage(ltd, output);
    if (LOG.isDebugEnabled()) {
        LOG.debug("Created FileSink Plan for clause: " + dest + "dest_path: " + dest_path + " row schema: " + inputRR.toString());
    }
    FileSinkOperator fso = (FileSinkOperator) output;
    fso.getConf().setTable(dest_tab);
    // and it is an insert overwrite or insert into table
    if (dest_tab != null && !dest_tab.isNonNative() && conf.getBoolVar(ConfVars.HIVESTATSAUTOGATHER) && conf.getBoolVar(ConfVars.HIVESTATSCOLAUTOGATHER) && ColumnStatsAutoGatherContext.canRunAutogatherStats(fso)) {
        if (dest_type.intValue() == QBMetaData.DEST_TABLE) {
            genAutoColumnStatsGatheringPipeline(qb, table_desc, partSpec, input, qb.getParseInfo().isInsertIntoTable(dest_tab.getDbName(), dest_tab.getTableName()));
        } else if (dest_type.intValue() == QBMetaData.DEST_PARTITION) {
            genAutoColumnStatsGatheringPipeline(qb, table_desc, dest_part.getSpec(), input, qb.getParseInfo().isInsertIntoTable(dest_tab.getDbName(), dest_tab.getTableName()));
        }
    }
    return output;
}
Also used : AbstractMapJoinOperator(org.apache.hadoop.hive.ql.exec.AbstractMapJoinOperator) SelectOperator(org.apache.hadoop.hive.ql.exec.SelectOperator) JoinOperator(org.apache.hadoop.hive.ql.exec.JoinOperator) Operator(org.apache.hadoop.hive.ql.exec.Operator) GroupByOperator(org.apache.hadoop.hive.ql.exec.GroupByOperator) FileSinkOperator(org.apache.hadoop.hive.ql.exec.FileSinkOperator) FilterOperator(org.apache.hadoop.hive.ql.exec.FilterOperator) LimitOperator(org.apache.hadoop.hive.ql.exec.LimitOperator) ReduceSinkOperator(org.apache.hadoop.hive.ql.exec.ReduceSinkOperator) TableScanOperator(org.apache.hadoop.hive.ql.exec.TableScanOperator) UnionOperator(org.apache.hadoop.hive.ql.exec.UnionOperator) SMBMapJoinOperator(org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator) LoadFileType(org.apache.hadoop.hive.ql.plan.LoadTableDesc.LoadFileType) MaterializedViewDesc(org.apache.hadoop.hive.ql.exec.MaterializedViewDesc) FileSinkDesc(org.apache.hadoop.hive.ql.plan.FileSinkDesc) LazySimpleSerDe(org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe) FieldSchema(org.apache.hadoop.hive.metastore.api.FieldSchema) DynamicPartitionCtx(org.apache.hadoop.hive.ql.plan.DynamicPartitionCtx) ArrayList(java.util.ArrayList) ColumnInfo(org.apache.hadoop.hive.ql.exec.ColumnInfo) Operation(org.apache.hadoop.hive.ql.io.AcidUtils.Operation) HiveOperation(org.apache.hadoop.hive.ql.plan.HiveOperation) CreateViewDesc(org.apache.hadoop.hive.ql.plan.CreateViewDesc) LockException(org.apache.hadoop.hive.ql.lockmgr.LockException) ThriftJDBCBinarySerDe(org.apache.hadoop.hive.serde2.thrift.ThriftJDBCBinarySerDe) NoOpFetchFormatter(org.apache.hadoop.hive.serde2.NoOpFetchFormatter) ListBucketingCtx(org.apache.hadoop.hive.ql.plan.ListBucketingCtx) LinkedList(java.util.LinkedList) ArrayList(java.util.ArrayList) List(java.util.List) WriteEntity(org.apache.hadoop.hive.ql.hooks.WriteEntity) CalciteSemanticException(org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException) Path(org.apache.hadoop.fs.Path) Partition(org.apache.hadoop.hive.ql.metadata.Partition) DummyPartition(org.apache.hadoop.hive.ql.metadata.DummyPartition) LoadFileDesc(org.apache.hadoop.hive.ql.plan.LoadFileDesc) RowSchema(org.apache.hadoop.hive.ql.exec.RowSchema) Table(org.apache.hadoop.hive.ql.metadata.Table) FileSinkOperator(org.apache.hadoop.hive.ql.exec.FileSinkOperator) SQLUniqueConstraint(org.apache.hadoop.hive.metastore.api.SQLUniqueConstraint) CheckConstraint(org.apache.hadoop.hive.ql.metadata.CheckConstraint) NotNullConstraint(org.apache.hadoop.hive.ql.metadata.NotNullConstraint) SQLCheckConstraint(org.apache.hadoop.hive.metastore.api.SQLCheckConstraint) SQLDefaultConstraint(org.apache.hadoop.hive.metastore.api.SQLDefaultConstraint) DefaultConstraint(org.apache.hadoop.hive.ql.metadata.DefaultConstraint) SQLNotNullConstraint(org.apache.hadoop.hive.metastore.api.SQLNotNullConstraint) LockException(org.apache.hadoop.hive.ql.lockmgr.LockException) IOException(java.io.IOException) CalciteSemanticException(org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException) MetaException(org.apache.hadoop.hive.metastore.api.MetaException) HiveException(org.apache.hadoop.hive.ql.metadata.HiveException) SerDeException(org.apache.hadoop.hive.serde2.SerDeException) PatternSyntaxException(java.util.regex.PatternSyntaxException) FileNotFoundException(java.io.FileNotFoundException) AccessControlException(java.security.AccessControlException) InvalidTableException(org.apache.hadoop.hive.ql.metadata.InvalidTableException) LoadTableDesc(org.apache.hadoop.hive.ql.plan.LoadTableDesc) CreateTableDesc(org.apache.hadoop.hive.ql.plan.CreateTableDesc) HiveTxnManager(org.apache.hadoop.hive.ql.lockmgr.HiveTxnManager) CreateTableDesc(org.apache.hadoop.hive.ql.plan.CreateTableDesc) InsertTableDesc(org.apache.hadoop.hive.ql.plan.InsertTableDesc) LoadTableDesc(org.apache.hadoop.hive.ql.plan.LoadTableDesc) AlterTableDesc(org.apache.hadoop.hive.ql.plan.AlterTableDesc) TableDesc(org.apache.hadoop.hive.ql.plan.TableDesc) StandardStructObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector) StructObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector)

Example 15 with WriteEntity

use of org.apache.hadoop.hive.ql.hooks.WriteEntity in project hive by apache.

the class SemanticAnalyzer method validate.

@Override
public void validate() throws SemanticException {
    LOG.debug("validation start");
    boolean wasAcidChecked = false;
    // Validate inputs and outputs have right protectmode to execute the query
    for (ReadEntity readEntity : getInputs()) {
        ReadEntity.Type type = readEntity.getType();
        if (type != ReadEntity.Type.TABLE && type != ReadEntity.Type.PARTITION) {
            // here to make the logic complete.
            continue;
        }
        Table tbl = readEntity.getTable();
        Partition p = readEntity.getPartition();
        if (p != null) {
            tbl = p.getTable();
        }
        if (tbl != null && AcidUtils.isTransactionalTable(tbl)) {
            transactionalInQuery = true;
            if (!wasAcidChecked) {
                checkAcidTxnManager(tbl);
            }
            wasAcidChecked = true;
        }
    }
    for (WriteEntity writeEntity : getOutputs()) {
        WriteEntity.Type type = writeEntity.getType();
        if (type == WriteEntity.Type.PARTITION || type == WriteEntity.Type.DUMMYPARTITION) {
            String conflictingArchive = null;
            try {
                Partition usedp = writeEntity.getPartition();
                Table tbl = usedp.getTable();
                if (AcidUtils.isTransactionalTable(tbl)) {
                    transactionalInQuery = true;
                    if (!wasAcidChecked) {
                        checkAcidTxnManager(tbl);
                    }
                    wasAcidChecked = true;
                }
                LOG.debug("validated " + usedp.getName());
                LOG.debug(usedp.getTable().getTableName());
                WriteEntity.WriteType writeType = writeEntity.getWriteType();
                if (writeType != WriteType.UPDATE && writeType != WriteType.DELETE) {
                    // Do not check for ACID; it does not create new parts and this is expensive as hell.
                    // TODO: add an API to get table name list for archived parts with a single call;
                    // nobody uses this so we could skip the whole thing.
                    conflictingArchive = ArchiveUtils.conflictingArchiveNameOrNull(db, tbl, usedp.getSpec());
                }
            } catch (HiveException e) {
                throw new SemanticException(e);
            }
            if (conflictingArchive != null) {
                String message = String.format("Insert conflict with existing archive: %s", conflictingArchive);
                throw new SemanticException(message);
            }
        } else if (type == WriteEntity.Type.TABLE) {
            Table tbl = writeEntity.getTable();
            if (AcidUtils.isTransactionalTable(tbl)) {
                transactionalInQuery = true;
                if (!wasAcidChecked) {
                    checkAcidTxnManager(tbl);
                }
                wasAcidChecked = true;
            }
        }
        if (type != WriteEntity.Type.TABLE && type != WriteEntity.Type.PARTITION) {
            LOG.debug("not validating writeEntity, because entity is neither table nor partition");
            continue;
        }
    }
    boolean reworkMapredWork = HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_REWORK_MAPREDWORK);
    // validate all tasks
    for (Task<? extends Serializable> rootTask : rootTasks) {
        validate(rootTask, reworkMapredWork);
    }
}
Also used : ReadEntity(org.apache.hadoop.hive.ql.hooks.ReadEntity) Partition(org.apache.hadoop.hive.ql.metadata.Partition) DummyPartition(org.apache.hadoop.hive.ql.metadata.DummyPartition) WriteType(org.apache.hadoop.hive.ql.hooks.WriteEntity.WriteType) Table(org.apache.hadoop.hive.ql.metadata.Table) HiveException(org.apache.hadoop.hive.ql.metadata.HiveException) WriteEntity(org.apache.hadoop.hive.ql.hooks.WriteEntity) CalciteSemanticException(org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException)

Aggregations

WriteEntity (org.apache.hadoop.hive.ql.hooks.WriteEntity)88 Table (org.apache.hadoop.hive.ql.metadata.Table)39 ReadEntity (org.apache.hadoop.hive.ql.hooks.ReadEntity)35 HiveException (org.apache.hadoop.hive.ql.metadata.HiveException)24 Partition (org.apache.hadoop.hive.ql.metadata.Partition)24 ArrayList (java.util.ArrayList)18 DDLWork (org.apache.hadoop.hive.ql.plan.DDLWork)14 Path (org.apache.hadoop.fs.Path)13 AlterTableExchangePartition (org.apache.hadoop.hive.ql.plan.AlterTableExchangePartition)13 Referenceable (org.apache.atlas.typesystem.Referenceable)11 Database (org.apache.hadoop.hive.metastore.api.Database)11 Test (org.junit.Test)11 QueryPlan (org.apache.hadoop.hive.ql.QueryPlan)10 HashMap (java.util.HashMap)9 LinkedHashMap (java.util.LinkedHashMap)9 Test (org.testng.annotations.Test)9 SQLCheckConstraint (org.apache.hadoop.hive.metastore.api.SQLCheckConstraint)8 SQLDefaultConstraint (org.apache.hadoop.hive.metastore.api.SQLDefaultConstraint)8 SQLNotNullConstraint (org.apache.hadoop.hive.metastore.api.SQLNotNullConstraint)8 SQLUniqueConstraint (org.apache.hadoop.hive.metastore.api.SQLUniqueConstraint)8