Search in sources :

Example 11 with HiveKey

use of org.apache.hadoop.hive.ql.io.HiveKey in project hive by apache.

the class TopNHash method getVectorizedKeyToForward.

/**
   * After vectorized batch is processed, can return the key that caused a particular row
   * to be forwarded. Because the row could only be marked to forward because it has
   * the same key with some row already in the heap (for GBY), we can use that key from the
   * heap to emit the forwarded row.
   * @param batchIndex index of the key in the batch.
   * @return The key corresponding to the index.
   */
public HiveKey getVectorizedKeyToForward(int batchIndex) {
    int index = MAY_FORWARD - batchIndexToResult[batchIndex];
    HiveKey hk = new HiveKey();
    hk.set(keys[index], 0, keys[index].length);
    hk.setHashCode(hashes[index]);
    hk.setDistKeyLength(distKeyLengths[index]);
    return hk;
}
Also used : HiveKey(org.apache.hadoop.hive.ql.io.HiveKey)

Example 12 with HiveKey

use of org.apache.hadoop.hive.ql.io.HiveKey in project hive by apache.

the class SparkUtilities method copyHiveKey.

public static HiveKey copyHiveKey(HiveKey key) {
    HiveKey copy = new HiveKey();
    copy.setDistKeyLength(key.getDistKeyLength());
    copy.setHashCode(key.hashCode());
    copy.set(key);
    return copy;
}
Also used : HiveKey(org.apache.hadoop.hive.ql.io.HiveKey)

Example 13 with HiveKey

use of org.apache.hadoop.hive.ql.io.HiveKey in project presto by prestodb.

the class HiveBucketing method getHiveBucket.

public static Optional<HiveBucket> getHiveBucket(List<Entry<ObjectInspector, Object>> columnBindings, int bucketCount) {
    try {
        @SuppressWarnings("resource") GenericUDFHash udf = new GenericUDFHash();
        ObjectInspector[] objectInspectors = new ObjectInspector[columnBindings.size()];
        DeferredObject[] deferredObjects = new DeferredObject[columnBindings.size()];
        int i = 0;
        for (Entry<ObjectInspector, Object> entry : columnBindings) {
            objectInspectors[i] = getJavaObjectInspector(entry.getKey());
            deferredObjects[i] = getJavaDeferredObject(entry.getValue(), entry.getKey());
            i++;
        }
        ObjectInspector udfInspector = udf.initialize(objectInspectors);
        IntObjectInspector inspector = (IntObjectInspector) udfInspector;
        Object result = udf.evaluate(deferredObjects);
        HiveKey hiveKey = new HiveKey();
        hiveKey.setHashCode(inspector.get(result));
        int bucketNumber = new DefaultHivePartitioner<>().getBucket(hiveKey, null, bucketCount);
        return Optional.of(new HiveBucket(bucketNumber, bucketCount));
    } catch (HiveException e) {
        log.debug(e, "Error evaluating bucket number");
        return Optional.empty();
    }
}
Also used : PrimitiveObjectInspectorFactory.javaByteObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory.javaByteObjectInspector) PrimitiveObjectInspectorFactory.javaLongObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory.javaLongObjectInspector) ObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector) PrimitiveObjectInspectorFactory.javaIntObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory.javaIntObjectInspector) PrimitiveObjectInspectorFactory.javaBooleanObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory.javaBooleanObjectInspector) IntObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.primitive.IntObjectInspector) PrimitiveObjectInspectorFactory.javaShortObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory.javaShortObjectInspector) PrimitiveObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector) PrimitiveObjectInspectorFactory.javaStringObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory.javaStringObjectInspector) PrimitiveObjectInspectorFactory.javaIntObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory.javaIntObjectInspector) IntObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.primitive.IntObjectInspector) HiveException(org.apache.hadoop.hive.ql.metadata.HiveException) DeferredObject(org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredObject) HiveKey(org.apache.hadoop.hive.ql.io.HiveKey) GenericUDFHash(org.apache.hadoop.hive.ql.udf.generic.GenericUDFHash) DeferredJavaObject(org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredJavaObject) DeferredObject(org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredObject)

Example 14 with HiveKey

use of org.apache.hadoop.hive.ql.io.HiveKey in project hive by apache.

the class MapJoinOperator method reloadHashTable.

/**
   * Reload hashtable from the hash partition.
   * It can have two steps:
   * 1) Deserialize a serialized hash table, and
   * 2) Merge every key/value pair from small table container into the hash table
   * @param pos position of small table
   * @param partitionId the partition of the small table to be reloaded from
   * @throws IOException
   * @throws HiveException
   * @throws SerDeException
   */
protected void reloadHashTable(byte pos, int partitionId) throws IOException, HiveException, SerDeException, ClassNotFoundException {
    HybridHashTableContainer container = (HybridHashTableContainer) mapJoinTables[pos];
    HashPartition partition = container.getHashPartitions()[partitionId];
    // Merge the sidefile into the newly created hash table
    // This is where the spilling may happen again
    LOG.info("Going to restore sidefile...");
    KeyValueContainer kvContainer = partition.getSidefileKVContainer();
    int rowCount = kvContainer.size();
    LOG.info("Hybrid Grace Hash Join: Number of rows restored from KeyValueContainer: " + kvContainer.size());
    // We're sure this part is smaller than memory limit
    if (rowCount <= 0) {
        // Since rowCount is used later to instantiate a BytesBytesMultiHashMap
        rowCount = 1024 * 1024;
    // as the initialCapacity which cannot be 0, we provide a reasonable
    // positive number here
    }
    LOG.info("Going to restore hashmap...");
    BytesBytesMultiHashMap restoredHashMap = partition.getHashMapFromDisk(rowCount);
    rowCount += restoredHashMap.getNumValues();
    LOG.info("Hybrid Grace Hash Join: Deserializing spilled hash partition...");
    LOG.info("Hybrid Grace Hash Join: Number of rows in hashmap: " + rowCount);
    // The size of deserialized partition shouldn't exceed half of memory limit
    if (rowCount * container.getTableRowSize() >= container.getMemoryThreshold() / 2) {
        LOG.warn("Hybrid Grace Hash Join: Hash table cannot be reloaded since it" + " will be greater than memory limit. Recursive spilling is currently not supported");
    }
    KeyValueHelper writeHelper = container.getWriteHelper();
    while (kvContainer.hasNext()) {
        ObjectPair<HiveKey, BytesWritable> pair = kvContainer.next();
        Writable key = pair.getFirst();
        Writable val = pair.getSecond();
        writeHelper.setKeyValue(key, val);
        restoredHashMap.put(writeHelper, -1);
    }
    container.setTotalInMemRowCount(container.getTotalInMemRowCount() + restoredHashMap.getNumValues());
    kvContainer.clear();
    spilledMapJoinTables[pos] = new MapJoinBytesTableContainer(restoredHashMap);
    spilledMapJoinTables[pos].setInternalValueOi(container.getInternalValueOi());
    spilledMapJoinTables[pos].setSortableSortOrders(container.getSortableSortOrders());
    spilledMapJoinTables[pos].setNullMarkers(container.getNullMarkers());
    spilledMapJoinTables[pos].setNotNullMarkers(container.getNotNullMarkers());
}
Also used : HiveKey(org.apache.hadoop.hive.ql.io.HiveKey) KeyValueHelper(org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.KeyValueHelper) MapJoinBytesTableContainer(org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer) Writable(org.apache.hadoop.io.Writable) BytesWritable(org.apache.hadoop.io.BytesWritable) HashPartition(org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer.HashPartition) BytesWritable(org.apache.hadoop.io.BytesWritable) HybridHashTableContainer(org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer) KeyValueContainer(org.apache.hadoop.hive.ql.exec.persistence.KeyValueContainer) BytesBytesMultiHashMap(org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap)

Example 15 with HiveKey

use of org.apache.hadoop.hive.ql.io.HiveKey in project hive by apache.

the class ReduceSinkOperator method collect.

@Override
public void collect(byte[] key, byte[] value, int hash) throws IOException {
    HiveKey keyWritable = new HiveKey(key, hash);
    BytesWritable valueWritable = new BytesWritable(value);
    collect(keyWritable, valueWritable);
}
Also used : HiveKey(org.apache.hadoop.hive.ql.io.HiveKey) BytesWritable(org.apache.hadoop.io.BytesWritable)

Aggregations

HiveKey (org.apache.hadoop.hive.ql.io.HiveKey)21 BytesWritable (org.apache.hadoop.io.BytesWritable)12 HiveException (org.apache.hadoop.hive.ql.metadata.HiveException)6 IOException (java.io.IOException)5 ObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector)3 Input (com.esotericsoftware.kryo.io.Input)2 FileInputStream (java.io.FileInputStream)2 Path (org.apache.hadoop.fs.Path)2 GenericUDFHash (org.apache.hadoop.hive.ql.udf.generic.GenericUDFHash)2 SerDeException (org.apache.hadoop.hive.serde2.SerDeException)2 StructObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector)2 IntObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.primitive.IntObjectInspector)2 PrimitiveObjectInspectorFactory.javaBooleanObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory.javaBooleanObjectInspector)2 PrimitiveObjectInspectorFactory.javaLongObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory.javaLongObjectInspector)2 PrimitiveObjectInspectorFactory.javaStringObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory.javaStringObjectInspector)2 JobConf (org.apache.hadoop.mapred.JobConf)2 FileNotFoundException (java.io.FileNotFoundException)1 ArrayList (java.util.ArrayList)1 Iterator (java.util.Iterator)1 List (java.util.List)1