use of org.apache.hadoop.hive.ql.io.HiveKey in project hive by apache.
the class TopNHash method getVectorizedKeyToForward.
/**
* After vectorized batch is processed, can return the key that caused a particular row
* to be forwarded. Because the row could only be marked to forward because it has
* the same key with some row already in the heap (for GBY), we can use that key from the
* heap to emit the forwarded row.
* @param batchIndex index of the key in the batch.
* @return The key corresponding to the index.
*/
public HiveKey getVectorizedKeyToForward(int batchIndex) {
int index = MAY_FORWARD - batchIndexToResult[batchIndex];
HiveKey hk = new HiveKey();
hk.set(keys[index], 0, keys[index].length);
hk.setHashCode(hashes[index]);
hk.setDistKeyLength(distKeyLengths[index]);
return hk;
}
use of org.apache.hadoop.hive.ql.io.HiveKey in project hive by apache.
the class SparkUtilities method copyHiveKey.
public static HiveKey copyHiveKey(HiveKey key) {
HiveKey copy = new HiveKey();
copy.setDistKeyLength(key.getDistKeyLength());
copy.setHashCode(key.hashCode());
copy.set(key);
return copy;
}
use of org.apache.hadoop.hive.ql.io.HiveKey in project presto by prestodb.
the class HiveBucketing method getHiveBucket.
public static Optional<HiveBucket> getHiveBucket(List<Entry<ObjectInspector, Object>> columnBindings, int bucketCount) {
try {
@SuppressWarnings("resource") GenericUDFHash udf = new GenericUDFHash();
ObjectInspector[] objectInspectors = new ObjectInspector[columnBindings.size()];
DeferredObject[] deferredObjects = new DeferredObject[columnBindings.size()];
int i = 0;
for (Entry<ObjectInspector, Object> entry : columnBindings) {
objectInspectors[i] = getJavaObjectInspector(entry.getKey());
deferredObjects[i] = getJavaDeferredObject(entry.getValue(), entry.getKey());
i++;
}
ObjectInspector udfInspector = udf.initialize(objectInspectors);
IntObjectInspector inspector = (IntObjectInspector) udfInspector;
Object result = udf.evaluate(deferredObjects);
HiveKey hiveKey = new HiveKey();
hiveKey.setHashCode(inspector.get(result));
int bucketNumber = new DefaultHivePartitioner<>().getBucket(hiveKey, null, bucketCount);
return Optional.of(new HiveBucket(bucketNumber, bucketCount));
} catch (HiveException e) {
log.debug(e, "Error evaluating bucket number");
return Optional.empty();
}
}
use of org.apache.hadoop.hive.ql.io.HiveKey in project hive by apache.
the class MapJoinOperator method reloadHashTable.
/**
* Reload hashtable from the hash partition.
* It can have two steps:
* 1) Deserialize a serialized hash table, and
* 2) Merge every key/value pair from small table container into the hash table
* @param pos position of small table
* @param partitionId the partition of the small table to be reloaded from
* @throws IOException
* @throws HiveException
* @throws SerDeException
*/
protected void reloadHashTable(byte pos, int partitionId) throws IOException, HiveException, SerDeException, ClassNotFoundException {
HybridHashTableContainer container = (HybridHashTableContainer) mapJoinTables[pos];
HashPartition partition = container.getHashPartitions()[partitionId];
// Merge the sidefile into the newly created hash table
// This is where the spilling may happen again
LOG.info("Going to restore sidefile...");
KeyValueContainer kvContainer = partition.getSidefileKVContainer();
int rowCount = kvContainer.size();
LOG.info("Hybrid Grace Hash Join: Number of rows restored from KeyValueContainer: " + kvContainer.size());
// We're sure this part is smaller than memory limit
if (rowCount <= 0) {
// Since rowCount is used later to instantiate a BytesBytesMultiHashMap
rowCount = 1024 * 1024;
// as the initialCapacity which cannot be 0, we provide a reasonable
// positive number here
}
LOG.info("Going to restore hashmap...");
BytesBytesMultiHashMap restoredHashMap = partition.getHashMapFromDisk(rowCount);
rowCount += restoredHashMap.getNumValues();
LOG.info("Hybrid Grace Hash Join: Deserializing spilled hash partition...");
LOG.info("Hybrid Grace Hash Join: Number of rows in hashmap: " + rowCount);
// The size of deserialized partition shouldn't exceed half of memory limit
if (rowCount * container.getTableRowSize() >= container.getMemoryThreshold() / 2) {
LOG.warn("Hybrid Grace Hash Join: Hash table cannot be reloaded since it" + " will be greater than memory limit. Recursive spilling is currently not supported");
}
KeyValueHelper writeHelper = container.getWriteHelper();
while (kvContainer.hasNext()) {
ObjectPair<HiveKey, BytesWritable> pair = kvContainer.next();
Writable key = pair.getFirst();
Writable val = pair.getSecond();
writeHelper.setKeyValue(key, val);
restoredHashMap.put(writeHelper, -1);
}
container.setTotalInMemRowCount(container.getTotalInMemRowCount() + restoredHashMap.getNumValues());
kvContainer.clear();
spilledMapJoinTables[pos] = new MapJoinBytesTableContainer(restoredHashMap);
spilledMapJoinTables[pos].setInternalValueOi(container.getInternalValueOi());
spilledMapJoinTables[pos].setSortableSortOrders(container.getSortableSortOrders());
spilledMapJoinTables[pos].setNullMarkers(container.getNullMarkers());
spilledMapJoinTables[pos].setNotNullMarkers(container.getNotNullMarkers());
}
use of org.apache.hadoop.hive.ql.io.HiveKey in project hive by apache.
the class ReduceSinkOperator method collect.
@Override
public void collect(byte[] key, byte[] value, int hash) throws IOException {
HiveKey keyWritable = new HiveKey(key, hash);
BytesWritable valueWritable = new BytesWritable(value);
collect(keyWritable, valueWritable);
}
Aggregations