Search in sources :

Example 1 with KeyValueContainer

use of org.apache.hadoop.hive.ql.exec.persistence.KeyValueContainer in project hive by apache.

the class MapJoinOperator method reloadHashTable.

/**
   * Reload hashtable from the hash partition.
   * It can have two steps:
   * 1) Deserialize a serialized hash table, and
   * 2) Merge every key/value pair from small table container into the hash table
   * @param pos position of small table
   * @param partitionId the partition of the small table to be reloaded from
   * @throws IOException
   * @throws HiveException
   * @throws SerDeException
   */
protected void reloadHashTable(byte pos, int partitionId) throws IOException, HiveException, SerDeException, ClassNotFoundException {
    HybridHashTableContainer container = (HybridHashTableContainer) mapJoinTables[pos];
    HashPartition partition = container.getHashPartitions()[partitionId];
    // Merge the sidefile into the newly created hash table
    // This is where the spilling may happen again
    LOG.info("Going to restore sidefile...");
    KeyValueContainer kvContainer = partition.getSidefileKVContainer();
    int rowCount = kvContainer.size();
    LOG.info("Hybrid Grace Hash Join: Number of rows restored from KeyValueContainer: " + kvContainer.size());
    // We're sure this part is smaller than memory limit
    if (rowCount <= 0) {
        // Since rowCount is used later to instantiate a BytesBytesMultiHashMap
        rowCount = 1024 * 1024;
    // as the initialCapacity which cannot be 0, we provide a reasonable
    // positive number here
    }
    LOG.info("Going to restore hashmap...");
    BytesBytesMultiHashMap restoredHashMap = partition.getHashMapFromDisk(rowCount);
    rowCount += restoredHashMap.getNumValues();
    LOG.info("Hybrid Grace Hash Join: Deserializing spilled hash partition...");
    LOG.info("Hybrid Grace Hash Join: Number of rows in hashmap: " + rowCount);
    // The size of deserialized partition shouldn't exceed half of memory limit
    if (rowCount * container.getTableRowSize() >= container.getMemoryThreshold() / 2) {
        LOG.warn("Hybrid Grace Hash Join: Hash table cannot be reloaded since it" + " will be greater than memory limit. Recursive spilling is currently not supported");
    }
    KeyValueHelper writeHelper = container.getWriteHelper();
    while (kvContainer.hasNext()) {
        ObjectPair<HiveKey, BytesWritable> pair = kvContainer.next();
        Writable key = pair.getFirst();
        Writable val = pair.getSecond();
        writeHelper.setKeyValue(key, val);
        restoredHashMap.put(writeHelper, -1);
    }
    container.setTotalInMemRowCount(container.getTotalInMemRowCount() + restoredHashMap.getNumValues());
    kvContainer.clear();
    spilledMapJoinTables[pos] = new MapJoinBytesTableContainer(restoredHashMap);
    spilledMapJoinTables[pos].setInternalValueOi(container.getInternalValueOi());
    spilledMapJoinTables[pos].setSortableSortOrders(container.getSortableSortOrders());
    spilledMapJoinTables[pos].setNullMarkers(container.getNullMarkers());
    spilledMapJoinTables[pos].setNotNullMarkers(container.getNotNullMarkers());
}
Also used : HiveKey(org.apache.hadoop.hive.ql.io.HiveKey) KeyValueHelper(org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.KeyValueHelper) MapJoinBytesTableContainer(org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer) Writable(org.apache.hadoop.io.Writable) BytesWritable(org.apache.hadoop.io.BytesWritable) HashPartition(org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer.HashPartition) BytesWritable(org.apache.hadoop.io.BytesWritable) HybridHashTableContainer(org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer) KeyValueContainer(org.apache.hadoop.hive.ql.exec.persistence.KeyValueContainer) BytesBytesMultiHashMap(org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap)

Aggregations

BytesBytesMultiHashMap (org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap)1 HybridHashTableContainer (org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer)1 HashPartition (org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer.HashPartition)1 KeyValueContainer (org.apache.hadoop.hive.ql.exec.persistence.KeyValueContainer)1 MapJoinBytesTableContainer (org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer)1 KeyValueHelper (org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.KeyValueHelper)1 HiveKey (org.apache.hadoop.hive.ql.io.HiveKey)1 BytesWritable (org.apache.hadoop.io.BytesWritable)1 Writable (org.apache.hadoop.io.Writable)1