Search in sources :

Example 1 with ObjectContainer

use of org.apache.hadoop.hive.ql.exec.persistence.ObjectContainer in project hive by apache.

the class MapJoinOperator method spillBigTableRow.

/**
   * Postpone processing the big table row temporarily by spilling it to a row container
   * @param hybridHtContainer Hybrid hashtable container
   * @param row big table row
   */
protected void spillBigTableRow(MapJoinTableContainer hybridHtContainer, Object row) throws HiveException {
    HybridHashTableContainer ht = (HybridHashTableContainer) hybridHtContainer;
    int partitionId = ht.getToSpillPartitionId();
    HashPartition hp = ht.getHashPartitions()[partitionId];
    ObjectContainer bigTable = hp.getMatchfileObjContainer();
    bigTable.add(row);
}
Also used : HashPartition(org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer.HashPartition) ObjectContainer(org.apache.hadoop.hive.ql.exec.persistence.ObjectContainer) HybridHashTableContainer(org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer)

Example 2 with ObjectContainer

use of org.apache.hadoop.hive.ql.exec.persistence.ObjectContainer in project hive by apache.

the class MapJoinOperator method reProcessBigTable.

/**
   * Iterate over the big table row container and feed process() with leftover rows
   * @param partitionId the partition from which to take out spilled big table rows
   * @throws HiveException
   */
protected void reProcessBigTable(int partitionId) throws HiveException {
    // For binary join, firstSmallTable is the only small table; it has reference to spilled big
    // table rows;
    // For n-way join, since we only spill once, when processing the first small table, so only the
    // firstSmallTable has reference to the spilled big table rows.
    HashPartition partition = firstSmallTable.getHashPartitions()[partitionId];
    ObjectContainer bigTable = partition.getMatchfileObjContainer();
    LOG.info("Hybrid Grace Hash Join: Going to process spilled big table rows in partition " + partitionId + ". Number of rows: " + bigTable.size());
    while (bigTable.hasNext()) {
        Object row = bigTable.next();
        process(row, conf.getPosBigTable());
    }
    bigTable.clear();
}
Also used : HashPartition(org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer.HashPartition) ObjectContainer(org.apache.hadoop.hive.ql.exec.persistence.ObjectContainer)

Example 3 with ObjectContainer

use of org.apache.hadoop.hive.ql.exec.persistence.ObjectContainer in project hive by apache.

the class VectorMapJoinBaseOperator method reProcessBigTable.

/**
   * For a vectorized row batch from the rows feed from the super MapJoinOperator.
   */
@Override
protected void reProcessBigTable(int partitionId) throws HiveException {
    if (scratchBatch == null) {
        // The process method was not called -- no big table rows.
        return;
    }
    HybridHashTableContainer.HashPartition partition = firstSmallTable.getHashPartitions()[partitionId];
    ObjectContainer bigTable = partition.getMatchfileObjContainer();
    DataOutputBuffer dataOutputBuffer = new DataOutputBuffer();
    while (bigTable.hasNext()) {
        Object row = bigTable.next();
        VectorizedBatchUtil.addProjectedRowToBatchFrom(row, (StructObjectInspector) inputObjInspectors[posBigTable], scratchBatch.size, scratchBatch, dataOutputBuffer);
        scratchBatch.size++;
        if (scratchBatch.size == VectorizedRowBatch.DEFAULT_SIZE) {
            // call process once we have a full batch
            process(scratchBatch, tag);
            scratchBatch.reset();
            dataOutputBuffer.reset();
        }
    }
    // Process the row batch that has less than DEFAULT_SIZE rows
    if (scratchBatch.size > 0) {
        process(scratchBatch, tag);
        scratchBatch.reset();
        dataOutputBuffer.reset();
    }
    bigTable.clear();
}
Also used : DataOutputBuffer(org.apache.hadoop.io.DataOutputBuffer) ObjectContainer(org.apache.hadoop.hive.ql.exec.persistence.ObjectContainer) HybridHashTableContainer(org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer)

Aggregations

ObjectContainer (org.apache.hadoop.hive.ql.exec.persistence.ObjectContainer)3 HybridHashTableContainer (org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer)2 HashPartition (org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer.HashPartition)2 DataOutputBuffer (org.apache.hadoop.io.DataOutputBuffer)1