Search in sources :

Example 1 with VectorMapJoinFastTableContainer

use of org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer in project hive by apache.

the class MapJoinTableContainerSerDe method loadFastContainer.

/**
   * Loads the small table into a VectorMapJoinFastTableContainer. Only used on Spark path.
   * @param mapJoinDesc The descriptor for the map join
   * @param fs FileSystem of the folder.
   * @param folder The folder to load table container.
   * @param hconf The hive configuration
   * @return Loaded table.
   */
@SuppressWarnings("unchecked")
public MapJoinTableContainer loadFastContainer(MapJoinDesc mapJoinDesc, FileSystem fs, Path folder, Configuration hconf) throws HiveException {
    try {
        VectorMapJoinFastTableContainer tableContainer = new VectorMapJoinFastTableContainer(mapJoinDesc, hconf, -1);
        tableContainer.setSerde(keyContext, valueContext);
        if (fs.exists(folder)) {
            if (!fs.isDirectory(folder)) {
                throw new HiveException("Error, not a directory: " + folder);
            }
            FileStatus[] fileStatuses = fs.listStatus(folder);
            if (fileStatuses != null && fileStatuses.length > 0) {
                AbstractSerDe keySerDe = keyContext.getSerDe();
                AbstractSerDe valueSerDe = valueContext.getSerDe();
                Writable key = keySerDe.getSerializedClass().newInstance();
                Writable value = valueSerDe.getSerializedClass().newInstance();
                for (FileStatus fileStatus : fileStatuses) {
                    Path filePath = fileStatus.getPath();
                    if (ShimLoader.getHadoopShims().isDirectory(fileStatus)) {
                        throw new HiveException("Error, not a file: " + filePath);
                    }
                    InputStream is = null;
                    ObjectInputStream in = null;
                    try {
                        is = fs.open(filePath, 4096);
                        in = new ObjectInputStream(is);
                        // skip the name and metadata
                        in.readUTF();
                        in.readObject();
                        int numKeys = in.readInt();
                        for (int keyIndex = 0; keyIndex < numKeys; keyIndex++) {
                            key.readFields(in);
                            long numRows = in.readLong();
                            for (long rowIndex = 0L; rowIndex < numRows; rowIndex++) {
                                value.readFields(in);
                                tableContainer.putRow(key, value);
                            }
                        }
                    } finally {
                        if (in != null) {
                            in.close();
                        } else if (is != null) {
                            is.close();
                        }
                    }
                }
            }
        }
        tableContainer.seal();
        return tableContainer;
    } catch (IOException e) {
        throw new HiveException("IO error while trying to create table container", e);
    } catch (Exception e) {
        throw new HiveException("Error while trying to create table container", e);
    }
}
Also used : Path(org.apache.hadoop.fs.Path) HiveException(org.apache.hadoop.hive.ql.metadata.HiveException) FileStatus(org.apache.hadoop.fs.FileStatus) ObjectInputStream(java.io.ObjectInputStream) InputStream(java.io.InputStream) Writable(org.apache.hadoop.io.Writable) IOException(java.io.IOException) AbstractSerDe(org.apache.hadoop.hive.serde2.AbstractSerDe) IOException(java.io.IOException) SerDeException(org.apache.hadoop.hive.serde2.SerDeException) ConcurrentModificationException(java.util.ConcurrentModificationException) HiveException(org.apache.hadoop.hive.ql.metadata.HiveException) VectorMapJoinFastTableContainer(org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer) ObjectInputStream(java.io.ObjectInputStream)

Aggregations

IOException (java.io.IOException)1 InputStream (java.io.InputStream)1 ObjectInputStream (java.io.ObjectInputStream)1 ConcurrentModificationException (java.util.ConcurrentModificationException)1 FileStatus (org.apache.hadoop.fs.FileStatus)1 Path (org.apache.hadoop.fs.Path)1 VectorMapJoinFastTableContainer (org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer)1 HiveException (org.apache.hadoop.hive.ql.metadata.HiveException)1 AbstractSerDe (org.apache.hadoop.hive.serde2.AbstractSerDe)1 SerDeException (org.apache.hadoop.hive.serde2.SerDeException)1 Writable (org.apache.hadoop.io.Writable)1