Search in sources :

Example 11 with ExecutionSetupException

use of org.apache.drill.common.exceptions.ExecutionSetupException in project drill by axbaretto.

the class HBaseRecordReader method setup.

@Override
public void setup(OperatorContext context, OutputMutator output) throws ExecutionSetupException {
    this.operatorContext = context;
    this.outputMutator = output;
    familyVectorMap = new HashMap<>();
    try {
        hTable = connection.getTable(hbaseTableName);
        // when creating reader (order of first appearance in query).
        for (SchemaPath column : getColumns()) {
            if (column.equals(ROW_KEY_PATH)) {
                MaterializedField field = MaterializedField.create(column.getAsNamePart().getName(), ROW_KEY_TYPE);
                rowKeyVector = outputMutator.addField(field, VarBinaryVector.class);
            } else {
                getOrCreateFamilyVector(column.getRootSegment().getPath(), false);
            }
        }
        // Add map and child vectors for any HBase columns that are requested (in
        // order to avoid later creation of dummy NullableIntVectors for them).
        final Set<Map.Entry<byte[], NavigableSet<byte[]>>> familiesEntries = hbaseScanColumnsOnly.getFamilyMap().entrySet();
        for (Map.Entry<byte[], NavigableSet<byte[]>> familyEntry : familiesEntries) {
            final String familyName = new String(familyEntry.getKey(), StandardCharsets.UTF_8);
            final MapVector familyVector = getOrCreateFamilyVector(familyName, false);
            final Set<byte[]> children = familyEntry.getValue();
            if (null != children) {
                for (byte[] childNameBytes : children) {
                    final String childName = new String(childNameBytes, StandardCharsets.UTF_8);
                    getOrCreateColumnVector(familyVector, childName);
                }
            }
        }
        // Add map vectors for any HBase column families that are requested.
        for (String familyName : completeFamilies) {
            getOrCreateFamilyVector(familyName, false);
        }
        resultScanner = hTable.getScanner(hbaseScan);
    } catch (SchemaChangeException | IOException e) {
        throw new ExecutionSetupException(e);
    }
}
Also used : ExecutionSetupException(org.apache.drill.common.exceptions.ExecutionSetupException) NavigableSet(java.util.NavigableSet) MaterializedField(org.apache.drill.exec.record.MaterializedField) IOException(java.io.IOException) VarBinaryVector(org.apache.drill.exec.vector.VarBinaryVector) NullableVarBinaryVector(org.apache.drill.exec.vector.NullableVarBinaryVector) SchemaChangeException(org.apache.drill.exec.exception.SchemaChangeException) SchemaPath(org.apache.drill.common.expression.SchemaPath) HashMap(java.util.HashMap) Map(java.util.Map) MapVector(org.apache.drill.exec.vector.complex.MapVector)

Example 12 with ExecutionSetupException

use of org.apache.drill.common.exceptions.ExecutionSetupException in project drill by axbaretto.

the class HBaseScanBatchCreator method getBatch.

@Override
public ScanBatch getBatch(ExecutorFragmentContext context, HBaseSubScan subScan, List<RecordBatch> children) throws ExecutionSetupException {
    Preconditions.checkArgument(children.isEmpty());
    List<RecordReader> readers = new LinkedList<>();
    List<SchemaPath> columns = null;
    for (HBaseSubScan.HBaseSubScanSpec scanSpec : subScan.getRegionScanSpecList()) {
        try {
            if ((columns = subScan.getColumns()) == null) {
                columns = GroupScan.ALL_COLUMNS;
            }
            readers.add(new HBaseRecordReader(subScan.getStorageEngine().getConnection(), scanSpec, columns));
        } catch (Exception e1) {
            throw new ExecutionSetupException(e1);
        }
    }
    return new ScanBatch(subScan, context, readers);
}
Also used : ExecutionSetupException(org.apache.drill.common.exceptions.ExecutionSetupException) SchemaPath(org.apache.drill.common.expression.SchemaPath) RecordReader(org.apache.drill.exec.store.RecordReader) ScanBatch(org.apache.drill.exec.physical.impl.ScanBatch) LinkedList(java.util.LinkedList) ExecutionSetupException(org.apache.drill.common.exceptions.ExecutionSetupException)

Example 13 with ExecutionSetupException

use of org.apache.drill.common.exceptions.ExecutionSetupException in project drill by axbaretto.

the class MaprDBJsonRecordReader method setup.

@Override
public void setup(OperatorContext context, OutputMutator output) throws ExecutionSetupException {
    this.vectorWriter = new VectorContainerWriter(output, unionEnabled);
    this.operatorContext = context;
    try {
        table = MapRDB.getTable(tableName);
        table.setOption(TableOption.EXCLUDEID, !includeId);
        documentStream = table.find(condition, projectedFields);
        documentReaderIterators = documentStream.documentReaders().iterator();
    } catch (DBException e) {
        throw new ExecutionSetupException(e);
    }
}
Also used : ExecutionSetupException(org.apache.drill.common.exceptions.ExecutionSetupException) DBException(com.mapr.db.exceptions.DBException) VectorContainerWriter(org.apache.drill.exec.vector.complex.impl.VectorContainerWriter)

Example 14 with ExecutionSetupException

use of org.apache.drill.common.exceptions.ExecutionSetupException in project drill by axbaretto.

the class HiveDrillNativeScanBatchCreator method getBatch.

@Override
public ScanBatch getBatch(ExecutorFragmentContext context, HiveDrillNativeParquetSubScan config, List<RecordBatch> children) throws ExecutionSetupException {
    final HiveTableWithColumnCache table = config.getTable();
    final List<List<InputSplit>> splits = config.getInputSplits();
    final List<HivePartition> partitions = config.getPartitions();
    final List<SchemaPath> columns = config.getColumns();
    final String partitionDesignator = context.getOptions().getOption(ExecConstants.FILESYSTEM_PARTITION_COLUMN_LABEL).string_val;
    List<Map<String, String>> implicitColumns = Lists.newLinkedList();
    boolean selectAllQuery = Utilities.isStarQuery(columns);
    final boolean hasPartitions = (partitions != null && partitions.size() > 0);
    final List<String[]> partitionColumns = Lists.newArrayList();
    final List<Integer> selectedPartitionColumns = Lists.newArrayList();
    List<SchemaPath> tableColumns = columns;
    if (!selectAllQuery) {
        // Separate out the partition and non-partition columns. Non-partition columns are passed directly to the
        // ParquetRecordReader. Partition columns are passed to ScanBatch.
        tableColumns = Lists.newArrayList();
        Pattern pattern = Pattern.compile(String.format("%s[0-9]+", partitionDesignator));
        for (SchemaPath column : columns) {
            Matcher m = pattern.matcher(column.getRootSegmentPath());
            if (m.matches()) {
                selectedPartitionColumns.add(Integer.parseInt(column.getRootSegmentPath().substring(partitionDesignator.length())));
            } else {
                tableColumns.add(column);
            }
        }
    }
    final OperatorContext oContext = context.newOperatorContext(config);
    int currentPartitionIndex = 0;
    final List<RecordReader> readers = new LinkedList<>();
    final HiveConf conf = config.getHiveConf();
    // TODO: In future we can get this cache from Metadata cached on filesystem.
    final Map<String, ParquetMetadata> footerCache = Maps.newHashMap();
    Map<String, String> mapWithMaxColumns = Maps.newLinkedHashMap();
    try {
        for (List<InputSplit> splitGroups : splits) {
            for (InputSplit split : splitGroups) {
                final FileSplit fileSplit = (FileSplit) split;
                final Path finalPath = fileSplit.getPath();
                final JobConf cloneJob = new ProjectionPusher().pushProjectionsAndFilters(new JobConf(conf), finalPath.getParent());
                final FileSystem fs = finalPath.getFileSystem(cloneJob);
                ParquetMetadata parquetMetadata = footerCache.get(finalPath.toString());
                if (parquetMetadata == null) {
                    parquetMetadata = ParquetFileReader.readFooter(cloneJob, finalPath);
                    footerCache.put(finalPath.toString(), parquetMetadata);
                }
                final List<Integer> rowGroupNums = getRowGroupNumbersFromFileSplit(fileSplit, parquetMetadata);
                for (int rowGroupNum : rowGroupNums) {
                    // DRILL-5009 : Skip the row group if the row count is zero
                    if (parquetMetadata.getBlocks().get(rowGroupNum).getRowCount() == 0) {
                        continue;
                    }
                    // Drill has only ever written a single row group per file, only detect corruption
                    // in the first row group
                    ParquetReaderUtility.DateCorruptionStatus containsCorruptDates = ParquetReaderUtility.detectCorruptDates(parquetMetadata, config.getColumns(), true);
                    if (logger.isDebugEnabled()) {
                        logger.debug(containsCorruptDates.toString());
                    }
                    readers.add(new ParquetRecordReader(context, Path.getPathWithoutSchemeAndAuthority(finalPath).toString(), rowGroupNum, fs, CodecFactory.createDirectCodecFactory(fs.getConf(), new ParquetDirectByteBufferAllocator(oContext.getAllocator()), 0), parquetMetadata, tableColumns, containsCorruptDates));
                    Map<String, String> implicitValues = Maps.newLinkedHashMap();
                    if (hasPartitions) {
                        List<String> values = partitions.get(currentPartitionIndex).getValues();
                        for (int i = 0; i < values.size(); i++) {
                            if (selectAllQuery || selectedPartitionColumns.contains(i)) {
                                implicitValues.put(partitionDesignator + i, values.get(i));
                            }
                        }
                    }
                    implicitColumns.add(implicitValues);
                    if (implicitValues.size() > mapWithMaxColumns.size()) {
                        mapWithMaxColumns = implicitValues;
                    }
                }
                currentPartitionIndex++;
            }
        }
    } catch (final IOException | RuntimeException e) {
        AutoCloseables.close(e, readers);
        throw new ExecutionSetupException("Failed to create RecordReaders. " + e.getMessage(), e);
    }
    // all readers should have the same number of implicit columns, add missing ones with value null
    mapWithMaxColumns = Maps.transformValues(mapWithMaxColumns, Functions.constant((String) null));
    for (Map<String, String> map : implicitColumns) {
        map.putAll(Maps.difference(map, mapWithMaxColumns).entriesOnlyOnRight());
    }
    // create an empty RecordReader to output the schema
    if (readers.size() == 0) {
        readers.add(new HiveDefaultReader(table, null, null, tableColumns, context, conf, ImpersonationUtil.createProxyUgi(config.getUserName(), context.getQueryUserName())));
    }
    return new ScanBatch(context, oContext, readers, implicitColumns);
}
Also used : ExecutionSetupException(org.apache.drill.common.exceptions.ExecutionSetupException) Matcher(java.util.regex.Matcher) ParquetMetadata(org.apache.parquet.hadoop.metadata.ParquetMetadata) ProjectionPusher(org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher) ParquetRecordReader(org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader) RecordReader(org.apache.drill.exec.store.RecordReader) FileSplit(org.apache.hadoop.mapred.FileSplit) SchemaPath(org.apache.drill.common.expression.SchemaPath) OperatorContext(org.apache.drill.exec.ops.OperatorContext) FileSystem(org.apache.hadoop.fs.FileSystem) ScanBatch(org.apache.drill.exec.physical.impl.ScanBatch) LinkedList(java.util.LinkedList) List(java.util.List) HiveConf(org.apache.hadoop.hive.conf.HiveConf) InputSplit(org.apache.hadoop.mapred.InputSplit) JobConf(org.apache.hadoop.mapred.JobConf) Path(org.apache.hadoop.fs.Path) SchemaPath(org.apache.drill.common.expression.SchemaPath) Pattern(java.util.regex.Pattern) ParquetDirectByteBufferAllocator(org.apache.drill.exec.store.parquet.ParquetDirectByteBufferAllocator) HiveDefaultReader(org.apache.drill.exec.store.hive.readers.HiveDefaultReader) IOException(java.io.IOException) ParquetReaderUtility(org.apache.drill.exec.store.parquet.ParquetReaderUtility) LinkedList(java.util.LinkedList) ParquetRecordReader(org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader) Map(java.util.Map)

Example 15 with ExecutionSetupException

use of org.apache.drill.common.exceptions.ExecutionSetupException in project drill by axbaretto.

the class HiveRecordReader method init.

private void init() throws ExecutionSetupException {
    final JobConf job = new JobConf(hiveConf);
    // Get the configured default val
    defaultPartitionValue = hiveConf.get(ConfVars.DEFAULTPARTITIONNAME.varname);
    Properties tableProperties;
    try {
        tableProperties = MetaStoreUtils.getTableMetadata(table);
        final Properties partitionProperties = (partition == null) ? tableProperties : HiveUtilities.getPartitionMetadata(partition, table);
        HiveUtilities.addConfToJob(job, partitionProperties);
        final SerDe tableSerDe = createSerDe(job, table.getSd().getSerdeInfo().getSerializationLib(), tableProperties);
        final StructObjectInspector tableOI = getStructOI(tableSerDe);
        if (partition != null) {
            partitionSerDe = createSerDe(job, partition.getSd().getSerdeInfo().getSerializationLib(), partitionProperties);
            partitionOI = getStructOI(partitionSerDe);
            finalOI = (StructObjectInspector) ObjectInspectorConverters.getConvertedOI(partitionOI, tableOI);
            partTblObjectInspectorConverter = ObjectInspectorConverters.getConverter(partitionOI, finalOI);
            job.setInputFormat(HiveUtilities.getInputFormatClass(job, partition.getSd(), table));
        } else {
            // For non-partitioned tables, there is no need to create converter as there are no schema changes expected.
            partitionSerDe = tableSerDe;
            partitionOI = tableOI;
            partTblObjectInspectorConverter = null;
            finalOI = tableOI;
            job.setInputFormat(HiveUtilities.getInputFormatClass(job, table.getSd(), table));
        }
        // Get list of partition column names
        final List<String> partitionNames = Lists.newArrayList();
        for (FieldSchema field : table.getPartitionKeys()) {
            partitionNames.add(field.getName());
        }
        // We should always get the columns names from ObjectInspector. For some of the tables (ex. avro) metastore
        // may not contain the schema, instead it is derived from other sources such as table properties or external file.
        // SerDe object knows how to get the schema with all the config and table properties passed in initialization.
        // ObjectInspector created from the SerDe object has the schema.
        final StructTypeInfo sTypeInfo = (StructTypeInfo) TypeInfoUtils.getTypeInfoFromObjectInspector(finalOI);
        final List<String> tableColumnNames = sTypeInfo.getAllStructFieldNames();
        // Select list of columns for project pushdown into Hive SerDe readers.
        final List<Integer> columnIds = Lists.newArrayList();
        if (isStarQuery()) {
            selectedColumnNames = tableColumnNames;
            for (int i = 0; i < selectedColumnNames.size(); i++) {
                columnIds.add(i);
            }
            selectedPartitionNames = partitionNames;
        } else {
            selectedColumnNames = Lists.newArrayList();
            for (SchemaPath field : getColumns()) {
                String columnName = field.getRootSegment().getPath();
                if (partitionNames.contains(columnName)) {
                    selectedPartitionNames.add(columnName);
                } else {
                    columnIds.add(tableColumnNames.indexOf(columnName));
                    selectedColumnNames.add(columnName);
                }
            }
        }
        ColumnProjectionUtils.appendReadColumns(job, columnIds, selectedColumnNames);
        for (String columnName : selectedColumnNames) {
            ObjectInspector fieldOI = finalOI.getStructFieldRef(columnName).getFieldObjectInspector();
            TypeInfo typeInfo = TypeInfoUtils.getTypeInfoFromTypeString(fieldOI.getTypeName());
            selectedColumnObjInspectors.add(fieldOI);
            selectedColumnTypes.add(typeInfo);
            selectedColumnFieldConverters.add(HiveFieldConverter.create(typeInfo, fragmentContext));
        }
        for (int i = 0; i < table.getPartitionKeys().size(); i++) {
            FieldSchema field = table.getPartitionKeys().get(i);
            if (selectedPartitionNames.contains(field.getName())) {
                TypeInfo pType = TypeInfoUtils.getTypeInfoFromTypeString(field.getType());
                selectedPartitionTypes.add(pType);
                if (partition != null) {
                    selectedPartitionValues.add(HiveUtilities.convertPartitionType(pType, partition.getValues().get(i), defaultPartitionValue));
                }
            }
        }
    } catch (Exception e) {
        throw new ExecutionSetupException("Failure while initializing HiveRecordReader: " + e.getMessage(), e);
    }
    if (!empty) {
        try {
            reader = (org.apache.hadoop.mapred.RecordReader<Object, Object>) job.getInputFormat().getRecordReader(inputSplit, job, Reporter.NULL);
        } catch (Exception e) {
            throw new ExecutionSetupException("Failed to get o.a.hadoop.mapred.RecordReader from Hive InputFormat", e);
        }
        key = reader.createKey();
        skipRecordsInspector = new SkipRecordsInspector(tableProperties, reader);
    }
}
Also used : SerDe(org.apache.hadoop.hive.serde2.SerDe) ExecutionSetupException(org.apache.drill.common.exceptions.ExecutionSetupException) ObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector) StructObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector) FieldSchema(org.apache.hadoop.hive.metastore.api.FieldSchema) StructTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo) Properties(java.util.Properties) StructTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo) TypeInfo(org.apache.hadoop.hive.serde2.typeinfo.TypeInfo) DrillRuntimeException(org.apache.drill.common.exceptions.DrillRuntimeException) ExecutionSetupException(org.apache.drill.common.exceptions.ExecutionSetupException) IOException(java.io.IOException) ExecutionException(java.util.concurrent.ExecutionException) SchemaChangeException(org.apache.drill.exec.exception.SchemaChangeException) SerDeException(org.apache.hadoop.hive.serde2.SerDeException) SchemaPath(org.apache.drill.common.expression.SchemaPath) JobConf(org.apache.hadoop.mapred.JobConf) StructObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector)

Aggregations

ExecutionSetupException (org.apache.drill.common.exceptions.ExecutionSetupException)94 IOException (java.io.IOException)43 ScanBatch (org.apache.drill.exec.physical.impl.ScanBatch)26 SchemaPath (org.apache.drill.common.expression.SchemaPath)25 RecordReader (org.apache.drill.exec.store.RecordReader)24 SchemaChangeException (org.apache.drill.exec.exception.SchemaChangeException)22 LinkedList (java.util.LinkedList)16 Map (java.util.Map)14 MaterializedField (org.apache.drill.exec.record.MaterializedField)13 ExecutionException (java.util.concurrent.ExecutionException)10 DrillRuntimeException (org.apache.drill.common.exceptions.DrillRuntimeException)10 OperatorContext (org.apache.drill.exec.ops.OperatorContext)8 UserException (org.apache.drill.common.exceptions.UserException)7 MajorType (org.apache.drill.common.types.TypeProtos.MajorType)7 JobConf (org.apache.hadoop.mapred.JobConf)7 HashMap (java.util.HashMap)6 List (java.util.List)6 OutOfMemoryException (org.apache.drill.exec.exception.OutOfMemoryException)6 VectorContainerWriter (org.apache.drill.exec.vector.complex.impl.VectorContainerWriter)6 Path (org.apache.hadoop.fs.Path)6