Search in sources :

Example 1 with AcidRecordReader

use of org.apache.hadoop.hive.ql.io.AcidInputFormat.AcidRecordReader in project hive by apache.

the class StreamingAssert method readRecords.

List<Record> readRecords() throws Exception {
    if (currentDeltas.isEmpty()) {
        throw new AssertionError("No data");
    }
    InputFormat<NullWritable, OrcStruct> inputFormat = new OrcInputFormat();
    JobConf job = new JobConf();
    job.set("mapred.input.dir", partitionLocation.toString());
    job.set("bucket_count", Integer.toString(table.getSd().getNumBuckets()));
    job.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS, "id,msg");
    job.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES, "bigint:string");
    job.set(ConfVars.HIVE_TRANSACTIONAL_TABLE_SCAN.varname, "true");
    job.set(ValidTxnList.VALID_TXNS_KEY, txns.toString());
    InputSplit[] splits = inputFormat.getSplits(job, 1);
    assertEquals(1, splits.length);
    final AcidRecordReader<NullWritable, OrcStruct> recordReader = (AcidRecordReader<NullWritable, OrcStruct>) inputFormat.getRecordReader(splits[0], job, Reporter.NULL);
    NullWritable key = recordReader.createKey();
    OrcStruct value = recordReader.createValue();
    List<Record> records = new ArrayList<>();
    while (recordReader.next(key, value)) {
        RecordIdentifier recordIdentifier = recordReader.getRecordIdentifier();
        Record record = new Record(new RecordIdentifier(recordIdentifier.getTransactionId(), recordIdentifier.getBucketId(), recordIdentifier.getRowId()), value.toString());
        System.out.println(record);
        records.add(record);
    }
    recordReader.close();
    return records;
}
Also used : ArrayList(java.util.ArrayList) AcidRecordReader(org.apache.hadoop.hive.ql.io.AcidInputFormat.AcidRecordReader) NullWritable(org.apache.hadoop.io.NullWritable) RecordIdentifier(org.apache.hadoop.hive.ql.io.RecordIdentifier) OrcStruct(org.apache.hadoop.hive.ql.io.orc.OrcStruct) OrcInputFormat(org.apache.hadoop.hive.ql.io.orc.OrcInputFormat) JobConf(org.apache.hadoop.mapred.JobConf) InputSplit(org.apache.hadoop.mapred.InputSplit)

Aggregations

ArrayList (java.util.ArrayList)1 AcidRecordReader (org.apache.hadoop.hive.ql.io.AcidInputFormat.AcidRecordReader)1 RecordIdentifier (org.apache.hadoop.hive.ql.io.RecordIdentifier)1 OrcInputFormat (org.apache.hadoop.hive.ql.io.orc.OrcInputFormat)1 OrcStruct (org.apache.hadoop.hive.ql.io.orc.OrcStruct)1 NullWritable (org.apache.hadoop.io.NullWritable)1 InputSplit (org.apache.hadoop.mapred.InputSplit)1 JobConf (org.apache.hadoop.mapred.JobConf)1