Search in sources :

Example 1 with Container

use of org.apache.iceberg.mr.mapred.Container in project hive by apache.

the class TestHiveIcebergSerDe method testDeserialize.

@Test
public void testDeserialize() {
    HiveIcebergSerDe serDe = new HiveIcebergSerDe();
    Record record = RandomGenericData.generate(schema, 1, 0).get(0);
    Container<Record> container = new Container<>();
    container.set(record);
    Assert.assertEquals(record, serDe.deserialize(container));
}
Also used : Container(org.apache.iceberg.mr.mapred.Container) Record(org.apache.iceberg.data.Record) Test(org.junit.Test)

Example 2 with Container

use of org.apache.iceberg.mr.mapred.Container in project hive by apache.

the class HiveIcebergRecordWriter method write.

@Override
public void write(Writable row) throws IOException {
    Record record = ((Container<Record>) row).get();
    super.write(record, spec, partition(record));
}
Also used : Container(org.apache.iceberg.mr.mapred.Container) Record(org.apache.iceberg.data.Record)

Example 3 with Container

use of org.apache.iceberg.mr.mapred.Container in project hive by apache.

the class TestHiveIcebergOutputCommitter method writeRecords.

/**
 * Write random records to the given table using separate {@link HiveIcebergOutputCommitter} and
 * a separate {@link HiveIcebergRecordWriter} for every task.
 * @param name The name of the table to get the table object from the conf
 * @param taskNum The number of tasks in the job handled by the committer
 * @param attemptNum The id used for attempt number generation
 * @param commitTasks If <code>true</code> the tasks will be committed
 * @param abortTasks If <code>true</code> the tasks will be aborted - needed so we can simulate no commit/no abort
 *                   situation
 * @param conf The job configuration
 * @param committer The output committer that should be used for committing/aborting the tasks
 * @return The random generated records which were appended to the table
 * @throws IOException Propagating {@link HiveIcebergRecordWriter} exceptions
 */
private List<Record> writeRecords(String name, int taskNum, int attemptNum, boolean commitTasks, boolean abortTasks, JobConf conf, OutputCommitter committer) throws IOException {
    List<Record> expected = new ArrayList<>(RECORD_NUM * taskNum);
    Table table = HiveIcebergStorageHandler.table(conf, name);
    FileIO io = table.io();
    Schema schema = HiveIcebergStorageHandler.schema(conf);
    PartitionSpec spec = table.spec();
    for (int i = 0; i < taskNum; ++i) {
        List<Record> records = TestHelper.generateRandomRecords(schema, RECORD_NUM, i + attemptNum);
        TaskAttemptID taskId = new TaskAttemptID(JOB_ID.getJtIdentifier(), JOB_ID.getId(), TaskType.MAP, i, attemptNum);
        int partitionId = taskId.getTaskID().getId();
        String operationId = QUERY_ID + "-" + JOB_ID;
        FileFormat fileFormat = FileFormat.PARQUET;
        OutputFileFactory outputFileFactory = OutputFileFactory.builderFor(table, partitionId, attemptNum).format(fileFormat).operationId(operationId).build();
        HiveFileWriterFactory hfwf = new HiveFileWriterFactory(table, fileFormat, schema, null, fileFormat, null, null, null, null);
        HiveIcebergRecordWriter testWriter = new HiveIcebergRecordWriter(schema, spec, fileFormat, hfwf, outputFileFactory, io, TARGET_FILE_SIZE, TezUtil.taskAttemptWrapper(taskId), conf.get(Catalogs.NAME));
        Container<Record> container = new Container<>();
        for (Record record : records) {
            container.set(record);
            testWriter.write(container);
        }
        testWriter.close(false);
        if (commitTasks) {
            committer.commitTask(new TaskAttemptContextImpl(conf, taskId));
            expected.addAll(records);
        } else if (abortTasks) {
            committer.abortTask(new TaskAttemptContextImpl(conf, taskId));
        }
    }
    return expected;
}
Also used : OutputFileFactory(org.apache.iceberg.io.OutputFileFactory) Table(org.apache.iceberg.Table) TaskAttemptID(org.apache.hadoop.mapred.TaskAttemptID) Schema(org.apache.iceberg.Schema) ArrayList(java.util.ArrayList) FileFormat(org.apache.iceberg.FileFormat) PartitionSpec(org.apache.iceberg.PartitionSpec) FileIO(org.apache.iceberg.io.FileIO) Container(org.apache.iceberg.mr.mapred.Container) TaskAttemptContextImpl(org.apache.hadoop.mapred.TaskAttemptContextImpl) Record(org.apache.iceberg.data.Record)

Aggregations

Record (org.apache.iceberg.data.Record)3 Container (org.apache.iceberg.mr.mapred.Container)3 ArrayList (java.util.ArrayList)1 TaskAttemptContextImpl (org.apache.hadoop.mapred.TaskAttemptContextImpl)1 TaskAttemptID (org.apache.hadoop.mapred.TaskAttemptID)1 FileFormat (org.apache.iceberg.FileFormat)1 PartitionSpec (org.apache.iceberg.PartitionSpec)1 Schema (org.apache.iceberg.Schema)1 Table (org.apache.iceberg.Table)1 FileIO (org.apache.iceberg.io.FileIO)1 OutputFileFactory (org.apache.iceberg.io.OutputFileFactory)1 Test (org.junit.Test)1