Search in sources :

Example 11 with JobContextImpl

use of org.apache.hadoop.mapred.JobContextImpl in project hive by apache.

the class TestHiveIcebergOutputCommitter method testSuccessfulMultipleTasksPartitionedWrite.

@Test
public void testSuccessfulMultipleTasksPartitionedWrite() throws IOException {
    HiveIcebergOutputCommitter committer = new HiveIcebergOutputCommitter();
    Table table = table(temp.getRoot().getPath(), true);
    JobConf conf = jobConf(table, 2);
    List<Record> expected = writeRecords(table.name(), 2, 0, true, false, conf);
    committer.commitJob(new JobContextImpl(conf, JOB_ID));
    // Expecting 6 files with fanout-, 8 with ClusteredWriter where writing to already completed partitions is allowed.
    HiveIcebergTestUtils.validateFiles(table, conf, JOB_ID, 8);
    HiveIcebergTestUtils.validateData(table, expected, 0);
}
Also used : JobContextImpl(org.apache.hadoop.mapred.JobContextImpl) Table(org.apache.iceberg.Table) Record(org.apache.iceberg.data.Record) JobConf(org.apache.hadoop.mapred.JobConf) Test(org.junit.Test)

Example 12 with JobContextImpl

use of org.apache.hadoop.mapred.JobContextImpl in project hive by apache.

the class TestHiveIcebergOutputCommitter method testRetryTask.

@Test
public void testRetryTask() throws IOException {
    HiveIcebergOutputCommitter committer = new HiveIcebergOutputCommitter();
    Table table = table(temp.getRoot().getPath(), false);
    JobConf conf = jobConf(table, 2);
    // Write records and abort the tasks
    writeRecords(table.name(), 2, 0, false, true, conf);
    HiveIcebergTestUtils.validateFiles(table, conf, JOB_ID, 0);
    HiveIcebergTestUtils.validateData(table, Collections.emptyList(), 0);
    // Write records but do not abort the tasks
    // The data files remain since we can not identify them but should not be read
    writeRecords(table.name(), 2, 1, false, false, conf);
    HiveIcebergTestUtils.validateFiles(table, conf, JOB_ID, 2);
    HiveIcebergTestUtils.validateData(table, Collections.emptyList(), 0);
    // Write and commit the records
    List<Record> expected = writeRecords(table.name(), 2, 2, true, false, conf);
    committer.commitJob(new JobContextImpl(conf, JOB_ID));
    HiveIcebergTestUtils.validateFiles(table, conf, JOB_ID, 4);
    HiveIcebergTestUtils.validateData(table, expected, 0);
}
Also used : JobContextImpl(org.apache.hadoop.mapred.JobContextImpl) Table(org.apache.iceberg.Table) Record(org.apache.iceberg.data.Record) JobConf(org.apache.hadoop.mapred.JobConf) Test(org.junit.Test)

Example 13 with JobContextImpl

use of org.apache.hadoop.mapred.JobContextImpl in project flink by apache.

the class HadoopOutputFormatBase method finalizeGlobal.

@Override
public void finalizeGlobal(int parallelism) throws IOException {
    try {
        JobContext jobContext = new JobContextImpl(this.jobConf, new JobID());
        OutputCommitter outputCommitter = this.jobConf.getOutputCommitter();
        // finalize HDFS output format
        outputCommitter.commitJob(jobContext);
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}
Also used : OutputCommitter(org.apache.hadoop.mapred.OutputCommitter) JobContextImpl(org.apache.hadoop.mapred.JobContextImpl) JobContext(org.apache.hadoop.mapred.JobContext) JobID(org.apache.hadoop.mapred.JobID) IOException(java.io.IOException)

Example 14 with JobContextImpl

use of org.apache.hadoop.mapred.JobContextImpl in project flink by apache.

the class HadoopOutputFormatBase method open.

/**
 * create the temporary output file for hadoop RecordWriter.
 *
 * @param taskNumber The number of the parallel instance.
 * @param numTasks The number of parallel tasks.
 * @throws java.io.IOException
 */
@Override
public void open(int taskNumber, int numTasks) throws IOException {
    // enforce sequential open() calls
    synchronized (OPEN_MUTEX) {
        if (Integer.toString(taskNumber + 1).length() > 6) {
            throw new IOException("Task id too large.");
        }
        TaskAttemptID taskAttemptID = TaskAttemptID.forName("attempt__0000_r_" + String.format("%" + (6 - Integer.toString(taskNumber + 1).length()) + "s", " ").replace(" ", "0") + Integer.toString(taskNumber + 1) + "_0");
        this.jobConf.set("mapred.task.id", taskAttemptID.toString());
        this.jobConf.setInt("mapred.task.partition", taskNumber + 1);
        // for hadoop 2.2
        this.jobConf.set("mapreduce.task.attempt.id", taskAttemptID.toString());
        this.jobConf.setInt("mapreduce.task.partition", taskNumber + 1);
        this.context = new TaskAttemptContextImpl(this.jobConf, taskAttemptID);
        this.outputCommitter = this.jobConf.getOutputCommitter();
        JobContext jobContext = new JobContextImpl(this.jobConf, new JobID());
        this.outputCommitter.setupJob(jobContext);
        this.recordWriter = this.mapredOutputFormat.getRecordWriter(null, this.jobConf, Integer.toString(taskNumber + 1), new HadoopDummyProgressable());
    }
}
Also used : JobContextImpl(org.apache.hadoop.mapred.JobContextImpl) TaskAttemptID(org.apache.hadoop.mapred.TaskAttemptID) TaskAttemptContextImpl(org.apache.hadoop.mapred.TaskAttemptContextImpl) IOException(java.io.IOException) JobContext(org.apache.hadoop.mapred.JobContext) HadoopDummyProgressable(org.apache.flink.api.java.hadoop.mapred.wrapper.HadoopDummyProgressable) JobID(org.apache.hadoop.mapred.JobID)

Aggregations

JobContextImpl (org.apache.hadoop.mapred.JobContextImpl)14 JobConf (org.apache.hadoop.mapred.JobConf)8 Test (org.junit.Test)7 Table (org.apache.iceberg.Table)6 Record (org.apache.iceberg.data.Record)5 IOException (java.io.IOException)4 JobID (org.apache.hadoop.mapred.JobID)4 OutputFormat (org.apache.hadoop.mapreduce.OutputFormat)4 IgniteCheckedException (org.apache.ignite.IgniteCheckedException)4 IgniteInterruptedCheckedException (org.apache.ignite.internal.IgniteInterruptedCheckedException)4 JobContext (org.apache.hadoop.mapred.JobContext)2 TaskAttemptID (org.apache.hadoop.mapred.TaskAttemptID)2 InputSplit (org.apache.hadoop.mapreduce.InputSplit)2 OutputCommitter (org.apache.hadoop.mapreduce.OutputCommitter)2 StreamEvent (co.cask.cdap.api.flow.flowlet.StreamEvent)1 IdentityStreamEventDecoder (co.cask.cdap.data.stream.decoder.IdentityStreamEventDecoder)1 AuthenticationTestContext (co.cask.cdap.security.auth.context.AuthenticationTestContext)1 NoOpAuthorizer (co.cask.cdap.security.spi.authorization.NoOpAuthorizer)1 File (java.io.File)1 HadoopDummyProgressable (org.apache.flink.api.java.hadoop.mapred.wrapper.HadoopDummyProgressable)1