Search in sources :

Example 6 with IOInterruptedException

use of org.apache.tez.runtime.library.api.IOInterruptedException in project tez by apache.

the class DefaultSorter method interruptSpillThread.

void interruptSpillThread() throws IOException {
    assert !spillLock.isHeldByCurrentThread();
    // sufficient motivation for the latter approach.
    try {
        spillThread.interrupt();
        spillThread.join();
    } catch (InterruptedException e) {
        LOG.info(outputContext.getDestinationVertexName() + ": " + "Spill thread interrupted");
        // Reset status
        Thread.currentThread().interrupt();
        throw new IOInterruptedException("Spill failed", e);
    }
}
Also used : IOInterruptedException(org.apache.tez.runtime.library.api.IOInterruptedException) IOInterruptedException(org.apache.tez.runtime.library.api.IOInterruptedException)

Example 7 with IOInterruptedException

use of org.apache.tez.runtime.library.api.IOInterruptedException in project tez by apache.

the class PipelinedSorter method flush.

@Override
public void flush() throws IOException {
    final String uniqueIdentifier = outputContext.getUniqueIdentifier();
    outputContext.notifyProgress();
    /**
     * Possible that the thread got interrupted when flush was happening or when the flush was
     * never invoked. As a part of cleanup activity in TezTaskRunner, it would invoke close()
     * on all I/O. At that time, this is safe to cleanup
     */
    if (isThreadInterrupted()) {
        return;
    }
    try {
        LOG.info(outputContext.getDestinationVertexName() + ": Starting flush of map output");
        span.end();
        merger.add(span.sort(sorter));
        // force a spill in flush()
        // case 1: we want to force because of following scenarios:
        // we have no keys written, and flush got called
        // we want atleast one spill(be it empty)
        // case 2: in pipeline shuffle case, we have no way of
        // knowing the last key being written until flush is called
        // so for flush()->spill() we want to force spill so that
        // we can send pipeline shuffle event with last event true.
        spill(false);
        sortmaster.shutdown();
        // safe to clean up
        buffers.clear();
        if (indexCacheList.isEmpty()) {
            /*
         * If we do not have this check, and if the task gets killed in the middle, it can throw
         * NPE leading to distraction when debugging.
         */
            if (LOG.isDebugEnabled()) {
                LOG.debug(outputContext.getDestinationVertexName() + ": Index list is empty... returning");
            }
            return;
        }
        if (!isFinalMergeEnabled()) {
            // Generate events for all spills
            List<Event> events = Lists.newLinkedList();
            // For pipelined shuffle, previous events are already sent. Just generate the last event alone
            int startIndex = (pipelinedShuffle) ? (numSpills - 1) : 0;
            int endIndex = numSpills;
            for (int i = startIndex; i < endIndex; i++) {
                boolean isLastEvent = (i == numSpills - 1);
                String pathComponent = (outputContext.getUniqueIdentifier() + "_" + i);
                ShuffleUtils.generateEventOnSpill(events, isFinalMergeEnabled(), isLastEvent, outputContext, i, indexCacheList.get(i), partitions, sendEmptyPartitionDetails, pathComponent, partitionStats, reportDetailedPartitionStats(), auxiliaryService, deflater);
                LOG.info(outputContext.getDestinationVertexName() + ": Adding spill event for spill (final update=" + isLastEvent + "), spillId=" + i);
            }
            outputContext.sendEvents(events);
            return;
        }
        numAdditionalSpills.increment(numSpills - 1);
        // In case final merge is required, the following code path is executed.
        if (numSpills == 1) {
            // someday be able to pass this directly to shuffle
            // without writing to disk
            final Path filename = spillFilePaths.get(0);
            final Path indexFilename = spillFileIndexPaths.get(0);
            finalOutputFile = mapOutputFile.getOutputFileForWriteInVolume(filename);
            finalIndexFile = mapOutputFile.getOutputIndexFileForWriteInVolume(indexFilename);
            sameVolRename(filename, finalOutputFile);
            sameVolRename(indexFilename, finalIndexFile);
            if (LOG.isDebugEnabled()) {
                LOG.debug(outputContext.getDestinationVertexName() + ": numSpills=" + numSpills + ", finalOutputFile=" + finalOutputFile + ", " + "finalIndexFile=" + finalIndexFile + ", filename=" + filename + ", indexFilename=" + indexFilename);
            }
            TezSpillRecord spillRecord = new TezSpillRecord(finalIndexFile, conf);
            if (reportPartitionStats()) {
                for (int i = 0; i < spillRecord.size(); i++) {
                    partitionStats[i] += spillRecord.getIndex(i).getPartLength();
                }
            }
            numShuffleChunks.setValue(numSpills);
            fileOutputByteCounter.increment(rfs.getFileStatus(finalOutputFile).getLen());
            // ??? why are events not being sent here?
            return;
        }
        finalOutputFile = // TODO
        mapOutputFile.getOutputFileForWrite(0);
        finalIndexFile = // TODO
        mapOutputFile.getOutputIndexFileForWrite(0);
        if (LOG.isDebugEnabled()) {
            LOG.debug(outputContext.getDestinationVertexName() + ": " + "numSpills: " + numSpills + ", finalOutputFile:" + finalOutputFile + ", finalIndexFile:" + finalIndexFile);
        }
        // The output stream for the final single output file
        FSDataOutputStream finalOut = rfs.create(finalOutputFile, true, 4096);
        if (!SPILL_FILE_PERMS.equals(SPILL_FILE_PERMS.applyUMask(FsPermission.getUMask(conf)))) {
            rfs.setPermission(finalOutputFile, SPILL_FILE_PERMS);
        }
        final TezSpillRecord spillRec = new TezSpillRecord(partitions);
        for (int parts = 0; parts < partitions; parts++) {
            boolean shouldWrite = false;
            // create the segments to be merged
            List<Segment> segmentList = new ArrayList<Segment>(numSpills);
            for (int i = 0; i < numSpills; i++) {
                Path spillFilename = spillFilePaths.get(i);
                TezIndexRecord indexRecord = indexCacheList.get(i).getIndex(parts);
                if (indexRecord.hasData() || !sendEmptyPartitionDetails) {
                    shouldWrite = true;
                    DiskSegment s = new DiskSegment(rfs, spillFilename, indexRecord.getStartOffset(), indexRecord.getPartLength(), codec, ifileReadAhead, ifileReadAheadLength, ifileBufferSize, true);
                    segmentList.add(s);
                }
            }
            int mergeFactor = this.conf.getInt(TezRuntimeConfiguration.TEZ_RUNTIME_IO_SORT_FACTOR, TezRuntimeConfiguration.TEZ_RUNTIME_IO_SORT_FACTOR_DEFAULT);
            // sort the segments only if there are intermediate merges
            boolean sortSegments = segmentList.size() > mergeFactor;
            // merge
            TezRawKeyValueIterator kvIter = TezMerger.merge(conf, rfs, keyClass, valClass, codec, segmentList, mergeFactor, new Path(uniqueIdentifier), (RawComparator) ConfigUtils.getIntermediateOutputKeyComparator(conf), progressable, sortSegments, true, null, spilledRecordsCounter, additionalSpillBytesRead, null, // Not using any Progress in TezMerger. Should just work.
            merger.needsRLE());
            // write merged output to disk
            long segmentStart = finalOut.getPos();
            long rawLength = 0;
            long partLength = 0;
            if (shouldWrite) {
                Writer writer = new Writer(conf, finalOut, keyClass, valClass, codec, spilledRecordsCounter, null, merger.needsRLE());
                if (combiner == null || numSpills < minSpillsForCombine) {
                    TezMerger.writeFile(kvIter, writer, progressable, TezRuntimeConfiguration.TEZ_RUNTIME_RECORDS_BEFORE_PROGRESS_DEFAULT);
                } else {
                    runCombineProcessor(kvIter, writer);
                }
                // close
                writer.close();
                rawLength = writer.getRawLength();
                partLength = writer.getCompressedLength();
            }
            outputBytesWithOverheadCounter.increment(rawLength);
            // record offsets
            final TezIndexRecord rec = new TezIndexRecord(segmentStart, rawLength, partLength);
            spillRec.putIndex(rec, parts);
            if (reportPartitionStats()) {
                partitionStats[parts] += partLength;
            }
        }
        // final merge has happened.
        numShuffleChunks.setValue(1);
        fileOutputByteCounter.increment(rfs.getFileStatus(finalOutputFile).getLen());
        spillRec.writeToFile(finalIndexFile, conf);
        finalOut.close();
        for (int i = 0; i < numSpills; i++) {
            Path indexFilename = spillFileIndexPaths.get(i);
            Path spillFilename = spillFilePaths.get(i);
            rfs.delete(indexFilename, true);
            rfs.delete(spillFilename, true);
        }
        spillFileIndexPaths.clear();
        spillFilePaths.clear();
    } catch (InterruptedException ie) {
        if (cleanup) {
            cleanup();
        }
        Thread.currentThread().interrupt();
        throw new IOInterruptedException("Interrupted while closing Output", ie);
    }
}
Also used : Path(org.apache.hadoop.fs.Path) DiskSegment(org.apache.tez.runtime.library.common.sort.impl.TezMerger.DiskSegment) IOInterruptedException(org.apache.tez.runtime.library.api.IOInterruptedException) ArrayList(java.util.ArrayList) IOInterruptedException(org.apache.tez.runtime.library.api.IOInterruptedException) DiskSegment(org.apache.tez.runtime.library.common.sort.impl.TezMerger.DiskSegment) Segment(org.apache.tez.runtime.library.common.sort.impl.TezMerger.Segment) Event(org.apache.tez.runtime.api.Event) FSDataOutputStream(org.apache.hadoop.fs.FSDataOutputStream) Writer(org.apache.tez.runtime.library.common.sort.impl.IFile.Writer)

Example 8 with IOInterruptedException

use of org.apache.tez.runtime.library.api.IOInterruptedException in project tez by apache.

the class UnorderedPartitionedKVWriter method getNextAvailableBuffer.

private WrappedBuffer getNextAvailableBuffer() throws IOException {
    if (availableBuffers.peek() == null) {
        if (numInitializedBuffers < numBuffers) {
            buffers[numInitializedBuffers] = new WrappedBuffer(numPartitions, numInitializedBuffers == numBuffers - 1 ? lastBufferSize : sizePerBuffer);
            numInitializedBuffers++;
            return buffers[numInitializedBuffers - 1];
        } else {
            // All buffers initialized, and none available right now. Wait
            try {
                // Ensure that spills are triggered so that buffers can be released.
                mayBeSpill(true);
                return availableBuffers.take();
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                throw new IOInterruptedException("Interrupted while waiting for next buffer", e);
            }
        }
    } else {
        return availableBuffers.poll();
    }
}
Also used : IOInterruptedException(org.apache.tez.runtime.library.api.IOInterruptedException) IOInterruptedException(org.apache.tez.runtime.library.api.IOInterruptedException)

Example 9 with IOInterruptedException

use of org.apache.tez.runtime.library.api.IOInterruptedException in project tez by apache.

the class TestUnorderedKVReader method testInterruptOnNext.

@Test(timeout = 5000)
public void testInterruptOnNext() throws IOException, InterruptedException {
    ShuffleManager shuffleManager = mock(ShuffleManager.class);
    // Simulate an interrupt while waiting for the next fetched input.
    doThrow(new InterruptedException()).when(shuffleManager).getNextInput();
    TezCounters counters = new TezCounters();
    TezCounter inputRecords = counters.findCounter(TaskCounter.INPUT_RECORDS_PROCESSED);
    UnorderedKVReader<Text, Text> reader = new UnorderedKVReader<Text, Text>(shuffleManager, defaultConf, null, false, -1, -1, inputRecords, mock(InputContext.class));
    try {
        reader.next();
        fail("No data available to reader. Should not be able to access any record");
    } catch (IOInterruptedException e) {
    // Expected exception. Any other should fail the test.
    }
}
Also used : IOInterruptedException(org.apache.tez.runtime.library.api.IOInterruptedException) InputContext(org.apache.tez.runtime.api.InputContext) ShuffleManager(org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager) Text(org.apache.hadoop.io.Text) TezCounter(org.apache.tez.common.counters.TezCounter) IOInterruptedException(org.apache.tez.runtime.library.api.IOInterruptedException) TezCounters(org.apache.tez.common.counters.TezCounters) Test(org.junit.Test)

Example 10 with IOInterruptedException

use of org.apache.tez.runtime.library.api.IOInterruptedException in project tez by apache.

the class OrderedGroupedKVInput method getReader.

/**
 * Get a KVReader for the Input.</p> This method will block until the input is
 * ready - i.e. the copy and merge stages are complete. Users can use the
 * isInputReady method to check if the input is ready, which gives an
 * indication of whether this method will block or not.
 *
 * NOTE: All values for the current K-V pair must be read prior to invoking
 * moveToNext. Once moveToNext() is called, the valueIterator from the
 * previous K-V pair will throw an Exception
 *
 * @return a KVReader over the sorted input.
 * @throws {@link IOInterruptedException} if IO was performing a blocking operation and was interrupted
 */
@Override
public KeyValuesReader getReader() throws IOException, TezException {
    // Cannot synchronize entire method since this is called form user code and can block.
    TezRawKeyValueIterator rawIterLocal;
    synchronized (this) {
        rawIterLocal = rawIter;
        if (getNumPhysicalInputs() == 0) {
            return new KeyValuesReader() {

                @Override
                public boolean next() throws IOException {
                    getContext().notifyProgress();
                    hasCompletedProcessing();
                    completedProcessing = true;
                    return false;
                }

                @Override
                public Object getCurrentKey() throws IOException {
                    throw new RuntimeException("No data available in Input");
                }

                @Override
                public Iterable<Object> getCurrentValues() throws IOException {
                    throw new RuntimeException("No data available in Input");
                }
            };
        }
    }
    if (rawIterLocal == null) {
        try {
            waitForInputReady();
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            throw new IOInterruptedException("Interrupted while waiting for input ready", e);
        }
    }
    @SuppressWarnings("rawtypes") ValuesIterator valuesIter = null;
    synchronized (this) {
        valuesIter = vIter;
    }
    return new OrderedGroupedKeyValuesReader(valuesIter, getContext());
}
Also used : IOInterruptedException(org.apache.tez.runtime.library.api.IOInterruptedException) KeyValuesReader(org.apache.tez.runtime.library.api.KeyValuesReader) ValuesIterator(org.apache.tez.runtime.library.common.ValuesIterator) IOInterruptedException(org.apache.tez.runtime.library.api.IOInterruptedException) TezRawKeyValueIterator(org.apache.tez.runtime.library.common.sort.impl.TezRawKeyValueIterator)

Aggregations

IOInterruptedException (org.apache.tez.runtime.library.api.IOInterruptedException)12 FSDataOutputStream (org.apache.hadoop.fs.FSDataOutputStream)2 Path (org.apache.hadoop.fs.Path)2 InputContext (org.apache.tez.runtime.api.InputContext)2 Writer (org.apache.tez.runtime.library.common.sort.impl.IFile.Writer)2 Test (org.junit.Test)2 IOException (java.io.IOException)1 ArrayList (java.util.ArrayList)1 Text (org.apache.hadoop.io.Text)1 TezCounter (org.apache.tez.common.counters.TezCounter)1 TezCounters (org.apache.tez.common.counters.TezCounters)1 Event (org.apache.tez.runtime.api.Event)1 KeyValueWriter (org.apache.tez.runtime.library.api.KeyValueWriter)1 KeyValueWriterWithBasePath (org.apache.tez.runtime.library.api.KeyValueWriterWithBasePath)1 KeyValuesReader (org.apache.tez.runtime.library.api.KeyValuesReader)1 ValuesIterator (org.apache.tez.runtime.library.common.ValuesIterator)1 ShuffleManager (org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager)1 DiskSegment (org.apache.tez.runtime.library.common.sort.impl.TezMerger.DiskSegment)1 Segment (org.apache.tez.runtime.library.common.sort.impl.TezMerger.Segment)1 TezRawKeyValueIterator (org.apache.tez.runtime.library.common.sort.impl.TezRawKeyValueIterator)1