Examples with IFrameWriter - org.apache.hyracks.api.comm.IFrameWriter

Example 6 with IFrameWriter

use of org.apache.hyracks.api.comm.IFrameWriter in project asterixdb by apache.

the class TestTypedAdapterFactory method createAdapter.

@Override
public IDataSourceAdapter createAdapter(IHyracksTaskContext ctx, int partition) throws HyracksDataException {
    final String nodeId = ctx.getJobletContext().getServiceContext().getNodeId();
    final ITupleParserFactory tupleParserFactory = new ITupleParserFactory() {

        private static final long serialVersionUID = 1L;

        @Override
        public ITupleParser createTupleParser(IHyracksTaskContext ctx) throws HyracksDataException {
            ADMDataParser parser;
            ITupleForwarder forwarder;
            ArrayTupleBuilder tb;
            IApplicationContext appCtx = (IApplicationContext) ctx.getJobletContext().getServiceContext().getApplicationContext();
            ClusterPartition nodePartition = appCtx.getMetadataProperties().getNodePartitions().get(nodeId)[0];
            parser = new ADMDataParser(outputType, true);
            forwarder = DataflowUtils.getTupleForwarder(configuration, FeedUtils.getFeedLogManager(ctx, FeedUtils.splitsForAdapter(ExternalDataUtils.getDataverse(configuration), ExternalDataUtils.getFeedName(configuration), nodeId, nodePartition)));
            tb = new ArrayTupleBuilder(1);
            return new ITupleParser() {

                @Override
                public void parse(InputStream in, IFrameWriter writer) throws HyracksDataException {
                    try {
                        parser.setInputStream(in);
                        forwarder.initialize(ctx, writer);
                        while (true) {
                            tb.reset();
                            if (!parser.parse(tb.getDataOutput())) {
                                break;
                            }
                            tb.addFieldEndOffset();
                            forwarder.addTuple(tb);
                        }
                        forwarder.close();
                    } catch (Exception e) {
                        throw new HyracksDataException(e);
                    }
                }
            };
        }
    };
    try {
        return new TestTypedAdapter(tupleParserFactory, outputType, ctx, configuration, partition);
    } catch (IOException e) {
        throw new HyracksDataException(e);
    }
}

Also used : IFrameWriter(org.apache.hyracks.api.comm.IFrameWriter) ITupleParser(org.apache.hyracks.dataflow.std.file.ITupleParser) InputStream(java.io.InputStream) ArrayTupleBuilder(org.apache.hyracks.dataflow.common.comm.io.ArrayTupleBuilder) IApplicationContext(org.apache.asterix.common.api.IApplicationContext) IOException(java.io.IOException) AlgebricksException(org.apache.hyracks.algebricks.common.exceptions.AlgebricksException) HyracksDataException(org.apache.hyracks.api.exceptions.HyracksDataException) IOException(java.io.IOException) HyracksDataException(org.apache.hyracks.api.exceptions.HyracksDataException) ADMDataParser(org.apache.asterix.external.parser.ADMDataParser) ITupleForwarder(org.apache.asterix.external.api.ITupleForwarder) ITupleParserFactory(org.apache.hyracks.dataflow.std.file.ITupleParserFactory) IHyracksTaskContext(org.apache.hyracks.api.context.IHyracksTaskContext) ClusterPartition(org.apache.asterix.common.cluster.ClusterPartition)

Example 7 with IFrameWriter

use of org.apache.hyracks.api.comm.IFrameWriter in project asterixdb by apache.

the class NestedPlansRunningAggregatorFactory method assemblePipeline.

private IFrameWriter assemblePipeline(AlgebricksPipeline subplan, IFrameWriter writer, IHyracksTaskContext ctx) throws HyracksDataException {
    // plug the operators
    IFrameWriter start = writer;
    IPushRuntimeFactory[] runtimeFactories = subplan.getRuntimeFactories();
    RecordDescriptor[] recordDescriptors = subplan.getRecordDescriptors();
    for (int i = runtimeFactories.length - 1; i >= 0; i--) {
        IPushRuntime newRuntime = runtimeFactories[i].createPushRuntime(ctx);
        newRuntime.setFrameWriter(0, start, recordDescriptors[i]);
        if (i > 0) {
            newRuntime.setInputRecordDescriptor(0, recordDescriptors[i - 1]);
        } else {
            // the nts has the same input and output rec. desc.
            newRuntime.setInputRecordDescriptor(0, recordDescriptors[0]);
        }
        start = newRuntime;
    }
    return start;
}

Also used : IFrameWriter(org.apache.hyracks.api.comm.IFrameWriter) RecordDescriptor(org.apache.hyracks.api.dataflow.value.RecordDescriptor) IPushRuntime(org.apache.hyracks.algebricks.runtime.base.IPushRuntime) IPushRuntimeFactory(org.apache.hyracks.algebricks.runtime.base.IPushRuntimeFactory)

Example 8 with IFrameWriter

use of org.apache.hyracks.api.comm.IFrameWriter in project asterixdb by apache.

the class PipelineAssembler method assemblePipeline.

public IFrameWriter assemblePipeline(IFrameWriter writer, IHyracksTaskContext ctx) throws HyracksDataException {
    // plug the operators
    // this.writer;
    IFrameWriter start = writer;
    for (int i = pipeline.getRuntimeFactories().length - 1; i >= 0; i--) {
        IPushRuntime newRuntime = pipeline.getRuntimeFactories()[i].createPushRuntime(ctx);
        if (i == pipeline.getRuntimeFactories().length - 1) {
            if (outputArity == 1) {
                newRuntime.setFrameWriter(0, start, pipelineOutputRecordDescriptor);
            }
        } else {
            newRuntime.setFrameWriter(0, start, pipeline.getRecordDescriptors()[i]);
        }
        if (i > 0) {
            newRuntime.setInputRecordDescriptor(0, pipeline.getRecordDescriptors()[i - 1]);
        } else if (inputArity > 0) {
            newRuntime.setInputRecordDescriptor(0, pipelineInputRecordDescriptor);
        }
        start = newRuntime;
    }
    return start;
}

Also used : IFrameWriter(org.apache.hyracks.api.comm.IFrameWriter) IPushRuntime(org.apache.hyracks.algebricks.runtime.base.IPushRuntime)

Example 9 with IFrameWriter

use of org.apache.hyracks.api.comm.IFrameWriter in project asterixdb by apache.

the class HashSpillableTableFactory method buildSpillableTable.

@Override
public ISpillableTable buildSpillableTable(final IHyracksTaskContext ctx, int suggestTableSize, long inputDataBytesSize, final int[] keyFields, final IBinaryComparator[] comparators, final INormalizedKeyComputer firstKeyNormalizerFactory, IAggregatorDescriptorFactory aggregateFactory, RecordDescriptor inRecordDescriptor, RecordDescriptor outRecordDescriptor, final int framesLimit, final int seed) throws HyracksDataException {
    final int tableSize = suggestTableSize;
    // For the output, we need to have at least one frame.
    if (framesLimit < MIN_FRAME_LIMT) {
        throw new HyracksDataException("The given frame limit is too small to partition the data.");
    }
    final int[] intermediateResultKeys = new int[keyFields.length];
    for (int i = 0; i < keyFields.length; i++) {
        intermediateResultKeys[i] = i;
    }
    final FrameTuplePairComparator ftpcInputCompareToAggregate = new FrameTuplePairComparator(keyFields, intermediateResultKeys, comparators);
    final ITuplePartitionComputer tpc = new FieldHashPartitionComputerFamily(keyFields, hashFunctionFamilies).createPartitioner(seed);
    // For calculating hash value for the already aggregated tuples (not incoming tuples)
    // This computer is required to calculate the hash value of a aggregated tuple
    // while doing the garbage collection work on Hash Table.
    final ITuplePartitionComputer tpcIntermediate = new FieldHashPartitionComputerFamily(intermediateResultKeys, hashFunctionFamilies).createPartitioner(seed);
    final IAggregatorDescriptor aggregator = aggregateFactory.createAggregator(ctx, inRecordDescriptor, outRecordDescriptor, keyFields, intermediateResultKeys, null);
    final AggregateState aggregateState = aggregator.createAggregateStates();
    final ArrayTupleBuilder stateTupleBuilder = new ArrayTupleBuilder(outRecordDescriptor.getFields().length);
    //TODO(jf) research on the optimized partition size
    long memoryBudget = Math.max(MIN_DATA_TABLE_FRAME_LIMT + MIN_HASH_TABLE_FRAME_LIMT, framesLimit - OUTPUT_FRAME_LIMT - MIN_HASH_TABLE_FRAME_LIMT);
    final int numPartitions = getNumOfPartitions(inputDataBytesSize / ctx.getInitialFrameSize(), memoryBudget);
    final int entriesPerPartition = (int) Math.ceil(1.0 * tableSize / numPartitions);
    if (LOGGER.isLoggable(Level.FINE)) {
        LOGGER.fine("created hashtable, table size:" + tableSize + " file size:" + inputDataBytesSize + "  #partitions:" + numPartitions);
    }
    final ArrayTupleBuilder outputTupleBuilder = new ArrayTupleBuilder(outRecordDescriptor.getFields().length);
    return new ISpillableTable() {

        private final TuplePointer pointer = new TuplePointer();

        private final BitSet spilledSet = new BitSet(numPartitions);

        // This frame pool will be shared by both data table and hash table.
        private final IDeallocatableFramePool framePool = new DeallocatableFramePool(ctx, framesLimit * ctx.getInitialFrameSize());

        // buffer manager for hash table
        private final ISimpleFrameBufferManager bufferManagerForHashTable = new FramePoolBackedFrameBufferManager(framePool);

        private final ISerializableTable hashTableForTuplePointer = new SerializableHashTable(tableSize, ctx, bufferManagerForHashTable);

        // buffer manager for data table
        final IPartitionedTupleBufferManager bufferManager = new VPartitionTupleBufferManager(PreferToSpillFullyOccupiedFramePolicy.createAtMostOneFrameForSpilledPartitionConstrain(spilledSet), numPartitions, framePool);

        final ITuplePointerAccessor bufferAccessor = bufferManager.getTuplePointerAccessor(outRecordDescriptor);

        private final PreferToSpillFullyOccupiedFramePolicy spillPolicy = new PreferToSpillFullyOccupiedFramePolicy(bufferManager, spilledSet);

        private final FrameTupleAppender outputAppender = new FrameTupleAppender(new VSizeFrame(ctx));

        @Override
        public void close() throws HyracksDataException {
            hashTableForTuplePointer.close();
            aggregator.close();
        }

        @Override
        public void clear(int partition) throws HyracksDataException {
            for (int p = getFirstEntryInHashTable(partition); p < getLastEntryInHashTable(partition); p++) {
                hashTableForTuplePointer.delete(p);
            }
            // Checks whether the garbage collection is required and conducts a garbage collection if so.
            if (hashTableForTuplePointer.isGarbageCollectionNeeded()) {
                int numberOfFramesReclaimed = hashTableForTuplePointer.collectGarbage(bufferAccessor, tpcIntermediate);
                if (LOGGER.isLoggable(Level.FINE)) {
                    LOGGER.fine("Garbage Collection on Hash table is done. Deallocated frames:" + numberOfFramesReclaimed);
                }
            }
            bufferManager.clearPartition(partition);
        }

        private int getPartition(int entryInHashTable) {
            return entryInHashTable / entriesPerPartition;
        }

        private int getFirstEntryInHashTable(int partition) {
            return partition * entriesPerPartition;
        }

        private int getLastEntryInHashTable(int partition) {
            return Math.min(tableSize, (partition + 1) * entriesPerPartition);
        }

        @Override
        public boolean insert(IFrameTupleAccessor accessor, int tIndex) throws HyracksDataException {
            int entryInHashTable = tpc.partition(accessor, tIndex, tableSize);
            for (int i = 0; i < hashTableForTuplePointer.getTupleCount(entryInHashTable); i++) {
                hashTableForTuplePointer.getTuplePointer(entryInHashTable, i, pointer);
                bufferAccessor.reset(pointer);
                int c = ftpcInputCompareToAggregate.compare(accessor, tIndex, bufferAccessor);
                if (c == 0) {
                    aggregateExistingTuple(accessor, tIndex, bufferAccessor, pointer.getTupleIndex());
                    return true;
                }
            }
            return insertNewAggregateEntry(entryInHashTable, accessor, tIndex);
        }

        /**
             * Inserts a new aggregate entry into the data table and hash table.
             * This insertion must be an atomic operation. We cannot have a partial success or failure.
             * So, if an insertion succeeds on the data table and the same insertion on the hash table fails, then
             * we need to revert the effect of data table insertion.
             */
        private boolean insertNewAggregateEntry(int entryInHashTable, IFrameTupleAccessor accessor, int tIndex) throws HyracksDataException {
            initStateTupleBuilder(accessor, tIndex);
            int pid = getPartition(entryInHashTable);
            // Insertion to the data table
            if (!bufferManager.insertTuple(pid, stateTupleBuilder.getByteArray(), stateTupleBuilder.getFieldEndOffsets(), 0, stateTupleBuilder.getSize(), pointer)) {
                return false;
            }
            // Insertion to the hash table
            if (!hashTableForTuplePointer.insert(entryInHashTable, pointer)) {
                // To preserve the atomicity of this method, we need to undo the effect
                // of the above bufferManager.insertTuple() call since the given insertion has failed.
                bufferManager.cancelInsertTuple(pid);
                return false;
            }
            return true;
        }

        private void initStateTupleBuilder(IFrameTupleAccessor accessor, int tIndex) throws HyracksDataException {
            stateTupleBuilder.reset();
            for (int k = 0; k < keyFields.length; k++) {
                stateTupleBuilder.addField(accessor, tIndex, keyFields[k]);
            }
            aggregator.init(stateTupleBuilder, accessor, tIndex, aggregateState);
        }

        private void aggregateExistingTuple(IFrameTupleAccessor accessor, int tIndex, ITuplePointerAccessor bufferAccessor, int tupleIndex) throws HyracksDataException {
            aggregator.aggregate(accessor, tIndex, bufferAccessor, tupleIndex, aggregateState);
        }

        @Override
        public int flushFrames(int partition, IFrameWriter writer, AggregateType type) throws HyracksDataException {
            int count = 0;
            for (int hashEntryPid = getFirstEntryInHashTable(partition); hashEntryPid < getLastEntryInHashTable(partition); hashEntryPid++) {
                count += hashTableForTuplePointer.getTupleCount(hashEntryPid);
                for (int tid = 0; tid < hashTableForTuplePointer.getTupleCount(hashEntryPid); tid++) {
                    hashTableForTuplePointer.getTuplePointer(hashEntryPid, tid, pointer);
                    bufferAccessor.reset(pointer);
                    outputTupleBuilder.reset();
                    for (int k = 0; k < intermediateResultKeys.length; k++) {
                        outputTupleBuilder.addField(bufferAccessor.getBuffer().array(), bufferAccessor.getAbsFieldStartOffset(intermediateResultKeys[k]), bufferAccessor.getFieldLength(intermediateResultKeys[k]));
                    }
                    boolean hasOutput = false;
                    switch(type) {
                        case PARTIAL:
                            hasOutput = aggregator.outputPartialResult(outputTupleBuilder, bufferAccessor, pointer.getTupleIndex(), aggregateState);
                            break;
                        case FINAL:
                            hasOutput = aggregator.outputFinalResult(outputTupleBuilder, bufferAccessor, pointer.getTupleIndex(), aggregateState);
                            break;
                    }
                    if (hasOutput && !outputAppender.appendSkipEmptyField(outputTupleBuilder.getFieldEndOffsets(), outputTupleBuilder.getByteArray(), 0, outputTupleBuilder.getSize())) {
                        outputAppender.write(writer, true);
                        if (!outputAppender.appendSkipEmptyField(outputTupleBuilder.getFieldEndOffsets(), outputTupleBuilder.getByteArray(), 0, outputTupleBuilder.getSize())) {
                            throw new HyracksDataException("The output item is too large to be fit into a frame.");
                        }
                    }
                }
            }
            outputAppender.write(writer, true);
            spilledSet.set(partition);
            return count;
        }

        @Override
        public int getNumPartitions() {
            return bufferManager.getNumPartitions();
        }

        @Override
        public int findVictimPartition(IFrameTupleAccessor accessor, int tIndex) throws HyracksDataException {
            int entryInHashTable = tpc.partition(accessor, tIndex, tableSize);
            int partition = getPartition(entryInHashTable);
            return spillPolicy.selectVictimPartition(partition);
        }
    };
}

Also used : IFrameWriter(org.apache.hyracks.api.comm.IFrameWriter) FramePoolBackedFrameBufferManager(org.apache.hyracks.dataflow.std.buffermanager.FramePoolBackedFrameBufferManager) VPartitionTupleBufferManager(org.apache.hyracks.dataflow.std.buffermanager.VPartitionTupleBufferManager) ITuplePointerAccessor(org.apache.hyracks.dataflow.std.buffermanager.ITuplePointerAccessor) TuplePointer(org.apache.hyracks.dataflow.std.structures.TuplePointer) ISimpleFrameBufferManager(org.apache.hyracks.dataflow.std.buffermanager.ISimpleFrameBufferManager) ITuplePartitionComputer(org.apache.hyracks.api.dataflow.value.ITuplePartitionComputer) IDeallocatableFramePool(org.apache.hyracks.dataflow.std.buffermanager.IDeallocatableFramePool) FrameTupleAppender(org.apache.hyracks.dataflow.common.comm.io.FrameTupleAppender) IFrameTupleAccessor(org.apache.hyracks.api.comm.IFrameTupleAccessor) SerializableHashTable(org.apache.hyracks.dataflow.std.structures.SerializableHashTable) FrameTuplePairComparator(org.apache.hyracks.dataflow.std.util.FrameTuplePairComparator) BitSet(java.util.BitSet) FieldHashPartitionComputerFamily(org.apache.hyracks.dataflow.common.data.partition.FieldHashPartitionComputerFamily) ArrayTupleBuilder(org.apache.hyracks.dataflow.common.comm.io.ArrayTupleBuilder) HyracksDataException(org.apache.hyracks.api.exceptions.HyracksDataException) VSizeFrame(org.apache.hyracks.api.comm.VSizeFrame) IDeallocatableFramePool(org.apache.hyracks.dataflow.std.buffermanager.IDeallocatableFramePool) DeallocatableFramePool(org.apache.hyracks.dataflow.std.buffermanager.DeallocatableFramePool) PreferToSpillFullyOccupiedFramePolicy(org.apache.hyracks.dataflow.std.buffermanager.PreferToSpillFullyOccupiedFramePolicy) ISerializableTable(org.apache.hyracks.dataflow.std.structures.ISerializableTable) IPartitionedTupleBufferManager(org.apache.hyracks.dataflow.std.buffermanager.IPartitionedTupleBufferManager)

Example 10 with IFrameWriter

use of org.apache.hyracks.api.comm.IFrameWriter in project asterixdb by apache.

the class ResultWriterOperatorDescriptor method createPushRuntime.

@Override
public IOperatorNodePushable createPushRuntime(final IHyracksTaskContext ctx, IRecordDescriptorProvider recordDescProvider, final int partition, final int nPartitions) throws HyracksDataException {
    final IDatasetPartitionManager dpm = ctx.getDatasetPartitionManager();
    final IFrame frame = new VSizeFrame(ctx);
    final FrameOutputStream frameOutputStream = new FrameOutputStream(ctx.getInitialFrameSize());
    frameOutputStream.reset(frame, true);
    PrintStream printStream = new PrintStream(frameOutputStream);
    final RecordDescriptor outRecordDesc = recordDescProvider.getInputRecordDescriptor(getActivityId(), 0);
    final IResultSerializer resultSerializer = resultSerializerFactory.createResultSerializer(outRecordDesc, printStream);
    final FrameTupleAccessor frameTupleAccessor = new FrameTupleAccessor(outRecordDesc);
    return new AbstractUnaryInputSinkOperatorNodePushable() {

        private IFrameWriter datasetPartitionWriter;

        private boolean failed = false;

        @Override
        public void open() throws HyracksDataException {
            try {
                datasetPartitionWriter = dpm.createDatasetPartitionWriter(ctx, rsId, ordered, asyncMode, partition, nPartitions);
                datasetPartitionWriter.open();
                resultSerializer.init();
            } catch (HyracksException e) {
                throw HyracksDataException.create(e);
            }
        }

        @Override
        public void nextFrame(ByteBuffer buffer) throws HyracksDataException {
            frameTupleAccessor.reset(buffer);
            for (int tIndex = 0; tIndex < frameTupleAccessor.getTupleCount(); tIndex++) {
                resultSerializer.appendTuple(frameTupleAccessor, tIndex);
                if (!frameOutputStream.appendTuple()) {
                    frameOutputStream.flush(datasetPartitionWriter);
                    resultSerializer.appendTuple(frameTupleAccessor, tIndex);
                    frameOutputStream.appendTuple();
                }
            }
        }

        @Override
        public void fail() throws HyracksDataException {
            failed = true;
            datasetPartitionWriter.fail();
        }

        @Override
        public void close() throws HyracksDataException {
            try {
                if (!failed && frameOutputStream.getTupleCount() > 0) {
                    frameOutputStream.flush(datasetPartitionWriter);
                }
            } catch (Exception e) {
                datasetPartitionWriter.fail();
                throw e;
            } finally {
                datasetPartitionWriter.close();
            }
        }

        @Override
        public String toString() {
            StringBuilder sb = new StringBuilder();
            sb.append("{ ");
            sb.append("\"rsId\": \"").append(rsId).append("\", ");
            sb.append("\"ordered\": ").append(ordered).append(", ");
            sb.append("\"asyncMode\": ").append(asyncMode).append(" }");
            return sb.toString();
        }
    };
}

Also used : PrintStream(java.io.PrintStream) IFrameWriter(org.apache.hyracks.api.comm.IFrameWriter) IFrame(org.apache.hyracks.api.comm.IFrame) IResultSerializer(org.apache.hyracks.api.dataflow.value.IResultSerializer) RecordDescriptor(org.apache.hyracks.api.dataflow.value.RecordDescriptor) HyracksException(org.apache.hyracks.api.exceptions.HyracksException) ByteBuffer(java.nio.ByteBuffer) VSizeFrame(org.apache.hyracks.api.comm.VSizeFrame) HyracksDataException(org.apache.hyracks.api.exceptions.HyracksDataException) IOException(java.io.IOException) HyracksException(org.apache.hyracks.api.exceptions.HyracksException) IDatasetPartitionManager(org.apache.hyracks.api.dataset.IDatasetPartitionManager) AbstractUnaryInputSinkOperatorNodePushable(org.apache.hyracks.dataflow.std.base.AbstractUnaryInputSinkOperatorNodePushable) FrameOutputStream(org.apache.hyracks.dataflow.common.comm.io.FrameOutputStream) FrameTupleAccessor(org.apache.hyracks.dataflow.common.comm.io.FrameTupleAccessor)

Aggregations

IFrameWriter (org.apache.hyracks.api.comm.IFrameWriter)32 HyracksDataException (org.apache.hyracks.api.exceptions.HyracksDataException)15 RecordDescriptor (org.apache.hyracks.api.dataflow.value.RecordDescriptor)13 VSizeFrame (org.apache.hyracks.api.comm.VSizeFrame)8 ByteBuffer (java.nio.ByteBuffer)6 ArrayTupleBuilder (org.apache.hyracks.dataflow.common.comm.io.ArrayTupleBuilder)6 FrameTupleAppender (org.apache.hyracks.dataflow.common.comm.io.FrameTupleAppender)6 IOException (java.io.IOException)5 ArrayList (java.util.ArrayList)5 FrameTupleAccessor (org.apache.hyracks.dataflow.common.comm.io.FrameTupleAccessor)5 IHyracksTaskContext (org.apache.hyracks.api.context.IHyracksTaskContext)4 HyracksException (org.apache.hyracks.api.exceptions.HyracksException)4 DataOutput (java.io.DataOutput)3 InputStream (java.io.InputStream)3 IPushRuntime (org.apache.hyracks.algebricks.runtime.base.IPushRuntime)3 IFrame (org.apache.hyracks.api.comm.IFrame)3 AbstractOperatorNodePushable (org.apache.hyracks.dataflow.std.base.AbstractOperatorNodePushable)3 InputStreamReader (java.io.InputStreamReader)2 Semaphore (java.util.concurrent.Semaphore)2 ExternalFile (org.apache.asterix.external.indexing.ExternalFile)2