Search in sources :

Example 1 with ISpillableTable

use of org.apache.hyracks.dataflow.std.group.ISpillableTable in project asterixdb by apache.

the class ExternalGroupWriteOperatorNodePushable method initialize.

@Override
public void initialize() throws HyracksDataException {
    ExternalGroupState aggState = (ExternalGroupState) ctx.getStateObject(stateId);
    ISpillableTable table = aggState.getSpillableTable();
    RunFileWriter[] partitionRuns = aggState.getRuns();
    int[] numberOfTuples = aggState.getSpilledNumTuples();
    try {
        writer.open();
        // level 0 use used at build stage.
        doPass(table, partitionRuns, numberOfTuples, writer, 1);
    } catch (Exception e) {
        try {
            for (RunFileWriter run : generatedRuns) {
                run.erase();
            }
        } finally {
            writer.fail();
        }
        throw e;
    } finally {
        writer.close();
    }
}
Also used : ISpillableTable(org.apache.hyracks.dataflow.std.group.ISpillableTable) HyracksDataException(org.apache.hyracks.api.exceptions.HyracksDataException) RunFileWriter(org.apache.hyracks.dataflow.common.io.RunFileWriter)

Example 2 with ISpillableTable

use of org.apache.hyracks.dataflow.std.group.ISpillableTable in project asterixdb by apache.

the class ExternalGroupWriteOperatorNodePushable method doPass.

private void doPass(ISpillableTable table, RunFileWriter[] runs, int[] numOfTuples, IFrameWriter writer, int level) throws HyracksDataException {
    assert table.getNumPartitions() == runs.length;
    for (int i = 0; i < runs.length; i++) {
        if (runs[i] == null) {
            table.flushFrames(i, writer, AggregateType.FINAL);
        }
    }
    table.close();
    for (int i = 0; i < runs.length; i++) {
        if (runs[i] != null) {
            // Calculates the hash table size (# of unique hash values) based on the budget and a tuple size.
            int memoryBudgetInBytes = ctx.getInitialFrameSize() * frameLimit;
            int groupByColumnsCount = mergeGroupFields.length;
            int hashTableCardinality = ExternalGroupOperatorDescriptor.calculateGroupByTableCardinality(memoryBudgetInBytes, groupByColumnsCount, ctx.getInitialFrameSize());
            hashTableCardinality = (int) Math.min(hashTableCardinality, numOfTuples[i]);
            ISpillableTable partitionTable = spillableTableFactory.buildSpillableTable(ctx, hashTableCardinality, runs[i].getFileSize(), mergeGroupFields, groupByComparators, nmkComputer, mergeAggregatorFactory, partialAggRecordDesc, outRecordDesc, frameLimit, level);
            RunFileWriter[] runFileWriters = new RunFileWriter[partitionTable.getNumPartitions()];
            int[] sizeInTuplesNextLevel = buildGroup(runs[i].createDeleteOnCloseReader(), partitionTable, runFileWriters);
            for (int idFile = 0; idFile < runFileWriters.length; idFile++) {
                if (runFileWriters[idFile] != null) {
                    generatedRuns.add(runFileWriters[idFile]);
                }
            }
            if (LOGGER.isLoggable(Level.FINE)) {
                int numOfSpilledPart = 0;
                for (int x = 0; x < numOfTuples.length; x++) {
                    if (numOfTuples[x] > 0) {
                        numOfSpilledPart++;
                    }
                }
                LOGGER.fine("level " + level + ":" + "build with " + numOfTuples.length + " partitions" + ", spilled " + numOfSpilledPart + " partitions");
            }
            doPass(partitionTable, runFileWriters, sizeInTuplesNextLevel, writer, level + 1);
        }
    }
}
Also used : ISpillableTable(org.apache.hyracks.dataflow.std.group.ISpillableTable) RunFileWriter(org.apache.hyracks.dataflow.common.io.RunFileWriter)

Example 3 with ISpillableTable

use of org.apache.hyracks.dataflow.std.group.ISpillableTable in project asterixdb by apache.

the class ExternalGroupBuildOperatorNodePushable method open.

@Override
public void open() throws HyracksDataException {
    state = new ExternalGroupState(ctx.getJobletContext().getJobId(), stateId);
    ISpillableTable table = spillableTableFactory.buildSpillableTable(ctx, tableSize, fileSize, keyFields, comparators, firstNormalizerComputer, aggregatorFactory, inRecordDescriptor, outRecordDescriptor, framesLimit, 0);
    RunFileWriter[] runFileWriters = new RunFileWriter[table.getNumPartitions()];
    this.externalGroupBy = new ExternalHashGroupBy(this, table, runFileWriters, inRecordDescriptor);
    state.setSpillableTable(table);
    state.setRuns(runFileWriters);
    state.setSpilledNumTuples(externalGroupBy.getSpilledNumTuples());
}
Also used : ISpillableTable(org.apache.hyracks.dataflow.std.group.ISpillableTable) RunFileWriter(org.apache.hyracks.dataflow.common.io.RunFileWriter)

Aggregations

RunFileWriter (org.apache.hyracks.dataflow.common.io.RunFileWriter)3 ISpillableTable (org.apache.hyracks.dataflow.std.group.ISpillableTable)3 HyracksDataException (org.apache.hyracks.api.exceptions.HyracksDataException)1