Search in sources :

Example 1 with FSKeyedSortedMerger2

use of edu.iu.dsc.tws.comms.shuffle.FSKeyedSortedMerger2 in project twister2 by DSC-SPIDAL.

the class DPartitionBatchFinalReceiver method initMergers.

/**
 * Initialize the mergers, this happens after each refresh
 */
private void initMergers(long maxBytesInMemory, long maxRecordsInMemory, long maxFileSize, int parallelIOAllowance) {
    for (Integer target : expIds.keySet()) {
        String shuffleDirectory = this.shuffleDirectories.get(partition.getLogicalPlan().getIndexOfTaskInNode(target) % this.shuffleDirectories.size());
        Shuffle sortedMerger;
        if (partition.getKeyType() == null) {
            sortedMerger = new FSMerger(maxBytesInMemory, maxRecordsInMemory, shuffleDirectory, DFWIOUtils.getOperationName(target, partition, refresh), partition.getDataType());
        } else {
            if (comparator != null) {
                sortedMerger = new FSKeyedSortedMerger2(maxBytesInMemory, maxFileSize, shuffleDirectory, DFWIOUtils.getOperationName(target, partition, refresh), partition.getKeyType(), partition.getDataType(), comparator, target, groupByKey, parallelIOAllowance);
            } else {
                sortedMerger = new FSKeyedMerger(maxBytesInMemory, maxRecordsInMemory, shuffleDirectory, DFWIOUtils.getOperationName(target, partition, refresh), partition.getKeyType(), partition.getDataType());
            }
        }
        sortedMergers.put(target, sortedMerger);
        finishedSources.put(target, new HashSet<>());
    }
}
Also used : FSKeyedSortedMerger2(edu.iu.dsc.tws.comms.shuffle.FSKeyedSortedMerger2) Shuffle(edu.iu.dsc.tws.comms.shuffle.Shuffle) FSKeyedMerger(edu.iu.dsc.tws.comms.shuffle.FSKeyedMerger) FSMerger(edu.iu.dsc.tws.comms.shuffle.FSMerger)

Example 2 with FSKeyedSortedMerger2

use of edu.iu.dsc.tws.comms.shuffle.FSKeyedSortedMerger2 in project twister2 by DSC-SPIDAL.

the class SortJoinUtilsTest method innerJoinWithDiskBasedListComparision.

/**
 * This test compares the results of in memory and disk based inner joins.
 * Purpose is to verify the accuracy of disk based inner join
 */
@Test
public void innerJoinWithDiskBasedListComparision() {
    List<Tuple> left = new ArrayList<>();
    List<Tuple> right = new ArrayList<>();
    Random random = new Random();
    for (int i = 0; i < 100; i++) {
        left.add(Tuple.of(random.nextInt(10), random.nextInt()));
        right.add(Tuple.of(random.nextInt(10), random.nextInt()));
    }
    FSKeyedSortedMerger2 fsk1 = new FSKeyedSortedMerger2(10, 100, "/tmp", "op-1-" + UUID.randomUUID().toString(), MessageTypes.INTEGER, MessageTypes.INTEGER, (Comparator<Integer>) Integer::compare, 0, false, 1);
    for (Tuple tuple : left) {
        byte[] data = MessageTypes.INTEGER.getDataPacker().packToByteArray((Integer) tuple.getValue());
        fsk1.add(tuple.getKey(), data, data.length);
        fsk1.run();
    }
    FSKeyedSortedMerger2 fsk2 = new FSKeyedSortedMerger2(10, 100, "/tmp", "op-2-" + UUID.randomUUID().toString(), MessageTypes.INTEGER, MessageTypes.INTEGER, (Comparator<Integer>) Integer::compare, 0, false, 1);
    for (Tuple tuple : right) {
        byte[] data = MessageTypes.INTEGER.getDataPacker().packToByteArray((Integer) tuple.getValue());
        fsk2.add(tuple.getKey(), data, data.length);
        fsk2.run();
    }
    CommonThreadPool.init(Config.newBuilder().build());
    fsk1.switchToReading();
    fsk2.switchToReading();
    Iterator iterator = SortJoinUtils.joinWithCache((RestorableIterator) fsk1.readIterator(), (RestorableIterator) fsk2.readIterator(), new KeyComparatorWrapper((Comparator<Integer>) Integer::compare), CommunicationContext.JoinType.INNER, Config.newBuilder().build());
    List<Object> objects = SortJoinUtils.innerJoin(left, right, new KeyComparatorWrapper(Comparator.naturalOrder()));
    objects.sort(Comparator.comparingInt(o -> (Integer) ((JoinedTuple) o).getKey()));
    int i = 0;
    while (iterator.hasNext()) {
        JoinedTuple nextFromIt = (JoinedTuple) iterator.next();
        JoinedTuple nextFromList = (JoinedTuple) objects.get(i++);
        Assert.assertEquals(nextFromIt.getKey(), nextFromList.getKey());
    }
    Assert.assertEquals(i, objects.size());
}
Also used : CommunicationContext(edu.iu.dsc.tws.api.comms.CommunicationContext) Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple) FSKeyedSortedMerger2(edu.iu.dsc.tws.comms.shuffle.FSKeyedSortedMerger2) RestorableIterator(edu.iu.dsc.tws.comms.shuffle.RestorableIterator) Iterator(java.util.Iterator) CommonThreadPool(edu.iu.dsc.tws.api.util.CommonThreadPool) Random(java.util.Random) Test(org.junit.Test) JoinedTuple(edu.iu.dsc.tws.api.comms.structs.JoinedTuple) Config(edu.iu.dsc.tws.api.config.Config) UUID(java.util.UUID) MessageTypes(edu.iu.dsc.tws.api.comms.messaging.types.MessageTypes) Logger(java.util.logging.Logger) ArrayList(java.util.ArrayList) List(java.util.List) Comparator(java.util.Comparator) Assert(org.junit.Assert) FSKeyedSortedMerger2(edu.iu.dsc.tws.comms.shuffle.FSKeyedSortedMerger2) ArrayList(java.util.ArrayList) JoinedTuple(edu.iu.dsc.tws.api.comms.structs.JoinedTuple) Comparator(java.util.Comparator) Random(java.util.Random) RestorableIterator(edu.iu.dsc.tws.comms.shuffle.RestorableIterator) Iterator(java.util.Iterator) Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple) JoinedTuple(edu.iu.dsc.tws.api.comms.structs.JoinedTuple) Test(org.junit.Test)

Example 3 with FSKeyedSortedMerger2

use of edu.iu.dsc.tws.comms.shuffle.FSKeyedSortedMerger2 in project twister2 by DSC-SPIDAL.

the class SortJoinUtilsTest method leftOuterJoinComparision.

/**
 * This test compares the results of in memory and disk based left outer joins.
 * Purpose is to verify the accuracy of disk based left outer join
 */
@Test
public void leftOuterJoinComparision() {
    List<Tuple> left = new ArrayList<>();
    List<Tuple> right = new ArrayList<>();
    Random random = new Random();
    for (int i = 0; i < 100; i++) {
        left.add(Tuple.of(random.nextInt(10), random.nextInt()));
        right.add(Tuple.of(random.nextInt(10), random.nextInt()));
    }
    FSKeyedSortedMerger2 fsk1 = new FSKeyedSortedMerger2(10, 100, "/tmp", "op-1-" + UUID.randomUUID().toString(), MessageTypes.INTEGER, MessageTypes.INTEGER, (Comparator<Integer>) Integer::compare, 0, false, 1);
    for (Tuple tuple : left) {
        byte[] data = MessageTypes.INTEGER.getDataPacker().packToByteArray((Integer) tuple.getValue());
        fsk1.add(tuple.getKey(), data, data.length);
        fsk1.run();
    }
    FSKeyedSortedMerger2 fsk2 = new FSKeyedSortedMerger2(10, 100, "/tmp", "op-2-" + UUID.randomUUID().toString(), MessageTypes.INTEGER, MessageTypes.INTEGER, (Comparator<Integer>) Integer::compare, 0, false, 1);
    for (Tuple tuple : right) {
        byte[] data = MessageTypes.INTEGER.getDataPacker().packToByteArray((Integer) tuple.getValue());
        fsk2.add(tuple.getKey(), data, data.length);
        fsk2.run();
    }
    CommonThreadPool.init(Config.newBuilder().build());
    fsk1.switchToReading();
    fsk2.switchToReading();
    Iterator iterator = SortJoinUtils.leftOuterJoin((RestorableIterator) fsk1.readIterator(), (RestorableIterator) fsk2.readIterator(), new KeyComparatorWrapper((Comparator<Integer>) Integer::compare));
    List<Object> objects = SortJoinUtils.leftOuterJoin(left, right, new KeyComparatorWrapper(Comparator.naturalOrder()));
    objects.sort(Comparator.comparingInt(o -> (Integer) ((JoinedTuple) o).getKey()));
    int i = 0;
    while (iterator.hasNext()) {
        JoinedTuple nextFromIt = (JoinedTuple) iterator.next();
        JoinedTuple nextFromList = (JoinedTuple) objects.get(i++);
        Assert.assertEquals(nextFromIt.getKey(), nextFromList.getKey());
    }
    Assert.assertEquals(i, objects.size());
}
Also used : CommunicationContext(edu.iu.dsc.tws.api.comms.CommunicationContext) Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple) FSKeyedSortedMerger2(edu.iu.dsc.tws.comms.shuffle.FSKeyedSortedMerger2) RestorableIterator(edu.iu.dsc.tws.comms.shuffle.RestorableIterator) Iterator(java.util.Iterator) CommonThreadPool(edu.iu.dsc.tws.api.util.CommonThreadPool) Random(java.util.Random) Test(org.junit.Test) JoinedTuple(edu.iu.dsc.tws.api.comms.structs.JoinedTuple) Config(edu.iu.dsc.tws.api.config.Config) UUID(java.util.UUID) MessageTypes(edu.iu.dsc.tws.api.comms.messaging.types.MessageTypes) Logger(java.util.logging.Logger) ArrayList(java.util.ArrayList) List(java.util.List) Comparator(java.util.Comparator) Assert(org.junit.Assert) FSKeyedSortedMerger2(edu.iu.dsc.tws.comms.shuffle.FSKeyedSortedMerger2) ArrayList(java.util.ArrayList) JoinedTuple(edu.iu.dsc.tws.api.comms.structs.JoinedTuple) Comparator(java.util.Comparator) Random(java.util.Random) RestorableIterator(edu.iu.dsc.tws.comms.shuffle.RestorableIterator) Iterator(java.util.Iterator) Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple) JoinedTuple(edu.iu.dsc.tws.api.comms.structs.JoinedTuple) Test(org.junit.Test)

Example 4 with FSKeyedSortedMerger2

use of edu.iu.dsc.tws.comms.shuffle.FSKeyedSortedMerger2 in project twister2 by DSC-SPIDAL.

the class DKGatherBatchFinalReceiver method init.

@Override
public void init(Config cfg, DataFlowOperation op, Map<Integer, List<Integer>> expectedIds) {
    super.init(cfg, op, expectedIds);
    long maxBytesInMemory = CommunicationContext.getShuffleMaxBytesInMemory(cfg);
    long maxRecordsInMemory = CommunicationContext.getShuffleMaxRecordsInMemory(cfg);
    long maxBytesToFile = CommunicationContext.getShuffleFileSize(cfg);
    int parallelIOAllowance = CommunicationContext.getParallelIOAllowance(cfg);
    for (Integer target : expectedIds.keySet()) {
        Shuffle sortedMerger;
        if (sorted) {
            sortedMerger = new FSKeyedSortedMerger2(maxBytesInMemory, maxBytesToFile, shuffleDirectory, getOperationName(target), dataFlowOperation.getKeyType(), dataFlowOperation.getDataType(), comparator, target, this.groupByKey, parallelIOAllowance);
        } else {
            sortedMerger = new FSKeyedMerger(maxBytesInMemory, maxRecordsInMemory, shuffleDirectory, getOperationName(target), dataFlowOperation.getKeyType(), dataFlowOperation.getDataType());
        }
        sortedMergers.put(target, sortedMerger);
    }
    this.bulkReceiver.init(cfg, expectedIds.keySet());
}
Also used : FSKeyedSortedMerger2(edu.iu.dsc.tws.comms.shuffle.FSKeyedSortedMerger2) Shuffle(edu.iu.dsc.tws.comms.shuffle.Shuffle) FSKeyedMerger(edu.iu.dsc.tws.comms.shuffle.FSKeyedMerger)

Example 5 with FSKeyedSortedMerger2

use of edu.iu.dsc.tws.comms.shuffle.FSKeyedSortedMerger2 in project twister2 by DSC-SPIDAL.

the class SortJoinUtilsTest method fullOuterJoinComparision.

/**
 * This test compares the results of in memory and disk based full outer joins.
 * Purpose is to verify the accuracy of disk based full outer join
 */
@Test
public void fullOuterJoinComparision() {
    List<Tuple> left = new ArrayList<>();
    List<Tuple> right = new ArrayList<>();
    Random random = new Random();
    for (int i = 0; i < 100; i++) {
        left.add(Tuple.of(random.nextInt(10), random.nextInt()));
        right.add(Tuple.of(random.nextInt(10), random.nextInt()));
    }
    FSKeyedSortedMerger2 fsk1 = new FSKeyedSortedMerger2(10, 100, "/tmp", "op-1-" + UUID.randomUUID().toString(), MessageTypes.INTEGER, MessageTypes.INTEGER, (Comparator<Integer>) Integer::compare, 0, false, 1);
    for (Tuple tuple : left) {
        byte[] data = MessageTypes.INTEGER.getDataPacker().packToByteArray((Integer) tuple.getValue());
        fsk1.add(tuple.getKey(), data, data.length);
        fsk1.run();
    }
    FSKeyedSortedMerger2 fsk2 = new FSKeyedSortedMerger2(10, 100, "/tmp", "op-2-" + UUID.randomUUID().toString(), MessageTypes.INTEGER, MessageTypes.INTEGER, (Comparator<Integer>) Integer::compare, 0, false, 1);
    for (Tuple tuple : right) {
        byte[] data = MessageTypes.INTEGER.getDataPacker().packToByteArray((Integer) tuple.getValue());
        fsk2.add(tuple.getKey(), data, data.length);
        fsk2.run();
    }
    CommonThreadPool.init(Config.newBuilder().build());
    fsk1.switchToReading();
    fsk2.switchToReading();
    Iterator iterator = SortJoinUtils.fullOuterJoin((RestorableIterator) fsk1.readIterator(), (RestorableIterator) fsk2.readIterator(), new KeyComparatorWrapper((Comparator<Integer>) Integer::compare));
    List<Object> objects = SortJoinUtils.fullOuterJoin(left, right, new KeyComparatorWrapper(Comparator.naturalOrder()));
    objects.sort(Comparator.comparingInt(o -> (Integer) ((JoinedTuple) o).getKey()));
    int i = 0;
    while (iterator.hasNext()) {
        JoinedTuple nextFromIt = (JoinedTuple) iterator.next();
        JoinedTuple nextFromList = (JoinedTuple) objects.get(i++);
        Assert.assertEquals(nextFromIt.getKey(), nextFromList.getKey());
    }
    Assert.assertEquals(i, objects.size());
}
Also used : CommunicationContext(edu.iu.dsc.tws.api.comms.CommunicationContext) Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple) FSKeyedSortedMerger2(edu.iu.dsc.tws.comms.shuffle.FSKeyedSortedMerger2) RestorableIterator(edu.iu.dsc.tws.comms.shuffle.RestorableIterator) Iterator(java.util.Iterator) CommonThreadPool(edu.iu.dsc.tws.api.util.CommonThreadPool) Random(java.util.Random) Test(org.junit.Test) JoinedTuple(edu.iu.dsc.tws.api.comms.structs.JoinedTuple) Config(edu.iu.dsc.tws.api.config.Config) UUID(java.util.UUID) MessageTypes(edu.iu.dsc.tws.api.comms.messaging.types.MessageTypes) Logger(java.util.logging.Logger) ArrayList(java.util.ArrayList) List(java.util.List) Comparator(java.util.Comparator) Assert(org.junit.Assert) FSKeyedSortedMerger2(edu.iu.dsc.tws.comms.shuffle.FSKeyedSortedMerger2) ArrayList(java.util.ArrayList) JoinedTuple(edu.iu.dsc.tws.api.comms.structs.JoinedTuple) Comparator(java.util.Comparator) Random(java.util.Random) RestorableIterator(edu.iu.dsc.tws.comms.shuffle.RestorableIterator) Iterator(java.util.Iterator) Tuple(edu.iu.dsc.tws.api.comms.structs.Tuple) JoinedTuple(edu.iu.dsc.tws.api.comms.structs.JoinedTuple) Test(org.junit.Test)

Aggregations

FSKeyedSortedMerger2 (edu.iu.dsc.tws.comms.shuffle.FSKeyedSortedMerger2)7 CommunicationContext (edu.iu.dsc.tws.api.comms.CommunicationContext)5 MessageTypes (edu.iu.dsc.tws.api.comms.messaging.types.MessageTypes)5 JoinedTuple (edu.iu.dsc.tws.api.comms.structs.JoinedTuple)5 Tuple (edu.iu.dsc.tws.api.comms.structs.Tuple)5 Config (edu.iu.dsc.tws.api.config.Config)5 CommonThreadPool (edu.iu.dsc.tws.api.util.CommonThreadPool)5 RestorableIterator (edu.iu.dsc.tws.comms.shuffle.RestorableIterator)5 ArrayList (java.util.ArrayList)5 Comparator (java.util.Comparator)5 Iterator (java.util.Iterator)5 List (java.util.List)5 Random (java.util.Random)5 UUID (java.util.UUID)5 Logger (java.util.logging.Logger)5 Assert (org.junit.Assert)5 Test (org.junit.Test)5 FSKeyedMerger (edu.iu.dsc.tws.comms.shuffle.FSKeyedMerger)2 Shuffle (edu.iu.dsc.tws.comms.shuffle.Shuffle)2 FSMerger (edu.iu.dsc.tws.comms.shuffle.FSMerger)1