Examples with PartitionKey - co.cask.cdap.api.dataset.lib.PartitionKey

Example 36 with PartitionKey

use of co.cask.cdap.api.dataset.lib.PartitionKey in project cdap by caskdata.

the class CoreSchedulerServiceTest method publishNotification.

private void publishNotification(TopicId topicId, ProgramId programId, String dataset) throws TopicNotFoundException, IOException, TransactionFailureException, AlreadyExistsException, BadRequestException {
    DatasetId datasetId = programId.getNamespaceId().dataset(dataset);
    PartitionKey partitionKey = PartitionKey.builder().addIntField("part1", 1).build();
    Notification notification = Notification.forPartitions(datasetId, ImmutableList.of(partitionKey));
    messagingService.publish(StoreRequestBuilder.of(topicId).addPayloads(GSON.toJson(notification)).build());
}

Also used : PartitionKey(co.cask.cdap.api.dataset.lib.PartitionKey) Notification(co.cask.cdap.proto.Notification) DatasetId(co.cask.cdap.proto.id.DatasetId)

Example 37 with PartitionKey

use of co.cask.cdap.api.dataset.lib.PartitionKey in project cdap by caskdata.

the class PartitionConsumerTest method testPartitionConsumingWithFilterAndLimit.

@Test
public void testPartitionConsumingWithFilterAndLimit() throws Exception {
    final PartitionedFileSet dataset = dsFrameworkUtil.getInstance(pfsInstance);
    final TransactionAware txAwareDataset = (TransactionAware) dataset;
    final Set<PartitionKey> partitionKeys1 = new HashSet<>();
    for (int i = 0; i < 10; i++) {
        partitionKeys1.add(generateUniqueKey());
    }
    final Set<PartitionKey> partitionKeys2 = new HashSet<>();
    for (int i = 0; i < 15; i++) {
        partitionKeys2.add(generateUniqueKey());
    }
    final PartitionConsumer partitionConsumer = new ConcurrentPartitionConsumer(dataset, new InMemoryStatePersistor());
    // (consumption only happens at transaction borders)
    for (final PartitionKey partitionKey : partitionKeys1) {
        dsFrameworkUtil.newInMemoryTransactionExecutor(txAwareDataset).execute(new TransactionExecutor.Subroutine() {

            @Override
            public void apply() throws Exception {
                dataset.getPartitionOutput(partitionKey).addPartition();
            }
        });
    }
    dsFrameworkUtil.newInMemoryTransactionExecutor(txAwareDataset).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() throws Exception {
            // Initial consumption results in the partitions corresponding to partitionKeys1 to be consumed because only
            // those partitions are added to the dataset at this point
            List<Partition> consumedPartitions = new ArrayList<>();
            // with limit = 1, the returned iterator is only size 1, even though there are more unconsumed partitions
            Iterables.addAll(consumedPartitions, partitionConsumer.consumePartitions(1).getPartitions());
            Assert.assertEquals(1, consumedPartitions.size());
            // ask for 5 more
            Iterables.addAll(consumedPartitions, partitionConsumer.consumePartitions(5).getPartitions());
            Assert.assertEquals(6, consumedPartitions.size());
            // ask for 5 more, but there are only 4 more unconsumed partitions (size of partitionKeys1 is 10).
            Iterables.addAll(consumedPartitions, partitionConsumer.consumePartitions(5).getPartitions());
            Assert.assertEquals(10, consumedPartitions.size());
            Assert.assertEquals(partitionKeys1, toKeys(consumedPartitions));
        }
    });
    dsFrameworkUtil.newInMemoryTransactionExecutor(txAwareDataset).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() throws Exception {
            for (PartitionKey partitionKey : partitionKeys2) {
                dataset.getPartitionOutput(partitionKey).addPartition();
            }
        }
    });
    dsFrameworkUtil.newInMemoryTransactionExecutor(txAwareDataset).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() throws Exception {
            // using the same PartitionConsumer (which remembers the PartitionConsumerState) to consume additional
            // partitions results in only the newly added partitions (corresponding to partitionKeys2) to be returned
            Assert.assertEquals(partitionKeys2, toKeys(partitionConsumer.consumePartitions().getPartitions()));
        }
    });
    dsFrameworkUtil.newInMemoryTransactionExecutor(txAwareDataset).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() throws Exception {
            // consuming the partitions again, without adding any new partitions returns an empty iterator
            Assert.assertTrue(partitionConsumer.consumePartitions().getPartitions().isEmpty());
        }
    });
    dsFrameworkUtil.newInMemoryTransactionExecutor(txAwareDataset).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() throws Exception {
            // creating a new PartitionConsumer resets the consumption state.
            // test combination of filter and limit
            // the partitionFilter will match partitionKeys [1, 7), of which there are 6
            final PartitionFilter partitionFilter = PartitionFilter.builder().addRangeCondition("i", 1, 7).build();
            final Predicate<PartitionDetail> predicate = new Predicate<PartitionDetail>() {

                @Override
                public boolean apply(PartitionDetail partitionDetail) {
                    return partitionFilter.match(partitionDetail.getPartitionKey());
                }
            };
            ConsumerConfiguration configuration = ConsumerConfiguration.builder().setPartitionPredicate(predicate).build();
            PartitionConsumer newPartitionConsumer = new ConcurrentPartitionConsumer(dataset, new InMemoryStatePersistor(), configuration);
            List<Partition> consumedPartitions = new ArrayList<>();
            // apply the filter (narrows it down to 6 elements) and apply a limit of 4 results in 4 consumed partitions
            Iterables.addAll(consumedPartitions, newPartitionConsumer.consumePartitions(4).getPartitions());
            Assert.assertEquals(4, consumedPartitions.size());
            // apply a limit of 3, using the same filter returns the remaining 2 elements that fit that filter
            Iterables.addAll(consumedPartitions, newPartitionConsumer.consumePartitions(3).getPartitions());
            Assert.assertEquals(6, consumedPartitions.size());
            // assert that the partitions returned have partition keys, where the i values range from [1, 7]
            Set<Integer> expectedIFields = new HashSet<>();
            for (int i = 1; i < 7; i++) {
                expectedIFields.add(i);
            }
            Set<Integer> actualIFields = new HashSet<>();
            for (Partition consumedPartition : consumedPartitions) {
                actualIFields.add((Integer) consumedPartition.getPartitionKey().getField("i"));
            }
            Assert.assertEquals(expectedIFields, actualIFields);
        }
    });
}

Also used : ConcurrentPartitionConsumer(co.cask.cdap.api.dataset.lib.partitioned.ConcurrentPartitionConsumer) ConsumablePartition(co.cask.cdap.api.dataset.lib.partitioned.ConsumablePartition) Partition(co.cask.cdap.api.dataset.lib.Partition) HashSet(java.util.HashSet) PartitionedFileSet(co.cask.cdap.api.dataset.lib.PartitionedFileSet) Set(java.util.Set) ConsumerWorkingSet(co.cask.cdap.api.dataset.lib.partitioned.ConsumerWorkingSet) PartitionedFileSet(co.cask.cdap.api.dataset.lib.PartitionedFileSet) TransactionExecutor(org.apache.tephra.TransactionExecutor) PartitionDetail(co.cask.cdap.api.dataset.lib.PartitionDetail) Predicate(co.cask.cdap.api.Predicate) PartitionFilter(co.cask.cdap.api.dataset.lib.PartitionFilter) TransactionAware(org.apache.tephra.TransactionAware) ConsumerConfiguration(co.cask.cdap.api.dataset.lib.partitioned.ConsumerConfiguration) PartitionKey(co.cask.cdap.api.dataset.lib.PartitionKey) ArrayList(java.util.ArrayList) ImmutableList(com.google.common.collect.ImmutableList) List(java.util.List) ConcurrentPartitionConsumer(co.cask.cdap.api.dataset.lib.partitioned.ConcurrentPartitionConsumer) PartitionConsumer(co.cask.cdap.api.dataset.lib.partitioned.PartitionConsumer) HashSet(java.util.HashSet) Test(org.junit.Test)

Example 38 with PartitionKey

use of co.cask.cdap.api.dataset.lib.PartitionKey in project cdap by caskdata.

the class PartitionConsumerTest method testPartitionConsumingWithPartitionAcceptor.

@Test
public void testPartitionConsumingWithPartitionAcceptor() throws Exception {
    final PartitionedFileSet dataset = dsFrameworkUtil.getInstance(pfsInstance);
    final TransactionAware txAwareDataset = (TransactionAware) dataset;
    // i will range from [0,10), s will always be 'partitionKeys1'
    final Set<PartitionKey> partitionKeys1 = new HashSet<>();
    for (int i = 0; i < 10; i++) {
        PartitionKey key = PartitionKey.builder().addIntField("i", i).addLongField("l", 17L).addStringField("s", "partitionKeys1").build();
        partitionKeys1.add(key);
    }
    // i will range from [0,15), s will always be 'partitionKeys2'
    final Set<PartitionKey> partitionKeys2 = new HashSet<>();
    for (int i = 0; i < 15; i++) {
        PartitionKey key = PartitionKey.builder().addIntField("i", i).addLongField("l", 17L).addStringField("s", "partitionKeys2").build();
        partitionKeys2.add(key);
    }
    dsFrameworkUtil.newInMemoryTransactionExecutor(txAwareDataset).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() throws Exception {
            for (final PartitionKey partitionKey : partitionKeys1) {
                dataset.getPartitionOutput(partitionKey).addPartition();
            }
            for (final PartitionKey partitionKey : partitionKeys2) {
                dataset.getPartitionOutput(partitionKey).addPartition();
            }
        }
    });
    final PartitionConsumer partitionConsumer = new ConcurrentPartitionConsumer(dataset, new InMemoryStatePersistor());
    dsFrameworkUtil.newInMemoryTransactionExecutor(txAwareDataset).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() throws Exception {
            List<Partition> consumedPartitions = new ArrayList<>();
            // specify a PartitionAcceptor that only limits to partitions where 's' field is equal to 'partitionKeys1'
            // so it will get all the partitions in partitionKeys1
            Iterables.addAll(consumedPartitions, partitionConsumer.consumePartitions(new CustomAcceptor("partitionKeys1")).getPartitions());
            // assert that we consumed all the partitions represented by partitionsKeys1
            Assert.assertEquals(partitionKeys1, toKeys(consumedPartitions));
            consumedPartitions.clear();
            // ask for partitions where 's' field is equal to 'partitionKeys2', but stop iterating upon 'i' field == 8
            Iterables.addAll(consumedPartitions, partitionConsumer.consumePartitions(new CustomAcceptor("partitionKeys2", 8)).getPartitions());
            // this will give us 8 of partitionKeys2
            Assert.assertEquals(8, consumedPartitions.size());
            // ask for the remainder of the partitions - i ranging from [8,15). Then, we will have all of 'partitionKeys2'
            Iterables.addAll(consumedPartitions, partitionConsumer.consumePartitions().getPartitions());
            Assert.assertEquals(partitionKeys2, toKeys(consumedPartitions));
        }
    });
}

Also used : ConcurrentPartitionConsumer(co.cask.cdap.api.dataset.lib.partitioned.ConcurrentPartitionConsumer) PartitionedFileSet(co.cask.cdap.api.dataset.lib.PartitionedFileSet) TransactionExecutor(org.apache.tephra.TransactionExecutor) TransactionAware(org.apache.tephra.TransactionAware) PartitionKey(co.cask.cdap.api.dataset.lib.PartitionKey) ArrayList(java.util.ArrayList) ImmutableList(com.google.common.collect.ImmutableList) List(java.util.List) ConcurrentPartitionConsumer(co.cask.cdap.api.dataset.lib.partitioned.ConcurrentPartitionConsumer) PartitionConsumer(co.cask.cdap.api.dataset.lib.partitioned.PartitionConsumer) HashSet(java.util.HashSet) Test(org.junit.Test)

Example 39 with PartitionKey

use of co.cask.cdap.api.dataset.lib.PartitionKey in project cdap by caskdata.

the class PartitionConsumerTest method testConsumeAfterDelete.

@Test
public void testConsumeAfterDelete() throws Exception {
    final PartitionedFileSet dataset = dsFrameworkUtil.getInstance(pfsInstance);
    final TransactionAware txAwareDataset = (TransactionAware) dataset;
    final Set<PartitionKey> partitionKeys1 = new HashSet<>();
    for (int i = 0; i < 3; i++) {
        partitionKeys1.add(generateUniqueKey());
    }
    // need to ensure that our consumerConfiguration is larger than the amount we consume initially, so that
    // additional partitions (which will be deleted afterwards) are brought into the working set
    ConsumerConfiguration consumerConfiguration = ConsumerConfiguration.builder().setMaxWorkingSetSize(100).build();
    final PartitionConsumer partitionConsumer = new ConcurrentPartitionConsumer(dataset, new InMemoryStatePersistor(), consumerConfiguration);
    dsFrameworkUtil.newInMemoryTransactionExecutor(txAwareDataset).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() throws Exception {
            for (PartitionKey partitionKey : partitionKeys1) {
                dataset.getPartitionOutput(partitionKey).addPartition();
            }
        }
    });
    dsFrameworkUtil.newInMemoryTransactionExecutor(txAwareDataset).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() throws Exception {
            // and not consumed
            for (int i = 0; i < 2; i++) {
                dataset.getPartitionOutput(generateUniqueKey()).addPartition();
            }
        }
    });
    dsFrameworkUtil.newInMemoryTransactionExecutor(txAwareDataset).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() throws Exception {
            // consume 3 of the 5 initial partitions
            Assert.assertEquals(partitionKeys1, toKeys(partitionConsumer.consumePartitions(3).getPartitions()));
        }
    });
    final Set<PartitionKey> partitionKeys2 = new HashSet<>();
    for (int i = 0; i < 5; i++) {
        partitionKeys2.add(generateUniqueKey());
    }
    dsFrameworkUtil.newInMemoryTransactionExecutor(txAwareDataset).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() throws Exception {
            // drop all existing partitions (2 of which are not consumed)
            for (PartitionDetail partitionDetail : dataset.getPartitions(PartitionFilter.ALWAYS_MATCH)) {
                dataset.dropPartition(partitionDetail.getPartitionKey());
            }
            // add 5 new ones
            for (PartitionKey partitionKey : partitionKeys2) {
                dataset.getPartitionOutput(partitionKey).addPartition();
            }
        }
    });
    dsFrameworkUtil.newInMemoryTransactionExecutor(txAwareDataset).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() throws Exception {
            // the consumed partition keys should correspond to partitionKeys2, and not include the dropped, but unconsumed
            // partitions added before them
            Assert.assertEquals(partitionKeys2, toKeys(partitionConsumer.consumePartitions().getPartitions()));
        }
    });
    dsFrameworkUtil.newInMemoryTransactionExecutor(txAwareDataset).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() throws Exception {
            // consuming the partitions again, without adding any new partitions returns an empty iterator
            Assert.assertTrue(partitionConsumer.consumePartitions().getPartitions().isEmpty());
        }
    });
    dsFrameworkUtil.newInMemoryTransactionExecutor(txAwareDataset).execute(new TransactionExecutor.Subroutine() {

        @Override
        public void apply() throws Exception {
            // creating a new PartitionConsumer resets the consumption state. Consuming from it then returns an iterator
            // with all the partition keys added after the deletions
            ConcurrentPartitionConsumer partitionConsumer2 = new ConcurrentPartitionConsumer(dataset, new InMemoryStatePersistor());
            Assert.assertEquals(partitionKeys2, toKeys(partitionConsumer2.consumePartitions().getPartitions()));
        }
    });
}

Also used : ConcurrentPartitionConsumer(co.cask.cdap.api.dataset.lib.partitioned.ConcurrentPartitionConsumer) PartitionedFileSet(co.cask.cdap.api.dataset.lib.PartitionedFileSet) TransactionExecutor(org.apache.tephra.TransactionExecutor) PartitionDetail(co.cask.cdap.api.dataset.lib.PartitionDetail) TransactionAware(org.apache.tephra.TransactionAware) ConsumerConfiguration(co.cask.cdap.api.dataset.lib.partitioned.ConsumerConfiguration) PartitionKey(co.cask.cdap.api.dataset.lib.PartitionKey) ConcurrentPartitionConsumer(co.cask.cdap.api.dataset.lib.partitioned.ConcurrentPartitionConsumer) PartitionConsumer(co.cask.cdap.api.dataset.lib.partitioned.PartitionConsumer) HashSet(java.util.HashSet) Test(org.junit.Test)

Example 40 with PartitionKey

use of co.cask.cdap.api.dataset.lib.PartitionKey in project cdap by caskdata.

the class PartitionKeyCodec method deserialize.

@Override
public PartitionKey deserialize(JsonElement jsonElement, Type type, JsonDeserializationContext jsonDeserializationContext) throws JsonParseException {
    JsonObject jsonObject = jsonElement.getAsJsonObject();
    PartitionKey.Builder builder = PartitionKey.builder();
    for (Map.Entry<String, JsonElement> entry : jsonObject.entrySet()) {
        JsonArray jsonArray = entry.getValue().getAsJsonArray();
        builder.addField(entry.getKey(), deserializeComparable(jsonArray, jsonDeserializationContext));
    }
    return builder.build();
}

Also used : JsonArray(com.google.gson.JsonArray) JsonElement(com.google.gson.JsonElement) JsonObject(com.google.gson.JsonObject) PartitionKey(co.cask.cdap.api.dataset.lib.PartitionKey) Map(java.util.Map)

Aggregations

PartitionKey (co.cask.cdap.api.dataset.lib.PartitionKey)59 PartitionedFileSet (co.cask.cdap.api.dataset.lib.PartitionedFileSet)28 Test (org.junit.Test)27 TransactionAware (org.apache.tephra.TransactionAware)17 TransactionExecutor (org.apache.tephra.TransactionExecutor)17 IOException (java.io.IOException)12 HashMap (java.util.HashMap)12 PartitionDetail (co.cask.cdap.api.dataset.lib.PartitionDetail)11 ConcurrentPartitionConsumer (co.cask.cdap.api.dataset.lib.partitioned.ConcurrentPartitionConsumer)11 PartitionConsumer (co.cask.cdap.api.dataset.lib.partitioned.PartitionConsumer)11 ArrayList (java.util.ArrayList)11 List (java.util.List)11 HashSet (java.util.HashSet)10 DataSetException (co.cask.cdap.api.dataset.DataSetException)9 ImmutableList (com.google.common.collect.ImmutableList)9 PartitionNotFoundException (co.cask.cdap.api.dataset.PartitionNotFoundException)7 Partition (co.cask.cdap.api.dataset.lib.Partition)7 ConsumerConfiguration (co.cask.cdap.api.dataset.lib.partitioned.ConsumerConfiguration)7 TimePartitionedFileSet (co.cask.cdap.api.dataset.lib.TimePartitionedFileSet)6 Location (org.apache.twill.filesystem.Location)6