Search in sources :

Example 11 with Firehose

use of io.druid.data.input.Firehose in project druid by druid-io.

the class KafkaEightFirehoseFactory method connect.

@Override
public Firehose connect(final ByteBufferInputRowParser firehoseParser) throws IOException {
    Set<String> newDimExclus = Sets.union(firehoseParser.getParseSpec().getDimensionsSpec().getDimensionExclusions(), Sets.newHashSet("feed"));
    final ByteBufferInputRowParser theParser = firehoseParser.withParseSpec(firehoseParser.getParseSpec().withDimensionsSpec(firehoseParser.getParseSpec().getDimensionsSpec().withDimensionExclusions(newDimExclus)));
    final ConsumerConnector connector = Consumer.createJavaConsumerConnector(new ConsumerConfig(consumerProps));
    final Map<String, List<KafkaStream<byte[], byte[]>>> streams = connector.createMessageStreams(ImmutableMap.of(feed, 1));
    final List<KafkaStream<byte[], byte[]>> streamList = streams.get(feed);
    if (streamList == null || streamList.size() != 1) {
        return null;
    }
    final KafkaStream<byte[], byte[]> stream = streamList.get(0);
    final ConsumerIterator<byte[], byte[]> iter = stream.iterator();
    return new Firehose() {

        @Override
        public boolean hasMore() {
            return iter.hasNext();
        }

        @Override
        public InputRow nextRow() {
            try {
                final byte[] message = iter.next().message();
                if (message == null) {
                    return null;
                }
                return theParser.parse(ByteBuffer.wrap(message));
            } catch (InvalidMessageException e) {
                /*
          IF the CRC is caused within the wire transfer, this is not the best way to handel CRC.
          Probably it is better to shutdown the fireHose without commit and start it again.
           */
                log.error(e, "Message failed its checksum and it is corrupt, will skip it");
                return null;
            }
        }

        @Override
        public Runnable commit() {
            return new Runnable() {

                @Override
                public void run() {
                    /*
                 This is actually not going to do exactly what we want, cause it will be called asynchronously
                 after the persist is complete.  So, it's going to commit that it's processed more than was actually
                 persisted.  This is unfortunate, but good enough for now.  Should revisit along with an upgrade
                 of our Kafka version.
               */
                    log.info("committing offsets");
                    connector.commitOffsets();
                }
            };
        }

        @Override
        public void close() throws IOException {
            connector.shutdown();
        }
    };
}
Also used : InvalidMessageException(kafka.message.InvalidMessageException) Firehose(io.druid.data.input.Firehose) ConsumerConnector(kafka.javaapi.consumer.ConsumerConnector) KafkaStream(kafka.consumer.KafkaStream) ByteBufferInputRowParser(io.druid.data.input.ByteBufferInputRowParser) ConsumerConfig(kafka.consumer.ConsumerConfig) List(java.util.List)

Example 12 with Firehose

use of io.druid.data.input.Firehose in project druid by druid-io.

the class ReplayableFirehoseFactoryTest method testReplayableFirehoseWithEvents.

@Test
public void testReplayableFirehoseWithEvents() throws Exception {
    final boolean[] hasMore = { true };
    expect(delegateFactory.connect(parser)).andReturn(delegateFirehose);
    expect(delegateFirehose.hasMore()).andAnswer(new IAnswer<Boolean>() {

        @Override
        public Boolean answer() throws Throwable {
            return hasMore[0];
        }
    }).anyTimes();
    expect(delegateFirehose.nextRow()).andReturn(testRows.get(0)).andReturn(testRows.get(1)).andAnswer(new IAnswer<InputRow>() {

        @Override
        public InputRow answer() throws Throwable {
            hasMore[0] = false;
            return testRows.get(2);
        }
    });
    delegateFirehose.close();
    replayAll();
    List<InputRow> rows = Lists.newArrayList();
    try (Firehose firehose = replayableFirehoseFactory.connect(parser)) {
        while (firehose.hasMore()) {
            rows.add(firehose.nextRow());
        }
    }
    Assert.assertEquals(testRows, rows);
    // now replay!
    rows.clear();
    try (Firehose firehose = replayableFirehoseFactory.connect(parser)) {
        while (firehose.hasMore()) {
            rows.add(firehose.nextRow());
        }
    }
    Assert.assertEquals(testRows, rows);
    verifyAll();
}
Also used : IAnswer(org.easymock.IAnswer) Firehose(io.druid.data.input.Firehose) MapBasedInputRow(io.druid.data.input.MapBasedInputRow) InputRow(io.druid.data.input.InputRow) Test(org.junit.Test)

Example 13 with Firehose

use of io.druid.data.input.Firehose in project druid by druid-io.

the class ReplayableFirehoseFactoryTest method testReplayableFirehoseWithMultipleFiles.

@Test
public void testReplayableFirehoseWithMultipleFiles() throws Exception {
    replayableFirehoseFactory = new ReplayableFirehoseFactory(delegateFactory, false, 1, 3, mapper);
    final boolean[] hasMore = { true };
    final int multiplicationFactor = 500;
    final InputRow finalRow = new MapBasedInputRow(DateTime.now(), Lists.newArrayList("dim4", "dim5"), ImmutableMap.<String, Object>of("dim4", "val12", "dim5", "val20", "met1", 30));
    expect(delegateFactory.connect(parser)).andReturn(delegateFirehose);
    expect(delegateFirehose.hasMore()).andAnswer(new IAnswer<Boolean>() {

        @Override
        public Boolean answer() throws Throwable {
            return hasMore[0];
        }
    }).anyTimes();
    expect(delegateFirehose.nextRow()).andReturn(testRows.get(0)).times(multiplicationFactor).andReturn(testRows.get(1)).times(multiplicationFactor).andReturn(testRows.get(2)).times(multiplicationFactor).andAnswer(new IAnswer<InputRow>() {

        @Override
        public InputRow answer() throws Throwable {
            hasMore[0] = false;
            return finalRow;
        }
    });
    delegateFirehose.close();
    replayAll();
    List<InputRow> testRowsMultiplied = Lists.newArrayList();
    for (InputRow row : testRows) {
        for (int i = 0; i < multiplicationFactor; i++) {
            testRowsMultiplied.add(row);
        }
    }
    testRowsMultiplied.add(finalRow);
    List<InputRow> rows = Lists.newArrayList();
    try (Firehose firehose = replayableFirehoseFactory.connect(parser)) {
        while (firehose.hasMore()) {
            rows.add(firehose.nextRow());
        }
    }
    Assert.assertEquals(testRowsMultiplied, rows);
    // now replay!
    rows.clear();
    try (Firehose firehose = replayableFirehoseFactory.connect(parser)) {
        while (firehose.hasMore()) {
            rows.add(firehose.nextRow());
        }
    }
    Assert.assertEquals(testRowsMultiplied, rows);
    verifyAll();
}
Also used : IAnswer(org.easymock.IAnswer) Firehose(io.druid.data.input.Firehose) MapBasedInputRow(io.druid.data.input.MapBasedInputRow) InputRow(io.druid.data.input.InputRow) ReplayableFirehoseFactory(io.druid.segment.realtime.firehose.ReplayableFirehoseFactory) MapBasedInputRow(io.druid.data.input.MapBasedInputRow) Test(org.junit.Test)

Example 14 with Firehose

use of io.druid.data.input.Firehose in project druid by druid-io.

the class ReplayableFirehoseFactoryTest method testReplayableFirehoseWithoutReportParseExceptions.

@Test
public void testReplayableFirehoseWithoutReportParseExceptions() throws Exception {
    final boolean[] hasMore = { true };
    replayableFirehoseFactory = new ReplayableFirehoseFactory(delegateFactory, false, 10000, 3, mapper);
    expect(delegateFactory.connect(parser)).andReturn(delegateFirehose);
    expect(delegateFirehose.hasMore()).andAnswer(new IAnswer<Boolean>() {

        @Override
        public Boolean answer() throws Throwable {
            return hasMore[0];
        }
    }).anyTimes();
    expect(delegateFirehose.nextRow()).andReturn(testRows.get(0)).andReturn(testRows.get(1)).andThrow(new ParseException("unparseable!")).andAnswer(new IAnswer<InputRow>() {

        @Override
        public InputRow answer() throws Throwable {
            hasMore[0] = false;
            return testRows.get(2);
        }
    });
    delegateFirehose.close();
    replayAll();
    List<InputRow> rows = Lists.newArrayList();
    try (Firehose firehose = replayableFirehoseFactory.connect(parser)) {
        while (firehose.hasMore()) {
            rows.add(firehose.nextRow());
        }
    }
    Assert.assertEquals(testRows, rows);
    verifyAll();
}
Also used : IAnswer(org.easymock.IAnswer) Firehose(io.druid.data.input.Firehose) MapBasedInputRow(io.druid.data.input.MapBasedInputRow) InputRow(io.druid.data.input.InputRow) ReplayableFirehoseFactory(io.druid.segment.realtime.firehose.ReplayableFirehoseFactory) ParseException(io.druid.java.util.common.parsers.ParseException) Test(org.junit.Test)

Example 15 with Firehose

use of io.druid.data.input.Firehose in project druid by druid-io.

the class RocketMQFirehoseFactory method connect.

@Override
public Firehose connect(ByteBufferInputRowParser byteBufferInputRowParser) throws IOException, ParseException {
    Set<String> newDimExclus = Sets.union(byteBufferInputRowParser.getParseSpec().getDimensionsSpec().getDimensionExclusions(), Sets.newHashSet("feed"));
    final ByteBufferInputRowParser theParser = byteBufferInputRowParser.withParseSpec(byteBufferInputRowParser.getParseSpec().withDimensionsSpec(byteBufferInputRowParser.getParseSpec().getDimensionsSpec().withDimensionExclusions(newDimExclus)));
    /**
     * Topic-Queue mapping.
     */
    final ConcurrentHashMap<String, Set<MessageQueue>> topicQueueMap;
    /**
     * Default Pull-style client for RocketMQ.
     */
    final DefaultMQPullConsumer defaultMQPullConsumer;
    final DruidPullMessageService pullMessageService;
    messageQueueTreeSetMap.clear();
    windows.clear();
    try {
        defaultMQPullConsumer = new DefaultMQPullConsumer(this.consumerGroup);
        defaultMQPullConsumer.setMessageModel(MessageModel.CLUSTERING);
        topicQueueMap = new ConcurrentHashMap<>();
        pullMessageService = new DruidPullMessageService(defaultMQPullConsumer);
        for (String topic : feed) {
            Validators.checkTopic(topic);
            topicQueueMap.put(topic, defaultMQPullConsumer.fetchSubscribeMessageQueues(topic));
        }
        DruidMessageQueueListener druidMessageQueueListener = new DruidMessageQueueListener(Sets.newHashSet(feed), topicQueueMap, defaultMQPullConsumer);
        defaultMQPullConsumer.setMessageQueueListener(druidMessageQueueListener);
        defaultMQPullConsumer.start();
        pullMessageService.start();
    } catch (MQClientException e) {
        LOGGER.error("Failed to start DefaultMQPullConsumer", e);
        throw new IOException("Failed to start RocketMQ client", e);
    }
    return new Firehose() {

        @Override
        public boolean hasMore() {
            boolean hasMore = false;
            DruidPullRequest earliestPullRequest = null;
            for (Map.Entry<String, Set<MessageQueue>> entry : topicQueueMap.entrySet()) {
                for (MessageQueue messageQueue : entry.getValue()) {
                    if (JavaCompatUtils.keySet(messageQueueTreeSetMap).contains(messageQueue) && !messageQueueTreeSetMap.get(messageQueue).isEmpty()) {
                        hasMore = true;
                    } else {
                        try {
                            long offset = defaultMQPullConsumer.fetchConsumeOffset(messageQueue, false);
                            int batchSize = (null == pullBatchSize || pullBatchSize.isEmpty()) ? DEFAULT_PULL_BATCH_SIZE : Integer.parseInt(pullBatchSize);
                            DruidPullRequest newPullRequest = new DruidPullRequest(messageQueue, null, offset, batchSize, !hasMessagesPending());
                            // notify pull message service to pull messages from brokers.
                            pullMessageService.putRequest(newPullRequest);
                            // set the earliest pull in case we need to block.
                            if (null == earliestPullRequest) {
                                earliestPullRequest = newPullRequest;
                            }
                        } catch (MQClientException e) {
                            LOGGER.error("Failed to fetch consume offset for queue: {}", entry.getKey());
                        }
                    }
                }
            }
            // Block only when there is no locally pending messages.
            if (!hasMore && null != earliestPullRequest) {
                try {
                    earliestPullRequest.getCountDownLatch().await();
                    hasMore = true;
                } catch (InterruptedException e) {
                    LOGGER.error("CountDownLatch await got interrupted", e);
                }
            }
            return hasMore;
        }

        @Override
        public InputRow nextRow() {
            for (Map.Entry<MessageQueue, ConcurrentSkipListSet<MessageExt>> entry : messageQueueTreeSetMap.entrySet()) {
                if (!entry.getValue().isEmpty()) {
                    MessageExt message = entry.getValue().pollFirst();
                    InputRow inputRow = theParser.parse(ByteBuffer.wrap(message.getBody()));
                    if (!JavaCompatUtils.keySet(windows).contains(entry.getKey())) {
                        windows.put(entry.getKey(), new ConcurrentSkipListSet<Long>());
                    }
                    windows.get(entry.getKey()).add(message.getQueueOffset());
                    return inputRow;
                }
            }
            // should never happen.
            throw new RuntimeException("Unexpected Fatal Error! There should have been one row available.");
        }

        @Override
        public Runnable commit() {
            return new Runnable() {

                @Override
                public void run() {
                    OffsetStore offsetStore = defaultMQPullConsumer.getOffsetStore();
                    Set<MessageQueue> updated = new HashSet<>();
                    // calculate offsets according to consuming windows.
                    for (ConcurrentHashMap.Entry<MessageQueue, ConcurrentSkipListSet<Long>> entry : windows.entrySet()) {
                        while (!entry.getValue().isEmpty()) {
                            long offset = offsetStore.readOffset(entry.getKey(), ReadOffsetType.MEMORY_FIRST_THEN_STORE);
                            if (offset + 1 > entry.getValue().first()) {
                                entry.getValue().pollFirst();
                            } else if (offset + 1 == entry.getValue().first()) {
                                entry.getValue().pollFirst();
                                offsetStore.updateOffset(entry.getKey(), offset + 1, true);
                                updated.add(entry.getKey());
                            } else {
                                break;
                            }
                        }
                    }
                    offsetStore.persistAll(updated);
                }
            };
        }

        @Override
        public void close() throws IOException {
            defaultMQPullConsumer.shutdown();
            pullMessageService.shutdown(false);
        }
    };
}
Also used : HashSet(java.util.HashSet) Set(java.util.Set) ConcurrentSkipListSet(java.util.concurrent.ConcurrentSkipListSet) DefaultMQPullConsumer(com.alibaba.rocketmq.client.consumer.DefaultMQPullConsumer) ConcurrentHashMap(java.util.concurrent.ConcurrentHashMap) MQClientException(com.alibaba.rocketmq.client.exception.MQClientException) HashSet(java.util.HashSet) ConcurrentSkipListSet(java.util.concurrent.ConcurrentSkipListSet) Firehose(io.druid.data.input.Firehose) IOException(java.io.IOException) ByteBufferInputRowParser(io.druid.data.input.ByteBufferInputRowParser) MessageExt(com.alibaba.rocketmq.common.message.MessageExt) MessageQueue(com.alibaba.rocketmq.common.message.MessageQueue) InputRow(io.druid.data.input.InputRow) Map(java.util.Map) ConcurrentHashMap(java.util.concurrent.ConcurrentHashMap) OffsetStore(com.alibaba.rocketmq.client.consumer.store.OffsetStore)

Aggregations

Firehose (io.druid.data.input.Firehose)18 InputRow (io.druid.data.input.InputRow)12 Test (org.junit.Test)8 MapBasedInputRow (io.druid.data.input.MapBasedInputRow)7 IOException (java.io.IOException)7 IAnswer (org.easymock.IAnswer)5 Map (java.util.Map)4 ParseException (io.druid.java.util.common.parsers.ParseException)3 HashBasedNumberedShardSpec (io.druid.timeline.partition.HashBasedNumberedShardSpec)3 List (java.util.List)3 DateTime (org.joda.time.DateTime)3 Interval (org.joda.time.Interval)3 ObjectMapper (com.fasterxml.jackson.databind.ObjectMapper)2 Optional (com.google.common.base.Optional)2 ImmutableMap (com.google.common.collect.ImmutableMap)2 ByteBufferInputRowParser (io.druid.data.input.ByteBufferInputRowParser)2 FirehoseFactory (io.druid.data.input.FirehoseFactory)2 RealtimeIOConfig (io.druid.segment.indexing.RealtimeIOConfig)2 GranularitySpec (io.druid.segment.indexing.granularity.GranularitySpec)2 ReplayableFirehoseFactory (io.druid.segment.realtime.firehose.ReplayableFirehoseFactory)2