Examples with Firehose - io.druid.data.input.Firehose

Example 6 with Firehose

use of io.druid.data.input.Firehose in project druid by druid-io.

the class ReplayableFirehoseFactoryTest method testReplayableFirehoseNoEvents.

@Test
public void testReplayableFirehoseNoEvents() throws Exception {
    expect(delegateFactory.connect(parser)).andReturn(delegateFirehose);
    expect(delegateFirehose.hasMore()).andReturn(false);
    delegateFirehose.close();
    replayAll();
    try (Firehose firehose = replayableFirehoseFactory.connect(parser)) {
        Assert.assertFalse(firehose.hasMore());
    }
    verifyAll();
}

Also used : Firehose(io.druid.data.input.Firehose) Test(org.junit.Test)

Example 7 with Firehose

use of io.druid.data.input.Firehose in project druid by druid-io.

the class IrcFirehoseFactory method connect.

@Override
public Firehose connect(final IrcInputRowParser firehoseParser) throws IOException {
    final IRCApi irc = new IRCApiImpl(false);
    final LinkedBlockingQueue<Pair<DateTime, ChannelPrivMsg>> queue = new LinkedBlockingQueue<Pair<DateTime, ChannelPrivMsg>>();
    irc.addListener(new VariousMessageListenerAdapter() {

        @Override
        public void onChannelMessage(ChannelPrivMsg aMsg) {
            try {
                queue.put(Pair.of(DateTime.now(), aMsg));
            } catch (InterruptedException e) {
                throw new RuntimeException("interrupted adding message to queue", e);
            }
        }
    });
    log.info("connecting to irc server [%s]", host);
    irc.connect(new IServerParameters() {

        @Override
        public String getNickname() {
            return nick;
        }

        @Override
        public List<String> getAlternativeNicknames() {
            return Lists.newArrayList(nick + UUID.randomUUID(), nick + UUID.randomUUID(), nick + UUID.randomUUID());
        }

        @Override
        public String getIdent() {
            return "druid";
        }

        @Override
        public String getRealname() {
            return nick;
        }

        @Override
        public IRCServer getServer() {
            return new IRCServer(host, false);
        }
    }, new Callback<IIRCState>() {

        @Override
        public void onSuccess(IIRCState aObject) {
            log.info("irc connection to server [%s] established", host);
            for (String chan : channels) {
                log.info("Joining channel %s", chan);
                irc.joinChannel(chan);
            }
        }

        @Override
        public void onFailure(Exception e) {
            log.error(e, "Unable to connect to irc server [%s]", host);
            throw new RuntimeException("Unable to connect to server", e);
        }
    });
    closed = false;
    return new Firehose() {

        InputRow nextRow = null;

        @Override
        public boolean hasMore() {
            try {
                while (true) {
                    Pair<DateTime, ChannelPrivMsg> nextMsg = queue.poll(100, TimeUnit.MILLISECONDS);
                    if (closed) {
                        return false;
                    }
                    if (nextMsg == null) {
                        continue;
                    }
                    try {
                        nextRow = firehoseParser.parse(nextMsg);
                        if (nextRow != null) {
                            return true;
                        }
                    } catch (IllegalArgumentException iae) {
                        log.debug("ignoring invalid message in channel [%s]", nextMsg.rhs.getChannelName());
                    }
                }
            } catch (InterruptedException e) {
                Thread.interrupted();
                throw new RuntimeException("interrupted retrieving elements from queue", e);
            }
        }

        @Override
        public InputRow nextRow() {
            return nextRow;
        }

        @Override
        public Runnable commit() {
            return new Runnable() {

                @Override
                public void run() {
                // nothing to see here
                }
            };
        }

        @Override
        public void close() throws IOException {
            try {
                log.info("disconnecting from irc server [%s]", host);
                irc.disconnect("");
            } finally {
                closed = true;
            }
        }
    };
}

Also used : VariousMessageListenerAdapter(com.ircclouds.irc.api.listeners.VariousMessageListenerAdapter) Firehose(io.druid.data.input.Firehose) IRCApiImpl(com.ircclouds.irc.api.IRCApiImpl) LinkedBlockingQueue(java.util.concurrent.LinkedBlockingQueue) DateTime(org.joda.time.DateTime) ChannelPrivMsg(com.ircclouds.irc.api.domain.messages.ChannelPrivMsg) IOException(java.io.IOException) IRCServer(com.ircclouds.irc.api.domain.IRCServer) IRCApi(com.ircclouds.irc.api.IRCApi) IIRCState(com.ircclouds.irc.api.state.IIRCState) InputRow(io.druid.data.input.InputRow) IServerParameters(com.ircclouds.irc.api.IServerParameters) List(java.util.List) Pair(io.druid.java.util.common.Pair)

Example 8 with Firehose

use of io.druid.data.input.Firehose in project druid by druid-io.

the class IndexTask method generateAndPublishSegments.

private boolean generateAndPublishSegments(final TaskToolbox toolbox, final DataSchema dataSchema, final Map<Interval, List<ShardSpec>> shardSpecs, final String version, final FirehoseFactory firehoseFactory) throws IOException, InterruptedException {
    final GranularitySpec granularitySpec = dataSchema.getGranularitySpec();
    final FireDepartment fireDepartmentForMetrics = new FireDepartment(dataSchema, new RealtimeIOConfig(null, null, null), null);
    final FireDepartmentMetrics fireDepartmentMetrics = fireDepartmentForMetrics.getMetrics();
    final Map<String, ShardSpec> sequenceNameToShardSpecMap = Maps.newHashMap();
    if (toolbox.getMonitorScheduler() != null) {
        toolbox.getMonitorScheduler().addMonitor(new RealtimeMetricsMonitor(ImmutableList.of(fireDepartmentForMetrics), ImmutableMap.of(DruidMetrics.TASK_ID, new String[] { getId() })));
    }
    final SegmentAllocator segmentAllocator;
    if (ingestionSchema.getIOConfig().isAppendToExisting()) {
        segmentAllocator = new ActionBasedSegmentAllocator(toolbox.getTaskActionClient(), dataSchema);
    } else {
        segmentAllocator = new SegmentAllocator() {

            @Override
            public SegmentIdentifier allocate(DateTime timestamp, String sequenceName, String previousSegmentId) throws IOException {
                Optional<Interval> interval = granularitySpec.bucketInterval(timestamp);
                if (!interval.isPresent()) {
                    throw new ISE("Could not find interval for timestamp [%s]", timestamp);
                }
                ShardSpec shardSpec = sequenceNameToShardSpecMap.get(sequenceName);
                if (shardSpec == null) {
                    throw new ISE("Could not find ShardSpec for sequenceName [%s]", sequenceName);
                }
                return new SegmentIdentifier(getDataSource(), interval.get(), version, shardSpec);
            }
        };
    }
    try (final Appenderator appenderator = newAppenderator(fireDepartmentMetrics, toolbox, dataSchema);
        final FiniteAppenderatorDriver driver = newDriver(appenderator, toolbox, segmentAllocator, fireDepartmentMetrics);
        final Firehose firehose = firehoseFactory.connect(dataSchema.getParser())) {
        final Supplier<Committer> committerSupplier = Committers.supplierFromFirehose(firehose);
        final Map<Interval, ShardSpecLookup> shardSpecLookups = Maps.newHashMap();
        if (driver.startJob() != null) {
            driver.clear();
        }
        try {
            while (firehose.hasMore()) {
                try {
                    final InputRow inputRow = firehose.nextRow();
                    final Optional<Interval> optInterval = granularitySpec.bucketInterval(inputRow.getTimestamp());
                    if (!optInterval.isPresent()) {
                        fireDepartmentMetrics.incrementThrownAway();
                        continue;
                    }
                    final Interval interval = optInterval.get();
                    if (!shardSpecLookups.containsKey(interval)) {
                        final List<ShardSpec> intervalShardSpecs = shardSpecs.get(interval);
                        if (intervalShardSpecs == null || intervalShardSpecs.isEmpty()) {
                            throw new ISE("Failed to get shardSpec for interval[%s]", interval);
                        }
                        shardSpecLookups.put(interval, intervalShardSpecs.get(0).getLookup(intervalShardSpecs));
                    }
                    final ShardSpec shardSpec = shardSpecLookups.get(interval).getShardSpec(inputRow.getTimestampFromEpoch(), inputRow);
                    final String sequenceName = String.format("index_%s_%s_%d", interval, version, shardSpec.getPartitionNum());
                    if (!sequenceNameToShardSpecMap.containsKey(sequenceName)) {
                        final ShardSpec shardSpecForPublishing = ingestionSchema.getTuningConfig().isForceExtendableShardSpecs() || ingestionSchema.getIOConfig().isAppendToExisting() ? new NumberedShardSpec(shardSpec.getPartitionNum(), shardSpecs.get(interval).size()) : shardSpec;
                        sequenceNameToShardSpecMap.put(sequenceName, shardSpecForPublishing);
                    }
                    final SegmentIdentifier identifier = driver.add(inputRow, sequenceName, committerSupplier);
                    if (identifier == null) {
                        throw new ISE("Could not allocate segment for row with timestamp[%s]", inputRow.getTimestamp());
                    }
                    fireDepartmentMetrics.incrementProcessed();
                } catch (ParseException e) {
                    if (ingestionSchema.getTuningConfig().isReportParseExceptions()) {
                        throw e;
                    } else {
                        fireDepartmentMetrics.incrementUnparseable();
                    }
                }
            }
        } finally {
            driver.persist(committerSupplier.get());
        }
        final TransactionalSegmentPublisher publisher = new TransactionalSegmentPublisher() {

            @Override
            public boolean publishSegments(Set<DataSegment> segments, Object commitMetadata) throws IOException {
                final SegmentTransactionalInsertAction action = new SegmentTransactionalInsertAction(segments, null, null);
                return toolbox.getTaskActionClient().submit(action).isSuccess();
            }
        };
        final SegmentsAndMetadata published = driver.finish(publisher, committerSupplier.get());
        if (published == null) {
            log.error("Failed to publish segments, aborting!");
            return false;
        } else {
            log.info("Published segments[%s]", Joiner.on(", ").join(Iterables.transform(published.getSegments(), new Function<DataSegment, String>() {

                @Override
                public String apply(DataSegment input) {
                    return input.getIdentifier();
                }
            })));
            return true;
        }
    }
}

Also used : RealtimeIOConfig(io.druid.segment.indexing.RealtimeIOConfig) SortedSet(java.util.SortedSet) Set(java.util.Set) SegmentIdentifier(io.druid.segment.realtime.appenderator.SegmentIdentifier) ShardSpecLookup(io.druid.timeline.partition.ShardSpecLookup) SegmentTransactionalInsertAction(io.druid.indexing.common.actions.SegmentTransactionalInsertAction) DataSegment(io.druid.timeline.DataSegment) NoneShardSpec(io.druid.timeline.partition.NoneShardSpec) ShardSpec(io.druid.timeline.partition.ShardSpec) NumberedShardSpec(io.druid.timeline.partition.NumberedShardSpec) HashBasedNumberedShardSpec(io.druid.timeline.partition.HashBasedNumberedShardSpec) DateTime(org.joda.time.DateTime) FireDepartment(io.druid.segment.realtime.FireDepartment) TransactionalSegmentPublisher(io.druid.segment.realtime.appenderator.TransactionalSegmentPublisher) ActionBasedSegmentAllocator(io.druid.indexing.appenderator.ActionBasedSegmentAllocator) ISE(io.druid.java.util.common.ISE) NumberedShardSpec(io.druid.timeline.partition.NumberedShardSpec) HashBasedNumberedShardSpec(io.druid.timeline.partition.HashBasedNumberedShardSpec) Optional(com.google.common.base.Optional) Firehose(io.druid.data.input.Firehose) SegmentsAndMetadata(io.druid.segment.realtime.appenderator.SegmentsAndMetadata) IOException(java.io.IOException) FireDepartmentMetrics(io.druid.segment.realtime.FireDepartmentMetrics) Appenderator(io.druid.segment.realtime.appenderator.Appenderator) GranularitySpec(io.druid.segment.indexing.granularity.GranularitySpec) ActionBasedSegmentAllocator(io.druid.indexing.appenderator.ActionBasedSegmentAllocator) SegmentAllocator(io.druid.segment.realtime.appenderator.SegmentAllocator) FiniteAppenderatorDriver(io.druid.segment.realtime.appenderator.FiniteAppenderatorDriver) InputRow(io.druid.data.input.InputRow) RealtimeMetricsMonitor(io.druid.segment.realtime.RealtimeMetricsMonitor) Committer(io.druid.data.input.Committer) ParseException(io.druid.java.util.common.parsers.ParseException) Interval(org.joda.time.Interval)

Example 9 with Firehose

use of io.druid.data.input.Firehose in project druid by druid-io.

the class IndexTask method determineShardSpecs.

/**
   * Determines the number of shards for each interval using a hash of queryGranularity timestamp + all dimensions (i.e
   * hash-based partitioning). In the future we may want to also support single-dimension partitioning.
   */
private Map<Interval, List<ShardSpec>> determineShardSpecs(final TaskToolbox toolbox, final FirehoseFactory firehoseFactory) throws IOException {
    final ObjectMapper jsonMapper = toolbox.getObjectMapper();
    final GranularitySpec granularitySpec = ingestionSchema.getDataSchema().getGranularitySpec();
    final Granularity queryGranularity = granularitySpec.getQueryGranularity();
    final boolean determineNumPartitions = ingestionSchema.getTuningConfig().getNumShards() == null;
    final boolean determineIntervals = !ingestionSchema.getDataSchema().getGranularitySpec().bucketIntervals().isPresent();
    final Map<Interval, List<ShardSpec>> shardSpecs = Maps.newHashMap();
    // if we were given number of shards per interval and the intervals, we don't need to scan the data
    if (!determineNumPartitions && !determineIntervals) {
        log.info("numShards and intervals provided, skipping determine partition scan");
        final SortedSet<Interval> intervals = ingestionSchema.getDataSchema().getGranularitySpec().bucketIntervals().get();
        final int numShards = ingestionSchema.getTuningConfig().getNumShards();
        for (Interval interval : intervals) {
            final List<ShardSpec> intervalShardSpecs = Lists.newArrayListWithCapacity(numShards);
            if (numShards > 1) {
                for (int i = 0; i < numShards; i++) {
                    intervalShardSpecs.add(new HashBasedNumberedShardSpec(i, numShards, null, jsonMapper));
                }
            } else {
                intervalShardSpecs.add(NoneShardSpec.instance());
            }
            shardSpecs.put(interval, intervalShardSpecs);
        }
        return shardSpecs;
    }
    // determine intervals containing data and prime HLL collectors
    final Map<Interval, Optional<HyperLogLogCollector>> hllCollectors = Maps.newHashMap();
    int thrownAway = 0;
    log.info("Determining intervals and shardSpecs");
    long determineShardSpecsStartMillis = System.currentTimeMillis();
    try (final Firehose firehose = firehoseFactory.connect(ingestionSchema.getDataSchema().getParser())) {
        while (firehose.hasMore()) {
            final InputRow inputRow = firehose.nextRow();
            final Interval interval;
            if (determineIntervals) {
                interval = granularitySpec.getSegmentGranularity().bucket(inputRow.getTimestamp());
            } else {
                final Optional<Interval> optInterval = granularitySpec.bucketInterval(inputRow.getTimestamp());
                if (!optInterval.isPresent()) {
                    thrownAway++;
                    continue;
                }
                interval = optInterval.get();
            }
            if (!determineNumPartitions) {
                // for the interval and don't instantiate a HLL collector
                if (!hllCollectors.containsKey(interval)) {
                    hllCollectors.put(interval, Optional.<HyperLogLogCollector>absent());
                }
                continue;
            }
            if (!hllCollectors.containsKey(interval)) {
                hllCollectors.put(interval, Optional.of(HyperLogLogCollector.makeLatestCollector()));
            }
            List<Object> groupKey = Rows.toGroupKey(queryGranularity.bucketStart(inputRow.getTimestamp()).getMillis(), inputRow);
            hllCollectors.get(interval).get().add(hashFunction.hashBytes(jsonMapper.writeValueAsBytes(groupKey)).asBytes());
        }
    }
    if (thrownAway > 0) {
        log.warn("Unable to to find a matching interval for [%,d] events", thrownAway);
    }
    final ImmutableSortedMap<Interval, Optional<HyperLogLogCollector>> sortedMap = ImmutableSortedMap.copyOf(hllCollectors, Comparators.intervalsByStartThenEnd());
    for (final Map.Entry<Interval, Optional<HyperLogLogCollector>> entry : sortedMap.entrySet()) {
        final Interval interval = entry.getKey();
        final Optional<HyperLogLogCollector> collector = entry.getValue();
        final int numShards;
        if (determineNumPartitions) {
            final long numRows = new Double(collector.get().estimateCardinality()).longValue();
            numShards = (int) Math.ceil((double) numRows / ingestionSchema.getTuningConfig().getTargetPartitionSize());
            log.info("Estimated [%,d] rows of data for interval [%s], creating [%,d] shards", numRows, interval, numShards);
        } else {
            numShards = ingestionSchema.getTuningConfig().getNumShards();
            log.info("Creating [%,d] shards for interval [%s]", numShards, interval);
        }
        final List<ShardSpec> intervalShardSpecs = Lists.newArrayListWithCapacity(numShards);
        if (numShards > 1) {
            for (int i = 0; i < numShards; i++) {
                intervalShardSpecs.add(new HashBasedNumberedShardSpec(i, numShards, null, jsonMapper));
            }
        } else {
            intervalShardSpecs.add(NoneShardSpec.instance());
        }
        shardSpecs.put(interval, intervalShardSpecs);
    }
    log.info("Found intervals and shardSpecs in %,dms", System.currentTimeMillis() - determineShardSpecsStartMillis);
    return shardSpecs;
}

Also used : Granularity(io.druid.java.util.common.granularity.Granularity) NoneShardSpec(io.druid.timeline.partition.NoneShardSpec) ShardSpec(io.druid.timeline.partition.ShardSpec) NumberedShardSpec(io.druid.timeline.partition.NumberedShardSpec) HashBasedNumberedShardSpec(io.druid.timeline.partition.HashBasedNumberedShardSpec) List(java.util.List) ImmutableList(com.google.common.collect.ImmutableList) ObjectMapper(com.fasterxml.jackson.databind.ObjectMapper) HashBasedNumberedShardSpec(io.druid.timeline.partition.HashBasedNumberedShardSpec) Optional(com.google.common.base.Optional) Firehose(io.druid.data.input.Firehose) HyperLogLogCollector(io.druid.hll.HyperLogLogCollector) GranularitySpec(io.druid.segment.indexing.granularity.GranularitySpec) InputRow(io.druid.data.input.InputRow) Map(java.util.Map) ImmutableMap(com.google.common.collect.ImmutableMap) ImmutableSortedMap(com.google.common.collect.ImmutableSortedMap) Interval(org.joda.time.Interval)

Example 10 with Firehose

use of io.druid.data.input.Firehose in project druid by druid-io.

the class NoopTask method run.

@Override
public TaskStatus run(TaskToolbox toolbox) throws Exception {
    if (firehoseFactory != null) {
        log.info("Connecting firehose");
    }
    try (Firehose firehose = firehoseFactory != null ? firehoseFactory.connect(null) : null) {
        log.info("Running noop task[%s]", getId());
        log.info("Sleeping for %,d millis.", runTime);
        Thread.sleep(runTime);
        log.info("Woke up!");
        return TaskStatus.success(getId());
    }
}

Also used : Firehose(io.druid.data.input.Firehose)

Aggregations

Firehose (io.druid.data.input.Firehose)18 InputRow (io.druid.data.input.InputRow)12 Test (org.junit.Test)8 MapBasedInputRow (io.druid.data.input.MapBasedInputRow)7 IOException (java.io.IOException)7 IAnswer (org.easymock.IAnswer)5 Map (java.util.Map)4 ParseException (io.druid.java.util.common.parsers.ParseException)3 HashBasedNumberedShardSpec (io.druid.timeline.partition.HashBasedNumberedShardSpec)3 List (java.util.List)3 DateTime (org.joda.time.DateTime)3 Interval (org.joda.time.Interval)3 ObjectMapper (com.fasterxml.jackson.databind.ObjectMapper)2 Optional (com.google.common.base.Optional)2 ImmutableMap (com.google.common.collect.ImmutableMap)2 ByteBufferInputRowParser (io.druid.data.input.ByteBufferInputRowParser)2 FirehoseFactory (io.druid.data.input.FirehoseFactory)2 RealtimeIOConfig (io.druid.segment.indexing.RealtimeIOConfig)2 GranularitySpec (io.druid.segment.indexing.granularity.GranularitySpec)2 ReplayableFirehoseFactory (io.druid.segment.realtime.firehose.ReplayableFirehoseFactory)2