Search in sources :

Example 26 with InputRow

use of io.druid.data.input.InputRow in project druid by druid-io.

the class IrcFirehoseFactory method connect.

@Override
public Firehose connect(final IrcInputRowParser firehoseParser) throws IOException {
    final IRCApi irc = new IRCApiImpl(false);
    final LinkedBlockingQueue<Pair<DateTime, ChannelPrivMsg>> queue = new LinkedBlockingQueue<Pair<DateTime, ChannelPrivMsg>>();
    irc.addListener(new VariousMessageListenerAdapter() {

        @Override
        public void onChannelMessage(ChannelPrivMsg aMsg) {
            try {
                queue.put(Pair.of(DateTime.now(), aMsg));
            } catch (InterruptedException e) {
                throw new RuntimeException("interrupted adding message to queue", e);
            }
        }
    });
    log.info("connecting to irc server [%s]", host);
    irc.connect(new IServerParameters() {

        @Override
        public String getNickname() {
            return nick;
        }

        @Override
        public List<String> getAlternativeNicknames() {
            return Lists.newArrayList(nick + UUID.randomUUID(), nick + UUID.randomUUID(), nick + UUID.randomUUID());
        }

        @Override
        public String getIdent() {
            return "druid";
        }

        @Override
        public String getRealname() {
            return nick;
        }

        @Override
        public IRCServer getServer() {
            return new IRCServer(host, false);
        }
    }, new Callback<IIRCState>() {

        @Override
        public void onSuccess(IIRCState aObject) {
            log.info("irc connection to server [%s] established", host);
            for (String chan : channels) {
                log.info("Joining channel %s", chan);
                irc.joinChannel(chan);
            }
        }

        @Override
        public void onFailure(Exception e) {
            log.error(e, "Unable to connect to irc server [%s]", host);
            throw new RuntimeException("Unable to connect to server", e);
        }
    });
    closed = false;
    return new Firehose() {

        InputRow nextRow = null;

        @Override
        public boolean hasMore() {
            try {
                while (true) {
                    Pair<DateTime, ChannelPrivMsg> nextMsg = queue.poll(100, TimeUnit.MILLISECONDS);
                    if (closed) {
                        return false;
                    }
                    if (nextMsg == null) {
                        continue;
                    }
                    try {
                        nextRow = firehoseParser.parse(nextMsg);
                        if (nextRow != null) {
                            return true;
                        }
                    } catch (IllegalArgumentException iae) {
                        log.debug("ignoring invalid message in channel [%s]", nextMsg.rhs.getChannelName());
                    }
                }
            } catch (InterruptedException e) {
                Thread.interrupted();
                throw new RuntimeException("interrupted retrieving elements from queue", e);
            }
        }

        @Override
        public InputRow nextRow() {
            return nextRow;
        }

        @Override
        public Runnable commit() {
            return new Runnable() {

                @Override
                public void run() {
                // nothing to see here
                }
            };
        }

        @Override
        public void close() throws IOException {
            try {
                log.info("disconnecting from irc server [%s]", host);
                irc.disconnect("");
            } finally {
                closed = true;
            }
        }
    };
}
Also used : VariousMessageListenerAdapter(com.ircclouds.irc.api.listeners.VariousMessageListenerAdapter) Firehose(io.druid.data.input.Firehose) IRCApiImpl(com.ircclouds.irc.api.IRCApiImpl) LinkedBlockingQueue(java.util.concurrent.LinkedBlockingQueue) DateTime(org.joda.time.DateTime) ChannelPrivMsg(com.ircclouds.irc.api.domain.messages.ChannelPrivMsg) IOException(java.io.IOException) IRCServer(com.ircclouds.irc.api.domain.IRCServer) IRCApi(com.ircclouds.irc.api.IRCApi) IIRCState(com.ircclouds.irc.api.state.IIRCState) InputRow(io.druid.data.input.InputRow) IServerParameters(com.ircclouds.irc.api.IServerParameters) List(java.util.List) Pair(io.druid.java.util.common.Pair)

Example 27 with InputRow

use of io.druid.data.input.InputRow in project druid by druid-io.

the class PredicateFirehose method nextRow.

@Override
public InputRow nextRow() {
    final InputRow row = savedInputRow;
    savedInputRow = null;
    return row;
}
Also used : InputRow(io.druid.data.input.InputRow)

Example 28 with InputRow

use of io.druid.data.input.InputRow in project druid by druid-io.

the class PredicateFirehose method hasMore.

@Override
public boolean hasMore() {
    if (savedInputRow != null) {
        return true;
    }
    while (firehose.hasMore()) {
        final InputRow row = firehose.nextRow();
        if (predicate.apply(row)) {
            savedInputRow = row;
            return true;
        }
        // Do not silently discard the rows
        if (ignored % IGNORE_THRESHOLD == 0) {
            log.warn("[%,d] InputRow(s) ignored as they do not satisfy the predicate", ignored);
        }
        ignored++;
    }
    return false;
}
Also used : InputRow(io.druid.data.input.InputRow)

Example 29 with InputRow

use of io.druid.data.input.InputRow in project druid by druid-io.

the class Plumbers method addNextRow.

public static void addNextRow(final Supplier<Committer> committerSupplier, final Firehose firehose, final Plumber plumber, final boolean reportParseExceptions, final FireDepartmentMetrics metrics) {
    final InputRow inputRow;
    try {
        inputRow = firehose.nextRow();
    } catch (ParseException e) {
        if (reportParseExceptions) {
            throw e;
        } else {
            log.debug(e, "Discarded row due to exception, considering unparseable.");
            metrics.incrementUnparseable();
            return;
        }
    }
    if (inputRow == null) {
        if (reportParseExceptions) {
            throw new ParseException("null input row");
        } else {
            log.debug("Discarded null input row, considering unparseable.");
            metrics.incrementUnparseable();
            return;
        }
    }
    final int numRows;
    try {
        numRows = plumber.add(inputRow, committerSupplier);
    } catch (IndexSizeExceededException e) {
        // plumber.add should be swapping out indexes before they fill up.
        throw new ISE(e, "WTF?! Index size exceeded, this shouldn't happen. Bad Plumber!");
    }
    if (numRows == -1) {
        metrics.incrementThrownAway();
        log.debug("Discarded row[%s], considering thrownAway.", inputRow);
        return;
    }
    metrics.incrementProcessed();
}
Also used : InputRow(io.druid.data.input.InputRow) ISE(io.druid.java.util.common.ISE) ParseException(io.druid.java.util.common.parsers.ParseException) IndexSizeExceededException(io.druid.segment.incremental.IndexSizeExceededException)

Example 30 with InputRow

use of io.druid.data.input.InputRow in project druid by druid-io.

the class SqlBenchmark method setup.

@Setup(Level.Trial)
public void setup() throws Exception {
    tmpDir = Files.createTempDir();
    log.info("Starting benchmark setup using tmpDir[%s], rows[%,d].", tmpDir, rowsPerSegment);
    if (ComplexMetrics.getSerdeForType("hyperUnique") == null) {
        ComplexMetrics.registerSerde("hyperUnique", new HyperUniquesSerde(HyperLogLogHash.getDefault()));
    }
    final BenchmarkSchemaInfo schemaInfo = BenchmarkSchemas.SCHEMA_MAP.get("basic");
    final BenchmarkDataGenerator dataGenerator = new BenchmarkDataGenerator(schemaInfo.getColumnSchemas(), RNG_SEED + 1, schemaInfo.getDataInterval(), rowsPerSegment);
    final List<InputRow> rows = Lists.newArrayList();
    for (int i = 0; i < rowsPerSegment; i++) {
        final InputRow row = dataGenerator.nextRow();
        if (i % 20000 == 0) {
            log.info("%,d/%,d rows generated.", i, rowsPerSegment);
        }
        rows.add(row);
    }
    log.info("%,d/%,d rows generated.", rows.size(), rowsPerSegment);
    final PlannerConfig plannerConfig = new PlannerConfig();
    final QueryRunnerFactoryConglomerate conglomerate = CalciteTests.queryRunnerFactoryConglomerate();
    final QueryableIndex index = IndexBuilder.create().tmpDir(new File(tmpDir, "1")).indexMerger(TestHelper.getTestIndexMergerV9()).rows(rows).buildMMappedIndex();
    this.walker = new SpecificSegmentsQuerySegmentWalker(conglomerate).add(DataSegment.builder().dataSource("foo").interval(index.getDataInterval()).version("1").shardSpec(new LinearShardSpec(0)).build(), index);
    final Map<String, Table> tableMap = ImmutableMap.<String, Table>of("foo", new DruidTable(new TableDataSource("foo"), RowSignature.builder().add("__time", ValueType.LONG).add("dimSequential", ValueType.STRING).add("dimZipf", ValueType.STRING).add("dimUniform", ValueType.STRING).build()));
    final Schema druidSchema = new AbstractSchema() {

        @Override
        protected Map<String, Table> getTableMap() {
            return tableMap;
        }
    };
    plannerFactory = new PlannerFactory(Calcites.createRootSchema(druidSchema), walker, CalciteTests.createOperatorTable(), plannerConfig);
    groupByQuery = GroupByQuery.builder().setDataSource("foo").setInterval(new Interval(JodaUtils.MIN_INSTANT, JodaUtils.MAX_INSTANT)).setDimensions(Arrays.<DimensionSpec>asList(new DefaultDimensionSpec("dimZipf", "d0"), new DefaultDimensionSpec("dimSequential", "d1"))).setAggregatorSpecs(Arrays.<AggregatorFactory>asList(new CountAggregatorFactory("c"))).setGranularity(Granularities.ALL).build();
    sqlQuery = "SELECT\n" + "  dimZipf AS d0," + "  dimSequential AS d1,\n" + "  COUNT(*) AS c\n" + "FROM druid.foo\n" + "GROUP BY dimZipf, dimSequential";
}
Also used : DruidTable(io.druid.sql.calcite.table.DruidTable) Table(org.apache.calcite.schema.Table) LinearShardSpec(io.druid.timeline.partition.LinearShardSpec) Schema(org.apache.calcite.schema.Schema) AbstractSchema(org.apache.calcite.schema.impl.AbstractSchema) BenchmarkDataGenerator(io.druid.benchmark.datagen.BenchmarkDataGenerator) HyperUniquesSerde(io.druid.query.aggregation.hyperloglog.HyperUniquesSerde) DruidTable(io.druid.sql.calcite.table.DruidTable) CountAggregatorFactory(io.druid.query.aggregation.CountAggregatorFactory) AggregatorFactory(io.druid.query.aggregation.AggregatorFactory) DefaultDimensionSpec(io.druid.query.dimension.DefaultDimensionSpec) QueryRunnerFactoryConglomerate(io.druid.query.QueryRunnerFactoryConglomerate) SpecificSegmentsQuerySegmentWalker(io.druid.sql.calcite.util.SpecificSegmentsQuerySegmentWalker) TableDataSource(io.druid.query.TableDataSource) CountAggregatorFactory(io.druid.query.aggregation.CountAggregatorFactory) AbstractSchema(org.apache.calcite.schema.impl.AbstractSchema) QueryableIndex(io.druid.segment.QueryableIndex) BenchmarkSchemaInfo(io.druid.benchmark.datagen.BenchmarkSchemaInfo) PlannerConfig(io.druid.sql.calcite.planner.PlannerConfig) InputRow(io.druid.data.input.InputRow) PlannerFactory(io.druid.sql.calcite.planner.PlannerFactory) File(java.io.File) Interval(org.joda.time.Interval) Setup(org.openjdk.jmh.annotations.Setup)

Aggregations

InputRow (io.druid.data.input.InputRow)81 Test (org.junit.Test)35 MapBasedInputRow (io.druid.data.input.MapBasedInputRow)24 BenchmarkDataGenerator (io.druid.benchmark.datagen.BenchmarkDataGenerator)22 File (java.io.File)18 Setup (org.openjdk.jmh.annotations.Setup)15 HyperUniquesSerde (io.druid.query.aggregation.hyperloglog.HyperUniquesSerde)14 Firehose (io.druid.data.input.Firehose)12 OnheapIncrementalIndex (io.druid.segment.incremental.OnheapIncrementalIndex)12 IndexSpec (io.druid.segment.IndexSpec)11 ArrayList (java.util.ArrayList)11 IncrementalIndex (io.druid.segment.incremental.IncrementalIndex)10 DateTime (org.joda.time.DateTime)10 QueryableIndex (io.druid.segment.QueryableIndex)9 IOException (java.io.IOException)9 BenchmarkColumnSchema (io.druid.benchmark.datagen.BenchmarkColumnSchema)8 Interval (org.joda.time.Interval)8 ParseException (io.druid.java.util.common.parsers.ParseException)7 AggregatorFactory (io.druid.query.aggregation.AggregatorFactory)6 DataSegment (io.druid.timeline.DataSegment)5