use of java.util.PriorityQueue in project pinot by linkedin.
the class GeneratorBasedRoutingTableBuilder method computeRoutingTableFromExternalView.
@Override
public List<ServerToSegmentSetMap> computeRoutingTableFromExternalView(String tableName, ExternalView externalView, List<InstanceConfig> instanceConfigList) {
// The default routing table algorithm tries to balance all available segments across all servers, so that each
// server is hit on every query. This works fine with small clusters (say less than 20 servers) but for larger
// clusters, this adds up to significant overhead (one request must be enqueued for each server, processed,
// returned, deserialized, aggregated, etc.).
//
// For large clusters, we want to avoid hitting every server, as this also has an adverse effect on client tail
// latency. This is due to the fact that a query cannot return until it has received a response from each server,
// and the greater the number of servers that are hit, the more likely it is that one of the servers will be a
// straggler (eg. due to contention for query processing threads, GC, etc.). We also want to balance the segments
// within any given routing table so that each server in the routing table has approximately the same number of
// segments to process.
//
// To do so, we have a routing table generator that generates routing tables by picking a random subset of servers.
// With this set of servers, we check if the set of segments served by these servers is complete. If the set of
// segments served does not cover all of the segments, we compute the list of missing segments and pick a random
// server that serves these missing segments until we have complete coverage of all the segments.
//
// We then order the segments in ascending number of replicas within our server set, in order to allocate the
// segments with fewer replicas first. This ensures that segments that are 'easier' to allocate are more likely to
// end up on a replica with fewer segments.
//
// Then, we pick a random replica for each segment, iterating from fewest replicas to most replicas, inversely
// weighted by the number of segments already assigned to that replica. This ensures that we build a routing table
// that's as even as possible.
//
// The algorithm to generate a routing table is thus:
// 1. Compute the inverse external view, a mapping of servers to segments
// 2. For each routing table to generate:
// a) Pick TARGET_SERVER_COUNT_PER_QUERY distinct servers
// b) Check if the server set covers all the segments; if not, add additional servers until it does.
// c) Order the segments in our server set in ascending order of number of replicas present in our server set
// d) For each segment, pick a random replica with proper weighting
// e) Return that routing table
//
// Given that we can generate routing tables at will, we then generate many routing tables and use them to optimize
// according to two criteria: the variance in workload per server for any individual table as well as the variance
// in workload per server across all the routing tables. To do so, we generate an initial set of routing tables
// according to a per-routing table metric and discard the worst routing tables.
RoutingTableGenerator routingTableGenerator = buildRoutingTableGenerator();
routingTableGenerator.init(externalView, instanceConfigList);
PriorityQueue<Pair<Map<String, Set<String>>, Float>> topRoutingTables = new PriorityQueue<>(ROUTING_TABLE_COUNT, new Comparator<Pair<Map<String, Set<String>>, Float>>() {
@Override
public int compare(Pair<Map<String, Set<String>>, Float> left, Pair<Map<String, Set<String>>, Float> right) {
// Float.compare sorts in ascending order and we want a max heap, so we need to return the negative of the comparison
return -Float.compare(left.getValue(), right.getValue());
}
});
for (int i = 0; i < ROUTING_TABLE_COUNT; i++) {
topRoutingTables.add(generateRoutingTableWithMetric(routingTableGenerator));
}
// Generate routing more tables and keep the ROUTING_TABLE_COUNT top ones
for (int i = 0; i < (ROUTING_TABLE_GENERATION_COUNT - ROUTING_TABLE_COUNT); ++i) {
Pair<Map<String, Set<String>>, Float> newRoutingTable = generateRoutingTableWithMetric(routingTableGenerator);
Pair<Map<String, Set<String>>, Float> worstRoutingTable = topRoutingTables.peek();
// If the new routing table is better than the worst one, keep it
if (newRoutingTable.getRight() < worstRoutingTable.getRight()) {
topRoutingTables.poll();
topRoutingTables.add(newRoutingTable);
}
}
// Return the best routing tables
List<ServerToSegmentSetMap> routingTables = new ArrayList<>(topRoutingTables.size());
while (!topRoutingTables.isEmpty()) {
Pair<Map<String, Set<String>>, Float> routingTableWithMetric = topRoutingTables.poll();
routingTables.add(new ServerToSegmentSetMap(routingTableWithMetric.getKey()));
}
return routingTables;
}
use of java.util.PriorityQueue in project pinot by linkedin.
the class AggregationGroupByTrimmingService method trimFinalResults.
/**
* Given an array of maps from group key to final result for each aggregation function, trim the results to topN size.
*/
@SuppressWarnings("unchecked")
@Nonnull
public List<GroupByResult>[] trimFinalResults(@Nonnull Map<String, Comparable>[] finalResultMaps) {
List<GroupByResult>[] trimmedResults = new List[_numAggregationFunctions];
for (int i = 0; i < _numAggregationFunctions; i++) {
LinkedList<GroupByResult> groupByResults = new LinkedList<>();
trimmedResults[i] = groupByResults;
Map<String, Comparable> finalResultMap = finalResultMaps[i];
if (finalResultMap.isEmpty()) {
continue;
}
// Construct the priority queues.
PriorityQueue<GroupKeyResultPair> priorityQueue = new PriorityQueue<>(_groupByTopN + 1, getGroupKeyResultPairComparator(_minOrders[i]));
// Fill results into the priority queues.
for (Map.Entry<String, Comparable> entry : finalResultMap.entrySet()) {
String groupKey = entry.getKey();
Comparable finalResult = entry.getValue();
priorityQueue.add(new GroupKeyResultPair(groupKey, finalResult));
if (priorityQueue.size() > _groupByTopN) {
priorityQueue.poll();
}
}
// Fill trimmed results into the list.
while (!priorityQueue.isEmpty()) {
GroupKeyResultPair groupKeyResultPair = priorityQueue.poll();
GroupByResult groupByResult = new GroupByResult();
// Do not remove trailing empty strings.
String[] groupKeys = groupKeyResultPair._groupKey.split(GROUP_KEY_DELIMITER, -1);
groupByResult.setGroup(Arrays.asList(groupKeys));
groupByResult.setValue(AggregationFunctionUtils.formatValue(groupKeyResultPair._result));
groupByResults.addFirst(groupByResult);
}
}
return trimmedResults;
}
use of java.util.PriorityQueue in project pinot by linkedin.
the class BalanceNumSegmentAssignmentStrategy method getAssignedInstances.
@Override
public List<String> getAssignedInstances(HelixAdmin helixAdmin, String helixClusterName, SegmentMetadata segmentMetadata, int numReplicas, String tenantName) {
String serverTenantName;
String tableName;
if ("realtime".equalsIgnoreCase(segmentMetadata.getIndexType())) {
tableName = TableNameBuilder.REALTIME_TABLE_NAME_BUILDER.forTable(segmentMetadata.getTableName());
serverTenantName = ControllerTenantNameBuilder.getRealtimeTenantNameForTenant(tenantName);
} else {
tableName = TableNameBuilder.OFFLINE_TABLE_NAME_BUILDER.forTable(segmentMetadata.getTableName());
serverTenantName = ControllerTenantNameBuilder.getOfflineTenantNameForTenant(tenantName);
}
List<String> selectedInstances = new ArrayList<String>();
Map<String, Integer> currentNumSegmentsPerInstanceMap = new HashMap<String, Integer>();
List<String> allTaggedInstances = HelixHelper.getEnabledInstancesWithTag(helixAdmin, helixClusterName, serverTenantName);
for (String instance : allTaggedInstances) {
currentNumSegmentsPerInstanceMap.put(instance, 0);
}
// Count number of segments assigned to each instance
IdealState idealState = helixAdmin.getResourceIdealState(helixClusterName, tableName);
if (idealState != null) {
for (String partitionName : idealState.getPartitionSet()) {
Map<String, String> instanceToStateMap = idealState.getInstanceStateMap(partitionName);
if (instanceToStateMap != null) {
for (String instanceName : instanceToStateMap.keySet()) {
if (currentNumSegmentsPerInstanceMap.containsKey(instanceName)) {
currentNumSegmentsPerInstanceMap.put(instanceName, currentNumSegmentsPerInstanceMap.get(instanceName) + 1);
}
// else, ignore. Do not add servers, that are not tagged, to the map
// By this approach, new segments will not be allotted to the server if tags changed
}
}
}
}
// Select up to numReplicas instances with the fewest segments assigned
PriorityQueue<Number2ObjectPair<String>> priorityQueue = new PriorityQueue<Number2ObjectPair<String>>(numReplicas, Pairs.getDescendingnumber2ObjectPairComparator());
for (String key : currentNumSegmentsPerInstanceMap.keySet()) {
priorityQueue.add(new Number2ObjectPair<String>(currentNumSegmentsPerInstanceMap.get(key), key));
if (priorityQueue.size() > numReplicas) {
priorityQueue.poll();
}
}
while (!priorityQueue.isEmpty()) {
selectedInstances.add(priorityQueue.poll().getB());
}
LOGGER.info("Segment assignment result for : " + segmentMetadata.getName() + ", in resource : " + segmentMetadata.getTableName() + ", selected instances: " + Arrays.toString(selectedInstances.toArray()));
return selectedInstances;
}
use of java.util.PriorityQueue in project hadoop by apache.
the class Folder method run.
public int run() throws IOException {
class JobEntryComparator implements Comparator<Pair<LoggedJob, JobTraceReader>> {
public int compare(Pair<LoggedJob, JobTraceReader> p1, Pair<LoggedJob, JobTraceReader> p2) {
LoggedJob j1 = p1.first();
LoggedJob j2 = p2.first();
return (j1.getSubmitTime() < j2.getSubmitTime()) ? -1 : (j1.getSubmitTime() == j2.getSubmitTime()) ? 0 : 1;
}
}
// we initialize an empty heap so if we take an error before establishing
// a real one the finally code goes through
Queue<Pair<LoggedJob, JobTraceReader>> heap = new PriorityQueue<Pair<LoggedJob, JobTraceReader>>();
try {
LoggedJob job = reader.nextJob();
if (job == null) {
LOG.error("The job trace is empty");
return EMPTY_JOB_TRACE;
}
// the starting time limit.
if (startsAfter > 0) {
LOG.info("starts-after time is specified. Initial job submit time : " + job.getSubmitTime());
long approximateTime = job.getSubmitTime() + startsAfter;
job = reader.nextJob();
long skippedCount = 0;
while (job != null && job.getSubmitTime() < approximateTime) {
job = reader.nextJob();
skippedCount++;
}
LOG.debug("Considering jobs with submit time greater than " + startsAfter + " ms. Skipped " + skippedCount + " jobs.");
if (job == null) {
LOG.error("No more jobs to process in the trace with 'starts-after'" + " set to " + startsAfter + "ms.");
return EMPTY_JOB_TRACE;
}
LOG.info("The first job has a submit time of " + job.getSubmitTime());
}
firstJobSubmitTime = job.getSubmitTime();
long lastJobSubmitTime = firstJobSubmitTime;
int numberJobs = 0;
long currentIntervalEnd = Long.MIN_VALUE;
Path nextSegment = null;
Outputter<LoggedJob> tempGen = null;
if (debug) {
LOG.debug("The first job has a submit time of " + firstJobSubmitTime);
}
final Configuration conf = getConf();
try {
// skewBufferLength entries.
while (job != null) {
final Random tempNameGenerator = new Random();
lastJobSubmitTime = job.getSubmitTime();
++numberJobs;
if (job.getSubmitTime() >= currentIntervalEnd) {
if (tempGen != null) {
tempGen.close();
}
nextSegment = null;
for (int i = 0; i < 3 && nextSegment == null; ++i) {
try {
nextSegment = new Path(tempDir, "segment-" + tempNameGenerator.nextLong() + ".json.gz");
if (debug) {
LOG.debug("The next segment name is " + nextSegment);
}
FileSystem fs = nextSegment.getFileSystem(conf);
try {
if (!fs.exists(nextSegment)) {
break;
}
continue;
} catch (IOException e) {
// no code -- file did not already exist
}
} catch (IOException e) {
// no code -- file exists now, or directory bad. We try three
// times.
}
}
if (nextSegment == null) {
throw new RuntimeException("Failed to create a new file!");
}
if (debug) {
LOG.debug("Creating " + nextSegment + " for a job with a submit time of " + job.getSubmitTime());
}
deletees.add(nextSegment);
tempPaths.add(nextSegment);
tempGen = new DefaultOutputter<LoggedJob>();
tempGen.init(nextSegment, conf);
long currentIntervalNumber = (job.getSubmitTime() - firstJobSubmitTime) / inputCycle;
currentIntervalEnd = firstJobSubmitTime + ((currentIntervalNumber + 1) * inputCycle);
}
// content is in the same input cycle interval.
if (tempGen != null) {
tempGen.output(job);
}
job = reader.nextJob();
}
} catch (DeskewedJobTraceReader.OutOfOrderException e) {
return OUT_OF_ORDER_JOBS;
} finally {
if (tempGen != null) {
tempGen.close();
}
}
if (lastJobSubmitTime <= firstJobSubmitTime) {
LOG.error("All of your job[s] have the same submit time." + " Please just use your input file.");
return ALL_JOBS_SIMULTANEOUS;
}
double submitTimeSpan = lastJobSubmitTime - firstJobSubmitTime;
LOG.warn("Your input trace spans " + (lastJobSubmitTime - firstJobSubmitTime) + " ticks.");
double foldingRatio = submitTimeSpan * (numberJobs + 1) / numberJobs / inputCycle;
if (debug) {
LOG.warn("run: submitTimeSpan = " + submitTimeSpan + ", numberJobs = " + numberJobs + ", inputCycle = " + inputCycle);
}
if (reader.neededSkewBufferSize() > 0) {
LOG.warn("You needed a -skew-buffer-length of " + reader.neededSkewBufferSize() + " but no more, for this input.");
}
double tProbability = timeDilation * concentration / foldingRatio;
if (debug) {
LOG.warn("run: timeDilation = " + timeDilation + ", concentration = " + concentration + ", foldingRatio = " + foldingRatio);
LOG.warn("The transcription probability is " + tProbability);
}
transcriptionRateInteger = (int) Math.floor(tProbability);
transcriptionRateFraction = tProbability - Math.floor(tProbability);
// Now read all the inputs in parallel
heap = new PriorityQueue<Pair<LoggedJob, JobTraceReader>>(tempPaths.size(), new JobEntryComparator());
for (Path tempPath : tempPaths) {
JobTraceReader thisReader = new JobTraceReader(tempPath, conf);
closees.add(thisReader);
LoggedJob streamFirstJob = thisReader.getNext();
long thisIndex = (streamFirstJob.getSubmitTime() - firstJobSubmitTime) / inputCycle;
if (debug) {
LOG.debug("A job with submit time of " + streamFirstJob.getSubmitTime() + " is in interval # " + thisIndex);
}
adjustJobTimes(streamFirstJob);
if (debug) {
LOG.debug("That job's submit time is adjusted to " + streamFirstJob.getSubmitTime());
}
heap.add(new Pair<LoggedJob, JobTraceReader>(streamFirstJob, thisReader));
}
Pair<LoggedJob, JobTraceReader> next = heap.poll();
while (next != null) {
maybeOutput(next.first());
if (debug) {
LOG.debug("The most recent job has an adjusted submit time of " + next.first().getSubmitTime());
LOG.debug(" Its replacement in the heap will come from input engine " + next.second());
}
LoggedJob replacement = next.second().getNext();
if (replacement == null) {
next.second().close();
if (debug) {
LOG.debug("That input engine is depleted.");
}
} else {
adjustJobTimes(replacement);
if (debug) {
LOG.debug("The replacement has an adjusted submit time of " + replacement.getSubmitTime());
}
heap.add(new Pair<LoggedJob, JobTraceReader>(replacement, next.second()));
}
next = heap.poll();
}
} finally {
IOUtils.cleanup(null, reader);
if (outGen != null) {
outGen.close();
}
for (Pair<LoggedJob, JobTraceReader> heapEntry : heap) {
heapEntry.second().close();
}
for (Closeable closee : closees) {
closee.close();
}
if (!debug) {
Configuration conf = getConf();
for (Path deletee : deletees) {
FileSystem fs = deletee.getFileSystem(conf);
try {
fs.delete(deletee, false);
} catch (IOException e) {
// no code
}
}
}
}
return 0;
}
use of java.util.PriorityQueue in project meter by OleksandrKucherenko.
the class Meter method stats.
/**
* Print captured statistics into provided output.
*
* @param log instance of logger
*/
@SuppressWarnings({ "PMD.AvoidInstantiatingObjectsInLoops" })
public void stats(final Output log) {
final Config config = getConfig();
final int totalSteps = mCurrent.Position.get();
final List<Step> steps = mCurrent.ReportSteps;
long totalSkipped = 0;
Step subStep;
for (int i = 0; i < totalSteps; i++) {
steps.add(subStep = new Step(config, mCurrent, i));
totalSkipped += subStep.Skipped;
}
// dump all
for (final Step step : steps) {
log.log((step.IsSkipped) ? Level.WARNING : Level.FINEST, config.OutputTag, step.toString());
}
// generate summary of tracking: top items by time, total time, total skipped time,
if (getConfig().ShowSummary) {
log.log(Level.FINEST, config.OutputTag, DELIMITER);
// generate summary of tracking: top items by time, total time, total skipped time,
log.log(Level.INFO, config.OutputTag, String.format(Locale.US, "final: %.3f ms%s, steps: %d", toMillis(mCurrent.total() - totalSkipped), (totalSkipped > 1000) ? String.format(" (-%.3f ms)", toMillis(totalSkipped)) : "", totalSteps));
}
// create sorted list of steps for TOP-N items print
final PriorityQueue<Step> pq = new PriorityQueue<>(totalSteps, Step.Comparator);
pq.addAll(steps);
// publish longest steps
if (config.ShowTopNLongest > 0) {
log.log(Level.FINEST, config.OutputTag, DELIMITER);
final int len = Math.min(pq.size(), config.ShowTopNLongest);
for (int i = 1; i <= len; i++) {
final Step step = pq.poll();
if (null != step && !step.IsSkipped) {
log.log(Level.INFO, config.OutputTag, "top-" + i + ": " + step.toString());
}
}
}
log.log(Level.FINEST, config.OutputTag, DELIMITER);
}
Aggregations