use of com.linkedin.pinot.core.segment.index.converter.SegmentFormatConverter in project pinot by linkedin.
the class SegmentIndexCreationDriverImpl method convertFormatIfNeeded.
// Explanation of why we are using format converter:
// There are 3 options to correctly generate segments to v3 format
// 1. Generate v3 directly: This is efficient but v3 index writer needs to know buffer size upfront.
// Inverted, star and raw indexes don't have the index size upfront. This is also least flexible approach
// if we add more indexes in future.
// 2. Hold data in-memory: One way to work around predeclaring sizes in (1) is to allocate "large" buffer (2GB?)
// and hold the data in memory and write the buffer at the end. The memory requirement in this case increases linearly
// with the number of columns. Variation of that is to mmap data to separate files...which is what we are doing here
// 3. Another option is to generate dictionary and fwd indexes in v3 and generate inverted, star and raw indexes in
// separate files. Then add those files to v3 index file. This leads to lot of hodgepodge code to
// handle multiple segment formats.
// Using converter is similar to option (2), plus it's battle-tested code. We will roll out with
// this change to keep changes limited. Once we've migrated we can implement approach (1) with option to
// copy for indexes for which we don't know sizes upfront.
private void convertFormatIfNeeded(File segmentDirectory) throws Exception {
SegmentVersion versionToGenerate = config.getSegmentVersion();
if (versionToGenerate.equals(SegmentVersion.v1)) {
// v1 by default
return;
}
SegmentFormatConverter converter = SegmentFormatConverterFactory.getConverter(SegmentVersion.v1, SegmentVersion.v3);
converter.convert(segmentDirectory);
}
Aggregations