Search in sources :

Example 1 with SegmentizerFactory

use of org.apache.druid.segment.loading.SegmentizerFactory in project druid by druid-io.

the class CustomSegmentizerFactoryTest method testCustomSegmentizerPersist.

@Test
public void testCustomSegmentizerPersist() throws IOException {
    IncrementalIndex data = TestIndex.makeRealtimeIndex("druid.sample.numeric.tsv");
    File segment = new File(temporaryFolder.newFolder(), "segment");
    File persisted = INDEX_MERGER.persist(data, Intervals.of("2011-01-12T00:00:00.000Z/2011-05-01T00:00:00.000Z"), segment, new IndexSpec(null, null, null, null, new CustomSegmentizerFactory()), null);
    File factoryJson = new File(persisted, "factory.json");
    Assert.assertTrue(factoryJson.exists());
    SegmentizerFactory factory = JSON_MAPPER.readValue(factoryJson, SegmentizerFactory.class);
    Assert.assertTrue(factory instanceof CustomSegmentizerFactory);
}
Also used : IncrementalIndex(org.apache.druid.segment.incremental.IncrementalIndex) MMappedQueryableSegmentizerFactory(org.apache.druid.segment.loading.MMappedQueryableSegmentizerFactory) SegmentizerFactory(org.apache.druid.segment.loading.SegmentizerFactory) File(java.io.File) InitializedNullHandlingTest(org.apache.druid.testing.InitializedNullHandlingTest) Test(org.junit.Test)

Example 2 with SegmentizerFactory

use of org.apache.druid.segment.loading.SegmentizerFactory in project druid by druid-io.

the class CustomSegmentizerFactoryTest method testDefaultSegmentizerPersist.

@Test
public void testDefaultSegmentizerPersist() throws IOException {
    IncrementalIndex data = TestIndex.makeRealtimeIndex("druid.sample.numeric.tsv");
    File segment = new File(temporaryFolder.newFolder(), "segment");
    File persisted = INDEX_MERGER.persist(data, Intervals.of("2011-01-12T00:00:00.000Z/2011-05-01T00:00:00.000Z"), segment, new IndexSpec(null, null, null, null, null), null);
    File factoryJson = new File(persisted, "factory.json");
    Assert.assertTrue(factoryJson.exists());
    SegmentizerFactory factory = JSON_MAPPER.readValue(factoryJson, SegmentizerFactory.class);
    Assert.assertTrue(factory instanceof MMappedQueryableSegmentizerFactory);
}
Also used : IncrementalIndex(org.apache.druid.segment.incremental.IncrementalIndex) MMappedQueryableSegmentizerFactory(org.apache.druid.segment.loading.MMappedQueryableSegmentizerFactory) MMappedQueryableSegmentizerFactory(org.apache.druid.segment.loading.MMappedQueryableSegmentizerFactory) SegmentizerFactory(org.apache.druid.segment.loading.SegmentizerFactory) File(java.io.File) InitializedNullHandlingTest(org.apache.druid.testing.InitializedNullHandlingTest) Test(org.junit.Test)

Example 3 with SegmentizerFactory

use of org.apache.druid.segment.loading.SegmentizerFactory in project druid by druid-io.

the class BroadcastSegmentIndexedTableTest method setup.

@Before
public void setup() throws IOException, SegmentLoadingException {
    final ObjectMapper mapper = new DefaultObjectMapper();
    mapper.registerModule(new SegmentizerModule());
    final IndexIO indexIO = new IndexIO(mapper, () -> 0);
    mapper.setInjectableValues(new InjectableValues.Std().addValue(ExprMacroTable.class.getName(), TestExprMacroTable.INSTANCE).addValue(ObjectMapper.class.getName(), mapper).addValue(IndexIO.class, indexIO).addValue(DataSegment.PruneSpecsHolder.class, DataSegment.PruneSpecsHolder.DEFAULT));
    final IndexMerger indexMerger = new IndexMergerV9(mapper, indexIO, OffHeapMemorySegmentWriteOutMediumFactory.instance());
    Interval testInterval = Intervals.of("2011-01-12T00:00:00.000Z/2011-05-01T00:00:00.000Z");
    IncrementalIndex data = TestIndex.makeRealtimeIndex("druid.sample.numeric.tsv");
    File segment = new File(temporaryFolder.newFolder(), "segment");
    File persisted = indexMerger.persist(data, testInterval, segment, new IndexSpec(), null);
    File factoryJson = new File(persisted, "factory.json");
    Assert.assertTrue(factoryJson.exists());
    SegmentizerFactory factory = mapper.readValue(factoryJson, SegmentizerFactory.class);
    Assert.assertTrue(factory instanceof MMappedQueryableSegmentizerFactory);
    DataSegment dataSegment = new DataSegment(DATASOURCE, testInterval, DateTimes.nowUtc().toString(), ImmutableMap.of(), columnNames, ImmutableList.of(), null, null, segment.getTotalSpace());
    backingSegment = (QueryableIndexSegment) factory.factorize(dataSegment, segment, false, SegmentLazyLoadFailCallback.NOOP);
    columnNames = ImmutableList.<String>builder().add(ColumnHolder.TIME_COLUMN_NAME).addAll(backingSegment.asQueryableIndex().getColumnNames()).build();
    broadcastTable = new BroadcastSegmentIndexedTable(backingSegment, keyColumns, dataSegment.getVersion());
}
Also used : IndexMerger(org.apache.druid.segment.IndexMerger) IndexSpec(org.apache.druid.segment.IndexSpec) IncrementalIndex(org.apache.druid.segment.incremental.IncrementalIndex) IndexMergerV9(org.apache.druid.segment.IndexMergerV9) MMappedQueryableSegmentizerFactory(org.apache.druid.segment.loading.MMappedQueryableSegmentizerFactory) MMappedQueryableSegmentizerFactory(org.apache.druid.segment.loading.MMappedQueryableSegmentizerFactory) SegmentizerFactory(org.apache.druid.segment.loading.SegmentizerFactory) InjectableValues(com.fasterxml.jackson.databind.InjectableValues) DataSegment(org.apache.druid.timeline.DataSegment) TestExprMacroTable(org.apache.druid.query.expression.TestExprMacroTable) ExprMacroTable(org.apache.druid.math.expr.ExprMacroTable) IndexIO(org.apache.druid.segment.IndexIO) SegmentizerModule(org.apache.druid.jackson.SegmentizerModule) DefaultObjectMapper(org.apache.druid.jackson.DefaultObjectMapper) File(java.io.File) ObjectMapper(com.fasterxml.jackson.databind.ObjectMapper) DefaultObjectMapper(org.apache.druid.jackson.DefaultObjectMapper) Interval(org.joda.time.Interval) Before(org.junit.Before)

Example 4 with SegmentizerFactory

use of org.apache.druid.segment.loading.SegmentizerFactory in project druid by druid-io.

the class SegmentManagerBroadcastJoinIndexedTableTest method createSegment.

private DataSegment createSegment(IncrementalIndex data, String interval, String version) throws IOException {
    final DataSegment tmpSegment = new DataSegment(TABLE_NAME, Intervals.of(interval), version, Collections.emptyMap(), Collections.emptyList(), Collections.emptyList(), new NumberedShardSpec(0, 0), 9, 100);
    final String storageDir = DataSegmentPusher.getDefaultStorageDir(tmpSegment, false);
    final File segmentDir = new File(segmentDeepStorageDir, storageDir);
    FileUtils.mkdirp(segmentDir);
    IndexMerger indexMerger = new IndexMergerV9(objectMapper, indexIO, OffHeapMemorySegmentWriteOutMediumFactory.instance());
    SegmentizerFactory factory = new BroadcastJoinableMMappedQueryableSegmentizerFactory(indexIO, KEY_COLUMNS);
    indexMerger.persist(data, Intervals.of(interval), segmentDir, new IndexSpec(null, null, null, null, factory), null);
    final File factoryJson = new File(segmentDir, "factory.json");
    objectMapper.writeValue(factoryJson, factory);
    return tmpSegment.withLoadSpec(ImmutableMap.of("type", "local", "path", segmentDir.getAbsolutePath()));
}
Also used : IndexMerger(org.apache.druid.segment.IndexMerger) IndexSpec(org.apache.druid.segment.IndexSpec) IndexMergerV9(org.apache.druid.segment.IndexMergerV9) BroadcastJoinableMMappedQueryableSegmentizerFactory(org.apache.druid.segment.loading.BroadcastJoinableMMappedQueryableSegmentizerFactory) SegmentizerFactory(org.apache.druid.segment.loading.SegmentizerFactory) DataSegment(org.apache.druid.timeline.DataSegment) File(java.io.File) NumberedShardSpec(org.apache.druid.timeline.partition.NumberedShardSpec) BroadcastJoinableMMappedQueryableSegmentizerFactory(org.apache.druid.segment.loading.BroadcastJoinableMMappedQueryableSegmentizerFactory)

Example 5 with SegmentizerFactory

use of org.apache.druid.segment.loading.SegmentizerFactory in project druid by druid-io.

the class IndexMergerV9 method makeIndexFiles.

private File makeIndexFiles(final List<IndexableAdapter> adapters, @Nullable final AggregatorFactory[] metricAggs, final File outDir, final ProgressIndicator progress, final List<String> mergedDimensions, final List<String> mergedMetrics, final Function<List<TransformableRowIterator>, TimeAndDimsIterator> rowMergerFn, final boolean fillRowNumConversions, final IndexSpec indexSpec, @Nullable final SegmentWriteOutMediumFactory segmentWriteOutMediumFactory) throws IOException {
    progress.start();
    progress.progress();
    List<Metadata> metadataList = Lists.transform(adapters, IndexableAdapter::getMetadata);
    final Metadata segmentMetadata;
    if (metricAggs != null) {
        AggregatorFactory[] combiningMetricAggs = new AggregatorFactory[metricAggs.length];
        for (int i = 0; i < metricAggs.length; i++) {
            combiningMetricAggs[i] = metricAggs[i].getCombiningFactory();
        }
        segmentMetadata = Metadata.merge(metadataList, combiningMetricAggs);
    } else {
        segmentMetadata = Metadata.merge(metadataList, null);
    }
    Closer closer = Closer.create();
    try {
        final FileSmoosher v9Smoosher = new FileSmoosher(outDir);
        FileUtils.mkdirp(outDir);
        SegmentWriteOutMediumFactory omf = segmentWriteOutMediumFactory != null ? segmentWriteOutMediumFactory : defaultSegmentWriteOutMediumFactory;
        log.debug("Using SegmentWriteOutMediumFactory[%s]", omf.getClass().getSimpleName());
        SegmentWriteOutMedium segmentWriteOutMedium = omf.makeSegmentWriteOutMedium(outDir);
        closer.register(segmentWriteOutMedium);
        long startTime = System.currentTimeMillis();
        Files.asByteSink(new File(outDir, "version.bin")).write(Ints.toByteArray(IndexIO.V9_VERSION));
        log.debug("Completed version.bin in %,d millis.", System.currentTimeMillis() - startTime);
        progress.progress();
        startTime = System.currentTimeMillis();
        try (FileOutputStream fos = new FileOutputStream(new File(outDir, "factory.json"))) {
            SegmentizerFactory customSegmentLoader = indexSpec.getSegmentLoader();
            if (customSegmentLoader != null) {
                mapper.writeValue(fos, customSegmentLoader);
            } else {
                mapper.writeValue(fos, new MMappedQueryableSegmentizerFactory(indexIO));
            }
        }
        log.debug("Completed factory.json in %,d millis", System.currentTimeMillis() - startTime);
        progress.progress();
        final Map<String, ValueType> metricsValueTypes = new TreeMap<>(Comparators.naturalNullsFirst());
        final Map<String, String> metricTypeNames = new TreeMap<>(Comparators.naturalNullsFirst());
        final List<ColumnCapabilities> dimCapabilities = Lists.newArrayListWithCapacity(mergedDimensions.size());
        mergeCapabilities(adapters, mergedDimensions, metricsValueTypes, metricTypeNames, dimCapabilities);
        final Map<String, DimensionHandler> handlers = makeDimensionHandlers(mergedDimensions, dimCapabilities);
        final List<DimensionMergerV9> mergers = new ArrayList<>();
        for (int i = 0; i < mergedDimensions.size(); i++) {
            DimensionHandler handler = handlers.get(mergedDimensions.get(i));
            mergers.add(handler.makeMerger(indexSpec, segmentWriteOutMedium, dimCapabilities.get(i), progress, closer));
        }
        /**
         *********** Setup Dim Conversions *************
         */
        progress.progress();
        startTime = System.currentTimeMillis();
        writeDimValuesAndSetupDimConversion(adapters, progress, mergedDimensions, mergers);
        log.debug("Completed dim conversions in %,d millis.", System.currentTimeMillis() - startTime);
        /**
         *********** Walk through data sets, merge them, and write merged columns ************
         */
        progress.progress();
        final TimeAndDimsIterator timeAndDimsIterator = makeMergedTimeAndDimsIterator(adapters, mergedDimensions, mergedMetrics, rowMergerFn, handlers, mergers);
        closer.register(timeAndDimsIterator);
        final GenericColumnSerializer timeWriter = setupTimeWriter(segmentWriteOutMedium, indexSpec);
        final ArrayList<GenericColumnSerializer> metricWriters = setupMetricsWriters(segmentWriteOutMedium, mergedMetrics, metricsValueTypes, metricTypeNames, indexSpec);
        List<IntBuffer> rowNumConversions = mergeIndexesAndWriteColumns(adapters, progress, timeAndDimsIterator, timeWriter, metricWriters, mergers, fillRowNumConversions);
        /**
         ********** Create Inverted Indexes and Finalize Build Columns ************
         */
        final String section = "build inverted index and columns";
        progress.startSection(section);
        makeTimeColumn(v9Smoosher, progress, timeWriter, indexSpec);
        makeMetricsColumns(v9Smoosher, progress, mergedMetrics, metricsValueTypes, metricTypeNames, metricWriters, indexSpec);
        for (int i = 0; i < mergedDimensions.size(); i++) {
            DimensionMergerV9 merger = mergers.get(i);
            merger.writeIndexes(rowNumConversions);
            if (merger.canSkip()) {
                continue;
            }
            ColumnDescriptor columnDesc = merger.makeColumnDescriptor();
            makeColumn(v9Smoosher, mergedDimensions.get(i), columnDesc);
        }
        progress.stopSection(section);
        /**
         *********** Make index.drd & metadata.drd files *************
         */
        progress.progress();
        makeIndexBinary(v9Smoosher, adapters, outDir, mergedDimensions, mergedMetrics, progress, indexSpec, mergers);
        makeMetadataBinary(v9Smoosher, progress, segmentMetadata);
        v9Smoosher.close();
        progress.stop();
        return outDir;
    } catch (Throwable t) {
        throw closer.rethrow(t);
    } finally {
        closer.close();
    }
}
Also used : ArrayList(java.util.ArrayList) SegmentWriteOutMediumFactory(org.apache.druid.segment.writeout.SegmentWriteOutMediumFactory) ColumnCapabilities(org.apache.druid.segment.column.ColumnCapabilities) SegmentWriteOutMedium(org.apache.druid.segment.writeout.SegmentWriteOutMedium) FileSmoosher(org.apache.druid.java.util.common.io.smoosh.FileSmoosher) Closer(org.apache.druid.java.util.common.io.Closer) ValueType(org.apache.druid.segment.column.ValueType) MMappedQueryableSegmentizerFactory(org.apache.druid.segment.loading.MMappedQueryableSegmentizerFactory) ColumnDescriptor(org.apache.druid.segment.column.ColumnDescriptor) MMappedQueryableSegmentizerFactory(org.apache.druid.segment.loading.MMappedQueryableSegmentizerFactory) SegmentizerFactory(org.apache.druid.segment.loading.SegmentizerFactory) AggregatorFactory(org.apache.druid.query.aggregation.AggregatorFactory) TreeMap(java.util.TreeMap) FileOutputStream(java.io.FileOutputStream) IntBuffer(java.nio.IntBuffer) File(java.io.File)

Aggregations

File (java.io.File)5 SegmentizerFactory (org.apache.druid.segment.loading.SegmentizerFactory)5 MMappedQueryableSegmentizerFactory (org.apache.druid.segment.loading.MMappedQueryableSegmentizerFactory)4 IncrementalIndex (org.apache.druid.segment.incremental.IncrementalIndex)3 IndexMerger (org.apache.druid.segment.IndexMerger)2 IndexMergerV9 (org.apache.druid.segment.IndexMergerV9)2 IndexSpec (org.apache.druid.segment.IndexSpec)2 InitializedNullHandlingTest (org.apache.druid.testing.InitializedNullHandlingTest)2 DataSegment (org.apache.druid.timeline.DataSegment)2 Test (org.junit.Test)2 InjectableValues (com.fasterxml.jackson.databind.InjectableValues)1 ObjectMapper (com.fasterxml.jackson.databind.ObjectMapper)1 FileOutputStream (java.io.FileOutputStream)1 IntBuffer (java.nio.IntBuffer)1 ArrayList (java.util.ArrayList)1 TreeMap (java.util.TreeMap)1 DefaultObjectMapper (org.apache.druid.jackson.DefaultObjectMapper)1 SegmentizerModule (org.apache.druid.jackson.SegmentizerModule)1 Closer (org.apache.druid.java.util.common.io.Closer)1 FileSmoosher (org.apache.druid.java.util.common.io.smoosh.FileSmoosher)1