Search in sources :

Example 11 with DefaultSizeEstimator

use of org.apache.hudi.common.util.DefaultSizeEstimator in project hudi by apache.

the class SpillableMapBasedFileSystemView method createFileIdToPendingCompactionMap.

@Override
protected Map<HoodieFileGroupId, Pair<String, CompactionOperation>> createFileIdToPendingCompactionMap(Map<HoodieFileGroupId, Pair<String, CompactionOperation>> fgIdToPendingCompaction) {
    try {
        LOG.info("Creating Pending Compaction map using external spillable Map. Max Mem=" + maxMemoryForPendingCompaction + ", BaseDir=" + baseStoreDir);
        new File(baseStoreDir).mkdirs();
        Map<HoodieFileGroupId, Pair<String, CompactionOperation>> pendingMap = new ExternalSpillableMap<>(maxMemoryForPendingCompaction, baseStoreDir, new DefaultSizeEstimator(), new DefaultSizeEstimator<>(), diskMapType, isBitCaskDiskMapCompressionEnabled);
        pendingMap.putAll(fgIdToPendingCompaction);
        return pendingMap;
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}
Also used : HoodieFileGroupId(org.apache.hudi.common.model.HoodieFileGroupId) ExternalSpillableMap(org.apache.hudi.common.util.collection.ExternalSpillableMap) IOException(java.io.IOException) DefaultSizeEstimator(org.apache.hudi.common.util.DefaultSizeEstimator) File(java.io.File) Pair(org.apache.hudi.common.util.collection.Pair)

Example 12 with DefaultSizeEstimator

use of org.apache.hudi.common.util.DefaultSizeEstimator in project hudi by apache.

the class SpillableMapBasedFileSystemView method createFileIdToPendingClusteringMap.

@Override
protected Map<HoodieFileGroupId, HoodieInstant> createFileIdToPendingClusteringMap(final Map<HoodieFileGroupId, HoodieInstant> fileGroupsInClustering) {
    try {
        LOG.info("Creating file group id to clustering instant map using external spillable Map. Max Mem=" + maxMemoryForClusteringFileGroups + ", BaseDir=" + baseStoreDir);
        new File(baseStoreDir).mkdirs();
        Map<HoodieFileGroupId, HoodieInstant> pendingMap = new ExternalSpillableMap<>(maxMemoryForClusteringFileGroups, baseStoreDir, new DefaultSizeEstimator(), new DefaultSizeEstimator<>(), diskMapType, isBitCaskDiskMapCompressionEnabled);
        pendingMap.putAll(fileGroupsInClustering);
        return pendingMap;
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}
Also used : HoodieInstant(org.apache.hudi.common.table.timeline.HoodieInstant) HoodieFileGroupId(org.apache.hudi.common.model.HoodieFileGroupId) ExternalSpillableMap(org.apache.hudi.common.util.collection.ExternalSpillableMap) IOException(java.io.IOException) DefaultSizeEstimator(org.apache.hudi.common.util.DefaultSizeEstimator) File(java.io.File)

Example 13 with DefaultSizeEstimator

use of org.apache.hudi.common.util.DefaultSizeEstimator in project hudi by apache.

the class SpillableMapBasedFileSystemView method createFileIdToReplaceInstantMap.

@Override
protected Map<HoodieFileGroupId, HoodieInstant> createFileIdToReplaceInstantMap(final Map<HoodieFileGroupId, HoodieInstant> replacedFileGroups) {
    try {
        LOG.info("Creating file group id to replace instant map using external spillable Map. Max Mem=" + maxMemoryForReplaceFileGroups + ", BaseDir=" + baseStoreDir);
        new File(baseStoreDir).mkdirs();
        Map<HoodieFileGroupId, HoodieInstant> pendingMap = new ExternalSpillableMap<>(maxMemoryForReplaceFileGroups, baseStoreDir, new DefaultSizeEstimator(), new DefaultSizeEstimator<>(), diskMapType, isBitCaskDiskMapCompressionEnabled);
        pendingMap.putAll(replacedFileGroups);
        return pendingMap;
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}
Also used : HoodieInstant(org.apache.hudi.common.table.timeline.HoodieInstant) HoodieFileGroupId(org.apache.hudi.common.model.HoodieFileGroupId) ExternalSpillableMap(org.apache.hudi.common.util.collection.ExternalSpillableMap) IOException(java.io.IOException) DefaultSizeEstimator(org.apache.hudi.common.util.DefaultSizeEstimator) File(java.io.File)

Example 14 with DefaultSizeEstimator

use of org.apache.hudi.common.util.DefaultSizeEstimator in project hudi by apache.

the class TestBoundedInMemoryQueue method testException.

// Test to ensure that exception in either queueing thread or BufferedIterator-reader thread
// is propagated to
// another thread.
@SuppressWarnings("unchecked")
@Test
@Timeout(value = 60)
public void testException() throws Exception {
    final int numRecords = 256;
    final List<HoodieRecord> hoodieRecords = dataGen.generateInserts(instantTime, numRecords);
    final SizeEstimator<Tuple2<HoodieRecord, Option<IndexedRecord>>> sizeEstimator = new DefaultSizeEstimator<>();
    // queue memory limit
    HoodieLazyInsertIterable.HoodieInsertValueGenResult payload = getTransformFunction(HoodieTestDataGenerator.AVRO_SCHEMA).apply((HoodieAvroRecord) hoodieRecords.get(0));
    final long objSize = sizeEstimator.sizeEstimate(new Tuple2<>(payload.record, payload.insertValue));
    final long memoryLimitInBytes = 4 * objSize;
    // first let us throw exception from queueIterator reader and test that queueing thread
    // stops and throws
    // correct exception back.
    BoundedInMemoryQueue<HoodieRecord, Tuple2<HoodieRecord, Option<IndexedRecord>>> queue1 = new BoundedInMemoryQueue(memoryLimitInBytes, getTransformFunction(HoodieTestDataGenerator.AVRO_SCHEMA));
    // Produce
    Future<Boolean> resFuture = executorService.submit(() -> {
        new IteratorBasedQueueProducer<>(hoodieRecords.iterator()).produce(queue1);
        return true;
    });
    // waiting for permits to expire.
    while (!isQueueFull(queue1.rateLimiter)) {
        Thread.sleep(10);
    }
    // notify queueing thread of an exception and ensure that it exits.
    final Exception e = new Exception("Failing it :)");
    queue1.markAsFailed(e);
    final Throwable thrown1 = assertThrows(ExecutionException.class, resFuture::get, "exception is expected");
    assertEquals(HoodieException.class, thrown1.getCause().getClass());
    assertEquals(e, thrown1.getCause().getCause());
    // second let us raise an exception while doing record queueing. this exception should get
    // propagated to
    // queue iterator reader.
    final RuntimeException expectedException = new RuntimeException("failing record reading");
    final Iterator<HoodieRecord> mockHoodieRecordsIterator = mock(Iterator.class);
    when(mockHoodieRecordsIterator.hasNext()).thenReturn(true);
    when(mockHoodieRecordsIterator.next()).thenThrow(expectedException);
    BoundedInMemoryQueue<HoodieRecord, Tuple2<HoodieRecord, Option<IndexedRecord>>> queue2 = new BoundedInMemoryQueue(memoryLimitInBytes, getTransformFunction(HoodieTestDataGenerator.AVRO_SCHEMA));
    // Produce
    Future<Boolean> res = executorService.submit(() -> {
        try {
            new IteratorBasedQueueProducer<>(mockHoodieRecordsIterator).produce(queue2);
        } catch (Exception ex) {
            queue2.markAsFailed(ex);
            throw ex;
        }
        return true;
    });
    final Throwable thrown2 = assertThrows(Exception.class, () -> {
        queue2.iterator().hasNext();
    }, "exception is expected");
    assertEquals(expectedException, thrown2.getCause());
    // queueing thread should also have exited. make sure that it is not running.
    final Throwable thrown3 = assertThrows(ExecutionException.class, res::get, "exception is expected");
    assertEquals(expectedException, thrown3.getCause());
}
Also used : IndexedRecord(org.apache.avro.generic.IndexedRecord) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) HoodieException(org.apache.hudi.exception.HoodieException) ExecutionException(java.util.concurrent.ExecutionException) Tuple2(scala.Tuple2) BoundedInMemoryQueue(org.apache.hudi.common.util.queue.BoundedInMemoryQueue) DefaultSizeEstimator(org.apache.hudi.common.util.DefaultSizeEstimator) Test(org.junit.jupiter.api.Test) Timeout(org.junit.jupiter.api.Timeout)

Example 15 with DefaultSizeEstimator

use of org.apache.hudi.common.util.DefaultSizeEstimator in project hudi by apache.

the class BufferedConnectWriter method init.

private void init() {
    try {
        // Load and batch all incoming records in a map
        long memoryForMerge = IOUtils.getMaxMemoryPerPartitionMerge(context.getTaskContextSupplier(), config);
        LOG.info("MaxMemoryPerPartitionMerge => " + memoryForMerge);
        this.bufferedRecords = new ExternalSpillableMap<>(memoryForMerge, config.getSpillableMapBasePath(), new DefaultSizeEstimator(), new HoodieRecordSizeEstimator(new Schema.Parser().parse(config.getSchema())), config.getCommonConfig().getSpillableDiskMapType(), config.getCommonConfig().isBitCaskDiskMapCompressionEnabled());
    } catch (IOException io) {
        throw new HoodieIOException("Cannot instantiate an ExternalSpillableMap", io);
    }
}
Also used : HoodieRecordSizeEstimator(org.apache.hudi.common.util.HoodieRecordSizeEstimator) HoodieIOException(org.apache.hudi.exception.HoodieIOException) IOException(java.io.IOException) HoodieIOException(org.apache.hudi.exception.HoodieIOException) DefaultSizeEstimator(org.apache.hudi.common.util.DefaultSizeEstimator)

Aggregations

DefaultSizeEstimator (org.apache.hudi.common.util.DefaultSizeEstimator)15 IOException (java.io.IOException)10 HoodieRecordSizeEstimator (org.apache.hudi.common.util.HoodieRecordSizeEstimator)9 HoodieRecord (org.apache.hudi.common.model.HoodieRecord)8 Schema (org.apache.avro.Schema)7 IndexedRecord (org.apache.avro.generic.IndexedRecord)7 HoodieRecordPayload (org.apache.hudi.common.model.HoodieRecordPayload)7 ParameterizedTest (org.junit.jupiter.params.ParameterizedTest)7 MethodSource (org.junit.jupiter.params.provider.MethodSource)6 File (java.io.File)5 ExternalSpillableMap (org.apache.hudi.common.util.collection.ExternalSpillableMap)5 HoodieFileGroupId (org.apache.hudi.common.model.HoodieFileGroupId)4 UncheckedIOException (java.io.UncheckedIOException)3 ArrayList (java.util.ArrayList)3 HoodieAvroRecord (org.apache.hudi.common.model.HoodieAvroRecord)3 Test (org.junit.jupiter.api.Test)3 GenericRecord (org.apache.avro.generic.GenericRecord)2 HoodieAvroPayload (org.apache.hudi.common.model.HoodieAvroPayload)2 HoodieKey (org.apache.hudi.common.model.HoodieKey)2 HoodieInstant (org.apache.hudi.common.table.timeline.HoodieInstant)2