Search in sources :

Example 1 with HoodieIndexException

use of org.apache.hudi.exception.HoodieIndexException in project hudi by apache.

the class SimpleBloomFilter method serializeToString.

/**
 * Serialize the bloom filter as a string.
 */
@Override
public String serializeToString() {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    DataOutputStream dos = new DataOutputStream(baos);
    try {
        filter.write(dos);
        byte[] bytes = baos.toByteArray();
        dos.close();
        return Base64CodecUtil.encode(bytes);
    } catch (IOException e) {
        throw new HoodieIndexException("Could not serialize BloomFilter instance", e);
    }
}
Also used : DataOutputStream(java.io.DataOutputStream) ByteArrayOutputStream(java.io.ByteArrayOutputStream) IOException(java.io.IOException) HoodieIndexException(org.apache.hudi.exception.HoodieIndexException)

Example 2 with HoodieIndexException

use of org.apache.hudi.exception.HoodieIndexException in project hudi by apache.

the class SparkHoodieHBaseIndex method updateLocationFunction.

private Function2<Integer, Iterator<WriteStatus>, Iterator<WriteStatus>> updateLocationFunction() {
    return (partition, statusIterator) -> {
        List<WriteStatus> writeStatusList = new ArrayList<>();
        // Grab the global HBase connection
        synchronized (SparkHoodieHBaseIndex.class) {
            if (hbaseConnection == null || hbaseConnection.isClosed()) {
                hbaseConnection = getHBaseConnection();
            }
        }
        final long startTimeForPutsTask = DateTime.now().getMillis();
        LOG.info("startTimeForPutsTask for this task: " + startTimeForPutsTask);
        try (BufferedMutator mutator = hbaseConnection.getBufferedMutator(TableName.valueOf(tableName))) {
            final RateLimiter limiter = RateLimiter.create(multiPutBatchSize, TimeUnit.SECONDS);
            while (statusIterator.hasNext()) {
                WriteStatus writeStatus = statusIterator.next();
                List<Mutation> mutations = new ArrayList<>();
                try {
                    long numOfInserts = writeStatus.getStat().getNumInserts();
                    LOG.info("Num of inserts in this WriteStatus: " + numOfInserts);
                    LOG.info("Total inserts in this job: " + this.totalNumInserts);
                    LOG.info("multiPutBatchSize for this job: " + this.multiPutBatchSize);
                    // Any calls beyond `multiPutBatchSize` within a second will be rate limited
                    for (HoodieRecord rec : writeStatus.getWrittenRecords()) {
                        if (!writeStatus.isErrored(rec.getKey())) {
                            Option<HoodieRecordLocation> loc = rec.getNewLocation();
                            if (loc.isPresent()) {
                                if (rec.getCurrentLocation() != null) {
                                    // This is an update, no need to update index
                                    continue;
                                }
                                Put put = new Put(Bytes.toBytes(rec.getRecordKey()));
                                put.addColumn(SYSTEM_COLUMN_FAMILY, COMMIT_TS_COLUMN, Bytes.toBytes(loc.get().getInstantTime()));
                                put.addColumn(SYSTEM_COLUMN_FAMILY, FILE_NAME_COLUMN, Bytes.toBytes(loc.get().getFileId()));
                                put.addColumn(SYSTEM_COLUMN_FAMILY, PARTITION_PATH_COLUMN, Bytes.toBytes(rec.getPartitionPath()));
                                mutations.add(put);
                            } else {
                                // Delete existing index for a deleted record
                                Delete delete = new Delete(Bytes.toBytes(rec.getRecordKey()));
                                mutations.add(delete);
                            }
                        }
                        if (mutations.size() < multiPutBatchSize) {
                            continue;
                        }
                        doMutations(mutator, mutations, limiter);
                    }
                    // process remaining puts and deletes, if any
                    doMutations(mutator, mutations, limiter);
                } catch (Exception e) {
                    Exception we = new Exception("Error updating index for " + writeStatus, e);
                    LOG.error(we);
                    writeStatus.setGlobalError(we);
                }
                writeStatusList.add(writeStatus);
            }
            final long endPutsTime = DateTime.now().getMillis();
            LOG.info("hbase puts task time for this task: " + (endPutsTime - startTimeForPutsTask));
        } catch (IOException e) {
            throw new HoodieIndexException("Failed to Update Index locations because of exception with HBase Client", e);
        }
        return writeStatusList.iterator();
    };
}
Also used : HoodieTable(org.apache.hudi.table.HoodieTable) Mutation(org.apache.hadoop.hbase.client.Mutation) Function2(org.apache.spark.api.java.function.Function2) Result(org.apache.hadoop.hbase.client.Result) Date(java.util.Date) RateLimiter(org.apache.hudi.common.util.RateLimiter) HoodieJavaRDD(org.apache.hudi.data.HoodieJavaRDD) Logger(org.apache.log4j.Logger) Delete(org.apache.hadoop.hbase.client.Delete) Partitioner(org.apache.spark.Partitioner) Configuration(org.apache.hadoop.conf.Configuration) Map(java.util.Map) HoodieDependentSystemUnavailableException(org.apache.hudi.exception.HoodieDependentSystemUnavailableException) HoodieSparkEngineContext(org.apache.hudi.client.common.HoodieSparkEngineContext) BufferedMutator(org.apache.hadoop.hbase.client.BufferedMutator) HoodieActiveTimeline(org.apache.hudi.common.table.timeline.HoodieActiveTimeline) HoodieIndexException(org.apache.hudi.exception.HoodieIndexException) Get(org.apache.hadoop.hbase.client.Get) Tuple2(scala.Tuple2) HoodieIndex(org.apache.hudi.index.HoodieIndex) Serializable(java.io.Serializable) List(java.util.List) HoodieRecordLocation(org.apache.hudi.common.model.HoodieRecordLocation) RegionLocator(org.apache.hadoop.hbase.client.RegionLocator) HBaseConfiguration(org.apache.hadoop.hbase.HBaseConfiguration) ResultScanner(org.apache.hadoop.hbase.client.ResultScanner) ReflectionUtils(org.apache.hudi.common.util.ReflectionUtils) SparkMemoryUtils(org.apache.hudi.client.utils.SparkMemoryUtils) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) Option(org.apache.hudi.common.util.Option) HashMap(java.util.HashMap) HoodieEngineContext(org.apache.hudi.common.engine.HoodieEngineContext) ArrayList(java.util.ArrayList) HTable(org.apache.hadoop.hbase.client.HTable) HoodieTableMetaClient(org.apache.hudi.common.table.HoodieTableMetaClient) EmptyHoodieRecordPayload(org.apache.hudi.common.model.EmptyHoodieRecordPayload) LinkedList(java.util.LinkedList) HoodieTimeline(org.apache.hudi.common.table.timeline.HoodieTimeline) JavaRDD(org.apache.spark.api.java.JavaRDD) Bytes(org.apache.hadoop.hbase.util.Bytes) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) TableName(org.apache.hadoop.hbase.TableName) HoodieData(org.apache.hudi.common.data.HoodieData) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) Iterator(java.util.Iterator) Put(org.apache.hadoop.hbase.client.Put) SparkConf(org.apache.spark.SparkConf) DateTime(org.joda.time.DateTime) HoodieHBaseIndexConfig(org.apache.hudi.config.HoodieHBaseIndexConfig) IOException(java.io.IOException) JavaPairRDD(org.apache.spark.api.java.JavaPairRDD) HoodieAvroRecord(org.apache.hudi.common.model.HoodieAvroRecord) ConnectionFactory(org.apache.hadoop.hbase.client.ConnectionFactory) Scan(org.apache.hadoop.hbase.client.Scan) TimeUnit(java.util.concurrent.TimeUnit) WriteStatus(org.apache.hudi.client.WriteStatus) HoodieRecordPayload(org.apache.hudi.common.model.HoodieRecordPayload) HRegionLocation(org.apache.hadoop.hbase.HRegionLocation) Connection(org.apache.hadoop.hbase.client.Connection) HoodieKey(org.apache.hudi.common.model.HoodieKey) LogManager(org.apache.log4j.LogManager) Delete(org.apache.hadoop.hbase.client.Delete) BufferedMutator(org.apache.hadoop.hbase.client.BufferedMutator) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) IOException(java.io.IOException) HoodieIndexException(org.apache.hudi.exception.HoodieIndexException) RateLimiter(org.apache.hudi.common.util.RateLimiter) Put(org.apache.hadoop.hbase.client.Put) HoodieDependentSystemUnavailableException(org.apache.hudi.exception.HoodieDependentSystemUnavailableException) HoodieIndexException(org.apache.hudi.exception.HoodieIndexException) IOException(java.io.IOException) List(java.util.List) ArrayList(java.util.ArrayList) LinkedList(java.util.LinkedList) Option(org.apache.hudi.common.util.Option) WriteStatus(org.apache.hudi.client.WriteStatus)

Example 3 with HoodieIndexException

use of org.apache.hudi.exception.HoodieIndexException in project hudi by apache.

the class HoodieKeyLookupHandle method getBloomFilter.

private BloomFilter getBloomFilter() {
    BloomFilter bloomFilter = null;
    HoodieTimer timer = new HoodieTimer().startTimer();
    try {
        if (config.isMetadataBloomFilterIndexEnabled()) {
            bloomFilter = hoodieTable.getMetadataTable().getBloomFilter(partitionPathFileIDPair.getLeft(), partitionPathFileIDPair.getRight()).orElseThrow(() -> new HoodieIndexException("BloomFilter missing for " + partitionPathFileIDPair.getRight()));
        } else {
            try (HoodieFileReader reader = createNewFileReader()) {
                bloomFilter = reader.readBloomFilter();
            }
        }
    } catch (IOException e) {
        throw new HoodieIndexException(String.format("Error reading bloom filter from %s", getPartitionPathFileIDPair()), e);
    }
    LOG.info(String.format("Read bloom filter from %s in %d ms", partitionPathFileIDPair, timer.endTimer()));
    return bloomFilter;
}
Also used : HoodieTimer(org.apache.hudi.common.util.HoodieTimer) HoodieFileReader(org.apache.hudi.io.storage.HoodieFileReader) HoodieIndexException(org.apache.hudi.exception.HoodieIndexException) IOException(java.io.IOException) BloomFilter(org.apache.hudi.common.bloom.BloomFilter)

Example 4 with HoodieIndexException

use of org.apache.hudi.exception.HoodieIndexException in project hudi by apache.

the class HoodieIndexUtils method filterKeysFromFile.

/**
 * Given a list of row keys and one file, return only row keys existing in that file.
 *
 * @param filePath            - File to filter keys from
 * @param candidateRecordKeys - Candidate keys to filter
 * @return List of candidate keys that are available in the file
 */
public static List<String> filterKeysFromFile(Path filePath, List<String> candidateRecordKeys, Configuration configuration) throws HoodieIndexException {
    ValidationUtils.checkArgument(FSUtils.isBaseFile(filePath));
    List<String> foundRecordKeys = new ArrayList<>();
    try {
        // Load all rowKeys from the file, to double-confirm
        if (!candidateRecordKeys.isEmpty()) {
            HoodieTimer timer = new HoodieTimer().startTimer();
            HoodieFileReader fileReader = HoodieFileReaderFactory.getFileReader(configuration, filePath);
            Set<String> fileRowKeys = fileReader.filterRowKeys(new TreeSet<>(candidateRecordKeys));
            foundRecordKeys.addAll(fileRowKeys);
            LOG.info(String.format("Checked keys against file %s, in %d ms. #candidates (%d) #found (%d)", filePath, timer.endTimer(), candidateRecordKeys.size(), foundRecordKeys.size()));
            if (LOG.isDebugEnabled()) {
                LOG.debug("Keys matching for file " + filePath + " => " + foundRecordKeys);
            }
        }
    } catch (Exception e) {
        throw new HoodieIndexException("Error checking candidate keys against file.", e);
    }
    return foundRecordKeys;
}
Also used : ArrayList(java.util.ArrayList) HoodieTimer(org.apache.hudi.common.util.HoodieTimer) HoodieFileReader(org.apache.hudi.io.storage.HoodieFileReader) HoodieIndexException(org.apache.hudi.exception.HoodieIndexException) HoodieIndexException(org.apache.hudi.exception.HoodieIndexException)

Example 5 with HoodieIndexException

use of org.apache.hudi.exception.HoodieIndexException in project hudi by apache.

the class HoodieDynamicBoundedBloomFilter method serializeToString.

@Override
public String serializeToString() {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    DataOutputStream dos = new DataOutputStream(baos);
    try {
        internalDynamicBloomFilter.write(dos);
        byte[] bytes = baos.toByteArray();
        dos.close();
        return Base64CodecUtil.encode(bytes);
    } catch (IOException e) {
        throw new HoodieIndexException("Could not serialize BloomFilter instance", e);
    }
}
Also used : DataOutputStream(java.io.DataOutputStream) ByteArrayOutputStream(java.io.ByteArrayOutputStream) IOException(java.io.IOException) HoodieIndexException(org.apache.hudi.exception.HoodieIndexException)

Aggregations

HoodieIndexException (org.apache.hudi.exception.HoodieIndexException)5 IOException (java.io.IOException)4 ByteArrayOutputStream (java.io.ByteArrayOutputStream)2 DataOutputStream (java.io.DataOutputStream)2 ArrayList (java.util.ArrayList)2 HoodieTimer (org.apache.hudi.common.util.HoodieTimer)2 HoodieFileReader (org.apache.hudi.io.storage.HoodieFileReader)2 Serializable (java.io.Serializable)1 Date (java.util.Date)1 HashMap (java.util.HashMap)1 Iterator (java.util.Iterator)1 LinkedList (java.util.LinkedList)1 List (java.util.List)1 Map (java.util.Map)1 TimeUnit (java.util.concurrent.TimeUnit)1 Configuration (org.apache.hadoop.conf.Configuration)1 HBaseConfiguration (org.apache.hadoop.hbase.HBaseConfiguration)1 HRegionLocation (org.apache.hadoop.hbase.HRegionLocation)1 TableName (org.apache.hadoop.hbase.TableName)1 BufferedMutator (org.apache.hadoop.hbase.client.BufferedMutator)1