Search in sources :

Example 1 with NGramUTF8StringBinaryTokenizerFactory

use of org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.NGramUTF8StringBinaryTokenizerFactory in project asterixdb by apache.

the class LSMInvertedIndexTestUtils method createHashedNGramInvIndexTestContext.

public static LSMInvertedIndexTestContext createHashedNGramInvIndexTestContext(LSMInvertedIndexTestHarness harness, InvertedIndexType invIndexType) throws IOException, HyracksDataException {
    ISerializerDeserializer[] fieldSerdes = getHashedIndexFieldSerdes(invIndexType);
    ITokenFactory tokenFactory = new HashedUTF8NGramTokenFactory();
    IBinaryTokenizerFactory tokenizerFactory = new NGramUTF8StringBinaryTokenizerFactory(TEST_GRAM_LENGTH, true, true, false, tokenFactory);
    LSMInvertedIndexTestContext testCtx = LSMInvertedIndexTestContext.create(harness, fieldSerdes, fieldSerdes.length - 1, tokenizerFactory, invIndexType, null, null, null, null, null, null);
    return testCtx;
}
Also used : IBinaryTokenizerFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.IBinaryTokenizerFactory) ITokenFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.ITokenFactory) HashedUTF8NGramTokenFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.HashedUTF8NGramTokenFactory) ISerializerDeserializer(org.apache.hyracks.api.dataflow.value.ISerializerDeserializer) NGramUTF8StringBinaryTokenizerFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.NGramUTF8StringBinaryTokenizerFactory)

Example 2 with NGramUTF8StringBinaryTokenizerFactory

use of org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.NGramUTF8StringBinaryTokenizerFactory in project asterixdb by apache.

the class LSMInvertedIndexTestUtils method createNGramInvIndexTestContext.

public static LSMInvertedIndexTestContext createNGramInvIndexTestContext(LSMInvertedIndexTestHarness harness, InvertedIndexType invIndexType) throws IOException, HyracksDataException {
    ISerializerDeserializer[] fieldSerdes = getNonHashedIndexFieldSerdes(invIndexType);
    ITokenFactory tokenFactory = new UTF8NGramTokenFactory();
    IBinaryTokenizerFactory tokenizerFactory = new NGramUTF8StringBinaryTokenizerFactory(TEST_GRAM_LENGTH, true, true, false, tokenFactory);
    LSMInvertedIndexTestContext testCtx = LSMInvertedIndexTestContext.create(harness, fieldSerdes, fieldSerdes.length - 1, tokenizerFactory, invIndexType, null, null, null, null, null, null);
    return testCtx;
}
Also used : UTF8NGramTokenFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.UTF8NGramTokenFactory) HashedUTF8NGramTokenFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.HashedUTF8NGramTokenFactory) IBinaryTokenizerFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.IBinaryTokenizerFactory) ITokenFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.ITokenFactory) ISerializerDeserializer(org.apache.hyracks.api.dataflow.value.ISerializerDeserializer) NGramUTF8StringBinaryTokenizerFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.NGramUTF8StringBinaryTokenizerFactory)

Aggregations

ISerializerDeserializer (org.apache.hyracks.api.dataflow.value.ISerializerDeserializer)2 HashedUTF8NGramTokenFactory (org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.HashedUTF8NGramTokenFactory)2 IBinaryTokenizerFactory (org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.IBinaryTokenizerFactory)2 ITokenFactory (org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.ITokenFactory)2 NGramUTF8StringBinaryTokenizerFactory (org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.NGramUTF8StringBinaryTokenizerFactory)2 UTF8NGramTokenFactory (org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.UTF8NGramTokenFactory)1