Search in sources :

Example 6 with ITokenFactory

use of org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.ITokenFactory in project asterixdb by apache.

the class LSMInvertedIndexTestUtils method createWordInvIndexTestContext.

public static LSMInvertedIndexTestContext createWordInvIndexTestContext(LSMInvertedIndexTestHarness harness, InvertedIndexType invIndexType) throws IOException, HyracksDataException {
    ISerializerDeserializer[] fieldSerdes = getNonHashedIndexFieldSerdes(invIndexType);
    ITokenFactory tokenFactory = new UTF8WordTokenFactory();
    IBinaryTokenizerFactory tokenizerFactory = new DelimitedUTF8StringBinaryTokenizerFactory(true, false, tokenFactory);
    LSMInvertedIndexTestContext testCtx = LSMInvertedIndexTestContext.create(harness, fieldSerdes, fieldSerdes.length - 1, tokenizerFactory, invIndexType, null, null, null, null, null, null);
    return testCtx;
}
Also used : DelimitedUTF8StringBinaryTokenizerFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.DelimitedUTF8StringBinaryTokenizerFactory) IBinaryTokenizerFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.IBinaryTokenizerFactory) ITokenFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.ITokenFactory) ISerializerDeserializer(org.apache.hyracks.api.dataflow.value.ISerializerDeserializer) UTF8WordTokenFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.UTF8WordTokenFactory) HashedUTF8WordTokenFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.HashedUTF8WordTokenFactory)

Example 7 with ITokenFactory

use of org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.ITokenFactory in project asterixdb by apache.

the class LSMInvertedIndexTestUtils method createNGramInvIndexTestContext.

public static LSMInvertedIndexTestContext createNGramInvIndexTestContext(LSMInvertedIndexTestHarness harness, InvertedIndexType invIndexType) throws IOException, HyracksDataException {
    ISerializerDeserializer[] fieldSerdes = getNonHashedIndexFieldSerdes(invIndexType);
    ITokenFactory tokenFactory = new UTF8NGramTokenFactory();
    IBinaryTokenizerFactory tokenizerFactory = new NGramUTF8StringBinaryTokenizerFactory(TEST_GRAM_LENGTH, true, true, false, tokenFactory);
    LSMInvertedIndexTestContext testCtx = LSMInvertedIndexTestContext.create(harness, fieldSerdes, fieldSerdes.length - 1, tokenizerFactory, invIndexType, null, null, null, null, null, null);
    return testCtx;
}
Also used : UTF8NGramTokenFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.UTF8NGramTokenFactory) HashedUTF8NGramTokenFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.HashedUTF8NGramTokenFactory) IBinaryTokenizerFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.IBinaryTokenizerFactory) ITokenFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.ITokenFactory) ISerializerDeserializer(org.apache.hyracks.api.dataflow.value.ISerializerDeserializer) NGramUTF8StringBinaryTokenizerFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.NGramUTF8StringBinaryTokenizerFactory)

Example 8 with ITokenFactory

use of org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.ITokenFactory in project asterixdb by apache.

the class LSMInvertedIndexTestUtils method createHashedWordInvIndexTestContext.

public static LSMInvertedIndexTestContext createHashedWordInvIndexTestContext(LSMInvertedIndexTestHarness harness, InvertedIndexType invIndexType) throws IOException, HyracksDataException {
    ISerializerDeserializer[] fieldSerdes = getHashedIndexFieldSerdes(invIndexType);
    ITokenFactory tokenFactory = new HashedUTF8WordTokenFactory();
    IBinaryTokenizerFactory tokenizerFactory = new DelimitedUTF8StringBinaryTokenizerFactory(true, false, tokenFactory);
    LSMInvertedIndexTestContext testCtx = LSMInvertedIndexTestContext.create(harness, fieldSerdes, fieldSerdes.length - 1, tokenizerFactory, invIndexType, null, null, null, null, null, null);
    return testCtx;
}
Also used : DelimitedUTF8StringBinaryTokenizerFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.DelimitedUTF8StringBinaryTokenizerFactory) HashedUTF8WordTokenFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.HashedUTF8WordTokenFactory) IBinaryTokenizerFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.IBinaryTokenizerFactory) ITokenFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.ITokenFactory) ISerializerDeserializer(org.apache.hyracks.api.dataflow.value.ISerializerDeserializer)

Example 9 with ITokenFactory

use of org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.ITokenFactory in project asterixdb by apache.

the class CountHashedGramTokensDescriptor method createEvaluatorFactory.

@Override
public IScalarEvaluatorFactory createEvaluatorFactory(final IScalarEvaluatorFactory[] args) throws AlgebricksException {
    return new IScalarEvaluatorFactory() {

        private static final long serialVersionUID = 1L;

        @Override
        public IScalarEvaluator createScalarEvaluator(IHyracksTaskContext ctx) throws HyracksDataException {
            ITokenFactory tokenFactory = new HashedUTF8NGramTokenFactory();
            NGramUTF8StringBinaryTokenizer tokenizer = new NGramUTF8StringBinaryTokenizer(3, true, false, true, tokenFactory);
            return new GramTokensEvaluator(args, ctx, tokenizer, BuiltinType.AINT32);
        }
    };
}
Also used : IHyracksTaskContext(org.apache.hyracks.api.context.IHyracksTaskContext) GramTokensEvaluator(org.apache.asterix.runtime.evaluators.common.GramTokensEvaluator) NGramUTF8StringBinaryTokenizer(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.NGramUTF8StringBinaryTokenizer) ITokenFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.ITokenFactory) HashedUTF8NGramTokenFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.HashedUTF8NGramTokenFactory) IScalarEvaluatorFactory(org.apache.hyracks.algebricks.runtime.base.IScalarEvaluatorFactory)

Example 10 with ITokenFactory

use of org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.ITokenFactory in project asterixdb by apache.

the class CountHashedWordTokensDescriptor method createEvaluatorFactory.

@Override
public IScalarEvaluatorFactory createEvaluatorFactory(final IScalarEvaluatorFactory[] args) {
    return new IScalarEvaluatorFactory() {

        private static final long serialVersionUID = 1L;

        @Override
        public IScalarEvaluator createScalarEvaluator(IHyracksTaskContext ctx) throws HyracksDataException {
            ITokenFactory tokenFactory = new HashedUTF8WordTokenFactory();
            IBinaryTokenizer tokenizer = new DelimitedUTF8StringBinaryTokenizer(false, true, tokenFactory);
            return new WordTokensEvaluator(args, ctx, tokenizer, BuiltinType.AINT32);
        }
    };
}
Also used : HashedUTF8WordTokenFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.HashedUTF8WordTokenFactory) DelimitedUTF8StringBinaryTokenizer(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.DelimitedUTF8StringBinaryTokenizer) IHyracksTaskContext(org.apache.hyracks.api.context.IHyracksTaskContext) IBinaryTokenizer(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.IBinaryTokenizer) WordTokensEvaluator(org.apache.asterix.runtime.evaluators.common.WordTokensEvaluator) ITokenFactory(org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.ITokenFactory) IScalarEvaluatorFactory(org.apache.hyracks.algebricks.runtime.base.IScalarEvaluatorFactory)

Aggregations

ITokenFactory (org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.ITokenFactory)10 IScalarEvaluatorFactory (org.apache.hyracks.algebricks.runtime.base.IScalarEvaluatorFactory)6 IHyracksTaskContext (org.apache.hyracks.api.context.IHyracksTaskContext)6 ISerializerDeserializer (org.apache.hyracks.api.dataflow.value.ISerializerDeserializer)4 HashedUTF8NGramTokenFactory (org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.HashedUTF8NGramTokenFactory)4 HashedUTF8WordTokenFactory (org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.HashedUTF8WordTokenFactory)4 IBinaryTokenizerFactory (org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.IBinaryTokenizerFactory)4 GramTokensEvaluator (org.apache.asterix.runtime.evaluators.common.GramTokensEvaluator)3 WordTokensEvaluator (org.apache.asterix.runtime.evaluators.common.WordTokensEvaluator)3 DelimitedUTF8StringBinaryTokenizer (org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.DelimitedUTF8StringBinaryTokenizer)3 IBinaryTokenizer (org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.IBinaryTokenizer)3 NGramUTF8StringBinaryTokenizer (org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.NGramUTF8StringBinaryTokenizer)3 DelimitedUTF8StringBinaryTokenizerFactory (org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.DelimitedUTF8StringBinaryTokenizerFactory)2 NGramUTF8StringBinaryTokenizerFactory (org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.NGramUTF8StringBinaryTokenizerFactory)2 UTF8NGramTokenFactory (org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.UTF8NGramTokenFactory)2 UTF8WordTokenFactory (org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.UTF8WordTokenFactory)2