Search in sources :

Example 1 with SliceUtf8.getCodePointAt

use of io.airlift.slice.SliceUtf8.getCodePointAt in project presto by prestodb.

the class StringFunctions method pad.

private static Slice pad(Slice text, long targetLength, Slice padString, int paddingOffset) {
    checkCondition(0 <= targetLength && targetLength <= Integer.MAX_VALUE, INVALID_FUNCTION_ARGUMENT, "Target length must be in the range [0.." + Integer.MAX_VALUE + "]");
    checkCondition(padString.length() > 0, INVALID_FUNCTION_ARGUMENT, "Padding string must not be empty");
    int textLength = countCodePoints(text);
    int resultLength = (int) targetLength;
    // if our target length is the same as our string then return our string
    if (textLength == resultLength) {
        return text;
    }
    // if our string is bigger than requested then truncate
    if (textLength > resultLength) {
        return SliceUtf8.substring(text, 0, resultLength);
    }
    // number of bytes in each code point
    int padStringLength = countCodePoints(padString);
    int[] padStringCounts = new int[padStringLength];
    for (int i = 0; i < padStringLength; ++i) {
        padStringCounts[i] = lengthOfCodePointSafe(padString, offsetOfCodePoint(padString, i));
    }
    // preallocate the result
    int bufferSize = text.length();
    for (int i = 0; i < resultLength - textLength; ++i) {
        bufferSize += padStringCounts[i % padStringLength];
    }
    Slice buffer = Slices.allocate(bufferSize);
    // fill in the existing string
    int countBytes = bufferSize - text.length();
    int startPointOfExistingText = (paddingOffset + countBytes) % bufferSize;
    buffer.setBytes(startPointOfExistingText, text);
    // assign the pad string while there's enough space for it
    int byteIndex = paddingOffset;
    for (int i = 0; i < countBytes / padString.length(); ++i) {
        buffer.setBytes(byteIndex, padString);
        byteIndex += padString.length();
    }
    // handle the tail: at most we assign padStringLength - 1 code points
    buffer.setBytes(byteIndex, padString.getBytes(0, paddingOffset + countBytes - byteIndex));
    return buffer;
}
Also used : Slice(io.airlift.slice.Slice) Slices.utf8Slice(io.airlift.slice.Slices.utf8Slice) Constraint(com.facebook.presto.type.Constraint) SliceUtf8.lengthOfCodePoint(io.airlift.slice.SliceUtf8.lengthOfCodePoint) SliceUtf8.offsetOfCodePoint(io.airlift.slice.SliceUtf8.offsetOfCodePoint)

Example 2 with SliceUtf8.getCodePointAt

use of io.airlift.slice.SliceUtf8.getCodePointAt in project presto by prestodb.

the class StringFunctions method fromUtf8.

@Description("decodes the UTF-8 encoded string")
@ScalarFunction
@LiteralParameters("x")
@SqlType(StandardTypes.VARCHAR)
public static Slice fromUtf8(@SqlType(StandardTypes.VARBINARY) Slice slice, @SqlType("varchar(x)") Slice replacementCharacter) {
    int count = countCodePoints(replacementCharacter);
    if (count > 1) {
        throw new PrestoException(INVALID_FUNCTION_ARGUMENT, "Replacement character string must empty or a single character");
    }
    OptionalInt replacementCodePoint;
    if (count == 1) {
        try {
            replacementCodePoint = OptionalInt.of(getCodePointAt(replacementCharacter, 0));
        } catch (InvalidUtf8Exception e) {
            throw new PrestoException(INVALID_FUNCTION_ARGUMENT, "Invalid replacement character");
        }
    } else {
        replacementCodePoint = OptionalInt.empty();
    }
    return SliceUtf8.fixInvalidUtf8(slice, replacementCodePoint);
}
Also used : InvalidUtf8Exception(io.airlift.slice.InvalidUtf8Exception) PrestoException(com.facebook.presto.spi.PrestoException) OptionalInt(java.util.OptionalInt) Constraint(com.facebook.presto.type.Constraint) SliceUtf8.lengthOfCodePoint(io.airlift.slice.SliceUtf8.lengthOfCodePoint) SliceUtf8.offsetOfCodePoint(io.airlift.slice.SliceUtf8.offsetOfCodePoint) ScalarFunction(com.facebook.presto.spi.function.ScalarFunction) Description(com.facebook.presto.spi.function.Description) LiteralParameters(com.facebook.presto.spi.function.LiteralParameters) SqlType(com.facebook.presto.spi.function.SqlType)

Example 3 with SliceUtf8.getCodePointAt

use of io.airlift.slice.SliceUtf8.getCodePointAt in project hetu-core by openlookeng.

the class LikeFunctions method unescapeLiteralLikePattern.

public static Slice unescapeLiteralLikePattern(Slice pattern, Optional<Slice> escape) {
    if (!escape.isPresent()) {
        return pattern;
    }
    int escapeChar = getEscapeCharacter(escape).map(c -> (int) c).orElse(-1);
    @SuppressWarnings("resource") DynamicSliceOutput output = new DynamicSliceOutput(pattern.length());
    boolean escaped = false;
    int position = 0;
    while (position < pattern.length()) {
        int currentChar = getCodePointAt(pattern, position);
        int lengthOfCodePoint = lengthOfCodePoint(currentChar);
        if (!escaped && (currentChar == escapeChar)) {
            escaped = true;
        } else {
            output.writeBytes(pattern, position, lengthOfCodePoint);
            escaped = false;
        }
        position += lengthOfCodePoint;
    }
    checkEscape(!escaped);
    return output.slice();
}
Also used : NonStrictUTF8Encoding(io.airlift.jcodings.specific.NonStrictUTF8Encoding) Regex(io.airlift.joni.Regex) Slice(io.airlift.slice.Slice) StandardTypes(io.prestosql.spi.type.StandardTypes) INVALID_FUNCTION_ARGUMENT(io.prestosql.spi.StandardErrorCode.INVALID_FUNCTION_ARGUMENT) LiteralParameters(io.prestosql.spi.function.LiteralParameters) OP_LINE_ANCHOR(io.airlift.joni.constants.SyntaxProperties.OP_LINE_ANCHOR) INEFFECTIVE_META_CHAR(io.airlift.joni.constants.MetaChar.INEFFECTIVE_META_CHAR) DynamicSliceOutput(io.airlift.slice.DynamicSliceOutput) LiteralParameter(io.prestosql.spi.function.LiteralParameter) ScalarOperator(io.prestosql.spi.function.ScalarOperator) Failures.checkCondition(io.prestosql.util.Failures.checkCondition) OperatorType(io.prestosql.spi.function.OperatorType) PrestoException(io.prestosql.spi.PrestoException) SliceUtf8.getCodePointAt(io.airlift.slice.SliceUtf8.getCodePointAt) SliceUtf8.lengthOfCodePoint(io.airlift.slice.SliceUtf8.lengthOfCodePoint) LikePatternType(io.prestosql.spi.type.LikePatternType) OP_DOT_ANYCHAR(io.airlift.joni.constants.SyntaxProperties.OP_DOT_ANYCHAR) UTF_8(java.nio.charset.StandardCharsets.UTF_8) Chars.padSpaces(io.prestosql.spi.type.Chars.padSpaces) ScalarFunction(io.prestosql.spi.function.ScalarFunction) OP_ASTERISK_ZERO_INF(io.airlift.joni.constants.SyntaxProperties.OP_ASTERISK_ZERO_INF) Option(io.airlift.joni.Option) SqlType(io.prestosql.spi.function.SqlType) Syntax(io.airlift.joni.Syntax) Optional(java.util.Optional) DynamicSliceOutput(io.airlift.slice.DynamicSliceOutput) SliceUtf8.lengthOfCodePoint(io.airlift.slice.SliceUtf8.lengthOfCodePoint)

Example 4 with SliceUtf8.getCodePointAt

use of io.airlift.slice.SliceUtf8.getCodePointAt in project hetu-core by openlookeng.

the class LikeFunctions method patternConstantPrefixBytes.

public static int patternConstantPrefixBytes(Slice pattern, Optional<Slice> escape) {
    int escapeChar = getEscapeCharacter(escape).map(c -> (int) c).orElse(-1);
    boolean escaped = false;
    int position = 0;
    while (position < pattern.length()) {
        int currentChar = getCodePointAt(pattern, position);
        if (!escaped && (currentChar == escapeChar)) {
            escaped = true;
        } else if (escaped) {
            checkEscape(currentChar == '%' || currentChar == '_' || currentChar == escapeChar);
            escaped = false;
        } else if ((currentChar == '%') || (currentChar == '_')) {
            return position;
        }
        position += lengthOfCodePoint(currentChar);
    }
    checkEscape(!escaped);
    return position;
}
Also used : NonStrictUTF8Encoding(io.airlift.jcodings.specific.NonStrictUTF8Encoding) Regex(io.airlift.joni.Regex) Slice(io.airlift.slice.Slice) StandardTypes(io.prestosql.spi.type.StandardTypes) INVALID_FUNCTION_ARGUMENT(io.prestosql.spi.StandardErrorCode.INVALID_FUNCTION_ARGUMENT) LiteralParameters(io.prestosql.spi.function.LiteralParameters) OP_LINE_ANCHOR(io.airlift.joni.constants.SyntaxProperties.OP_LINE_ANCHOR) INEFFECTIVE_META_CHAR(io.airlift.joni.constants.MetaChar.INEFFECTIVE_META_CHAR) DynamicSliceOutput(io.airlift.slice.DynamicSliceOutput) LiteralParameter(io.prestosql.spi.function.LiteralParameter) ScalarOperator(io.prestosql.spi.function.ScalarOperator) Failures.checkCondition(io.prestosql.util.Failures.checkCondition) OperatorType(io.prestosql.spi.function.OperatorType) PrestoException(io.prestosql.spi.PrestoException) SliceUtf8.getCodePointAt(io.airlift.slice.SliceUtf8.getCodePointAt) SliceUtf8.lengthOfCodePoint(io.airlift.slice.SliceUtf8.lengthOfCodePoint) LikePatternType(io.prestosql.spi.type.LikePatternType) OP_DOT_ANYCHAR(io.airlift.joni.constants.SyntaxProperties.OP_DOT_ANYCHAR) UTF_8(java.nio.charset.StandardCharsets.UTF_8) Chars.padSpaces(io.prestosql.spi.type.Chars.padSpaces) ScalarFunction(io.prestosql.spi.function.ScalarFunction) OP_ASTERISK_ZERO_INF(io.airlift.joni.constants.SyntaxProperties.OP_ASTERISK_ZERO_INF) Option(io.airlift.joni.Option) SqlType(io.prestosql.spi.function.SqlType) Syntax(io.airlift.joni.Syntax) Optional(java.util.Optional) SliceUtf8.lengthOfCodePoint(io.airlift.slice.SliceUtf8.lengthOfCodePoint)

Example 5 with SliceUtf8.getCodePointAt

use of io.airlift.slice.SliceUtf8.getCodePointAt in project urban-eureka by errir503.

the class StringFunctions method castToCodePoints.

private static int[] castToCodePoints(Slice slice) {
    int[] codePoints = new int[safeCountCodePoints(slice)];
    int position = 0;
    for (int index = 0; index < codePoints.length; index++) {
        codePoints[index] = getCodePointAt(slice, position);
        position += lengthOfCodePoint(slice, position);
    }
    return codePoints;
}
Also used : Constraint(com.facebook.presto.type.Constraint) SliceUtf8.lengthOfCodePoint(io.airlift.slice.SliceUtf8.lengthOfCodePoint) SliceUtf8.offsetOfCodePoint(io.airlift.slice.SliceUtf8.offsetOfCodePoint)

Aggregations

SliceUtf8.lengthOfCodePoint (io.airlift.slice.SliceUtf8.lengthOfCodePoint)24 Slice (io.airlift.slice.Slice)12 SliceUtf8.offsetOfCodePoint (io.airlift.slice.SliceUtf8.offsetOfCodePoint)12 Constraint (com.facebook.presto.type.Constraint)6 NonStrictUTF8Encoding (io.airlift.jcodings.specific.NonStrictUTF8Encoding)4 Option (io.airlift.joni.Option)4 Regex (io.airlift.joni.Regex)4 Syntax (io.airlift.joni.Syntax)4 INEFFECTIVE_META_CHAR (io.airlift.joni.constants.MetaChar.INEFFECTIVE_META_CHAR)4 OP_ASTERISK_ZERO_INF (io.airlift.joni.constants.SyntaxProperties.OP_ASTERISK_ZERO_INF)4 OP_DOT_ANYCHAR (io.airlift.joni.constants.SyntaxProperties.OP_DOT_ANYCHAR)4 OP_LINE_ANCHOR (io.airlift.joni.constants.SyntaxProperties.OP_LINE_ANCHOR)4 DynamicSliceOutput (io.airlift.slice.DynamicSliceOutput)4 InvalidUtf8Exception (io.airlift.slice.InvalidUtf8Exception)4 SliceUtf8.getCodePointAt (io.airlift.slice.SliceUtf8.getCodePointAt)4 Slices.utf8Slice (io.airlift.slice.Slices.utf8Slice)4 UTF_8 (java.nio.charset.StandardCharsets.UTF_8)4 Optional (java.util.Optional)4 OptionalInt (java.util.OptionalInt)4 PrestoException (io.prestosql.spi.PrestoException)3