Search in sources :

Example 1 with OrcDataSourceId

use of io.prestosql.orc.OrcDataSourceId in project hetu-core by openlookeng.

the class TestDecimalStream method testReadToEdgeOfChunkShort.

@Test
public void testReadToEdgeOfChunkShort() throws IOException {
    OrcChunkLoader loader = new TestingChunkLoader(new OrcDataSourceId("read to edge of chunk short"), ImmutableList.of(encodeValues(ImmutableList.of(BigInteger.valueOf(Long.MAX_VALUE))), encodeValues(ImmutableList.of(BigInteger.valueOf(Long.MAX_VALUE)))));
    DecimalInputStream stream = new DecimalInputStream(loader);
    assertEquals(nextShortDecimalValue(stream), Long.MAX_VALUE);
    assertEquals(nextShortDecimalValue(stream), Long.MAX_VALUE);
}
Also used : OrcDataSourceId(io.prestosql.orc.OrcDataSourceId) Test(org.testng.annotations.Test)

Example 2 with OrcDataSourceId

use of io.prestosql.orc.OrcDataSourceId in project hetu-core by openlookeng.

the class TestDecimalStream method testSkipToEdgeOfChunkShort.

@Test
public void testSkipToEdgeOfChunkShort() throws IOException {
    OrcChunkLoader loader = new TestingChunkLoader(new OrcDataSourceId("skip to edge of chunk short"), ImmutableList.of(encodeValues(ImmutableList.of(BigInteger.valueOf(Long.MAX_VALUE))), encodeValues(ImmutableList.of(BigInteger.valueOf(Long.MAX_VALUE)))));
    DecimalInputStream stream = new DecimalInputStream(loader);
    stream.skip(1);
    assertEquals(nextShortDecimalValue(stream), Long.MAX_VALUE);
}
Also used : OrcDataSourceId(io.prestosql.orc.OrcDataSourceId) Test(org.testng.annotations.Test)

Example 3 with OrcDataSourceId

use of io.prestosql.orc.OrcDataSourceId in project hetu-core by openlookeng.

the class TestLongDecode method assertVIntRoundTrip.

private static void assertVIntRoundTrip(SliceOutput output, long value, boolean signed) throws IOException {
    // write using Hive's code
    output.reset();
    if (signed) {
        writeVslong(output, value);
    } else {
        writeVulong(output, value);
    }
    Slice hiveBytes = Slices.copyOf(output.slice());
    // write using Presto's code, and verify they are the same
    output.reset();
    writeVLong(output, value, signed);
    Slice prestoBytes = Slices.copyOf(output.slice());
    if (!prestoBytes.equals(hiveBytes)) {
        assertEquals(prestoBytes, hiveBytes);
    }
    // read using Hive's code
    if (signed) {
        long readValueOld = readVslong(hiveBytes.getInput());
        assertEquals(readValueOld, value);
    } else {
        long readValueOld = readVulong(hiveBytes.getInput());
        assertEquals(readValueOld, value);
    }
    // read using Presto's code
    long readValueNew = readVInt(signed, new OrcInputStream(OrcChunkLoader.create(new OrcDataSourceId("test"), hiveBytes, Optional.empty(), newSimpleAggregatedMemoryContext())));
    assertEquals(readValueNew, value);
}
Also used : OrcDataSourceId(io.prestosql.orc.OrcDataSourceId) Slice(io.airlift.slice.Slice)

Example 4 with OrcDataSourceId

use of io.prestosql.orc.OrcDataSourceId in project hetu-core by openlookeng.

the class OrcFileWriterFactory method createFileWriter.

@Override
public Optional<HiveFileWriter> createFileWriter(Path path, List<String> inputColumnNames, StorageFormat storageFormat, Properties schema, JobConf configuration, ConnectorSession session, Optional<AcidOutputFormat.Options> acidOptions, Optional<HiveACIDWriteType> acidWriteType) {
    if (!OrcOutputFormat.class.getName().equals(storageFormat.getOutputFormat())) {
        return Optional.empty();
    }
    CompressionKind compression = getCompression(schema, configuration);
    // existing tables and partitions may have columns in a different order than the writer is providing, so build
    // an index to rearrange columns in the proper order
    List<String> fileColumnNames = getColumnNames(schema);
    List<Type> fileColumnTypes = getColumnTypes(schema).stream().map(hiveType -> hiveType.getType(typeManager)).collect(toList());
    List<Type> dataFileColumnTypes = fileColumnTypes;
    int[] fileInputColumnIndexes = fileColumnNames.stream().mapToInt(inputColumnNames::indexOf).toArray();
    Optional<HiveFileWriter> deleteDeltaWriter = Optional.empty();
    if (AcidUtils.isTablePropertyTransactional(schema) && !AcidUtils.isInsertOnlyTable(schema)) {
        ImmutableList<String> orcFileColumnNames = ImmutableList.of(OrcPageSourceFactory.ACID_COLUMN_OPERATION, OrcPageSourceFactory.ACID_COLUMN_ORIGINAL_TRANSACTION, OrcPageSourceFactory.ACID_COLUMN_BUCKET, OrcPageSourceFactory.ACID_COLUMN_ROW_ID, OrcPageSourceFactory.ACID_COLUMN_CURRENT_TRANSACTION, OrcPageSourceFactory.ACID_COLUMN_ROW_STRUCT);
        ImmutableList.Builder<RowType.Field> fieldsBuilder = ImmutableList.builder();
        for (int i = 0; i < fileColumnNames.size(); i++) {
            fieldsBuilder.add(new RowType.Field(Optional.of(fileColumnNames.get(i)), fileColumnTypes.get(i)));
        }
        ImmutableList<Type> orcFileColumnTypes = ImmutableList.of(INTEGER, BIGINT, INTEGER, BIGINT, BIGINT, RowType.from(fieldsBuilder.build()));
        fileColumnNames = orcFileColumnNames;
        fileColumnTypes = orcFileColumnTypes;
        if (acidWriteType.isPresent() && acidWriteType.get() == HiveACIDWriteType.UPDATE) {
            AcidOutputFormat.Options deleteOptions = acidOptions.get().clone().writingDeleteDelta(true);
            Path deleteDeltaPath = AcidUtils.createFilename(path.getParent().getParent(), deleteOptions);
            deleteDeltaWriter = createFileWriter(deleteDeltaPath, inputColumnNames, storageFormat, schema, configuration, session, Optional.of(deleteOptions), Optional.of(HiveACIDWriteType.DELETE));
        }
    }
    try {
        FileSystem fileSystem = hdfsEnvironment.getFileSystem(session.getUser(), path, configuration);
        OrcDataSink orcDataSink = createOrcDataSink(session, fileSystem, path);
        Optional<Supplier<OrcDataSource>> validationInputFactory = Optional.empty();
        if (HiveSessionProperties.isOrcOptimizedWriterValidate(session)) {
            validationInputFactory = Optional.of(() -> {
                try {
                    FileStatus fileStatus = fileSystem.getFileStatus(path);
                    return new HdfsOrcDataSource(new OrcDataSourceId(path.toString()), fileStatus.getLen(), HiveSessionProperties.getOrcMaxMergeDistance(session), HiveSessionProperties.getOrcMaxBufferSize(session), HiveSessionProperties.getOrcStreamBufferSize(session), false, fileSystem.open(path), readStats, fileStatus.getModificationTime());
                } catch (IOException e) {
                    throw new PrestoException(HiveErrorCode.HIVE_WRITE_VALIDATION_FAILED, e);
                }
            });
        }
        Callable<Void> rollbackAction = () -> {
            fileSystem.delete(path, false);
            return null;
        };
        return Optional.of(new OrcFileWriter(orcDataSink, rollbackAction, fileColumnNames, fileColumnTypes, dataFileColumnTypes, compression, orcWriterOptions.withStripeMinSize(HiveSessionProperties.getOrcOptimizedWriterMinStripeSize(session)).withStripeMaxSize(HiveSessionProperties.getOrcOptimizedWriterMaxStripeSize(session)).withStripeMaxRowCount(HiveSessionProperties.getOrcOptimizedWriterMaxStripeRows(session)).withDictionaryMaxMemory(HiveSessionProperties.getOrcOptimizedWriterMaxDictionaryMemory(session)).withMaxStringStatisticsLimit(HiveSessionProperties.getOrcStringStatisticsLimit(session)), writeLegacyVersion, fileInputColumnIndexes, ImmutableMap.<String, String>builder().put(HiveMetadata.PRESTO_VERSION_NAME, nodeVersion.toString()).put(HiveMetadata.PRESTO_QUERY_ID_NAME, session.getQueryId()).put("hive.acid.version", String.valueOf(AcidUtils.OrcAcidVersion.ORC_ACID_VERSION)).build(), validationInputFactory, HiveSessionProperties.getOrcOptimizedWriterValidateMode(session), stats, acidOptions, acidWriteType, deleteDeltaWriter, path));
    } catch (IOException e) {
        throw new PrestoException(HiveErrorCode.HIVE_WRITER_OPEN_ERROR, "Error creating ORC file", e);
    }
}
Also used : StorageFormat(io.prestosql.plugin.hive.metastore.StorageFormat) FileSystem(org.apache.hadoop.fs.FileSystem) Flatten(org.weakref.jmx.Flatten) HiveUtil.getColumnTypes(io.prestosql.plugin.hive.HiveUtil.getColumnTypes) Callable(java.util.concurrent.Callable) INTEGER(io.prestosql.spi.type.IntegerType.INTEGER) FileStatus(org.apache.hadoop.fs.FileStatus) Supplier(java.util.function.Supplier) Inject(javax.inject.Inject) ImmutableList(com.google.common.collect.ImmutableList) ConnectorSession(io.prestosql.spi.connector.ConnectorSession) Managed(org.weakref.jmx.Managed) OrcDataSink(io.prestosql.orc.OrcDataSink) OrcConf(org.apache.orc.OrcConf) HiveUtil.getColumnNames(io.prestosql.plugin.hive.HiveUtil.getColumnNames) Objects.requireNonNull(java.util.Objects.requireNonNull) RowType(io.prestosql.spi.type.RowType) OrcPageSourceFactory(io.prestosql.plugin.hive.orc.OrcPageSourceFactory) Path(org.apache.hadoop.fs.Path) Type(io.prestosql.spi.type.Type) BIGINT(io.prestosql.spi.type.BigintType.BIGINT) ENGLISH(java.util.Locale.ENGLISH) PrestoException(io.prestosql.spi.PrestoException) Properties(java.util.Properties) ImmutableMap(com.google.common.collect.ImmutableMap) TypeManager(io.prestosql.spi.type.TypeManager) AcidOutputFormat(org.apache.hadoop.hive.ql.io.AcidOutputFormat) OrcWriterStats(io.prestosql.orc.OrcWriterStats) IOException(java.io.IOException) OrcDataSource(io.prestosql.orc.OrcDataSource) OrcWriterOptions(io.prestosql.orc.OrcWriterOptions) JobConf(org.apache.hadoop.mapred.JobConf) List(java.util.List) Collectors.toList(java.util.stream.Collectors.toList) OutputStreamOrcDataSink(io.prestosql.orc.OutputStreamOrcDataSink) CompressionKind(io.prestosql.orc.metadata.CompressionKind) HdfsOrcDataSource(io.prestosql.plugin.hive.orc.HdfsOrcDataSource) Optional(java.util.Optional) OrcOutputFormat(org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat) AcidUtils(org.apache.hadoop.hive.ql.io.AcidUtils) OrcDataSourceId(io.prestosql.orc.OrcDataSourceId) FileStatus(org.apache.hadoop.fs.FileStatus) ImmutableList(com.google.common.collect.ImmutableList) OrcDataSink(io.prestosql.orc.OrcDataSink) OutputStreamOrcDataSink(io.prestosql.orc.OutputStreamOrcDataSink) RowType(io.prestosql.spi.type.RowType) HdfsOrcDataSource(io.prestosql.plugin.hive.orc.HdfsOrcDataSource) PrestoException(io.prestosql.spi.PrestoException) AcidOutputFormat(org.apache.hadoop.hive.ql.io.AcidOutputFormat) FileSystem(org.apache.hadoop.fs.FileSystem) Supplier(java.util.function.Supplier) Path(org.apache.hadoop.fs.Path) CompressionKind(io.prestosql.orc.metadata.CompressionKind) OrcDataSourceId(io.prestosql.orc.OrcDataSourceId) IOException(java.io.IOException) RowType(io.prestosql.spi.type.RowType) Type(io.prestosql.spi.type.Type)

Example 5 with OrcDataSourceId

use of io.prestosql.orc.OrcDataSourceId in project hetu-core by openlookeng.

the class TestDecimalStream method testSkipToEdgeOfChunkLong.

@Test
public void testSkipToEdgeOfChunkLong() throws IOException {
    OrcChunkLoader loader = new TestingChunkLoader(new OrcDataSourceId("skip to edge of chunk long"), ImmutableList.of(encodeValues(ImmutableList.of(BigInteger.valueOf(Long.MAX_VALUE))), encodeValues(ImmutableList.of(BigInteger.valueOf(Long.MAX_VALUE)))));
    DecimalInputStream stream = new DecimalInputStream(loader);
    stream.skip(1);
    assertEquals(nextLongDecimalValue(stream), BigInteger.valueOf(Long.MAX_VALUE));
}
Also used : OrcDataSourceId(io.prestosql.orc.OrcDataSourceId) Test(org.testng.annotations.Test)

Aggregations

OrcDataSourceId (io.prestosql.orc.OrcDataSourceId)9 IOException (java.io.IOException)4 Path (org.apache.hadoop.fs.Path)4 Test (org.testng.annotations.Test)4 ImmutableList (com.google.common.collect.ImmutableList)3 ImmutableMap (com.google.common.collect.ImmutableMap)3 DataSize (io.airlift.units.DataSize)3 OrcDataSource (io.prestosql.orc.OrcDataSource)3 HdfsOrcDataSource (io.prestosql.plugin.hive.orc.HdfsOrcDataSource)3 PrestoException (io.prestosql.spi.PrestoException)3 ConnectorSession (io.prestosql.spi.connector.ConnectorSession)3 Type (io.prestosql.spi.type.Type)3 TypeManager (io.prestosql.spi.type.TypeManager)3 ArrayList (java.util.ArrayList)3 List (java.util.List)3 ENGLISH (java.util.Locale.ENGLISH)3 Objects.requireNonNull (java.util.Objects.requireNonNull)3 Optional (java.util.Optional)3 Properties (java.util.Properties)3 FileStatus (org.apache.hadoop.fs.FileStatus)3