Search in sources :

Example 1 with HdfsContext

use of io.prestosql.plugin.hive.HdfsEnvironment.HdfsContext in project hetu-core by openlookeng.

the class SyncPartitionMetadataProcedure method doSyncPartitionMetadata.

private void doSyncPartitionMetadata(ConnectorSession session, String schemaName, String tableName, String mode) {
    SyncMode syncMode = toSyncMode(mode);
    HdfsContext hdfsContext = new HdfsContext(session, schemaName, tableName);
    HiveIdentity identity = new HiveIdentity(session);
    SemiTransactionalHiveMetastore metastore = ((HiveMetadata) hiveMetadataFactory.get()).getMetastore();
    SchemaTableName schemaTableName = new SchemaTableName(schemaName, tableName);
    Table table = metastore.getTable(identity, schemaName, tableName).orElseThrow(() -> new TableNotFoundException(schemaTableName));
    if (table.getPartitionColumns().isEmpty()) {
        throw new PrestoException(INVALID_PROCEDURE_ARGUMENT, "Table is not partitioned: " + schemaTableName);
    }
    Path tableLocation = new Path(table.getStorage().getLocation());
    Set<String> partitionsToAdd;
    Set<String> partitionsToDrop;
    try {
        FileSystem fileSystem = hdfsEnvironment.getFileSystem(hdfsContext, tableLocation);
        List<String> partitionsInMetastore = metastore.getPartitionNames(identity, schemaName, tableName).orElseThrow(() -> new TableNotFoundException(schemaTableName));
        List<String> partitionsInFileSystem = listDirectory(fileSystem, fileSystem.getFileStatus(tableLocation), table.getPartitionColumns(), table.getPartitionColumns().size()).stream().map(fileStatus -> fileStatus.getPath().toUri()).map(uri -> tableLocation.toUri().relativize(uri).getPath()).collect(toImmutableList());
        // partitions in file system but not in metastore
        partitionsToAdd = difference(partitionsInFileSystem, partitionsInMetastore);
        // partitions in metastore but not in file system
        partitionsToDrop = difference(partitionsInMetastore, partitionsInFileSystem);
    } catch (IOException e) {
        throw new PrestoException(HIVE_FILESYSTEM_ERROR, e);
    }
    syncPartitions(partitionsToAdd, partitionsToDrop, syncMode, metastore, session, table);
}
Also used : Path(org.apache.hadoop.fs.Path) MethodHandle(java.lang.invoke.MethodHandle) Partition(io.prestosql.plugin.hive.metastore.Partition) Provider(javax.inject.Provider) FileSystem(org.apache.hadoop.fs.FileSystem) HIVE_FILESYSTEM_ERROR(io.prestosql.plugin.hive.HiveErrorCode.HIVE_FILESYSTEM_ERROR) HdfsContext(io.prestosql.plugin.hive.HdfsEnvironment.HdfsContext) Argument(io.prestosql.spi.procedure.Procedure.Argument) PRESTO_QUERY_ID_NAME(io.prestosql.plugin.hive.HiveMetadata.PRESTO_QUERY_ID_NAME) FileStatus(org.apache.hadoop.fs.FileStatus) Supplier(java.util.function.Supplier) Inject(javax.inject.Inject) HashSet(java.util.HashSet) SchemaTableName(io.prestosql.spi.connector.SchemaTableName) Procedure(io.prestosql.spi.procedure.Procedure) ImmutableList(com.google.common.collect.ImmutableList) ConnectorSession(io.prestosql.spi.connector.ConnectorSession) TableNotFoundException(io.prestosql.spi.connector.TableNotFoundException) HivePartitionManager.extractPartitionValues(io.prestosql.plugin.hive.HivePartitionManager.extractPartitionValues) Objects.requireNonNull(java.util.Objects.requireNonNull) Path(org.apache.hadoop.fs.Path) INVALID_PROCEDURE_ARGUMENT(io.prestosql.spi.StandardErrorCode.INVALID_PROCEDURE_ARGUMENT) SemiTransactionalHiveMetastore(io.prestosql.plugin.hive.metastore.SemiTransactionalHiveMetastore) ENGLISH(java.util.Locale.ENGLISH) VARCHAR(io.prestosql.spi.type.StandardTypes.VARCHAR) HiveIdentity(io.prestosql.plugin.hive.authentication.HiveIdentity) PrestoException(io.prestosql.spi.PrestoException) ImmutableMap(com.google.common.collect.ImmutableMap) ImmutableList.toImmutableList(com.google.common.collect.ImmutableList.toImmutableList) Set(java.util.Set) IOException(java.io.IOException) ThreadContextClassLoader(io.prestosql.spi.classloader.ThreadContextClassLoader) MethodHandleUtil.methodHandle(io.prestosql.spi.block.MethodHandleUtil.methodHandle) Sets(com.google.common.collect.Sets) List(java.util.List) Stream(java.util.stream.Stream) Table(io.prestosql.plugin.hive.metastore.Table) Column(io.prestosql.plugin.hive.metastore.Column) Table(io.prestosql.plugin.hive.metastore.Table) SemiTransactionalHiveMetastore(io.prestosql.plugin.hive.metastore.SemiTransactionalHiveMetastore) PrestoException(io.prestosql.spi.PrestoException) IOException(java.io.IOException) SchemaTableName(io.prestosql.spi.connector.SchemaTableName) HiveIdentity(io.prestosql.plugin.hive.authentication.HiveIdentity) TableNotFoundException(io.prestosql.spi.connector.TableNotFoundException) FileSystem(org.apache.hadoop.fs.FileSystem) HdfsContext(io.prestosql.plugin.hive.HdfsEnvironment.HdfsContext)

Example 2 with HdfsContext

use of io.prestosql.plugin.hive.HdfsEnvironment.HdfsContext in project hetu-core by openlookeng.

the class AbstractTestHive method doInsertIntoNewPartition.

private void doInsertIntoNewPartition(HiveStorageFormat storageFormat, SchemaTableName tableName) throws Exception {
    // creating the table
    doCreateEmptyTable(tableName, storageFormat, CREATE_TABLE_COLUMNS_PARTITIONED);
    // insert the data
    String queryId = insertData(tableName, CREATE_TABLE_PARTITIONED_DATA);
    Set<String> existingFiles;
    try (Transaction transaction = newTransaction()) {
        // verify partitions were created
        HiveIdentity identity = new HiveIdentity(newSession());
        List<String> partitionNames = transaction.getMetastore(tableName.getSchemaName()).getPartitionNames(identity, tableName.getSchemaName(), tableName.getTableName()).orElseThrow(() -> new AssertionError("Table does not exist: " + tableName));
        assertEqualsIgnoreOrder(partitionNames, CREATE_TABLE_PARTITIONED_DATA.getMaterializedRows().stream().map(row -> "ds=" + row.getField(CREATE_TABLE_PARTITIONED_DATA.getTypes().size() - 1)).collect(toList()));
        // verify the node versions in partitions
        Map<String, Optional<Partition>> partitions = getMetastoreClient().getPartitionsByNames(identity, tableName.getSchemaName(), tableName.getTableName(), partitionNames);
        assertEquals(partitions.size(), partitionNames.size());
        for (String partitionName : partitionNames) {
            Partition partition = partitions.get(partitionName).get();
            assertEquals(partition.getParameters().get(PRESTO_VERSION_NAME), TEST_SERVER_VERSION);
            assertEquals(partition.getParameters().get(PRESTO_QUERY_ID_NAME), queryId);
        }
        // load the new table
        ConnectorSession session = newSession();
        ConnectorMetadata metadata = transaction.getMetadata();
        metadata.beginQuery(session);
        ConnectorTableHandle tableHandle = getTableHandle(metadata, tableName);
        List<ColumnHandle> columnHandles = filterNonHiddenColumnHandles(metadata.getColumnHandles(session, tableHandle).values());
        // verify the data
        MaterializedResult result = readTable(transaction, tableHandle, columnHandles, session, TupleDomain.all(), OptionalInt.empty(), Optional.of(storageFormat));
        assertEqualsIgnoreOrder(result.getMaterializedRows(), CREATE_TABLE_PARTITIONED_DATA.getMaterializedRows());
        // test rollback
        existingFiles = listAllDataFiles(transaction, tableName.getSchemaName(), tableName.getTableName());
        assertFalse(existingFiles.isEmpty());
        // test statistics
        for (String partitionName : partitionNames) {
            HiveBasicStatistics partitionStatistics = getBasicStatisticsForPartition(session, transaction, tableName, partitionName);
            assertEquals(partitionStatistics.getRowCount().getAsLong(), 1L);
            assertEquals(partitionStatistics.getFileCount().getAsLong(), 1L);
            assertGreaterThan(partitionStatistics.getInMemoryDataSizeInBytes().getAsLong(), 0L);
            assertGreaterThan(partitionStatistics.getOnDiskDataSizeInBytes().getAsLong(), 0L);
        }
    }
    Path stagingPathRoot;
    try (Transaction transaction = newTransaction()) {
        ConnectorSession session = newSession();
        ConnectorMetadata metadata = transaction.getMetadata();
        ConnectorTableHandle tableHandle = getTableHandle(metadata, tableName);
        metadata.beginQuery(session);
        // "stage" insert data
        ConnectorInsertTableHandle insertTableHandle = metadata.beginInsert(session, tableHandle);
        stagingPathRoot = getStagingPathRoot(insertTableHandle);
        ConnectorPageSink sink = pageSinkProvider.createPageSink(transaction.getTransactionHandle(), session, insertTableHandle);
        sink.appendPage(CREATE_TABLE_PARTITIONED_DATA_2ND.toPage());
        Collection<Slice> fragments = getFutureValue(sink.finish());
        metadata.finishInsert(session, insertTableHandle, fragments, ImmutableList.of());
        // verify all temp files start with the unique prefix
        HdfsContext context = new HdfsContext(session, tableName.getSchemaName(), tableName.getTableName());
        Set<String> tempFiles = listAllDataFiles(context, getStagingPathRoot(insertTableHandle));
        assertTrue(!tempFiles.isEmpty());
        for (String filePath : tempFiles) {
            assertThat(new Path(filePath).getName()).startsWith(session.getQueryId());
        }
        // rollback insert
        transaction.rollback();
    }
    // verify the data is unchanged
    try (Transaction transaction = newTransaction()) {
        ConnectorSession session = newSession();
        ConnectorMetadata metadata = transaction.getMetadata();
        metadata.beginQuery(session);
        ConnectorTableHandle tableHandle = getTableHandle(metadata, tableName);
        List<ColumnHandle> columnHandles = filterNonHiddenColumnHandles(metadata.getColumnHandles(session, tableHandle).values());
        MaterializedResult result = readTable(transaction, tableHandle, columnHandles, session, TupleDomain.all(), OptionalInt.empty(), Optional.empty());
        assertEqualsIgnoreOrder(result.getMaterializedRows(), CREATE_TABLE_PARTITIONED_DATA.getMaterializedRows());
        // verify we did not modify the table directory
        assertEquals(listAllDataFiles(transaction, tableName.getSchemaName(), tableName.getTableName()), existingFiles);
        // verify temp directory is empty
        HdfsContext context = new HdfsContext(session, tableName.getSchemaName(), tableName.getTableName());
        assertTrue(listAllDataFiles(context, stagingPathRoot).isEmpty());
    }
}
Also used : Path(org.apache.hadoop.fs.Path) Partition(io.prestosql.plugin.hive.metastore.Partition) HiveColumnHandle.bucketColumnHandle(io.prestosql.plugin.hive.HiveColumnHandle.bucketColumnHandle) ColumnHandle(io.prestosql.spi.connector.ColumnHandle) Optional(java.util.Optional) ConnectorInsertTableHandle(io.prestosql.spi.connector.ConnectorInsertTableHandle) HiveIdentity(io.prestosql.plugin.hive.authentication.HiveIdentity) ConnectorTableHandle(io.prestosql.spi.connector.ConnectorTableHandle) Slices.utf8Slice(io.airlift.slice.Slices.utf8Slice) Slice(io.airlift.slice.Slice) ConnectorSession(io.prestosql.spi.connector.ConnectorSession) TestingConnectorSession(io.prestosql.testing.TestingConnectorSession) ConnectorMetadata(io.prestosql.spi.connector.ConnectorMetadata) HdfsContext(io.prestosql.plugin.hive.HdfsEnvironment.HdfsContext) MaterializedResult(io.prestosql.testing.MaterializedResult) ConnectorPageSink(io.prestosql.spi.connector.ConnectorPageSink)

Example 3 with HdfsContext

use of io.prestosql.plugin.hive.HdfsEnvironment.HdfsContext in project hetu-core by openlookeng.

the class AbstractTestHive method doInsertIntoExistingPartition.

private void doInsertIntoExistingPartition(HiveStorageFormat storageFormat, SchemaTableName tableName) throws Exception {
    // creating the table
    doCreateEmptyTable(tableName, storageFormat, CREATE_TABLE_COLUMNS_PARTITIONED);
    MaterializedResult.Builder resultBuilder = MaterializedResult.resultBuilder(SESSION, CREATE_TABLE_PARTITIONED_DATA.getTypes());
    for (int i = 0; i < 3; i++) {
        // insert the data
        insertData(tableName, CREATE_TABLE_PARTITIONED_DATA);
        try (Transaction transaction = newTransaction()) {
            ConnectorSession session = newSession();
            ConnectorMetadata metadata = transaction.getMetadata();
            metadata.beginQuery(session);
            ConnectorTableHandle tableHandle = getTableHandle(metadata, tableName);
            // verify partitions were created
            List<String> partitionNames = transaction.getMetastore(tableName.getSchemaName()).getPartitionNames(new HiveIdentity(session), tableName.getSchemaName(), tableName.getTableName()).orElseThrow(() -> new AssertionError("Table does not exist: " + tableName));
            assertEqualsIgnoreOrder(partitionNames, CREATE_TABLE_PARTITIONED_DATA.getMaterializedRows().stream().map(row -> "ds=" + row.getField(CREATE_TABLE_PARTITIONED_DATA.getTypes().size() - 1)).collect(toList()));
            // load the new table
            List<ColumnHandle> columnHandles = filterNonHiddenColumnHandles(metadata.getColumnHandles(session, tableHandle).values());
            // verify the data
            resultBuilder.rows(CREATE_TABLE_PARTITIONED_DATA.getMaterializedRows());
            MaterializedResult result = readTable(transaction, tableHandle, columnHandles, session, TupleDomain.all(), OptionalInt.empty(), Optional.of(storageFormat));
            assertEqualsIgnoreOrder(result.getMaterializedRows(), resultBuilder.build().getMaterializedRows());
            // test statistics
            for (String partitionName : partitionNames) {
                HiveBasicStatistics statistics = getBasicStatisticsForPartition(session, transaction, tableName, partitionName);
                assertEquals(statistics.getRowCount().getAsLong(), i + 1L);
                assertEquals(statistics.getFileCount().getAsLong(), i + 1L);
                assertGreaterThan(statistics.getInMemoryDataSizeInBytes().getAsLong(), 0L);
                assertGreaterThan(statistics.getOnDiskDataSizeInBytes().getAsLong(), 0L);
            }
        }
    }
    // test rollback
    Set<String> existingFiles;
    Path stagingPathRoot;
    try (Transaction transaction = newTransaction()) {
        ConnectorMetadata metadata = transaction.getMetadata();
        ConnectorSession session = newSession();
        existingFiles = listAllDataFiles(transaction, tableName.getSchemaName(), tableName.getTableName());
        assertFalse(existingFiles.isEmpty());
        ConnectorTableHandle tableHandle = getTableHandle(metadata, tableName);
        metadata.beginQuery(session);
        // "stage" insert data
        ConnectorInsertTableHandle insertTableHandle = metadata.beginInsert(session, tableHandle);
        stagingPathRoot = getStagingPathRoot(insertTableHandle);
        ConnectorPageSink sink = pageSinkProvider.createPageSink(transaction.getTransactionHandle(), session, insertTableHandle);
        sink.appendPage(CREATE_TABLE_PARTITIONED_DATA.toPage());
        sink.appendPage(CREATE_TABLE_PARTITIONED_DATA.toPage());
        Collection<Slice> fragments = getFutureValue(sink.finish());
        metadata.finishInsert(session, insertTableHandle, fragments, ImmutableList.of());
        // verify all temp files start with the unique prefix
        HdfsContext context = new HdfsContext(session, tableName.getSchemaName(), tableName.getTableName());
        Set<String> tempFiles = listAllDataFiles(context, getStagingPathRoot(insertTableHandle));
        assertTrue(!tempFiles.isEmpty());
        for (String filePath : tempFiles) {
            assertThat(new Path(filePath).getName()).startsWith(session.getQueryId());
        }
        // verify statistics are visible from within of the current transaction
        List<String> partitionNames = transaction.getMetastore(tableName.getSchemaName()).getPartitionNames(new HiveIdentity(session), tableName.getSchemaName(), tableName.getTableName()).orElseThrow(() -> new AssertionError("Table does not exist: " + tableName));
        for (String partitionName : partitionNames) {
            HiveBasicStatistics partitionStatistics = getBasicStatisticsForPartition(session, transaction, tableName, partitionName);
            assertEquals(partitionStatistics.getRowCount().getAsLong(), 5L);
        }
        // rollback insert
        transaction.rollback();
    }
    try (Transaction transaction = newTransaction()) {
        ConnectorMetadata metadata = transaction.getMetadata();
        ConnectorSession session = newSession();
        metadata.beginQuery(session);
        ConnectorTableHandle tableHandle = getTableHandle(metadata, tableName);
        List<ColumnHandle> columnHandles = filterNonHiddenColumnHandles(metadata.getColumnHandles(session, tableHandle).values());
        // verify the data is unchanged
        MaterializedResult result = readTable(transaction, tableHandle, columnHandles, session, TupleDomain.all(), OptionalInt.empty(), Optional.empty());
        assertEqualsIgnoreOrder(result.getMaterializedRows(), resultBuilder.build().getMaterializedRows());
        // verify we did not modify the table directory
        assertEquals(listAllDataFiles(transaction, tableName.getSchemaName(), tableName.getTableName()), existingFiles);
        // verify temp directory is empty
        HdfsContext hdfsContext = new HdfsContext(session, tableName.getSchemaName(), tableName.getTableName());
        assertTrue(listAllDataFiles(hdfsContext, stagingPathRoot).isEmpty());
        // verify statistics have been rolled back
        HiveIdentity identity = new HiveIdentity(session);
        List<String> partitionNames = transaction.getMetastore(tableName.getSchemaName()).getPartitionNames(identity, tableName.getSchemaName(), tableName.getTableName()).orElseThrow(() -> new AssertionError("Table does not exist: " + tableName));
        for (String partitionName : partitionNames) {
            HiveBasicStatistics partitionStatistics = getBasicStatisticsForPartition(session, transaction, tableName, partitionName);
            assertEquals(partitionStatistics.getRowCount().getAsLong(), 3L);
        }
    }
}
Also used : Path(org.apache.hadoop.fs.Path) HiveColumnHandle.bucketColumnHandle(io.prestosql.plugin.hive.HiveColumnHandle.bucketColumnHandle) ColumnHandle(io.prestosql.spi.connector.ColumnHandle) ConnectorInsertTableHandle(io.prestosql.spi.connector.ConnectorInsertTableHandle) Constraint(io.prestosql.spi.connector.Constraint) HiveIdentity(io.prestosql.plugin.hive.authentication.HiveIdentity) ConnectorTableHandle(io.prestosql.spi.connector.ConnectorTableHandle) Slices.utf8Slice(io.airlift.slice.Slices.utf8Slice) Slice(io.airlift.slice.Slice) ConnectorSession(io.prestosql.spi.connector.ConnectorSession) TestingConnectorSession(io.prestosql.testing.TestingConnectorSession) ConnectorMetadata(io.prestosql.spi.connector.ConnectorMetadata) HdfsContext(io.prestosql.plugin.hive.HdfsEnvironment.HdfsContext) MaterializedResult(io.prestosql.testing.MaterializedResult) ConnectorPageSink(io.prestosql.spi.connector.ConnectorPageSink)

Example 4 with HdfsContext

use of io.prestosql.plugin.hive.HdfsEnvironment.HdfsContext in project hetu-core by openlookeng.

the class AbstractTestHive method assertEmptyFile.

private void assertEmptyFile(HiveStorageFormat format) throws Exception {
    SchemaTableName tableName = temporaryTable("empty_file");
    try {
        List<Column> columns = ImmutableList.of(new Column("test", HIVE_STRING, Optional.empty()));
        createEmptyTable(tableName, format, columns, ImmutableList.of());
        try (Transaction transaction = newTransaction()) {
            ConnectorSession session = newSession();
            ConnectorMetadata metadata = transaction.getMetadata();
            metadata.beginQuery(session);
            ConnectorTableHandle tableHandle = getTableHandle(metadata, tableName);
            List<ColumnHandle> columnHandles = filterNonHiddenColumnHandles(metadata.getColumnHandles(session, tableHandle).values());
            Table table = transaction.getMetastore(tableName.getSchemaName()).getTable(new HiveIdentity(session), tableName.getSchemaName(), tableName.getTableName()).orElseThrow(AssertionError::new);
            // verify directory is empty
            HdfsContext context = new HdfsContext(session, tableName.getSchemaName(), tableName.getTableName());
            Path location = new Path(table.getStorage().getLocation());
            assertTrue(listDirectory(context, location).isEmpty());
            // read table with empty directory
            readTable(transaction, tableHandle, columnHandles, session, TupleDomain.all(), OptionalInt.of(0), Optional.of(ORC));
            // create empty file
            FileSystem fileSystem = hdfsEnvironment.getFileSystem(context, location);
            assertTrue(fileSystem.createNewFile(new Path(location, "empty-file")));
            assertEquals(listDirectory(context, location), ImmutableList.of("empty-file"));
            // read table with empty file
            MaterializedResult result = readTable(transaction, tableHandle, columnHandles, session, TupleDomain.all(), OptionalInt.of(1), Optional.empty());
            assertEquals(result.getRowCount(), 0);
        }
    } finally {
        dropTable(tableName);
    }
}
Also used : Path(org.apache.hadoop.fs.Path) HiveColumnHandle.bucketColumnHandle(io.prestosql.plugin.hive.HiveColumnHandle.bucketColumnHandle) ColumnHandle(io.prestosql.spi.connector.ColumnHandle) Table(io.prestosql.plugin.hive.metastore.Table) SchemaTableName(io.prestosql.spi.connector.SchemaTableName) HiveIdentity(io.prestosql.plugin.hive.authentication.HiveIdentity) ConnectorTableHandle(io.prestosql.spi.connector.ConnectorTableHandle) ViewColumn(io.prestosql.spi.connector.ConnectorViewDefinition.ViewColumn) Column(io.prestosql.plugin.hive.metastore.Column) SortingColumn(io.prestosql.plugin.hive.metastore.SortingColumn) FileSystem(org.apache.hadoop.fs.FileSystem) ConnectorSession(io.prestosql.spi.connector.ConnectorSession) TestingConnectorSession(io.prestosql.testing.TestingConnectorSession) ConnectorMetadata(io.prestosql.spi.connector.ConnectorMetadata) HdfsContext(io.prestosql.plugin.hive.HdfsEnvironment.HdfsContext) MaterializedResult(io.prestosql.testing.MaterializedResult)

Example 5 with HdfsContext

use of io.prestosql.plugin.hive.HdfsEnvironment.HdfsContext in project hetu-core by openlookeng.

the class AbstractTestHive method doInsert.

private void doInsert(HiveStorageFormat storageFormat, SchemaTableName tableName) throws Exception {
    // creating the table
    doCreateEmptyTable(tableName, storageFormat, CREATE_TABLE_COLUMNS);
    MaterializedResult.Builder resultBuilder = MaterializedResult.resultBuilder(SESSION, CREATE_TABLE_DATA.getTypes());
    for (int i = 0; i < 3; i++) {
        insertData(tableName, CREATE_TABLE_DATA);
        try (Transaction transaction = newTransaction()) {
            ConnectorSession session = newSession();
            ConnectorMetadata metadata = transaction.getMetadata();
            metadata.beginQuery(session);
            // load the new table
            ConnectorTableHandle tableHandle = getTableHandle(metadata, tableName);
            List<ColumnHandle> columnHandles = filterNonHiddenColumnHandles(metadata.getColumnHandles(session, tableHandle).values());
            // verify the metadata
            ConnectorTableMetadata tableMetadata = metadata.getTableMetadata(session, getTableHandle(metadata, tableName));
            assertEquals(filterNonHiddenColumnMetadata(tableMetadata.getColumns()), CREATE_TABLE_COLUMNS);
            // verify the data
            resultBuilder.rows(CREATE_TABLE_DATA.getMaterializedRows());
            MaterializedResult result = readTable(transaction, tableHandle, columnHandles, session, TupleDomain.all(), OptionalInt.empty(), Optional.empty());
            assertEqualsIgnoreOrder(result.getMaterializedRows(), resultBuilder.build().getMaterializedRows());
            // statistics
            HiveBasicStatistics tableStatistics = getBasicStatisticsForTable(session, transaction, tableName);
            assertEquals(tableStatistics.getRowCount().getAsLong(), CREATE_TABLE_DATA.getRowCount() * (i + 1));
            assertEquals(tableStatistics.getFileCount().getAsLong(), i + 1L);
            assertGreaterThan(tableStatistics.getInMemoryDataSizeInBytes().getAsLong(), 0L);
            assertGreaterThan(tableStatistics.getOnDiskDataSizeInBytes().getAsLong(), 0L);
        }
    }
    // test rollback
    Set<String> existingFiles;
    try (Transaction transaction = newTransaction()) {
        existingFiles = listAllDataFiles(transaction, tableName.getSchemaName(), tableName.getTableName());
        assertFalse(existingFiles.isEmpty());
    }
    Path stagingPathRoot;
    try (Transaction transaction = newTransaction()) {
        ConnectorSession session = newSession();
        ConnectorMetadata metadata = transaction.getMetadata();
        ConnectorTableHandle tableHandle = getTableHandle(metadata, tableName);
        // "stage" insert data
        metadata.beginQuery(session);
        ConnectorInsertTableHandle insertTableHandle = metadata.beginInsert(session, tableHandle);
        ConnectorPageSink sink = pageSinkProvider.createPageSink(transaction.getTransactionHandle(), session, insertTableHandle);
        sink.appendPage(CREATE_TABLE_DATA.toPage());
        sink.appendPage(CREATE_TABLE_DATA.toPage());
        Collection<Slice> fragments = getFutureValue(sink.finish());
        metadata.finishInsert(session, insertTableHandle, fragments, ImmutableList.of());
        // statistics, visible from within transaction
        HiveBasicStatistics tableStatistics = getBasicStatisticsForTable(session, transaction, tableName);
        assertEquals(tableStatistics.getRowCount().getAsLong(), CREATE_TABLE_DATA.getRowCount() * 5L);
        try (Transaction otherTransaction = newTransaction()) {
            // statistics, not visible from outside transaction
            HiveBasicStatistics otherTableStatistics = getBasicStatisticsForTable(session, otherTransaction, tableName);
            assertEquals(otherTableStatistics.getRowCount().getAsLong(), CREATE_TABLE_DATA.getRowCount() * 3L);
        }
        // verify all temp files start with the unique prefix
        stagingPathRoot = getStagingPathRoot(insertTableHandle);
        HdfsContext context = new HdfsContext(session, tableName.getSchemaName(), tableName.getTableName());
        Set<String> tempFiles = listAllDataFiles(context, stagingPathRoot);
        assertTrue(!tempFiles.isEmpty());
        for (String filePath : tempFiles) {
            assertThat(new Path(filePath).getName()).startsWith(session.getQueryId());
        }
        // rollback insert
        transaction.rollback();
    }
    // verify temp directory is empty
    HdfsContext context = new HdfsContext(newSession(), tableName.getSchemaName(), tableName.getTableName());
    assertTrue(listAllDataFiles(context, stagingPathRoot).isEmpty());
    // verify the data is unchanged
    try (Transaction transaction = newTransaction()) {
        ConnectorSession session = newSession();
        ConnectorMetadata metadata = transaction.getMetadata();
        metadata.beginQuery(session);
        ConnectorTableHandle tableHandle = getTableHandle(metadata, tableName);
        List<ColumnHandle> columnHandles = filterNonHiddenColumnHandles(metadata.getColumnHandles(session, tableHandle).values());
        MaterializedResult result = readTable(transaction, tableHandle, columnHandles, session, TupleDomain.all(), OptionalInt.empty(), Optional.empty());
        assertEqualsIgnoreOrder(result.getMaterializedRows(), resultBuilder.build().getMaterializedRows());
        // verify we did not modify the table directory
        assertEquals(listAllDataFiles(transaction, tableName.getSchemaName(), tableName.getTableName()), existingFiles);
    }
    // verify statistics unchanged
    try (Transaction transaction = newTransaction()) {
        ConnectorSession session = newSession();
        HiveBasicStatistics statistics = getBasicStatisticsForTable(session, transaction, tableName);
        assertEquals(statistics.getRowCount().getAsLong(), CREATE_TABLE_DATA.getRowCount() * 3L);
        assertEquals(statistics.getFileCount().getAsLong(), 3L);
    }
}
Also used : Path(org.apache.hadoop.fs.Path) HiveColumnHandle.bucketColumnHandle(io.prestosql.plugin.hive.HiveColumnHandle.bucketColumnHandle) ColumnHandle(io.prestosql.spi.connector.ColumnHandle) ConnectorInsertTableHandle(io.prestosql.spi.connector.ConnectorInsertTableHandle) Constraint(io.prestosql.spi.connector.Constraint) ConnectorTableHandle(io.prestosql.spi.connector.ConnectorTableHandle) Slices.utf8Slice(io.airlift.slice.Slices.utf8Slice) Slice(io.airlift.slice.Slice) ConnectorSession(io.prestosql.spi.connector.ConnectorSession) TestingConnectorSession(io.prestosql.testing.TestingConnectorSession) ConnectorMetadata(io.prestosql.spi.connector.ConnectorMetadata) HdfsContext(io.prestosql.plugin.hive.HdfsEnvironment.HdfsContext) MaterializedResult(io.prestosql.testing.MaterializedResult) ConnectorPageSink(io.prestosql.spi.connector.ConnectorPageSink) ConnectorTableMetadata(io.prestosql.spi.connector.ConnectorTableMetadata)

Aggregations

HdfsContext (io.prestosql.plugin.hive.HdfsEnvironment.HdfsContext)70 Path (org.apache.hadoop.fs.Path)48 HiveIdentity (io.prestosql.plugin.hive.authentication.HiveIdentity)40 ConnectorSession (io.prestosql.spi.connector.ConnectorSession)36 PrestoException (io.prestosql.spi.PrestoException)34 SchemaTableName (io.prestosql.spi.connector.SchemaTableName)28 ConnectorMetadata (io.prestosql.spi.connector.ConnectorMetadata)26 ConnectorTableHandle (io.prestosql.spi.connector.ConnectorTableHandle)26 ImmutableList (com.google.common.collect.ImmutableList)24 TestingConnectorSession (io.prestosql.testing.TestingConnectorSession)24 Slice (io.airlift.slice.Slice)22 Table (io.prestosql.plugin.hive.metastore.Table)22 ColumnHandle (io.prestosql.spi.connector.ColumnHandle)22 List (java.util.List)22 FileSystem (org.apache.hadoop.fs.FileSystem)22 ConnectorInsertTableHandle (io.prestosql.spi.connector.ConnectorInsertTableHandle)20 ConnectorPageSink (io.prestosql.spi.connector.ConnectorPageSink)20 ImmutableList.toImmutableList (com.google.common.collect.ImmutableList.toImmutableList)18 TableNotFoundException (io.prestosql.spi.connector.TableNotFoundException)18 IOException (java.io.IOException)18