Search in sources :

Example 61 with Schema

use of org.apache.iceberg.Schema in project hive by apache.

the class HiveIcebergSerDe method hiveSchemaOrThrow.

/**
 * Gets the hive schema and throws an exception if it is not provided. In the later case it adds the
 * previousException as a root cause.
 * @param previousException If we had an exception previously
 * @param autoConversion When <code>true</code>, convert unsupported types to more permissive ones, like tinyint to
 *                       int
 * @return The hive schema parsed from the serDeProperties provided when the SerDe was initialized
 * @throws SerDeException If there is no schema information in the serDeProperties
 */
private Schema hiveSchemaOrThrow(Exception previousException, boolean autoConversion) throws SerDeException {
    List<String> names = Lists.newArrayList();
    names.addAll(getColumnNames());
    names.addAll(getPartitionColumnNames());
    List<TypeInfo> types = Lists.newArrayList();
    types.addAll(getColumnTypes());
    types.addAll(getPartitionColumnTypes());
    List<String> comments = Lists.newArrayList();
    comments.addAll(getColumnComments());
    comments.addAll(getPartitionColumnComments());
    if (!names.isEmpty() && !types.isEmpty()) {
        Schema hiveSchema = HiveSchemaUtil.convert(names, types, comments, autoConversion);
        LOG.info("Using hive schema {}", SchemaParser.toJson(hiveSchema));
        return hiveSchema;
    } else {
        throw new SerDeException("Please provide an existing table or a valid schema", previousException);
    }
}
Also used : Schema(org.apache.iceberg.Schema) FieldSchema(org.apache.hadoop.hive.metastore.api.FieldSchema) TypeInfo(org.apache.hadoop.hive.serde2.typeinfo.TypeInfo) SerDeException(org.apache.hadoop.hive.serde2.SerDeException)

Example 62 with Schema

use of org.apache.iceberg.Schema in project hive by apache.

the class HiveIcebergStorageHandler method overlayTableProperties.

/**
 * Stores the serializable table data in the configuration.
 * Currently the following is handled:
 * <ul>
 *   <li>- Table - in case the table is serializable</li>
 *   <li>- Location</li>
 *   <li>- Schema</li>
 *   <li>- Partition specification</li>
 *   <li>- FileIO for handling table files</li>
 *   <li>- Location provider used for file generation</li>
 *   <li>- Encryption manager for encryption handling</li>
 * </ul>
 * @param configuration The configuration storing the catalog information
 * @param tableDesc The table which we want to store to the configuration
 * @param map The map of the configuration properties which we append with the serialized data
 */
@VisibleForTesting
static void overlayTableProperties(Configuration configuration, TableDesc tableDesc, Map<String, String> map) {
    Properties props = tableDesc.getProperties();
    Table table = IcebergTableUtil.getTable(configuration, props);
    String schemaJson = SchemaParser.toJson(table.schema());
    Maps.fromProperties(props).entrySet().stream().filter(// map overrides tableDesc properties
    entry -> !map.containsKey(entry.getKey())).forEach(entry -> map.put(entry.getKey(), entry.getValue()));
    map.put(InputFormatConfig.TABLE_IDENTIFIER, props.getProperty(Catalogs.NAME));
    map.put(InputFormatConfig.TABLE_LOCATION, table.location());
    map.put(InputFormatConfig.TABLE_SCHEMA, schemaJson);
    props.put(InputFormatConfig.PARTITION_SPEC, PartitionSpecParser.toJson(table.spec()));
    // serialize table object into config
    Table serializableTable = SerializableTable.copyOf(table);
    checkAndSkipIoConfigSerialization(configuration, serializableTable);
    map.put(InputFormatConfig.SERIALIZED_TABLE_PREFIX + tableDesc.getTableName(), SerializationUtil.serializeToBase64(serializableTable));
    // We need to remove this otherwise the job.xml will be invalid as column comments are separated with '\0' and
    // the serialization utils fail to serialize this character
    map.remove("columns.comments");
    // save schema into table props as well to avoid repeatedly hitting the HMS during serde initializations
    // this is an exception to the interface documentation, but it's a safe operation to add this property
    props.put(InputFormatConfig.TABLE_SCHEMA, schemaJson);
}
Also used : ExprNodeGenericFuncDesc(org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc) TableDesc(org.apache.hadoop.hive.ql.plan.TableDesc) HadoopConfigurable(org.apache.iceberg.hadoop.HadoopConfigurable) ListIterator(java.util.ListIterator) URISyntaxException(java.net.URISyntaxException) Catalogs(org.apache.iceberg.mr.Catalogs) LoggerFactory(org.slf4j.LoggerFactory) Date(org.apache.hadoop.hive.common.type.Date) SemanticException(org.apache.hadoop.hive.ql.parse.SemanticException) JobID(org.apache.hadoop.mapred.JobID) AbstractSerDe(org.apache.hadoop.hive.serde2.AbstractSerDe) StatsSetupConst(org.apache.hadoop.hive.common.StatsSetupConst) OutputCommitter(org.apache.hadoop.mapred.OutputCommitter) AlterTableType(org.apache.hadoop.hive.ql.ddl.table.AlterTableType) Throwables(org.apache.iceberg.relocated.com.google.common.base.Throwables) Map(java.util.Map) Configuration(org.apache.hadoop.conf.Configuration) InputFormat(org.apache.hadoop.mapred.InputFormat) URI(java.net.URI) PrimitiveTypeInfo(org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo) HiveStorageHandler(org.apache.hadoop.hive.ql.metadata.HiveStorageHandler) HiveStoragePredicateHandler(org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler) Splitter(org.apache.iceberg.relocated.com.google.common.base.Splitter) OutputFormat(org.apache.hadoop.mapred.OutputFormat) ExprNodeDesc(org.apache.hadoop.hive.ql.plan.ExprNodeDesc) WriteEntity(org.apache.hadoop.hive.ql.hooks.WriteEntity) Collection(java.util.Collection) Partish(org.apache.hadoop.hive.ql.stats.Partish) HiveMetaHook(org.apache.hadoop.hive.metastore.HiveMetaHook) FileSinkDesc(org.apache.hadoop.hive.ql.plan.FileSinkDesc) InputFormatConfig(org.apache.iceberg.mr.InputFormatConfig) Schema(org.apache.iceberg.Schema) Collectors(java.util.stream.Collectors) SessionState(org.apache.hadoop.hive.ql.session.SessionState) PartitionSpecParser(org.apache.iceberg.PartitionSpecParser) Serializable(java.io.Serializable) SchemaParser(org.apache.iceberg.SchemaParser) List(java.util.List) Optional(java.util.Optional) TableProperties(org.apache.iceberg.TableProperties) SessionStateUtil(org.apache.hadoop.hive.ql.session.SessionStateUtil) HiveException(org.apache.hadoop.hive.ql.metadata.HiveException) LockType(org.apache.hadoop.hive.metastore.api.LockType) ConvertAstToSearchArg(org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg) HashMap(java.util.HashMap) ExprNodeDynamicListDesc(org.apache.hadoop.hive.ql.plan.ExprNodeDynamicListDesc) ArrayList(java.util.ArrayList) SearchArgument(org.apache.hadoop.hive.ql.io.sarg.SearchArgument) Utilities(org.apache.hadoop.hive.ql.exec.Utilities) JobStatus(org.apache.hadoop.mapred.JobStatus) PartitionTransformSpec(org.apache.hadoop.hive.ql.parse.PartitionTransformSpec) ExprNodeColumnDesc(org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc) Properties(java.util.Properties) Logger(org.slf4j.Logger) Timestamp(org.apache.hadoop.hive.common.type.Timestamp) ExprNodeConstantDesc(org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc) Table(org.apache.iceberg.Table) HiveConf(org.apache.hadoop.hive.conf.HiveConf) Maps(org.apache.iceberg.relocated.com.google.common.collect.Maps) IOException(java.io.IOException) SerializationUtil(org.apache.iceberg.util.SerializationUtil) JobConf(org.apache.hadoop.mapred.JobConf) SnapshotSummary(org.apache.iceberg.SnapshotSummary) JobContext(org.apache.hadoop.mapred.JobContext) Deserializer(org.apache.hadoop.hive.serde2.Deserializer) Preconditions(org.apache.iceberg.relocated.com.google.common.base.Preconditions) JobContextImpl(org.apache.hadoop.mapred.JobContextImpl) HiveAuthorizationProvider(org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProvider) SerializableTable(org.apache.iceberg.SerializableTable) VisibleForTesting(org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting) Table(org.apache.iceberg.Table) SerializableTable(org.apache.iceberg.SerializableTable) TableProperties(org.apache.iceberg.TableProperties) Properties(java.util.Properties) VisibleForTesting(org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting)

Example 63 with Schema

use of org.apache.iceberg.Schema in project hive by apache.

the class TestHiveCatalog method testTableName.

@Test
public void testTableName() {
    Schema schema = new Schema(required(1, "id", Types.IntegerType.get(), "unique ID"), required(2, "data", Types.StringType.get()));
    PartitionSpec spec = PartitionSpec.builderFor(schema).bucket("data", 16).build();
    TableIdentifier tableIdent = TableIdentifier.of(DB_NAME, "tbl");
    try {
        catalog.buildTable(tableIdent, schema).withPartitionSpec(spec).create();
        Table table = catalog.loadTable(tableIdent);
        Assert.assertEquals("Name must match", "hive.hivedb.tbl", table.name());
        TableIdentifier snapshotsTableIdent = TableIdentifier.of(DB_NAME, "tbl", "snapshots");
        Table snapshotsTable = catalog.loadTable(snapshotsTableIdent);
        Assert.assertEquals("Name must match", "hive.hivedb.tbl.snapshots", snapshotsTable.name());
    } finally {
        catalog.dropTable(tableIdent);
    }
}
Also used : TableIdentifier(org.apache.iceberg.catalog.TableIdentifier) Table(org.apache.iceberg.Table) Schema(org.apache.iceberg.Schema) PartitionSpec(org.apache.iceberg.PartitionSpec) Test(org.junit.Test)

Example 64 with Schema

use of org.apache.iceberg.Schema in project hive by apache.

the class HiveIcebergTestUtils method createEqualityDeleteFile.

/**
 * @param table The table to create the delete file for
 * @param deleteFilePath The path where the delete file should be created, relative to the table location root
 * @param equalityFields List of field names that should play a role in the equality check
 * @param fileFormat The file format that should be used for writing out the delete file
 * @param rowsToDelete The rows that should be deleted. It's enough to fill out the fields that are relevant for the
 *                     equality check, as listed in equalityFields, the rest of the fields are ignored
 * @return The DeleteFile created
 * @throws IOException If there is an error during DeleteFile write
 */
public static DeleteFile createEqualityDeleteFile(Table table, String deleteFilePath, List<String> equalityFields, FileFormat fileFormat, List<Record> rowsToDelete) throws IOException {
    List<Integer> equalityFieldIds = equalityFields.stream().map(id -> table.schema().findField(id).fieldId()).collect(Collectors.toList());
    Schema eqDeleteRowSchema = table.schema().select(equalityFields.toArray(new String[] {}));
    FileAppenderFactory<Record> appenderFactory = new GenericAppenderFactory(table.schema(), table.spec(), ArrayUtil.toIntArray(equalityFieldIds), eqDeleteRowSchema, null);
    EncryptedOutputFile outputFile = table.encryption().encrypt(HadoopOutputFile.fromPath(new org.apache.hadoop.fs.Path(table.location(), deleteFilePath), new Configuration()));
    PartitionKey part = new PartitionKey(table.spec(), eqDeleteRowSchema);
    part.partition(rowsToDelete.get(0));
    EqualityDeleteWriter<Record> eqWriter = appenderFactory.newEqDeleteWriter(outputFile, fileFormat, part);
    try (EqualityDeleteWriter<Record> writer = eqWriter) {
        writer.deleteAll(rowsToDelete);
    }
    return eqWriter.toDeleteFile();
}
Also used : Arrays(java.util.Arrays) HadoopOutputFile(org.apache.iceberg.hadoop.HadoopOutputFile) Types(org.apache.iceberg.types.Types) Text(org.apache.hadoop.io.Text) NestedField.optional(org.apache.iceberg.types.Types.NestedField.optional) DateWritable(org.apache.hadoop.hive.serde2.io.DateWritable) StandardStructObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector) JobID(org.apache.hadoop.mapred.JobID) LongWritable(org.apache.hadoop.io.LongWritable) ByteBuffer(java.nio.ByteBuffer) BigDecimal(java.math.BigDecimal) TimestampUtils(org.apache.hadoop.hive.common.type.TimestampUtils) ArrayUtil(org.apache.iceberg.util.ArrayUtil) ByteBuffers(org.apache.iceberg.util.ByteBuffers) Map(java.util.Map) Configuration(org.apache.hadoop.conf.Configuration) GenericRecord(org.apache.iceberg.data.GenericRecord) PositionDeleteWriter(org.apache.iceberg.deletes.PositionDeleteWriter) LocalTime(java.time.LocalTime) PartitionKey(org.apache.iceberg.PartitionKey) ZoneOffset(java.time.ZoneOffset) Path(java.nio.file.Path) IntWritable(org.apache.hadoop.io.IntWritable) CloseableIterable(org.apache.iceberg.io.CloseableIterable) Timestamp(java.sql.Timestamp) UUID(java.util.UUID) Schema(org.apache.iceberg.Schema) Collectors(java.util.stream.Collectors) List(java.util.List) OffsetDateTime(java.time.OffsetDateTime) BooleanWritable(org.apache.hadoop.io.BooleanWritable) EncryptedOutputFile(org.apache.iceberg.encryption.EncryptedOutputFile) LocalDate(java.time.LocalDate) GenericAppenderFactory(org.apache.iceberg.data.GenericAppenderFactory) PositionDelete(org.apache.iceberg.deletes.PositionDelete) LocalDateTime(java.time.LocalDateTime) IcebergGenerics(org.apache.iceberg.data.IcebergGenerics) DoubleWritable(org.apache.hadoop.io.DoubleWritable) ArrayList(java.util.ArrayList) BytesWritable(org.apache.hadoop.io.BytesWritable) TimestampWritable(org.apache.hadoop.hive.serde2.io.TimestampWritable) Files(java.nio.file.Files) Table(org.apache.iceberg.Table) EqualityDeleteWriter(org.apache.iceberg.deletes.EqualityDeleteWriter) IOException(java.io.IOException) FileFormat(org.apache.iceberg.FileFormat) File(java.io.File) Record(org.apache.iceberg.data.Record) ObjectInspectorFactory(org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory) Paths(java.nio.file.Paths) TimestampTZUtil(org.apache.hadoop.hive.common.type.TimestampTZUtil) PrimitiveObjectInspectorFactory(org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory) HiveDecimal(org.apache.hadoop.hive.common.type.HiveDecimal) FileAppenderFactory(org.apache.iceberg.io.FileAppenderFactory) DeleteFile(org.apache.iceberg.DeleteFile) Comparator(java.util.Comparator) HiveDecimalWritable(org.apache.hadoop.hive.serde2.io.HiveDecimalWritable) Assert(org.junit.Assert) FloatWritable(org.apache.hadoop.io.FloatWritable) Path(java.nio.file.Path) Configuration(org.apache.hadoop.conf.Configuration) EncryptedOutputFile(org.apache.iceberg.encryption.EncryptedOutputFile) Schema(org.apache.iceberg.Schema) GenericAppenderFactory(org.apache.iceberg.data.GenericAppenderFactory) PartitionKey(org.apache.iceberg.PartitionKey) GenericRecord(org.apache.iceberg.data.GenericRecord) Record(org.apache.iceberg.data.Record)

Example 65 with Schema

use of org.apache.iceberg.Schema in project hive by apache.

the class TestHiveSchemaUtil method testConversionWithoutLastComment.

@Test
public void testConversionWithoutLastComment() {
    Schema expected = new Schema(optional(1, "customer_id", Types.LongType.get(), "customer comment"), optional(2, "first_name", Types.StringType.get(), null));
    Schema schema = HiveSchemaUtil.convert(Arrays.asList("customer_id", "first_name"), Arrays.asList(TypeInfoUtils.getTypeInfoFromTypeString(serdeConstants.BIGINT_TYPE_NAME), TypeInfoUtils.getTypeInfoFromTypeString(serdeConstants.STRING_TYPE_NAME)), Arrays.asList("customer comment"));
    Assert.assertEquals(expected.asStruct(), schema.asStruct());
}
Also used : Schema(org.apache.iceberg.Schema) FieldSchema(org.apache.hadoop.hive.metastore.api.FieldSchema) Test(org.junit.Test)

Aggregations

Schema (org.apache.iceberg.Schema)126 Test (org.junit.Test)93 Record (org.apache.iceberg.data.Record)68 Table (org.apache.iceberg.Table)55 PartitionSpec (org.apache.iceberg.PartitionSpec)39 GenericRecord (org.apache.iceberg.data.GenericRecord)36 FieldSchema (org.apache.hadoop.hive.metastore.api.FieldSchema)30 List (java.util.List)21 TableIdentifier (org.apache.iceberg.catalog.TableIdentifier)20 IOException (java.io.IOException)16 Types (org.apache.iceberg.types.Types)16 ArrayList (java.util.ArrayList)15 Map (java.util.Map)14 HashMap (java.util.HashMap)13 FileFormat (org.apache.iceberg.FileFormat)13 UpdateSchema (org.apache.iceberg.UpdateSchema)12 Path (org.apache.hadoop.fs.Path)11 Collectors (java.util.stream.Collectors)10 ImmutableList (org.apache.iceberg.relocated.com.google.common.collect.ImmutableList)10 TestHelper (org.apache.iceberg.mr.TestHelper)9