Search in sources :

Example 36 with Record

use of org.apache.iceberg.data.Record in project hive by apache.

the class TestHiveIcebergSchemaEvolution method testMoveLastNameBeforeCustomerIdInIcebergTable.

@Test
public void testMoveLastNameBeforeCustomerIdInIcebergTable() throws IOException {
    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
    // Move the last_name column before the customer_id in the table schema.
    icebergTable.updateSchema().moveBefore("last_name", "customer_id").commit();
    Schema customerSchemaLastNameFirst = new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"), optional(2, "customer_id", Types.LongType.get()), optional(3, "first_name", Types.StringType.get(), "This is first name"));
    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice").add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
    // Run a 'select *' to check if the order of the column in the result has been changed.
    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
    HiveIcebergTestUtils.validateData(customersWithLastNameFirst, HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
    // Query the data with names and check if the result is the same as when the table was created.
    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
    rows = shell.executeStatement("SELECT * FROM default.customers");
    HiveIcebergTestUtils.validateData(customersWithLastNameFirst, HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
}
Also used : TestHelper(org.apache.iceberg.mr.TestHelper) Table(org.apache.iceberg.Table) Schema(org.apache.iceberg.Schema) FieldSchema(org.apache.hadoop.hive.metastore.api.FieldSchema) Record(org.apache.iceberg.data.Record) Test(org.junit.Test)

Example 37 with Record

use of org.apache.iceberg.data.Record in project hive by apache.

the class TestHiveIcebergSchemaEvolution method testRemoveColumnFromIcebergTable.

@Test
public void testRemoveColumnFromIcebergTable() throws IOException {
    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
    // Remove the first_name column from the table.
    icebergTable.updateSchema().deleteColumn("first_name").commit();
    Schema customerSchemaWithoutFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()), optional(2, "last_name", Types.StringType.get(), "This is last name"));
    TestHelper.RecordsBuilder customersWithoutFirstNameBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithoutFirstName).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
    List<Record> customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
    // Run a 'select *' from Hive to see if the result doesn't contain the first_name column any more.
    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
    HiveIcebergTestUtils.validateData(customersWithoutFirstName, HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
    // Run a 'select first_name' and check if an exception is thrown.
    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class, "Invalid table alias or column reference 'first_name'", () -> {
        shell.executeStatement("SELECT first_name FROM default.customers");
    });
    // Insert an entry from Hive to check if it can be inserted without the first_name column.
    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta')");
    rows = shell.executeStatement("SELECT * FROM default.customers");
    customersWithoutFirstNameBuilder.add(4L, "Magenta");
    customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
    HiveIcebergTestUtils.validateData(customersWithoutFirstName, HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
}
Also used : TestHelper(org.apache.iceberg.mr.TestHelper) Table(org.apache.iceberg.Table) Schema(org.apache.iceberg.Schema) FieldSchema(org.apache.hadoop.hive.metastore.api.FieldSchema) Record(org.apache.iceberg.data.Record) Test(org.junit.Test)

Example 38 with Record

use of org.apache.iceberg.data.Record in project hive by apache.

the class TestHiveIcebergSerDe method testDeserialize.

@Test
public void testDeserialize() {
    HiveIcebergSerDe serDe = new HiveIcebergSerDe();
    Record record = RandomGenericData.generate(schema, 1, 0).get(0);
    Container<Record> container = new Container<>();
    container.set(record);
    Assert.assertEquals(record, serDe.deserialize(container));
}
Also used : Container(org.apache.iceberg.mr.mapred.Container) Record(org.apache.iceberg.data.Record) Test(org.junit.Test)

Example 39 with Record

use of org.apache.iceberg.data.Record in project hive by apache.

the class TestIcebergRecordObjectInspector method testIcebergRecordObjectInspector.

@Test
public void testIcebergRecordObjectInspector() {
    Schema schema = new Schema(required(1, "integer_field", Types.IntegerType.get()), required(2, "struct_field", Types.StructType.of(Types.NestedField.required(3, "string_field", Types.StringType.get()))));
    Record record = RandomGenericData.generate(schema, 1, 0L).get(0);
    Record innerRecord = record.get(1, Record.class);
    StructObjectInspector soi = (StructObjectInspector) IcebergObjectInspector.create(schema);
    Assert.assertEquals(ImmutableList.of(record.get(0), record.get(1)), soi.getStructFieldsDataAsList(record));
    StructField integerField = soi.getStructFieldRef("integer_field");
    Assert.assertEquals(record.get(0), soi.getStructFieldData(record, integerField));
    StructField structField = soi.getStructFieldRef("struct_field");
    Object innerData = soi.getStructFieldData(record, structField);
    Assert.assertEquals(innerRecord, innerData);
    StructObjectInspector innerSoi = (StructObjectInspector) structField.getFieldObjectInspector();
    StructField stringField = innerSoi.getStructFieldRef("string_field");
    Assert.assertEquals(ImmutableList.of(innerRecord.get(0)), innerSoi.getStructFieldsDataAsList(innerRecord));
    Assert.assertEquals(innerRecord.get(0), innerSoi.getStructFieldData(innerData, stringField));
}
Also used : StructField(org.apache.hadoop.hive.serde2.objectinspector.StructField) Schema(org.apache.iceberg.Schema) Record(org.apache.iceberg.data.Record) StructObjectInspector(org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector) Test(org.junit.Test)

Example 40 with Record

use of org.apache.iceberg.data.Record in project hive by apache.

the class HiveIcebergRecordWriter method write.

@Override
public void write(Writable row) throws IOException {
    Record record = ((Container<Record>) row).get();
    super.write(record, spec, partition(record));
}
Also used : Container(org.apache.iceberg.mr.mapred.Container) Record(org.apache.iceberg.data.Record)

Aggregations

Record (org.apache.iceberg.data.Record)114 Test (org.junit.Test)99 Schema (org.apache.iceberg.Schema)68 Table (org.apache.iceberg.Table)51 GenericRecord (org.apache.iceberg.data.GenericRecord)51 PartitionSpec (org.apache.iceberg.PartitionSpec)19 ArrayList (java.util.ArrayList)14 List (java.util.List)13 FieldSchema (org.apache.hadoop.hive.metastore.api.FieldSchema)12 HashMap (java.util.HashMap)11 IcebergBaseTest (org.apache.drill.metastore.iceberg.IcebergBaseTest)11 TestHelper (org.apache.iceberg.mr.TestHelper)11 ImmutableList (org.apache.iceberg.relocated.com.google.common.collect.ImmutableList)10 Types (org.apache.iceberg.types.Types)10 Map (java.util.Map)9 IOException (java.io.IOException)8 ImmutableMap (org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap)8 FileFormat (org.apache.iceberg.FileFormat)7 DeleteFile (org.apache.iceberg.DeleteFile)6 NestedField.optional (org.apache.iceberg.types.Types.NestedField.optional)6