Search in sources :

Example 1 with HoodieInsertException

use of org.apache.hudi.exception.HoodieInsertException in project hudi by apache.

the class TestHoodieRowCreateHandle method testInstantiationFailure.

@ParameterizedTest
@ValueSource(booleans = { true, false })
public void testInstantiationFailure(boolean enableMetadataTable) {
    // init config and table
    HoodieWriteConfig cfg = SparkDatasetTestUtils.getConfigBuilder(basePath, timelineServicePort).withPath("/dummypath/abc/").withMetadataConfig(HoodieMetadataConfig.newBuilder().enable(enableMetadataTable).build()).build();
    try {
        HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient);
        new HoodieRowCreateHandle(table, cfg, " def", UUID.randomUUID().toString(), "001", RANDOM.nextInt(100000), RANDOM.nextLong(), RANDOM.nextLong(), SparkDatasetTestUtils.STRUCT_TYPE);
        fail("Should have thrown exception");
    } catch (HoodieInsertException ioe) {
        // expected without metadata table
        if (enableMetadataTable) {
            fail("Should have thrown TableNotFoundException");
        }
    } catch (TableNotFoundException e) {
        // expected with metadata table
        if (!enableMetadataTable) {
            fail("Should have thrown HoodieInsertException");
        }
    }
}
Also used : TableNotFoundException(org.apache.hudi.exception.TableNotFoundException) HoodieTable(org.apache.hudi.table.HoodieTable) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) HoodieInsertException(org.apache.hudi.exception.HoodieInsertException) ValueSource(org.junit.jupiter.params.provider.ValueSource) ParameterizedTest(org.junit.jupiter.params.ParameterizedTest)

Example 2 with HoodieInsertException

use of org.apache.hudi.exception.HoodieInsertException in project hudi by apache.

the class HoodieCreateHandle method write.

/**
 * Writes all records passed.
 */
public void write() {
    Iterator<String> keyIterator;
    if (hoodieTable.requireSortedRecords()) {
        // Sorting the keys limits the amount of extra memory required for writing sorted records
        keyIterator = recordMap.keySet().stream().sorted().iterator();
    } else {
        keyIterator = recordMap.keySet().stream().iterator();
    }
    try {
        while (keyIterator.hasNext()) {
            final String key = keyIterator.next();
            HoodieRecord<T> record = recordMap.get(key);
            if (useWriterSchema) {
                write(record, record.getData().getInsertValue(tableSchemaWithMetaFields, config.getProps()));
            } else {
                write(record, record.getData().getInsertValue(tableSchema, config.getProps()));
            }
        }
    } catch (IOException io) {
        throw new HoodieInsertException("Failed to insert records for path " + path, io);
    }
}
Also used : HoodieInsertException(org.apache.hudi.exception.HoodieInsertException) IOException(java.io.IOException)

Example 3 with HoodieInsertException

use of org.apache.hudi.exception.HoodieInsertException in project hudi by apache.

the class TestHoodieClientOnCopyOnWriteStorage method testPreCommitValidationWithMultipleInflights.

@Test
public void testPreCommitValidationWithMultipleInflights() throws Exception {
    int numRecords = 200;
    HoodiePreCommitValidatorConfig validatorConfig = HoodiePreCommitValidatorConfig.newBuilder().withPreCommitValidator(SqlQuerySingleResultPreCommitValidator.class.getName()).withPrecommitValidatorSingleResultSqlQueries(COUNT_SQL_QUERY_FOR_VALIDATION + "#" + 500).build();
    HoodieWriteConfig config = getConfigBuilder().withCompactionConfig(HoodieCompactionConfig.newBuilder().withFailedWritesCleaningPolicy(HoodieFailedWritesCleaningPolicy.NEVER).build()).withPreCommitValidatorConfig(validatorConfig).build();
    String instant1 = HoodieActiveTimeline.createNewInstantTime();
    try {
        insertWithConfig(config, numRecords, instant1);
        fail("Expected validation to fail because we only insert 200 rows. Validation is configured to expect 500 rows");
    } catch (HoodieInsertException e) {
        if (e.getCause() instanceof HoodieValidationException) {
        // expected because wrong value passed
        } else {
            throw e;
        }
    }
    assertFalse(testTable.commitExists(instant1));
    assertTrue(testTable.inflightCommitExists(instant1));
    numRecords = 300;
    validatorConfig = HoodiePreCommitValidatorConfig.newBuilder().withPreCommitValidator(SqlQuerySingleResultPreCommitValidator.class.getName()).withPrecommitValidatorSingleResultSqlQueries(COUNT_SQL_QUERY_FOR_VALIDATION + "#" + numRecords).build();
    config = getConfigBuilder().withCompactionConfig(HoodieCompactionConfig.newBuilder().withFailedWritesCleaningPolicy(HoodieFailedWritesCleaningPolicy.NEVER).build()).withPreCommitValidatorConfig(validatorConfig).build();
    String instant2 = HoodieActiveTimeline.createNewInstantTime();
    // expect pre-commit validators to succeed. Note that validator is expected to exclude inflight instant1
    insertWithConfig(config, numRecords, instant2);
    assertTrue(testTable.inflightCommitExists(instant1));
    assertTrue(testTable.commitExists(instant2));
}
Also used : HoodieValidationException(org.apache.hudi.exception.HoodieValidationException) SqlQuerySingleResultPreCommitValidator(org.apache.hudi.client.validator.SqlQuerySingleResultPreCommitValidator) HoodiePreCommitValidatorConfig(org.apache.hudi.config.HoodiePreCommitValidatorConfig) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) HoodieInsertException(org.apache.hudi.exception.HoodieInsertException) ParameterizedTest(org.junit.jupiter.params.ParameterizedTest) Test(org.junit.jupiter.api.Test)

Example 4 with HoodieInsertException

use of org.apache.hudi.exception.HoodieInsertException in project hudi by apache.

the class TestTableSchemaEvolution method testCopyOnWriteTable.

@Test
public void testCopyOnWriteTable() throws Exception {
    // Create the table
    HoodieTableMetaClient.withPropertyBuilder().fromMetaClient(metaClient).setTimelineLayoutVersion(VERSION_1).initTable(metaClient.getHadoopConf(), metaClient.getBasePath());
    HoodieWriteConfig hoodieWriteConfig = getWriteConfigBuilder(TRIP_EXAMPLE_SCHEMA).withRollbackUsingMarkers(false).build();
    SparkRDDWriteClient client = getHoodieWriteClient(hoodieWriteConfig);
    // Initial inserts with TRIP_EXAMPLE_SCHEMA
    int numRecords = 10;
    insertFirstBatch(hoodieWriteConfig, client, "001", initCommitTime, numRecords, SparkRDDWriteClient::insert, false, true, numRecords);
    checkReadRecords("000", numRecords);
    // Updates with same schema is allowed
    final int numUpdateRecords = 5;
    updateBatch(hoodieWriteConfig, client, "002", "001", Option.empty(), initCommitTime, numUpdateRecords, SparkRDDWriteClient::upsert, false, true, numUpdateRecords, numRecords, 2);
    checkReadRecords("000", numRecords);
    // Delete with same schema is allowed
    final int numDeleteRecords = 2;
    numRecords -= numDeleteRecords;
    deleteBatch(hoodieWriteConfig, client, "003", "002", initCommitTime, numDeleteRecords, SparkRDDWriteClient::delete, false, true, 0, numRecords);
    checkReadRecords("000", numRecords);
    // Insert with devolved schema is not allowed
    HoodieWriteConfig hoodieDevolvedWriteConfig = getWriteConfig(TRIP_EXAMPLE_SCHEMA_DEVOLVED);
    client = getHoodieWriteClient(hoodieDevolvedWriteConfig);
    final List<HoodieRecord> failedRecords = generateInsertsWithSchema("004", numRecords, TRIP_EXAMPLE_SCHEMA_DEVOLVED);
    try {
        // We cannot use insertBatch directly here because we want to insert records
        // with a devolved schema.
        writeBatch(client, "004", "003", Option.empty(), "003", numRecords, (String s, Integer a) -> failedRecords, SparkRDDWriteClient::insert, true, numRecords, numRecords, 1, false);
        fail("Insert with devolved scheme should fail");
    } catch (HoodieInsertException ex) {
        // no new commit
        HoodieTimeline curTimeline = metaClient.reloadActiveTimeline().getCommitTimeline().filterCompletedInstants();
        assertTrue(curTimeline.lastInstant().get().getTimestamp().equals("003"));
        client.rollback("004");
    }
    // Update with devolved schema is not allowed
    try {
        updateBatch(hoodieDevolvedWriteConfig, client, "004", "003", Option.empty(), initCommitTime, numUpdateRecords, SparkRDDWriteClient::upsert, false, true, numUpdateRecords, 2 * numRecords, 5);
        fail("Update with devolved scheme should fail");
    } catch (HoodieUpsertException ex) {
        // no new commit
        HoodieTimeline curTimeline = metaClient.reloadActiveTimeline().getCommitTimeline().filterCompletedInstants();
        assertTrue(curTimeline.lastInstant().get().getTimestamp().equals("003"));
        client.rollback("004");
    }
    // Insert with evolved scheme is allowed
    HoodieWriteConfig hoodieEvolvedWriteConfig = getWriteConfig(TRIP_EXAMPLE_SCHEMA_EVOLVED);
    client = getHoodieWriteClient(hoodieEvolvedWriteConfig);
    final List<HoodieRecord> evolvedRecords = generateInsertsWithSchema("004", numRecords, TRIP_EXAMPLE_SCHEMA_EVOLVED);
    // We cannot use insertBatch directly here because we want to insert records
    // with a evolved schema.
    writeBatch(client, "004", "003", Option.empty(), initCommitTime, numRecords, (String s, Integer a) -> evolvedRecords, SparkRDDWriteClient::insert, true, numRecords, 2 * numRecords, 4, false);
    // new commit
    HoodieTimeline curTimeline = metaClient.reloadActiveTimeline().getCommitTimeline().filterCompletedInstants();
    assertTrue(curTimeline.lastInstant().get().getTimestamp().equals("004"));
    checkReadRecords("000", 2 * numRecords);
    // Updates with evolved schema is allowed
    final List<HoodieRecord> updateRecords = generateUpdatesWithSchema("005", numUpdateRecords, TRIP_EXAMPLE_SCHEMA_EVOLVED);
    writeBatch(client, "005", "004", Option.empty(), initCommitTime, numUpdateRecords, (String s, Integer a) -> updateRecords, SparkRDDWriteClient::upsert, true, numUpdateRecords, 2 * numRecords, 5, false);
    checkReadRecords("000", 2 * numRecords);
    // Now even the original schema cannot be used for updates as it is devolved
    // in relation to the current schema of the dataset.
    client = getHoodieWriteClient(hoodieWriteConfig);
    try {
        updateBatch(hoodieWriteConfig, client, "006", "005", Option.empty(), initCommitTime, numUpdateRecords, SparkRDDWriteClient::upsert, false, true, numUpdateRecords, numRecords, 2);
        fail("Update with original scheme should fail");
    } catch (HoodieUpsertException ex) {
        // no new commit
        curTimeline = metaClient.reloadActiveTimeline().getCommitTimeline().filterCompletedInstants();
        assertTrue(curTimeline.lastInstant().get().getTimestamp().equals("005"));
        client.rollback("006");
    }
    // in relation to the current schema of the dataset.
    try {
        // We are not using insertBatch directly here because insertion of these
        // records will fail and we dont want to keep these records within
        // HoodieTestDataGenerator.
        failedRecords.clear();
        failedRecords.addAll(dataGen.generateInserts("006", numRecords));
        writeBatch(client, "006", "005", Option.empty(), initCommitTime, numRecords, (String s, Integer a) -> failedRecords, SparkRDDWriteClient::insert, true, numRecords, numRecords, 1, false);
        fail("Insert with original scheme should fail");
    } catch (HoodieInsertException ex) {
        // no new commit
        curTimeline = metaClient.reloadActiveTimeline().getCommitTimeline().filterCompletedInstants();
        assertTrue(curTimeline.lastInstant().get().getTimestamp().equals("005"));
        client.rollback("006");
        // or deletes for records which do not even exist.
        for (HoodieRecord record : failedRecords) {
            assertTrue(dataGen.deleteExistingKeyIfPresent(record.getKey()));
        }
    }
    // Revert to the older commit and ensure that the original schema can now
    // be used for inserts and inserts.
    client.restoreToInstant("003");
    curTimeline = metaClient.reloadActiveTimeline().getCommitTimeline().filterCompletedInstants();
    assertTrue(curTimeline.lastInstant().get().getTimestamp().equals("003"));
    checkReadRecords("000", numRecords);
    // Insert with original schema is allowed now
    insertBatch(hoodieWriteConfig, client, "007", "003", numRecords, SparkRDDWriteClient::insert, false, true, numRecords, 2 * numRecords, 1, Option.empty());
    checkReadRecords("000", 2 * numRecords);
    // Update with original schema is allowed now
    updateBatch(hoodieWriteConfig, client, "008", "007", Option.empty(), initCommitTime, numUpdateRecords, SparkRDDWriteClient::upsert, false, true, numUpdateRecords, 2 * numRecords, 5);
    checkReadRecords("000", 2 * numRecords);
}
Also used : HoodieUpsertException(org.apache.hudi.exception.HoodieUpsertException) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) HoodieTimeline(org.apache.hudi.common.table.timeline.HoodieTimeline) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) HoodieInsertException(org.apache.hudi.exception.HoodieInsertException) Test(org.junit.jupiter.api.Test)

Example 5 with HoodieInsertException

use of org.apache.hudi.exception.HoodieInsertException in project hudi by apache.

the class TestTableSchemaEvolution method testMORTable.

@Test
public void testMORTable() throws Exception {
    tableType = HoodieTableType.MERGE_ON_READ;
    // Create the table
    HoodieTableMetaClient.withPropertyBuilder().fromMetaClient(metaClient).setTableType(HoodieTableType.MERGE_ON_READ).setTimelineLayoutVersion(VERSION_1).initTable(metaClient.getHadoopConf(), metaClient.getBasePath());
    HoodieWriteConfig hoodieWriteConfig = getWriteConfig(TRIP_EXAMPLE_SCHEMA);
    SparkRDDWriteClient client = getHoodieWriteClient(hoodieWriteConfig);
    // Initial inserts with TRIP_EXAMPLE_SCHEMA
    int numRecords = 10;
    insertFirstBatch(hoodieWriteConfig, client, "001", initCommitTime, numRecords, SparkRDDWriteClient::insert, false, false, numRecords);
    checkLatestDeltaCommit("001");
    // Compact once so we can incrementally read later
    assertTrue(client.scheduleCompactionAtInstant("002", Option.empty()));
    client.compact("002");
    // Updates with same schema is allowed
    final int numUpdateRecords = 5;
    updateBatch(hoodieWriteConfig, client, "003", "002", Option.empty(), initCommitTime, numUpdateRecords, SparkRDDWriteClient::upsert, false, false, 0, 0, 0);
    checkLatestDeltaCommit("003");
    checkReadRecords("000", numRecords);
    // Delete with same schema is allowed
    final int numDeleteRecords = 2;
    numRecords -= numDeleteRecords;
    deleteBatch(hoodieWriteConfig, client, "004", "003", initCommitTime, numDeleteRecords, SparkRDDWriteClient::delete, false, false, 0, 0);
    checkLatestDeltaCommit("004");
    checkReadRecords("000", numRecords);
    // Insert with evolved schema is not allowed
    HoodieWriteConfig hoodieDevolvedWriteConfig = getWriteConfig(TRIP_EXAMPLE_SCHEMA_DEVOLVED);
    client = getHoodieWriteClient(hoodieDevolvedWriteConfig);
    final List<HoodieRecord> failedRecords = generateInsertsWithSchema("004", numRecords, TRIP_EXAMPLE_SCHEMA_DEVOLVED);
    try {
        // We cannot use insertBatch directly here because we want to insert records
        // with a devolved schema and insertBatch inserts records using the TRIP_EXAMPLE_SCHEMA.
        writeBatch(client, "005", "004", Option.empty(), "003", numRecords, (String s, Integer a) -> failedRecords, SparkRDDWriteClient::insert, false, 0, 0, 0, false);
        fail("Insert with devolved scheme should fail");
    } catch (HoodieInsertException ex) {
        // no new commit
        checkLatestDeltaCommit("004");
        checkReadRecords("000", numRecords);
        client.rollback("005");
    }
    // Update with devolved schema is also not allowed
    try {
        updateBatch(hoodieDevolvedWriteConfig, client, "005", "004", Option.empty(), initCommitTime, numUpdateRecords, SparkRDDWriteClient::upsert, false, false, 0, 0, 0);
        fail("Update with devolved scheme should fail");
    } catch (HoodieUpsertException ex) {
        // no new commit
        checkLatestDeltaCommit("004");
        checkReadRecords("000", numRecords);
        client.rollback("005");
    }
    // Insert with an evolved scheme is allowed
    HoodieWriteConfig hoodieEvolvedWriteConfig = getWriteConfig(TRIP_EXAMPLE_SCHEMA_EVOLVED);
    client = getHoodieWriteClient(hoodieEvolvedWriteConfig);
    // We cannot use insertBatch directly here because we want to insert records
    // with an evolved schema and insertBatch inserts records using the TRIP_EXAMPLE_SCHEMA.
    final List<HoodieRecord> evolvedRecords = generateInsertsWithSchema("005", numRecords, TRIP_EXAMPLE_SCHEMA_EVOLVED);
    writeBatch(client, "005", "004", Option.empty(), initCommitTime, numRecords, (String s, Integer a) -> evolvedRecords, SparkRDDWriteClient::insert, false, 0, 0, 0, false);
    // new commit
    checkLatestDeltaCommit("005");
    checkReadRecords("000", 2 * numRecords);
    // Updates with evolved schema is allowed
    final List<HoodieRecord> updateRecords = generateUpdatesWithSchema("006", numUpdateRecords, TRIP_EXAMPLE_SCHEMA_EVOLVED);
    writeBatch(client, "006", "005", Option.empty(), initCommitTime, numUpdateRecords, (String s, Integer a) -> updateRecords, SparkRDDWriteClient::upsert, false, 0, 0, 0, false);
    // new commit
    checkLatestDeltaCommit("006");
    checkReadRecords("000", 2 * numRecords);
    // Now even the original schema cannot be used for updates as it is devolved in relation to the
    // current schema of the dataset.
    client = getHoodieWriteClient(hoodieWriteConfig);
    try {
        updateBatch(hoodieWriteConfig, client, "007", "006", Option.empty(), initCommitTime, numUpdateRecords, SparkRDDWriteClient::upsert, false, false, 0, 0, 0);
        fail("Update with original scheme should fail");
    } catch (HoodieUpsertException ex) {
        // no new commit
        checkLatestDeltaCommit("006");
        checkReadRecords("000", 2 * numRecords);
        client.rollback("007");
    }
    // current schema of the dataset.
    try {
        // We are not using insertBatch directly here because insertion of these
        // records will fail and we dont want to keep these records within HoodieTestDataGenerator as we
        // will be testing updates later.
        failedRecords.clear();
        failedRecords.addAll(dataGen.generateInserts("007", numRecords));
        writeBatch(client, "007", "006", Option.empty(), initCommitTime, numRecords, (String s, Integer a) -> failedRecords, SparkRDDWriteClient::insert, true, numRecords, numRecords, 1, false);
        fail("Insert with original scheme should fail");
    } catch (HoodieInsertException ex) {
        // no new commit
        checkLatestDeltaCommit("006");
        checkReadRecords("000", 2 * numRecords);
        client.rollback("007");
        // or deletes for records which do not even exist.
        for (HoodieRecord record : failedRecords) {
            assertTrue(dataGen.deleteExistingKeyIfPresent(record.getKey()));
        }
    }
    // Rollback to the original schema
    client.restoreToInstant("004");
    checkLatestDeltaCommit("004");
    // Updates with original schema are now allowed
    client = getHoodieWriteClient(hoodieWriteConfig);
    updateBatch(hoodieWriteConfig, client, "008", "004", Option.empty(), initCommitTime, numUpdateRecords, SparkRDDWriteClient::upsert, false, false, 0, 0, 0);
    // new commit
    checkLatestDeltaCommit("008");
    checkReadRecords("000", 2 * numRecords);
    // Insert with original schema is allowed now
    insertBatch(hoodieWriteConfig, client, "009", "008", numRecords, SparkRDDWriteClient::insert, false, false, 0, 0, 0, Option.empty());
    checkLatestDeltaCommit("009");
    checkReadRecords("000", 3 * numRecords);
}
Also used : HoodieUpsertException(org.apache.hudi.exception.HoodieUpsertException) HoodieRecord(org.apache.hudi.common.model.HoodieRecord) HoodieWriteConfig(org.apache.hudi.config.HoodieWriteConfig) HoodieInsertException(org.apache.hudi.exception.HoodieInsertException) Test(org.junit.jupiter.api.Test)

Aggregations

HoodieInsertException (org.apache.hudi.exception.HoodieInsertException)8 IOException (java.io.IOException)4 HoodieWriteConfig (org.apache.hudi.config.HoodieWriteConfig)4 HoodieUpsertException (org.apache.hudi.exception.HoodieUpsertException)4 Test (org.junit.jupiter.api.Test)3 SqlQuerySingleResultPreCommitValidator (org.apache.hudi.client.validator.SqlQuerySingleResultPreCommitValidator)2 HoodieRecord (org.apache.hudi.common.model.HoodieRecord)2 HoodieIOException (org.apache.hudi.exception.HoodieIOException)2 ParameterizedTest (org.junit.jupiter.params.ParameterizedTest)2 ArrayList (java.util.ArrayList)1 Arrays (java.util.Arrays)1 Collection (java.util.Collection)1 Collections (java.util.Collections)1 HashMap (java.util.HashMap)1 HashSet (java.util.HashSet)1 List (java.util.List)1 Map (java.util.Map)1 Properties (java.util.Properties)1 Set (java.util.Set)1 UUID (java.util.UUID)1