Search in sources :

Example 6 with InternalRow

use of org.apache.spark.sql.catalyst.InternalRow in project iceberg by apache.

the class WritersBenchmark method writePartitionedLegacyFanoutDataWriter.

@Benchmark
@Threads(1)
public void writePartitionedLegacyFanoutDataWriter(Blackhole blackhole) throws IOException {
    FileIO io = table().io();
    OutputFileFactory fileFactory = newFileFactory();
    Schema writeSchema = table().schema();
    StructType sparkWriteType = SparkSchemaUtil.convert(writeSchema);
    SparkAppenderFactory appenders = SparkAppenderFactory.builderFor(table(), writeSchema, sparkWriteType).spec(partitionedSpec).build();
    TaskWriter<InternalRow> writer = new SparkPartitionedFanoutWriter(partitionedSpec, fileFormat(), appenders, fileFactory, io, TARGET_FILE_SIZE_IN_BYTES, writeSchema, sparkWriteType);
    try (TaskWriter<InternalRow> closableWriter = writer) {
        for (InternalRow row : rows) {
            closableWriter.write(row);
        }
    }
    blackhole.consume(writer.complete());
}
Also used : OutputFileFactory(org.apache.iceberg.io.OutputFileFactory) StructType(org.apache.spark.sql.types.StructType) Schema(org.apache.iceberg.Schema) InternalRow(org.apache.spark.sql.catalyst.InternalRow) FileIO(org.apache.iceberg.io.FileIO) Threads(org.openjdk.jmh.annotations.Threads) Benchmark(org.openjdk.jmh.annotations.Benchmark)

Example 7 with InternalRow

use of org.apache.spark.sql.catalyst.InternalRow in project iceberg by apache.

the class WritersBenchmark method writePartitionedFanoutDataWriter.

@Benchmark
@Threads(1)
public void writePartitionedFanoutDataWriter(Blackhole blackhole) throws IOException {
    FileIO io = table().io();
    OutputFileFactory fileFactory = newFileFactory();
    SparkFileWriterFactory writerFactory = SparkFileWriterFactory.builderFor(table()).dataFileFormat(fileFormat()).dataSchema(table().schema()).build();
    FanoutDataWriter<InternalRow> writer = new FanoutDataWriter<>(writerFactory, fileFactory, io, fileFormat(), TARGET_FILE_SIZE_IN_BYTES);
    PartitionKey partitionKey = new PartitionKey(partitionedSpec, table().schema());
    StructType dataSparkType = SparkSchemaUtil.convert(table().schema());
    InternalRowWrapper internalRowWrapper = new InternalRowWrapper(dataSparkType);
    try (FanoutDataWriter<InternalRow> closeableWriter = writer) {
        for (InternalRow row : rows) {
            partitionKey.partition(internalRowWrapper.wrap(row));
            closeableWriter.write(row, partitionedSpec, partitionKey);
        }
    }
    blackhole.consume(writer);
}
Also used : OutputFileFactory(org.apache.iceberg.io.OutputFileFactory) StructType(org.apache.spark.sql.types.StructType) PartitionKey(org.apache.iceberg.PartitionKey) InternalRow(org.apache.spark.sql.catalyst.InternalRow) FanoutDataWriter(org.apache.iceberg.io.FanoutDataWriter) FileIO(org.apache.iceberg.io.FileIO) Threads(org.openjdk.jmh.annotations.Threads) Benchmark(org.openjdk.jmh.annotations.Benchmark)

Example 8 with InternalRow

use of org.apache.spark.sql.catalyst.InternalRow in project iceberg by apache.

the class RewriteManifestsProcedure method toOutputRows.

private InternalRow[] toOutputRows(RewriteManifests.Result result) {
    int rewrittenManifestsCount = Iterables.size(result.rewrittenManifests());
    int addedManifestsCount = Iterables.size(result.addedManifests());
    InternalRow row = newInternalRow(rewrittenManifestsCount, addedManifestsCount);
    return new InternalRow[] { row };
}
Also used : InternalRow(org.apache.spark.sql.catalyst.InternalRow)

Example 9 with InternalRow

use of org.apache.spark.sql.catalyst.InternalRow in project iceberg by apache.

the class RollbackToSnapshotProcedure method call.

@Override
public InternalRow[] call(InternalRow args) {
    Identifier tableIdent = toIdentifier(args.getString(0), PARAMETERS[0].name());
    long snapshotId = args.getLong(1);
    return modifyIcebergTable(tableIdent, table -> {
        Snapshot previousSnapshot = table.currentSnapshot();
        table.manageSnapshots().rollbackTo(snapshotId).commit();
        InternalRow outputRow = newInternalRow(previousSnapshot.snapshotId(), snapshotId);
        return new InternalRow[] { outputRow };
    });
}
Also used : Snapshot(org.apache.iceberg.Snapshot) Identifier(org.apache.spark.sql.connector.catalog.Identifier) InternalRow(org.apache.spark.sql.catalyst.InternalRow)

Example 10 with InternalRow

use of org.apache.spark.sql.catalyst.InternalRow in project iceberg by apache.

the class AddFilesProcedure method call.

@Override
public InternalRow[] call(InternalRow args) {
    Identifier tableIdent = toIdentifier(args.getString(0), PARAMETERS[0].name());
    CatalogPlugin sessionCat = spark().sessionState().catalogManager().v2SessionCatalog();
    Identifier sourceIdent = toCatalogAndIdentifier(args.getString(1), PARAMETERS[1].name(), sessionCat).identifier();
    Map<String, String> partitionFilter = Maps.newHashMap();
    if (!args.isNullAt(2)) {
        args.getMap(2).foreach(DataTypes.StringType, DataTypes.StringType, (k, v) -> {
            partitionFilter.put(k.toString(), v.toString());
            return BoxedUnit.UNIT;
        });
    }
    boolean checkDuplicateFiles;
    if (args.isNullAt(3)) {
        checkDuplicateFiles = true;
    } else {
        checkDuplicateFiles = args.getBoolean(3);
    }
    long addedFilesCount = importToIceberg(tableIdent, sourceIdent, partitionFilter, checkDuplicateFiles);
    return new InternalRow[] { newInternalRow(addedFilesCount) };
}
Also used : CatalogPlugin(org.apache.spark.sql.connector.catalog.CatalogPlugin) TableIdentifier(org.apache.spark.sql.catalyst.TableIdentifier) Identifier(org.apache.spark.sql.connector.catalog.Identifier) InternalRow(org.apache.spark.sql.catalyst.InternalRow)

Aggregations

InternalRow (org.apache.spark.sql.catalyst.InternalRow)110 GenericInternalRow (org.apache.spark.sql.catalyst.expressions.GenericInternalRow)33 Row (org.apache.spark.sql.Row)30 StructType (org.apache.spark.sql.types.StructType)29 Test (org.junit.Test)28 Schema (org.apache.iceberg.Schema)17 ArrayList (java.util.ArrayList)16 List (java.util.List)16 Test (org.junit.jupiter.api.Test)14 File (java.io.File)13 ParameterizedTest (org.junit.jupiter.params.ParameterizedTest)13 IOException (java.io.IOException)12 HoodieWriteConfig (org.apache.hudi.config.HoodieWriteConfig)12 Types (org.apache.iceberg.types.Types)12 OutputFileFactory (org.apache.iceberg.io.OutputFileFactory)11 GenericRecord (org.apache.avro.generic.GenericRecord)10 HoodieKey (org.apache.hudi.common.model.HoodieKey)10 FileAppender (org.apache.iceberg.io.FileAppender)10 Map (java.util.Map)9 Assert (org.junit.Assert)9