Search in sources :

Example 1 with ClusteredDataWriter

use of org.apache.iceberg.io.ClusteredDataWriter in project iceberg by apache.

the class WritersBenchmark method writePartitionedClusteredDataWriter.

@Benchmark
@Threads(1)
public void writePartitionedClusteredDataWriter(Blackhole blackhole) throws IOException {
    FileIO io = table().io();
    OutputFileFactory fileFactory = newFileFactory();
    SparkFileWriterFactory writerFactory = SparkFileWriterFactory.builderFor(table()).dataFileFormat(fileFormat()).dataSchema(table().schema()).build();
    ClusteredDataWriter<InternalRow> writer = new ClusteredDataWriter<>(writerFactory, fileFactory, io, fileFormat(), TARGET_FILE_SIZE_IN_BYTES);
    PartitionKey partitionKey = new PartitionKey(partitionedSpec, table().schema());
    StructType dataSparkType = SparkSchemaUtil.convert(table().schema());
    InternalRowWrapper internalRowWrapper = new InternalRowWrapper(dataSparkType);
    try (ClusteredDataWriter<InternalRow> closeableWriter = writer) {
        for (InternalRow row : rows) {
            partitionKey.partition(internalRowWrapper.wrap(row));
            closeableWriter.write(row, partitionedSpec, partitionKey);
        }
    }
    blackhole.consume(writer);
}
Also used : OutputFileFactory(org.apache.iceberg.io.OutputFileFactory) StructType(org.apache.spark.sql.types.StructType) ClusteredDataWriter(org.apache.iceberg.io.ClusteredDataWriter) PartitionKey(org.apache.iceberg.PartitionKey) InternalRow(org.apache.spark.sql.catalyst.InternalRow) FileIO(org.apache.iceberg.io.FileIO) Threads(org.openjdk.jmh.annotations.Threads) Benchmark(org.openjdk.jmh.annotations.Benchmark)

Example 2 with ClusteredDataWriter

use of org.apache.iceberg.io.ClusteredDataWriter in project iceberg by apache.

the class WritersBenchmark method writeUnpartitionedClusteredDataWriter.

@Benchmark
@Threads(1)
public void writeUnpartitionedClusteredDataWriter(Blackhole blackhole) throws IOException {
    FileIO io = table().io();
    OutputFileFactory fileFactory = newFileFactory();
    SparkFileWriterFactory writerFactory = SparkFileWriterFactory.builderFor(table()).dataFileFormat(fileFormat()).dataSchema(table().schema()).build();
    ClusteredDataWriter<InternalRow> writer = new ClusteredDataWriter<>(writerFactory, fileFactory, io, fileFormat(), TARGET_FILE_SIZE_IN_BYTES);
    try (ClusteredDataWriter<InternalRow> closeableWriter = writer) {
        for (InternalRow row : rows) {
            closeableWriter.write(row, unpartitionedSpec, null);
        }
    }
    blackhole.consume(writer);
}
Also used : OutputFileFactory(org.apache.iceberg.io.OutputFileFactory) ClusteredDataWriter(org.apache.iceberg.io.ClusteredDataWriter) InternalRow(org.apache.spark.sql.catalyst.InternalRow) FileIO(org.apache.iceberg.io.FileIO) Threads(org.openjdk.jmh.annotations.Threads) Benchmark(org.openjdk.jmh.annotations.Benchmark)

Aggregations

ClusteredDataWriter (org.apache.iceberg.io.ClusteredDataWriter)2 FileIO (org.apache.iceberg.io.FileIO)2 OutputFileFactory (org.apache.iceberg.io.OutputFileFactory)2 InternalRow (org.apache.spark.sql.catalyst.InternalRow)2 Benchmark (org.openjdk.jmh.annotations.Benchmark)2 Threads (org.openjdk.jmh.annotations.Threads)2 PartitionKey (org.apache.iceberg.PartitionKey)1 StructType (org.apache.spark.sql.types.StructType)1