use of org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter in project hudi by apache.
the class HoodieSparkBootstrapSchemaProvider method getBootstrapSourceSchemaParquet.
private static Schema getBootstrapSourceSchemaParquet(HoodieWriteConfig writeConfig, HoodieEngineContext context, Path filePath) {
MessageType parquetSchema = new ParquetUtils().readSchema(context.getHadoopConf().get(), filePath);
ParquetToSparkSchemaConverter converter = new ParquetToSparkSchemaConverter(Boolean.parseBoolean(SQLConf.PARQUET_BINARY_AS_STRING().defaultValueString()), Boolean.parseBoolean(SQLConf.PARQUET_INT96_AS_TIMESTAMP().defaultValueString()));
StructType sparkSchema = converter.convert(parquetSchema);
String tableName = HoodieAvroUtils.sanitizeName(writeConfig.getTableName());
String structName = tableName + "_record";
String recordNamespace = "hoodie." + tableName;
return AvroConversionUtils.convertStructTypeToAvroSchema(sparkSchema, structName, recordNamespace);
}
use of org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter in project Gaffer by gchq.
the class SchemaUtils method buildSparkSchema.
public StructType buildSparkSchema(final String group) {
final StructType sType = new ParquetToSparkSchemaConverter(false, false).convert(getParquetSchema(group));
groupToSparkSchema.put(group, sType);
return sType;
}
Aggregations