Search in sources :

Example 11 with LogicalType

use of org.apache.flink.table.types.logical.LogicalType in project flink by apache.

the class OrcColumnarRowInputFormat method createPartitionedFormat.

/**
 * Create a partitioned {@link OrcColumnarRowInputFormat}, the partition columns can be
 * generated by split.
 */
public static <SplitT extends FileSourceSplit> OrcColumnarRowInputFormat<VectorizedRowBatch, SplitT> createPartitionedFormat(OrcShim<VectorizedRowBatch> shim, Configuration hadoopConfig, RowType tableType, List<String> partitionKeys, PartitionFieldExtractor<SplitT> extractor, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, Function<RowType, TypeInformation<RowData>> rowTypeInfoFactory) {
    // TODO FLINK-25113 all this partition keys code should be pruned from the orc format,
    // because now FileSystemTableSource uses FileInfoExtractorBulkFormat for reading partition
    // keys.
    String[] tableFieldNames = tableType.getFieldNames().toArray(new String[0]);
    LogicalType[] tableFieldTypes = tableType.getChildren().toArray(new LogicalType[0]);
    List<String> orcFieldNames = getNonPartNames(tableFieldNames, partitionKeys);
    int[] orcSelectedFields = getSelectedOrcFields(tableFieldNames, selectedFields, orcFieldNames);
    ColumnBatchFactory<VectorizedRowBatch, SplitT> batchGenerator = (SplitT split, VectorizedRowBatch rowBatch) -> {
        // create and initialize the row batch
        ColumnVector[] vectors = new ColumnVector[selectedFields.length];
        for (int i = 0; i < vectors.length; i++) {
            String name = tableFieldNames[selectedFields[i]];
            LogicalType type = tableFieldTypes[selectedFields[i]];
            vectors[i] = partitionKeys.contains(name) ? createFlinkVectorFromConstant(type, extractor.extract(split, name, type), batchSize) : createFlinkVector(rowBatch.cols[orcFieldNames.indexOf(name)], type);
        }
        return new VectorizedColumnBatch(vectors);
    };
    return new OrcColumnarRowInputFormat<>(shim, hadoopConfig, convertToOrcTypeWithPart(tableFieldNames, tableFieldTypes, partitionKeys), orcSelectedFields, conjunctPredicates, batchSize, batchGenerator, rowTypeInfoFactory.apply(new RowType(Arrays.stream(selectedFields).mapToObj(i -> tableType.getFields().get(i)).collect(Collectors.toList()))));
}
Also used : VectorizedRowBatch(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch) VectorizedColumnBatch(org.apache.flink.table.data.columnar.vector.VectorizedColumnBatch) LogicalType(org.apache.flink.table.types.logical.LogicalType) RowType(org.apache.flink.table.types.logical.RowType)

Example 12 with LogicalType

use of org.apache.flink.table.types.logical.LogicalType in project flink by apache.

the class HiveTableSink method createBulkWriterFactory.

private Optional<BulkWriter.Factory<RowData>> createBulkWriterFactory(String[] partitionColumns, StorageDescriptor sd) {
    String serLib = sd.getSerdeInfo().getSerializationLib().toLowerCase();
    int formatFieldCount = tableSchema.getFieldCount() - partitionColumns.length;
    String[] formatNames = new String[formatFieldCount];
    LogicalType[] formatTypes = new LogicalType[formatFieldCount];
    for (int i = 0; i < formatFieldCount; i++) {
        formatNames[i] = tableSchema.getFieldName(i).get();
        formatTypes[i] = tableSchema.getFieldDataType(i).get().getLogicalType();
    }
    RowType formatType = RowType.of(formatTypes, formatNames);
    if (serLib.contains("parquet")) {
        Configuration formatConf = new Configuration(jobConf);
        sd.getSerdeInfo().getParameters().forEach(formatConf::set);
        return Optional.of(ParquetRowDataBuilder.createWriterFactory(formatType, formatConf, hiveVersion.startsWith("3.")));
    } else if (serLib.contains("orc")) {
        Configuration formatConf = new ThreadLocalClassLoaderConfiguration(jobConf);
        sd.getSerdeInfo().getParameters().forEach(formatConf::set);
        TypeDescription typeDescription = OrcSplitReaderUtil.logicalTypeToOrcType(formatType);
        return Optional.of(hiveShim.createOrcBulkWriterFactory(formatConf, typeDescription.toString(), formatTypes));
    } else {
        return Optional.empty();
    }
}
Also used : Configuration(org.apache.hadoop.conf.Configuration) ThreadLocalClassLoaderConfiguration(org.apache.flink.orc.writer.ThreadLocalClassLoaderConfiguration) ThreadLocalClassLoaderConfiguration(org.apache.flink.orc.writer.ThreadLocalClassLoaderConfiguration) LogicalType(org.apache.flink.table.types.logical.LogicalType) RowType(org.apache.flink.table.types.logical.RowType) TypeDescription(org.apache.orc.TypeDescription)

Example 13 with LogicalType

use of org.apache.flink.table.types.logical.LogicalType in project flink by apache.

the class ParquetColumnarRowSplitReaderTest method testProject.

@Test
public void testProject() throws IOException {
    // prepare parquet file
    int number = 1000;
    List<Row> records = new ArrayList<>(number);
    for (int i = 0; i < number; i++) {
        Integer v = i;
        records.add(newRow(v));
    }
    Path testPath = createTempParquetFile(TEMPORARY_FOLDER.newFolder(), PARQUET_SCHEMA, records, rowGroupSize);
    // test reader
    LogicalType[] fieldTypes = new LogicalType[] { new DoubleType(), new TinyIntType(), new IntType() };
    ParquetColumnarRowSplitReader reader = new ParquetColumnarRowSplitReader(false, true, new Configuration(), fieldTypes, new String[] { "f7", "f2", "f4" }, VectorizedColumnBatch::new, 500, new org.apache.hadoop.fs.Path(testPath.getPath()), 0, Long.MAX_VALUE);
    int i = 0;
    while (!reader.reachedEnd()) {
        ColumnarRowData row = reader.nextRecord();
        assertEquals(i, row.getDouble(0), 0);
        assertEquals((byte) i, row.getByte(1));
        assertEquals(i, row.getInt(2));
        i++;
    }
    reader.close();
}
Also used : Path(org.apache.flink.core.fs.Path) Configuration(org.apache.hadoop.conf.Configuration) ArrayList(java.util.ArrayList) LogicalType(org.apache.flink.table.types.logical.LogicalType) TinyIntType(org.apache.flink.table.types.logical.TinyIntType) TinyIntType(org.apache.flink.table.types.logical.TinyIntType) IntType(org.apache.flink.table.types.logical.IntType) BigIntType(org.apache.flink.table.types.logical.BigIntType) SmallIntType(org.apache.flink.table.types.logical.SmallIntType) VectorizedColumnBatch(org.apache.flink.table.data.columnar.vector.VectorizedColumnBatch) DoubleType(org.apache.flink.table.types.logical.DoubleType) ColumnarRowData(org.apache.flink.table.data.columnar.ColumnarRowData) Row(org.apache.flink.types.Row) Test(org.junit.Test)

Example 14 with LogicalType

use of org.apache.flink.table.types.logical.LogicalType in project flink by apache.

the class ParquetColumnarRowSplitReaderTest method innerTestPartitionValues.

private void innerTestPartitionValues(Path testPath, Map<String, Object> partSpec, boolean nullPartValue) throws IOException {
    LogicalType[] fieldTypes = new LogicalType[] { new VarCharType(VarCharType.MAX_LENGTH), new BooleanType(), new TinyIntType(), new SmallIntType(), new IntType(), new BigIntType(), new FloatType(), new DoubleType(), new TimestampType(9), new DecimalType(5, 0), new DecimalType(15, 0), new DecimalType(20, 0), new DecimalType(5, 0), new DecimalType(15, 0), new DecimalType(20, 0), new BooleanType(), new DateType(), new TimestampType(9), new DoubleType(), new TinyIntType(), new SmallIntType(), new IntType(), new BigIntType(), new FloatType(), new DecimalType(5, 0), new DecimalType(15, 0), new DecimalType(20, 0), new VarCharType(VarCharType.MAX_LENGTH) };
    ParquetColumnarRowSplitReader reader = ParquetSplitReaderUtil.genPartColumnarRowReader(false, true, new Configuration(), IntStream.range(0, 28).mapToObj(i -> "f" + i).toArray(String[]::new), Arrays.stream(fieldTypes).map(TypeConversions::fromLogicalToDataType).toArray(DataType[]::new), partSpec, new int[] { 7, 2, 4, 15, 19, 20, 21, 22, 23, 18, 16, 17, 24, 25, 26, 27 }, rowGroupSize, new Path(testPath.getPath()), 0, Long.MAX_VALUE);
    int i = 0;
    while (!reader.reachedEnd()) {
        ColumnarRowData row = reader.nextRecord();
        // common values
        assertEquals(i, row.getDouble(0), 0);
        assertEquals((byte) i, row.getByte(1));
        assertEquals(i, row.getInt(2));
        // partition values
        if (nullPartValue) {
            for (int j = 3; j < 16; j++) {
                assertTrue(row.isNullAt(j));
            }
        } else {
            assertTrue(row.getBoolean(3));
            assertEquals(9, row.getByte(4));
            assertEquals(10, row.getShort(5));
            assertEquals(11, row.getInt(6));
            assertEquals(12, row.getLong(7));
            assertEquals(13, row.getFloat(8), 0);
            assertEquals(6.6, row.getDouble(9), 0);
            assertEquals(DateTimeUtils.toInternal(Date.valueOf("2020-11-23")), row.getInt(10));
            assertEquals(LocalDateTime.of(1999, 1, 1, 1, 1), row.getTimestamp(11, 9).toLocalDateTime());
            assertEquals(DecimalData.fromBigDecimal(new BigDecimal(24), 5, 0), row.getDecimal(12, 5, 0));
            assertEquals(DecimalData.fromBigDecimal(new BigDecimal(25), 15, 0), row.getDecimal(13, 15, 0));
            assertEquals(DecimalData.fromBigDecimal(new BigDecimal(26), 20, 0), row.getDecimal(14, 20, 0));
            assertEquals("f27", row.getString(15).toString());
        }
        i++;
    }
    reader.close();
}
Also used : Path(org.apache.flink.core.fs.Path) Configuration(org.apache.hadoop.conf.Configuration) TypeConversions(org.apache.flink.table.types.utils.TypeConversions) BooleanType(org.apache.flink.table.types.logical.BooleanType) LogicalType(org.apache.flink.table.types.logical.LogicalType) BigIntType(org.apache.flink.table.types.logical.BigIntType) BigDecimal(java.math.BigDecimal) TinyIntType(org.apache.flink.table.types.logical.TinyIntType) TinyIntType(org.apache.flink.table.types.logical.TinyIntType) IntType(org.apache.flink.table.types.logical.IntType) BigIntType(org.apache.flink.table.types.logical.BigIntType) SmallIntType(org.apache.flink.table.types.logical.SmallIntType) FloatType(org.apache.flink.table.types.logical.FloatType) SmallIntType(org.apache.flink.table.types.logical.SmallIntType) DoubleType(org.apache.flink.table.types.logical.DoubleType) TimestampType(org.apache.flink.table.types.logical.TimestampType) DecimalType(org.apache.flink.table.types.logical.DecimalType) DataType(org.apache.flink.table.types.DataType) ColumnarRowData(org.apache.flink.table.data.columnar.ColumnarRowData) VarCharType(org.apache.flink.table.types.logical.VarCharType) DateType(org.apache.flink.table.types.logical.DateType)

Example 15 with LogicalType

use of org.apache.flink.table.types.logical.LogicalType in project flink by apache.

the class ParquetColumnarRowInputFormatTest method innerTestPartitionValues.

private void innerTestPartitionValues(Path testPath, List<String> partitionKeys, boolean nullPartValue) throws IOException {
    LogicalType[] fieldTypes = new LogicalType[] { new VarCharType(VarCharType.MAX_LENGTH), new BooleanType(), new TinyIntType(), new SmallIntType(), new IntType(), new BigIntType(), new FloatType(), new DoubleType(), new TimestampType(9), new DecimalType(5, 0), new DecimalType(15, 0), new DecimalType(20, 0), new DecimalType(5, 0), new DecimalType(15, 0), new DecimalType(20, 0), new BooleanType(), new DateType(), new TimestampType(9), new DoubleType(), new TinyIntType(), new SmallIntType(), new IntType(), new BigIntType(), new FloatType(), new DecimalType(5, 0), new DecimalType(15, 0), new DecimalType(20, 0), new VarCharType(VarCharType.MAX_LENGTH) };
    RowType rowType = RowType.of(fieldTypes, IntStream.range(0, 28).mapToObj(i -> "f" + i).toArray(String[]::new));
    int[] projected = new int[] { 7, 2, 4, 15, 19, 20, 21, 22, 23, 18, 16, 17, 24, 25, 26, 27 };
    RowType producedType = new RowType(Arrays.stream(projected).mapToObj(i -> rowType.getFields().get(i)).collect(Collectors.toList()));
    ParquetColumnarRowInputFormat<FileSourceSplit> format = ParquetColumnarRowInputFormat.createPartitionedFormat(new Configuration(), producedType, InternalTypeInfo.of(producedType), partitionKeys, PartitionFieldExtractor.forFileSystem("my_default_value"), 500, false, true);
    FileStatus fileStatus = testPath.getFileSystem().getFileStatus(testPath);
    AtomicInteger cnt = new AtomicInteger(0);
    forEachRemaining(format.createReader(EMPTY_CONF, new FileSourceSplit("id", testPath, 0, Long.MAX_VALUE, fileStatus.getModificationTime(), fileStatus.getLen())), row -> {
        int i = cnt.get();
        // common values
        assertEquals(i, row.getDouble(0), 0);
        assertEquals((byte) i, row.getByte(1));
        assertEquals(i, row.getInt(2));
        // partition values
        if (nullPartValue) {
            for (int j = 3; j < 16; j++) {
                assertTrue(row.isNullAt(j));
            }
        } else {
            assertTrue(row.getBoolean(3));
            assertEquals(9, row.getByte(4));
            assertEquals(10, row.getShort(5));
            assertEquals(11, row.getInt(6));
            assertEquals(12, row.getLong(7));
            assertEquals(13, row.getFloat(8), 0);
            assertEquals(6.6, row.getDouble(9), 0);
            assertEquals(DateTimeUtils.toInternal(Date.valueOf("2020-11-23")), row.getInt(10));
            assertEquals(LocalDateTime.of(1999, 1, 1, 1, 1), row.getTimestamp(11, 9).toLocalDateTime());
            assertEquals(DecimalData.fromBigDecimal(new BigDecimal(24), 5, 0), row.getDecimal(12, 5, 0));
            assertEquals(DecimalData.fromBigDecimal(new BigDecimal(25), 15, 0), row.getDecimal(13, 15, 0));
            assertEquals(DecimalData.fromBigDecimal(new BigDecimal(26), 20, 0), row.getDecimal(14, 20, 0));
            assertEquals("f27", row.getString(15).toString());
        }
        cnt.incrementAndGet();
    });
}
Also used : FileStatus(org.apache.flink.core.fs.FileStatus) FileSourceSplit(org.apache.flink.connector.file.src.FileSourceSplit) Configuration(org.apache.hadoop.conf.Configuration) BooleanType(org.apache.flink.table.types.logical.BooleanType) LogicalType(org.apache.flink.table.types.logical.LogicalType) BigIntType(org.apache.flink.table.types.logical.BigIntType) RowType(org.apache.flink.table.types.logical.RowType) BigDecimal(java.math.BigDecimal) TinyIntType(org.apache.flink.table.types.logical.TinyIntType) TinyIntType(org.apache.flink.table.types.logical.TinyIntType) IntType(org.apache.flink.table.types.logical.IntType) BigIntType(org.apache.flink.table.types.logical.BigIntType) SmallIntType(org.apache.flink.table.types.logical.SmallIntType) FloatType(org.apache.flink.table.types.logical.FloatType) SmallIntType(org.apache.flink.table.types.logical.SmallIntType) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) DoubleType(org.apache.flink.table.types.logical.DoubleType) TimestampType(org.apache.flink.table.types.logical.TimestampType) DecimalType(org.apache.flink.table.types.logical.DecimalType) VarCharType(org.apache.flink.table.types.logical.VarCharType) DateType(org.apache.flink.table.types.logical.DateType)

Aggregations

LogicalType (org.apache.flink.table.types.logical.LogicalType)192 DataType (org.apache.flink.table.types.DataType)53 RowType (org.apache.flink.table.types.logical.RowType)53 RowData (org.apache.flink.table.data.RowData)45 List (java.util.List)29 ArrayList (java.util.ArrayList)28 TableException (org.apache.flink.table.api.TableException)25 TimestampType (org.apache.flink.table.types.logical.TimestampType)25 Internal (org.apache.flink.annotation.Internal)21 IntType (org.apache.flink.table.types.logical.IntType)21 Map (java.util.Map)20 ValidationException (org.apache.flink.table.api.ValidationException)20 ArrayType (org.apache.flink.table.types.logical.ArrayType)19 DecimalType (org.apache.flink.table.types.logical.DecimalType)19 LocalZonedTimestampType (org.apache.flink.table.types.logical.LocalZonedTimestampType)17 Test (org.junit.Test)17 BigIntType (org.apache.flink.table.types.logical.BigIntType)16 LegacyTypeInformationType (org.apache.flink.table.types.logical.LegacyTypeInformationType)16 GenericRowData (org.apache.flink.table.data.GenericRowData)15 Arrays (java.util.Arrays)14