Search in sources :

Example 1 with ResultVectorCacheImpl

use of org.apache.drill.exec.physical.resultSet.impl.ResultVectorCacheImpl in project drill by apache.

the class TestConstantColumnLoader method testConstantColumnLoader.

/**
 * Test the static column loader using one column of each type.
 * The null column is of type int, but the associated value is of
 * type string. This is a bit odd, but works out because we detect that
 * the string value is null and call setNull on the writer, and avoid
 * using the actual data.
 */
@Test
public void testConstantColumnLoader() {
    final MajorType aType = MajorType.newBuilder().setMinorType(MinorType.VARCHAR).setMode(DataMode.REQUIRED).build();
    final MajorType bType = MajorType.newBuilder().setMinorType(MinorType.VARCHAR).setMode(DataMode.OPTIONAL).build();
    final List<ConstantColumnSpec> defns = new ArrayList<>();
    defns.add(new DummyColumn("a", aType, "a-value"));
    defns.add(new DummyColumn("b", bType, "b-value"));
    final ResultVectorCacheImpl cache = new ResultVectorCacheImpl(fixture.allocator());
    final ConstantColumnLoader staticLoader = new ConstantColumnLoader(cache, defns);
    // Create a batch
    staticLoader.load(2);
    // Verify
    final TupleMetadata expectedSchema = new SchemaBuilder().add("a", aType).add("b", bType).buildSchema();
    final SingleRowSet expected = fixture.rowSetBuilder(expectedSchema).addRow("a-value", "b-value").addRow("a-value", "b-value").build();
    new RowSetComparison(expected).verifyAndClearAll(fixture.wrap(staticLoader.load(2)));
    staticLoader.close();
}
Also used : SingleRowSet(org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet) RowSetComparison(org.apache.drill.test.rowSet.RowSetComparison) ConstantColumnSpec(org.apache.drill.exec.physical.impl.scan.project.ConstantColumnLoader.ConstantColumnSpec) MajorType(org.apache.drill.common.types.TypeProtos.MajorType) TupleMetadata(org.apache.drill.exec.record.metadata.TupleMetadata) ArrayList(java.util.ArrayList) SchemaBuilder(org.apache.drill.exec.record.metadata.SchemaBuilder) ResultVectorCacheImpl(org.apache.drill.exec.physical.resultSet.impl.ResultVectorCacheImpl) SubOperatorTest(org.apache.drill.test.SubOperatorTest) Test(org.junit.Test)

Example 2 with ResultVectorCacheImpl

use of org.apache.drill.exec.physical.resultSet.impl.ResultVectorCacheImpl in project drill by apache.

the class TestNullColumnLoader method testCachedTypesMapToNullable.

/**
 * Drill requires "schema persistence": if a scan operator
 * reads two files, F1 and F2, then the scan operator must
 * provide the same vectors from both readers. Not just the
 * same types, the same value vector instances (but, of course,
 * populated with different data.)
 * <p>
 * Test the case in which the reader for F1 found columns
 * (a, b, c) but, F2 found only (a, b), requiring that we
 * fill in column c, filled with nulls, but of the same type that it
 * was in file F1. We use a vector cache to pull off this trick.
 * This test ensures that the null column mechanism looks in that
 * vector cache when asked to create a nullable column.
 */
@Test
public void testCachedTypesMapToNullable() {
    final List<ResolvedNullColumn> defns = new ArrayList<>();
    defns.add(makeNullCol("req"));
    defns.add(makeNullCol("opt"));
    defns.add(makeNullCol("rep"));
    defns.add(makeNullCol("unk"));
    // Populate the cache with a column of each mode.
    final ResultVectorCacheImpl cache = new ResultVectorCacheImpl(fixture.allocator());
    cache.vectorFor(SchemaBuilder.columnSchema("req", MinorType.FLOAT8, DataMode.REQUIRED));
    final ValueVector opt = cache.vectorFor(SchemaBuilder.columnSchema("opt", MinorType.FLOAT8, DataMode.OPTIONAL));
    final ValueVector rep = cache.vectorFor(SchemaBuilder.columnSchema("rep", MinorType.FLOAT8, DataMode.REPEATED));
    // Use nullable Varchar for unknown null columns.
    final MajorType nullType = Types.optional(MinorType.VARCHAR);
    final NullColumnLoader staticLoader = new NullColumnLoader(cache, defns, nullType, false);
    // Create a batch
    final VectorContainer output = staticLoader.load(2);
    // Verify vectors are reused
    assertSame(opt, output.getValueVector(1).getValueVector());
    assertSame(rep, output.getValueVector(2).getValueVector());
    // Verify values and types
    final TupleMetadata expectedSchema = new SchemaBuilder().addNullable("req", MinorType.FLOAT8).addNullable("opt", MinorType.FLOAT8).addArray("rep", MinorType.FLOAT8).addNullable("unk", MinorType.VARCHAR).buildSchema();
    final SingleRowSet expected = fixture.rowSetBuilder(expectedSchema).addRow(null, null, new int[] {}, null).addRow(null, null, new int[] {}, null).build();
    RowSetUtilities.verify(expected, fixture.wrap(output));
    staticLoader.close();
}
Also used : ValueVector(org.apache.drill.exec.vector.ValueVector) SingleRowSet(org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet) MajorType(org.apache.drill.common.types.TypeProtos.MajorType) TupleMetadata(org.apache.drill.exec.record.metadata.TupleMetadata) ArrayList(java.util.ArrayList) SchemaBuilder(org.apache.drill.exec.record.metadata.SchemaBuilder) ResultVectorCacheImpl(org.apache.drill.exec.physical.resultSet.impl.ResultVectorCacheImpl) NullResultVectorCacheImpl(org.apache.drill.exec.physical.resultSet.impl.NullResultVectorCacheImpl) VectorContainer(org.apache.drill.exec.record.VectorContainer) SubOperatorTest(org.apache.drill.test.SubOperatorTest) Test(org.junit.Test)

Example 3 with ResultVectorCacheImpl

use of org.apache.drill.exec.physical.resultSet.impl.ResultVectorCacheImpl in project drill by apache.

the class TestMissingColumnLoader method testVectorCache.

/**
 * Drill requires "schema persistence": if a scan operator
 * reads two files, F1 and F2, then the scan operator must
 * provide the same vectors from both readers. Not just the
 * same types, the same value vector instances (but, of course,
 * populated with different data.)
 * <p>
 * Test the case in which the reader for F1 found columns
 * (a, b, c) but, F2 found only (a, b), requiring that we
 * fill in column c, filled with nulls, but of the same type that it
 * was in file F1. We use a vector cache to pull off this trick.
 * This test ensures that the null column mechanism looks in that
 * vector cache when asked to create a nullable column.
 */
@Test
public void testVectorCache() {
    TupleMetadata missingCols = new SchemaBuilder().addNullable("req", MinorType.FLOAT8).addNullable("opt", MinorType.FLOAT8).addArray("rep", MinorType.FLOAT8).addDynamic("unk").build();
    // Populate the cache with a column of each mode.
    final ResultVectorCacheImpl cache = new ResultVectorCacheImpl(fixture.allocator());
    cache.vectorFor(SchemaBuilder.columnSchema("req", MinorType.FLOAT8, DataMode.REQUIRED));
    final ValueVector opt = cache.vectorFor(SchemaBuilder.columnSchema("opt", MinorType.FLOAT8, DataMode.OPTIONAL));
    final ValueVector rep = cache.vectorFor(SchemaBuilder.columnSchema("rep", MinorType.FLOAT8, DataMode.REPEATED));
    // Use nullable Varchar for unknown null columns.
    final MajorType nullType = Types.optional(MinorType.VARCHAR);
    StaticBatchBuilder handler = new MissingColumnHandlerBuilder().inputSchema(missingCols).vectorCache(cache).nullType(nullType).build();
    assertNotNull(handler);
    // Create a batch
    handler.load(2);
    final VectorContainer output = handler.outputContainer();
    // Verify vectors are reused
    assertSame(opt, output.getValueVector(1).getValueVector());
    assertSame(rep, output.getValueVector(2).getValueVector());
    // Verify values and types
    final TupleMetadata expectedSchema = new SchemaBuilder().addNullable("req", MinorType.FLOAT8).addNullable("opt", MinorType.FLOAT8).addArray("rep", MinorType.FLOAT8).addNullable("unk", MinorType.VARCHAR).buildSchema();
    final SingleRowSet expected = fixture.rowSetBuilder(expectedSchema).addRow(null, null, new int[] {}, null).addRow(null, null, new int[] {}, null).build();
    RowSetUtilities.verify(expected, fixture.wrap(output));
    handler.close();
}
Also used : ValueVector(org.apache.drill.exec.vector.ValueVector) SingleRowSet(org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet) TupleMetadata(org.apache.drill.exec.record.metadata.TupleMetadata) MajorType(org.apache.drill.common.types.TypeProtos.MajorType) SchemaBuilder(org.apache.drill.exec.record.metadata.SchemaBuilder) ResultVectorCacheImpl(org.apache.drill.exec.physical.resultSet.impl.ResultVectorCacheImpl) NullResultVectorCacheImpl(org.apache.drill.exec.physical.resultSet.impl.NullResultVectorCacheImpl) VectorContainer(org.apache.drill.exec.record.VectorContainer) SubOperatorTest(org.apache.drill.test.SubOperatorTest) Test(org.junit.Test) EvfTest(org.apache.drill.categories.EvfTest)

Example 4 with ResultVectorCacheImpl

use of org.apache.drill.exec.physical.resultSet.impl.ResultVectorCacheImpl in project drill by apache.

the class TestConstantColumnLoader method testFileMetadata.

@Test
public void testFileMetadata() {
    FileMetadata fileInfo = new FileMetadata(new Path("hdfs:///w/x/y/z.csv"), new Path("hdfs:///w"));
    List<ConstantColumnSpec> defns = new ArrayList<>();
    FileMetadataColumnDefn iDefn = new FileMetadataColumnDefn(ScanTestUtils.SUFFIX_COL, ImplicitFileColumns.SUFFIX);
    FileMetadataColumn iCol = new FileMetadataColumn(ScanTestUtils.SUFFIX_COL, iDefn, fileInfo, null, 0);
    defns.add(iCol);
    String partColName = ScanTestUtils.partitionColName(1);
    PartitionColumn pCol = new PartitionColumn(partColName, 1, fileInfo, null, 0);
    defns.add(pCol);
    ResultVectorCacheImpl cache = new ResultVectorCacheImpl(fixture.allocator());
    ConstantColumnLoader staticLoader = new ConstantColumnLoader(cache, defns);
    // Create a batch
    staticLoader.load(2);
    // Verify
    TupleMetadata expectedSchema = new SchemaBuilder().add(ScanTestUtils.SUFFIX_COL, MinorType.VARCHAR).addNullable(partColName, MinorType.VARCHAR).buildSchema();
    SingleRowSet expected = fixture.rowSetBuilder(expectedSchema).addRow("csv", "y").addRow("csv", "y").build();
    new RowSetComparison(expected).verifyAndClearAll(fixture.wrap(staticLoader.load(2)));
    staticLoader.close();
}
Also used : Path(org.apache.hadoop.fs.Path) FileMetadataColumnDefn(org.apache.drill.exec.physical.impl.scan.file.FileMetadataColumnDefn) SingleRowSet(org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet) FileMetadata(org.apache.drill.exec.physical.impl.scan.file.FileMetadata) ArrayList(java.util.ArrayList) PartitionColumn(org.apache.drill.exec.physical.impl.scan.file.PartitionColumn) RowSetComparison(org.apache.drill.test.rowSet.RowSetComparison) ConstantColumnSpec(org.apache.drill.exec.physical.impl.scan.project.ConstantColumnLoader.ConstantColumnSpec) TupleMetadata(org.apache.drill.exec.record.metadata.TupleMetadata) SchemaBuilder(org.apache.drill.exec.record.metadata.SchemaBuilder) FileMetadataColumn(org.apache.drill.exec.physical.impl.scan.file.FileMetadataColumn) ResultVectorCacheImpl(org.apache.drill.exec.physical.resultSet.impl.ResultVectorCacheImpl) SubOperatorTest(org.apache.drill.test.SubOperatorTest) Test(org.junit.Test)

Example 5 with ResultVectorCacheImpl

use of org.apache.drill.exec.physical.resultSet.impl.ResultVectorCacheImpl in project drill by apache.

the class TestNullColumnLoader method testCachedTypesAllowRequired.

/**
 * Suppose, in the previous test, that one of the columns that
 * goes missing is a required column. The null-column mechanism can
 * create the "null" column as a required column, then fill it with
 * empty values (zero or "") -- if the scan operator feels doing so would
 * be helpful.
 */
@Test
public void testCachedTypesAllowRequired() {
    final List<ResolvedNullColumn> defns = new ArrayList<>();
    defns.add(makeNullCol("req"));
    defns.add(makeNullCol("opt"));
    defns.add(makeNullCol("rep"));
    defns.add(makeNullCol("unk"));
    // Populate the cache with a column of each mode.
    final ResultVectorCacheImpl cache = new ResultVectorCacheImpl(fixture.allocator());
    cache.vectorFor(SchemaBuilder.columnSchema("req", MinorType.FLOAT8, DataMode.REQUIRED));
    final ValueVector opt = cache.vectorFor(SchemaBuilder.columnSchema("opt", MinorType.FLOAT8, DataMode.OPTIONAL));
    final ValueVector rep = cache.vectorFor(SchemaBuilder.columnSchema("rep", MinorType.FLOAT8, DataMode.REPEATED));
    // Use nullable Varchar for unknown null columns.
    final MajorType nullType = Types.optional(MinorType.VARCHAR);
    final NullColumnLoader staticLoader = new NullColumnLoader(cache, defns, nullType, true);
    // Create a batch
    final VectorContainer output = staticLoader.load(2);
    // Verify vectors are reused
    assertSame(opt, output.getValueVector(1).getValueVector());
    assertSame(rep, output.getValueVector(2).getValueVector());
    // Verify values and types
    final TupleMetadata expectedSchema = new SchemaBuilder().add("req", MinorType.FLOAT8).addNullable("opt", MinorType.FLOAT8).addArray("rep", MinorType.FLOAT8).addNullable("unk", MinorType.VARCHAR).buildSchema();
    final SingleRowSet expected = fixture.rowSetBuilder(expectedSchema).addRow(0.0, null, new int[] {}, null).addRow(0.0, null, new int[] {}, null).build();
    RowSetUtilities.verify(expected, fixture.wrap(output));
    staticLoader.close();
}
Also used : ValueVector(org.apache.drill.exec.vector.ValueVector) SingleRowSet(org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet) MajorType(org.apache.drill.common.types.TypeProtos.MajorType) TupleMetadata(org.apache.drill.exec.record.metadata.TupleMetadata) ArrayList(java.util.ArrayList) SchemaBuilder(org.apache.drill.exec.record.metadata.SchemaBuilder) ResultVectorCacheImpl(org.apache.drill.exec.physical.resultSet.impl.ResultVectorCacheImpl) NullResultVectorCacheImpl(org.apache.drill.exec.physical.resultSet.impl.NullResultVectorCacheImpl) VectorContainer(org.apache.drill.exec.record.VectorContainer) SubOperatorTest(org.apache.drill.test.SubOperatorTest) Test(org.junit.Test)

Aggregations

ResultVectorCacheImpl (org.apache.drill.exec.physical.resultSet.impl.ResultVectorCacheImpl)5 SingleRowSet (org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet)5 SchemaBuilder (org.apache.drill.exec.record.metadata.SchemaBuilder)5 TupleMetadata (org.apache.drill.exec.record.metadata.TupleMetadata)5 SubOperatorTest (org.apache.drill.test.SubOperatorTest)5 Test (org.junit.Test)5 ArrayList (java.util.ArrayList)4 MajorType (org.apache.drill.common.types.TypeProtos.MajorType)4 NullResultVectorCacheImpl (org.apache.drill.exec.physical.resultSet.impl.NullResultVectorCacheImpl)3 VectorContainer (org.apache.drill.exec.record.VectorContainer)3 ValueVector (org.apache.drill.exec.vector.ValueVector)3 ConstantColumnSpec (org.apache.drill.exec.physical.impl.scan.project.ConstantColumnLoader.ConstantColumnSpec)2 RowSetComparison (org.apache.drill.test.rowSet.RowSetComparison)2 EvfTest (org.apache.drill.categories.EvfTest)1 FileMetadata (org.apache.drill.exec.physical.impl.scan.file.FileMetadata)1 FileMetadataColumn (org.apache.drill.exec.physical.impl.scan.file.FileMetadataColumn)1 FileMetadataColumnDefn (org.apache.drill.exec.physical.impl.scan.file.FileMetadataColumnDefn)1 PartitionColumn (org.apache.drill.exec.physical.impl.scan.file.PartitionColumn)1 Path (org.apache.hadoop.fs.Path)1