Search in sources :

Example 1 with KuduScanToken

use of org.apache.kudu.client.KuduScanToken in project presto by prestodb.

the class KuduClientSession method buildKuduSplits.

public List<KuduSplit> buildKuduSplits(KuduTableLayoutHandle layoutHandle) {
    reTryKerberos(kerberosAuthEnabled);
    KuduTableHandle tableHandle = layoutHandle.getTableHandle();
    KuduTable table = tableHandle.getTable(this);
    final int primaryKeyColumnCount = table.getSchema().getPrimaryKeyColumnCount();
    KuduScanToken.KuduScanTokenBuilder builder = client.newScanTokenBuilder(table);
    TupleDomain<ColumnHandle> constraintSummary = layoutHandle.getConstraintSummary();
    if (!addConstraintPredicates(table, builder, constraintSummary)) {
        return ImmutableList.of();
    }
    Optional<Set<ColumnHandle>> desiredColumns = layoutHandle.getDesiredColumns();
    if (desiredColumns.isPresent()) {
        if (desiredColumns.get().contains(KuduColumnHandle.ROW_ID_HANDLE)) {
            List<Integer> columnIndexes = IntStream.range(0, primaryKeyColumnCount).boxed().collect(Collectors.toList());
            for (ColumnHandle columnHandle : desiredColumns.get()) {
                if (columnHandle instanceof KuduColumnHandle) {
                    KuduColumnHandle k = (KuduColumnHandle) columnHandle;
                    int index = k.getOrdinalPosition();
                    if (index >= primaryKeyColumnCount) {
                        columnIndexes.add(index);
                    }
                }
            }
            builder.setProjectedColumnIndexes(columnIndexes);
        } else {
            List<Integer> columnIndexes = desiredColumns.get().stream().map(handle -> ((KuduColumnHandle) handle).getOrdinalPosition()).collect(toImmutableList());
            builder.setProjectedColumnIndexes(columnIndexes);
        }
    }
    List<KuduScanToken> tokens = builder.build();
    return tokens.stream().map(token -> toKuduSplit(tableHandle, token, primaryKeyColumnCount)).collect(toImmutableList());
}
Also used : EquatableValueSet(com.facebook.presto.common.predicate.EquatableValueSet) PartitionDesign(com.facebook.presto.kudu.properties.PartitionDesign) QUERY_REJECTED(com.facebook.presto.spi.StandardErrorCode.QUERY_REJECTED) DiscreteValues(com.facebook.presto.common.predicate.DiscreteValues) GENERIC_INTERNAL_ERROR(com.facebook.presto.spi.StandardErrorCode.GENERIC_INTERNAL_ERROR) Type(org.apache.kudu.Type) HashPartitionDefinition(com.facebook.presto.kudu.properties.HashPartitionDefinition) KuduScanner(org.apache.kudu.client.KuduScanner) KuduException(org.apache.kudu.client.KuduException) SortedRangeSet(com.facebook.presto.common.predicate.SortedRangeSet) Schema(org.apache.kudu.Schema) KuduTableProperties(com.facebook.presto.kudu.properties.KuduTableProperties) SchemaTableName(com.facebook.presto.spi.SchemaTableName) SchemaNotFoundException(com.facebook.presto.spi.SchemaNotFoundException) Map(java.util.Map) Marker(com.facebook.presto.common.predicate.Marker) ColumnSchema(org.apache.kudu.ColumnSchema) AlterTableOptions(org.apache.kudu.client.AlterTableOptions) PartialRow(org.apache.kudu.client.PartialRow) KuduUtil.reTryKerberos(com.facebook.presto.kudu.KuduUtil.reTryKerberos) ColumnDesign(com.facebook.presto.kudu.properties.ColumnDesign) ImmutableList.toImmutableList(com.google.common.collect.ImmutableList.toImmutableList) Set(java.util.Set) KuduClient(org.apache.kudu.client.KuduClient) Collectors(java.util.stream.Collectors) Range(com.facebook.presto.common.predicate.Range) KuduPredicate(org.apache.kudu.client.KuduPredicate) ColumnTypeAttributes(org.apache.kudu.ColumnTypeAttributes) Objects(java.util.Objects) List(java.util.List) ColumnMetadata(com.facebook.presto.spi.ColumnMetadata) SchemaEmulation(com.facebook.presto.kudu.schema.SchemaEmulation) Optional(java.util.Optional) Ranges(com.facebook.presto.common.predicate.Ranges) IntStream(java.util.stream.IntStream) Logger(com.facebook.airlift.log.Logger) DecimalType(com.facebook.presto.common.type.DecimalType) RangePartitionDefinition(com.facebook.presto.kudu.properties.RangePartitionDefinition) PrestoException(com.facebook.presto.spi.PrestoException) RangePartition(com.facebook.presto.kudu.properties.RangePartition) ArrayList(java.util.ArrayList) ImmutableList(com.google.common.collect.ImmutableList) ConnectorTableMetadata(com.facebook.presto.spi.ConnectorTableMetadata) CreateTableOptions(org.apache.kudu.client.CreateTableOptions) IOException(java.io.IOException) KuduTable(org.apache.kudu.client.KuduTable) Domain(com.facebook.presto.common.predicate.Domain) TupleDomain(com.facebook.presto.common.predicate.TupleDomain) KuduScanToken(org.apache.kudu.client.KuduScanToken) TableNotFoundException(com.facebook.presto.spi.TableNotFoundException) ColumnHandle(com.facebook.presto.spi.ColumnHandle) KuduSession(org.apache.kudu.client.KuduSession) ValueSet(com.facebook.presto.common.predicate.ValueSet) ColumnHandle(com.facebook.presto.spi.ColumnHandle) KuduScanToken(org.apache.kudu.client.KuduScanToken) EquatableValueSet(com.facebook.presto.common.predicate.EquatableValueSet) SortedRangeSet(com.facebook.presto.common.predicate.SortedRangeSet) Set(java.util.Set) ValueSet(com.facebook.presto.common.predicate.ValueSet) KuduTable(org.apache.kudu.client.KuduTable)

Example 2 with KuduScanToken

use of org.apache.kudu.client.KuduScanToken in project beam by apache.

the class KuduServiceImpl method createTabletScanners.

@Override
public List<byte[]> createTabletScanners(KuduIO.Read spec) throws KuduException {
    try (KuduClient client = getKuduClient(spec.getMasterAddresses())) {
        KuduTable table = client.openTable(spec.getTable());
        KuduScanToken.KuduScanTokenBuilder builder = client.newScanTokenBuilder(table);
        configureBuilder(spec, table.getSchema(), builder);
        List<KuduScanToken> tokens = builder.build();
        return tokens.stream().map(t -> uncheckCall(t::serialize)).collect(Collectors.toList());
    }
}
Also used : RowError(org.apache.kudu.client.RowError) Preconditions.checkNotNull(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull) LoggerFactory(org.slf4j.LoggerFactory) RowResultIterator(org.apache.kudu.client.RowResultIterator) Callable(java.util.concurrent.Callable) KuduScanner(org.apache.kudu.client.KuduScanner) KuduException(org.apache.kudu.client.KuduException) Schema(org.apache.kudu.Schema) Common(org.apache.kudu.Common) NoSuchElementException(java.util.NoSuchElementException) RowResult(org.apache.kudu.client.RowResult) Logger(org.slf4j.Logger) IOException(java.io.IOException) AbstractKuduScannerBuilder(org.apache.kudu.client.AbstractKuduScannerBuilder) KuduClient(org.apache.kudu.client.KuduClient) Collectors(java.util.stream.Collectors) KuduTable(org.apache.kudu.client.KuduTable) KuduPredicate(org.apache.kudu.client.KuduPredicate) SessionConfiguration(org.apache.kudu.client.SessionConfiguration) List(java.util.List) KuduScanToken(org.apache.kudu.client.KuduScanToken) BoundedSource(org.apache.beam.sdk.io.BoundedSource) AsyncKuduClient(org.apache.kudu.client.AsyncKuduClient) KuduSession(org.apache.kudu.client.KuduSession) Preconditions.checkState(org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState) KuduScanToken(org.apache.kudu.client.KuduScanToken) RowResult(org.apache.kudu.client.RowResult) KuduClient(org.apache.kudu.client.KuduClient) List(java.util.List) AsyncKuduClient(org.apache.kudu.client.AsyncKuduClient) KuduClient(org.apache.kudu.client.KuduClient) AsyncKuduClient(org.apache.kudu.client.AsyncKuduClient) KuduTable(org.apache.kudu.client.KuduTable)

Example 3 with KuduScanToken

use of org.apache.kudu.client.KuduScanToken in project apex-malhar by apache.

the class AbstractKuduInputPartitioner method getKuduScanTokensForSelectAllColumns.

/**
 * Builds a set of scan tokens. The list of scan tokens are generated as if the entire table is being scanned
 * i.e. a SELECT * FROM TABLE equivalent expression. This list is used to assign the partition pie assignments
 * for all of the planned partition of operators. Each operator gets a part of the PIE as if all columns were
 * selected. Subsequently when a query is to be processed, the query is used to generate the scan tokens applicable
 * for that query. Given that partition pie represents the entire data set, the scan assignments for the current
 * query will be a subset.
 * @return The list of scan tokens as if the entire table is getting scanned.
 * @throws Exception in cases when the connection to kudu cluster cannot be closed.
 */
public List<KuduScanToken> getKuduScanTokensForSelectAllColumns() throws Exception {
    // We are not using the current query for deciding the partition strategy but a SELECT * as
    // we do not want to want to optimize on just the current query. This prevents rapid throttling of operator
    // instances when the scan patterns are erratic. On the other hand, this might result on under utilized
    // operator resources in the DAG but will be consistent at a minimum.
    ApexKuduConnection apexKuduConnection = prototypeKuduInputOperator.getApexKuduConnectionInfo().build();
    KuduClient clientHandle = apexKuduConnection.getKuduClient();
    KuduTable table = apexKuduConnection.getKuduTable();
    KuduScanToken.KuduScanTokenBuilder builder = clientHandle.newScanTokenBuilder(table);
    List<String> allColumns = new ArrayList<>();
    List<ColumnSchema> columnList = apexKuduConnection.getKuduTable().getSchema().getColumns();
    for (ColumnSchema column : columnList) {
        allColumns.add(column.getName());
    }
    builder.setProjectedColumnNames(allColumns);
    LOG.debug("Building the partition pie assignments for the input operator");
    List<KuduScanToken> allPossibleTokens = builder.build();
    apexKuduConnection.close();
    return allPossibleTokens;
}
Also used : KuduScanToken(org.apache.kudu.client.KuduScanToken) ApexKuduConnection(org.apache.apex.malhar.kudu.ApexKuduConnection) KuduClient(org.apache.kudu.client.KuduClient) ArrayList(java.util.ArrayList) KuduTable(org.apache.kudu.client.KuduTable) ColumnSchema(org.apache.kudu.ColumnSchema)

Example 4 with KuduScanToken

use of org.apache.kudu.client.KuduScanToken in project apex-malhar by apache.

the class KuduInputOperatorCommons method truncateTable.

public void truncateTable() throws Exception {
    AbstractKuduPartitionScanner<UnitTestTablePojo, InputOperatorControlTuple> scannerForDeletingRows = unitTestStepwiseScanInputOperator.getScanner();
    List<KuduScanToken> scansForAllTablets = unitTestStepwiseScanInputOperator.getPartitioner().getKuduScanTokensForSelectAllColumns();
    ApexKuduConnection aCurrentConnection = scannerForDeletingRows.getConnectionPoolForThreads().get(0);
    KuduSession aSessionForDeletes = aCurrentConnection.getKuduClient().newSession();
    KuduTable currentTable = aCurrentConnection.getKuduTable();
    for (KuduScanToken aTabletScanToken : scansForAllTablets) {
        KuduScanner aScanner = aTabletScanToken.intoScanner(aCurrentConnection.getKuduClient());
        while (aScanner.hasMoreRows()) {
            RowResultIterator itrForRows = aScanner.nextRows();
            while (itrForRows.hasNext()) {
                RowResult aRow = itrForRows.next();
                int intRowKey = aRow.getInt("introwkey");
                String stringRowKey = aRow.getString("stringrowkey");
                long timestampRowKey = aRow.getLong("timestamprowkey");
                Delete aDeleteOp = currentTable.newDelete();
                aDeleteOp.getRow().addInt("introwkey", intRowKey);
                aDeleteOp.getRow().addString("stringrowkey", stringRowKey);
                aDeleteOp.getRow().addLong("timestamprowkey", timestampRowKey);
                aSessionForDeletes.apply(aDeleteOp);
            }
        }
    }
    aSessionForDeletes.close();
    // Sleep to allow for scans to complete
    Thread.sleep(2000);
}
Also used : Delete(org.apache.kudu.client.Delete) KuduScanToken(org.apache.kudu.client.KuduScanToken) KuduSession(org.apache.kudu.client.KuduSession) KuduTable(org.apache.kudu.client.KuduTable) RowResultIterator(org.apache.kudu.client.RowResultIterator) RowResult(org.apache.kudu.client.RowResult) KuduScanner(org.apache.kudu.client.KuduScanner)

Example 5 with KuduScanToken

use of org.apache.kudu.client.KuduScanToken in project hive by apache.

the class KuduInputFormat method computeSplits.

private List<KuduInputSplit> computeSplits(Configuration conf) throws IOException {
    try (KuduClient client = KuduHiveUtils.getKuduClient(conf)) {
        // Hive depends on FileSplits so we get the dummy Path for the Splits.
        Job job = Job.getInstance(conf);
        JobContext jobContext = ShimLoader.getHadoopShims().newJobContext(job);
        Path[] paths = FileInputFormat.getInputPaths(jobContext);
        Path dummyPath = paths[0];
        String tableName = conf.get(KUDU_TABLE_NAME_KEY);
        if (StringUtils.isEmpty(tableName)) {
            throw new IllegalArgumentException(KUDU_TABLE_NAME_KEY + " is not set.");
        }
        if (!client.tableExists(tableName)) {
            throw new IllegalArgumentException("Kudu table does not exist: " + tableName);
        }
        KuduTable table = client.openTable(tableName);
        List<KuduPredicate> predicates = KuduPredicateHandler.getPredicates(conf, table.getSchema());
        KuduScanToken.KuduScanTokenBuilder tokenBuilder = client.newScanTokenBuilder(table).setProjectedColumnNames(getProjectedColumns(conf));
        for (KuduPredicate predicate : predicates) {
            tokenBuilder.addPredicate(predicate);
        }
        List<KuduScanToken> tokens = tokenBuilder.build();
        List<KuduInputSplit> splits = new ArrayList<>(tokens.size());
        for (KuduScanToken token : tokens) {
            List<String> locations = new ArrayList<>(token.getTablet().getReplicas().size());
            for (LocatedTablet.Replica replica : token.getTablet().getReplicas()) {
                locations.add(replica.getRpcHost());
            }
            splits.add(new KuduInputSplit(token, dummyPath, locations.toArray(new String[0])));
        }
        return splits;
    }
}
Also used : Path(org.apache.hadoop.fs.Path) KuduScanToken(org.apache.kudu.client.KuduScanToken) ArrayList(java.util.ArrayList) KuduTable(org.apache.kudu.client.KuduTable) LocatedTablet(org.apache.kudu.client.LocatedTablet) KuduPredicate(org.apache.kudu.client.KuduPredicate) KuduClient(org.apache.kudu.client.KuduClient) JobContext(org.apache.hadoop.mapreduce.JobContext) Job(org.apache.hadoop.mapreduce.Job)

Aggregations

KuduScanToken (org.apache.kudu.client.KuduScanToken)8 ArrayList (java.util.ArrayList)6 KuduTable (org.apache.kudu.client.KuduTable)6 KuduClient (org.apache.kudu.client.KuduClient)5 KuduPredicate (org.apache.kudu.client.KuduPredicate)4 KuduSession (org.apache.kudu.client.KuduSession)4 IOException (java.io.IOException)3 KuduScanner (org.apache.kudu.client.KuduScanner)3 List (java.util.List)2 Collectors (java.util.stream.Collectors)2 ColumnSchema (org.apache.kudu.ColumnSchema)2 Schema (org.apache.kudu.Schema)2 KuduException (org.apache.kudu.client.KuduException)2 RowResult (org.apache.kudu.client.RowResult)2 RowResultIterator (org.apache.kudu.client.RowResultIterator)2 Logger (com.facebook.airlift.log.Logger)1 DiscreteValues (com.facebook.presto.common.predicate.DiscreteValues)1 Domain (com.facebook.presto.common.predicate.Domain)1 EquatableValueSet (com.facebook.presto.common.predicate.EquatableValueSet)1 Marker (com.facebook.presto.common.predicate.Marker)1