Search in sources :

Example 1 with DistributionSizeError

use of io.cdap.cdap.etl.api.join.error.DistributionSizeError in project cdap by caskdata.

the class JoinDistribution method validate.

public Collection<JoinError> validate(List<JoinStage> stages) {
    List<JoinError> errors = new ArrayList<>();
    if (stages.size() > 2) {
        errors.add(new JoinError("Only two stages can be joined if a distribution factor is specified"));
    }
    if (skewedStageName == null) {
        errors.add(new DistributionStageError("Distribution requires skewed stage name to be defined"));
    }
    if (distributionFactor < 1) {
        errors.add(new DistributionSizeError("Distribution size must be greater than 0"));
    }
    // If skewedStageName does not match any of the names in stages
    JoinStage leftStage = stages.stream().filter(s -> s.getStageName().equals(skewedStageName)).findFirst().orElse(null);
    if (leftStage == null) {
        errors.add(new DistributionStageError(String.format("Skewed stage '%s' does not match any of the specified " + "stages", skewedStageName)));
    } else if (!leftStage.isRequired()) {
        errors.add(new DistributionStageError(String.format("Distribution only supports inner or left outer joins, the skewed " + "stage '%s' must be required", skewedStageName)));
    }
    if (stages.stream().anyMatch(JoinStage::isBroadcast)) {
        errors.add(new BroadcastError("Distribution cannot be used if either stage will be broadcast"));
    }
    return errors;
}
Also used : JoinError(io.cdap.cdap.etl.api.join.error.JoinError) ArrayList(java.util.ArrayList) BroadcastError(io.cdap.cdap.etl.api.join.error.BroadcastError) DistributionStageError(io.cdap.cdap.etl.api.join.error.DistributionStageError) DistributionSizeError(io.cdap.cdap.etl.api.join.error.DistributionSizeError)

Example 2 with DistributionSizeError

use of io.cdap.cdap.etl.api.join.error.DistributionSizeError in project cdap by cdapio.

the class JoinDistribution method validate.

public Collection<JoinError> validate(List<JoinStage> stages) {
    List<JoinError> errors = new ArrayList<>();
    if (stages.size() > 2) {
        errors.add(new JoinError("Only two stages can be joined if a distribution factor is specified"));
    }
    if (skewedStageName == null) {
        errors.add(new DistributionStageError("Distribution requires skewed stage name to be defined"));
    }
    if (distributionFactor < 1) {
        errors.add(new DistributionSizeError("Distribution size must be greater than 0"));
    }
    // If skewedStageName does not match any of the names in stages
    JoinStage leftStage = stages.stream().filter(s -> s.getStageName().equals(skewedStageName)).findFirst().orElse(null);
    if (leftStage == null) {
        errors.add(new DistributionStageError(String.format("Skewed stage '%s' does not match any of the specified " + "stages", skewedStageName)));
    } else if (!leftStage.isRequired()) {
        errors.add(new DistributionStageError(String.format("Distribution only supports inner or left outer joins, the skewed " + "stage '%s' must be required", skewedStageName)));
    }
    if (stages.stream().anyMatch(JoinStage::isBroadcast)) {
        errors.add(new BroadcastError("Distribution cannot be used if either stage will be broadcast"));
    }
    return errors;
}
Also used : JoinError(io.cdap.cdap.etl.api.join.error.JoinError) ArrayList(java.util.ArrayList) BroadcastError(io.cdap.cdap.etl.api.join.error.BroadcastError) DistributionStageError(io.cdap.cdap.etl.api.join.error.DistributionStageError) DistributionSizeError(io.cdap.cdap.etl.api.join.error.DistributionSizeError)

Aggregations

BroadcastError (io.cdap.cdap.etl.api.join.error.BroadcastError)2 DistributionSizeError (io.cdap.cdap.etl.api.join.error.DistributionSizeError)2 DistributionStageError (io.cdap.cdap.etl.api.join.error.DistributionStageError)2 JoinError (io.cdap.cdap.etl.api.join.error.JoinError)2 ArrayList (java.util.ArrayList)2