Search in sources :

Example 16 with Dataset

use of co.cask.cdap.api.dataset.Dataset in project cdap by caskdata.

the class ExternalDatasets method makeTrackable.

/**
   * If the output is an external sink then an external dataset is created for tracking purpose and returned.
   * If the output is a regular dataset then it is already trackable, hence same output is returned.
   *
   * @param admin {@link Admin} used to create external dataset
   * @param output output to be tracked
   * @return an external dataset if output is an external sink, otherwise the same output is returned
   */
public static Output makeTrackable(Admin admin, Output output) {
    // If output is not an external sink, return the same output as it can be tracked by itself.
    if (!(output instanceof Output.OutputFormatProviderOutput)) {
        return output;
    }
    // Output is an external sink, create an external dataset so that it can be tracked.
    String outputName = output.getName();
    OutputFormatProvider outputFormatProvider = ((Output.OutputFormatProviderOutput) output).getOutputFormatProvider();
    Map<String, String> outputFormatConfiguration = outputFormatProvider.getOutputFormatConfiguration();
    // this can be tracked by itself without creating an external dataset
    if (outputFormatProvider instanceof Dataset) {
        return output;
    }
    // Output is an external sink, create an external dataset so that it can be tracked.
    try {
        // Create an external dataset for the output format for lineage tracking
        Map<String, String> arguments = new HashMap<>();
        arguments.put("output.format.class", outputFormatProvider.getOutputFormatClassName());
        arguments.putAll(outputFormatConfiguration);
        if (!admin.datasetExists(outputName)) {
            // Note: the dataset properties are the same as the arguments since we cannot identify them separately
            // since they are mixed up in a single configuration object (CDAP-5674)
            // Also, the properties of the external dataset created will contain runtime arguments for the same reason.
            admin.createDataset(outputName, EXTERNAL_DATASET_TYPE, DatasetProperties.of(arguments));
        } else {
            // Check if the external dataset name clashes with an existing CDAP Dataset
            String datasetType = admin.getDatasetType(outputName);
            if (!EXTERNAL_DATASET_TYPE.equals(datasetType)) {
                throw new IllegalArgumentException("An external sink cannot have the same name as an existing CDAP Dataset instance " + outputName);
            }
        }
        return Output.ofDataset(outputName, Collections.unmodifiableMap(arguments)).alias(output.getAlias());
    } catch (DatasetManagementException e) {
        throw Throwables.propagate(e);
    }
}
Also used : DatasetManagementException(co.cask.cdap.api.dataset.DatasetManagementException) HashMap(java.util.HashMap) Dataset(co.cask.cdap.api.dataset.Dataset) OutputFormatProvider(co.cask.cdap.api.data.batch.OutputFormatProvider)

Example 17 with Dataset

use of co.cask.cdap.api.dataset.Dataset in project cdap by caskdata.

the class ExploreTableManager method generateDisableStatement.

private String generateDisableStatement(DatasetId datasetId, DatasetSpecification spec) throws ExploreException {
    String tableName = tableNaming.getTableName(datasetId, spec.getProperties());
    String databaseName = ExploreProperties.getExploreDatabaseName(spec.getProperties());
    // If table does not exist, nothing to be done
    try {
        exploreService.getTableInfo(datasetId.getNamespace(), databaseName, tableName);
    } catch (TableNotFoundException e) {
        // Ignore exception, since this means table was not found.
        return null;
    }
    Dataset dataset = null;
    try (SystemDatasetInstantiator datasetInstantiator = datasetInstantiatorFactory.create()) {
        dataset = datasetInstantiator.getDataset(datasetId);
        if (dataset instanceof FileSet || dataset instanceof PartitionedFileSet) {
            // do not drop the explore table that dataset is reusing an existing table
            if (FileSetProperties.isUseExisting(spec.getProperties())) {
                return null;
            }
        }
        return generateDeleteStatement(dataset, databaseName, tableName);
    } catch (IOException e) {
        LOG.error("Exception creating dataset classLoaderProvider for dataset {}.", datasetId, e);
        throw new ExploreException("Exception instantiating dataset " + datasetId);
    } finally {
        Closeables.closeQuietly(dataset);
    }
}
Also used : FileSet(co.cask.cdap.api.dataset.lib.FileSet) PartitionedFileSet(co.cask.cdap.api.dataset.lib.PartitionedFileSet) Dataset(co.cask.cdap.api.dataset.Dataset) SystemDatasetInstantiator(co.cask.cdap.data.dataset.SystemDatasetInstantiator) PartitionedFileSet(co.cask.cdap.api.dataset.lib.PartitionedFileSet) IOException(java.io.IOException)

Example 18 with Dataset

use of co.cask.cdap.api.dataset.Dataset in project cdap by caskdata.

the class ExploreExecutorHttpHandler method doAddPartition.

private void doAddPartition(HttpRequest request, HttpResponder responder, DatasetId datasetId) {
    Dataset dataset;
    try (SystemDatasetInstantiator datasetInstantiator = datasetInstantiatorFactory.create()) {
        dataset = datasetInstantiator.getDataset(datasetId);
        if (dataset == null) {
            responder.sendString(HttpResponseStatus.NOT_FOUND, "Cannot load dataset " + datasetId);
            return;
        }
    } catch (IOException e) {
        String classNotFoundMessage = isClassNotFoundException(e);
        if (classNotFoundMessage != null) {
            JsonObject json = new JsonObject();
            json.addProperty("handle", QueryHandle.NO_OP.getHandle());
            responder.sendJson(HttpResponseStatus.OK, json);
            return;
        }
        LOG.error("Exception instantiating dataset {}.", datasetId, e);
        responder.sendString(HttpResponseStatus.INTERNAL_SERVER_ERROR, "Exception instantiating dataset " + datasetId.getDataset());
        return;
    }
    try {
        if (!(dataset instanceof PartitionedFileSet)) {
            responder.sendString(HttpResponseStatus.BAD_REQUEST, "not a partitioned dataset.");
            return;
        }
        Partitioning partitioning = ((PartitionedFileSet) dataset).getPartitioning();
        Reader reader = new InputStreamReader(new ChannelBufferInputStream(request.getContent()));
        Map<String, String> properties = GSON.fromJson(reader, new TypeToken<Map<String, String>>() {
        }.getType());
        String fsPath = properties.get("path");
        if (fsPath == null) {
            responder.sendString(HttpResponseStatus.BAD_REQUEST, "path was not specified.");
            return;
        }
        PartitionKey partitionKey;
        try {
            partitionKey = PartitionedFileSetArguments.getOutputPartitionKey(properties, partitioning);
        } catch (Exception e) {
            responder.sendString(HttpResponseStatus.BAD_REQUEST, "invalid partition key: " + e.getMessage());
            return;
        }
        if (partitionKey == null) {
            responder.sendString(HttpResponseStatus.BAD_REQUEST, "no partition key was given.");
            return;
        }
        QueryHandle handle = exploreTableManager.addPartition(datasetId, properties, partitionKey, fsPath);
        JsonObject json = new JsonObject();
        json.addProperty("handle", handle.getHandle());
        responder.sendJson(HttpResponseStatus.OK, json);
    } catch (Throwable e) {
        LOG.error("Got exception:", e);
        responder.sendString(HttpResponseStatus.INTERNAL_SERVER_ERROR, e.getMessage());
    }
}
Also used : InputStreamReader(java.io.InputStreamReader) Dataset(co.cask.cdap.api.dataset.Dataset) JsonObject(com.google.gson.JsonObject) Reader(java.io.Reader) InputStreamReader(java.io.InputStreamReader) PartitionedFileSet(co.cask.cdap.api.dataset.lib.PartitionedFileSet) IOException(java.io.IOException) BadRequestException(co.cask.cdap.common.BadRequestException) ExploreException(co.cask.cdap.explore.service.ExploreException) SQLException(java.sql.SQLException) DatasetManagementException(co.cask.cdap.api.dataset.DatasetManagementException) JsonSyntaxException(com.google.gson.JsonSyntaxException) UnsupportedTypeException(co.cask.cdap.api.data.schema.UnsupportedTypeException) IOException(java.io.IOException) Partitioning(co.cask.cdap.api.dataset.lib.Partitioning) SystemDatasetInstantiator(co.cask.cdap.data.dataset.SystemDatasetInstantiator) TypeToken(com.google.common.reflect.TypeToken) PartitionKey(co.cask.cdap.api.dataset.lib.PartitionKey) ChannelBufferInputStream(org.jboss.netty.buffer.ChannelBufferInputStream) QueryHandle(co.cask.cdap.proto.QueryHandle)

Aggregations

Dataset (co.cask.cdap.api.dataset.Dataset)18 IOException (java.io.IOException)11 DatasetManagementException (co.cask.cdap.api.dataset.DatasetManagementException)7 SystemDatasetInstantiator (co.cask.cdap.data.dataset.SystemDatasetInstantiator)6 DatasetInstantiationException (co.cask.cdap.api.data.DatasetInstantiationException)3 UnsupportedTypeException (co.cask.cdap.api.data.schema.UnsupportedTypeException)3 PartitionedFileSet (co.cask.cdap.api.dataset.lib.PartitionedFileSet)3 BadRequestException (co.cask.cdap.common.BadRequestException)3 DatasetSpecification (co.cask.cdap.api.dataset.DatasetSpecification)2 PartitionKey (co.cask.cdap.api.dataset.lib.PartitionKey)2 Partitioning (co.cask.cdap.api.dataset.lib.Partitioning)2 TopicNotFoundException (co.cask.cdap.api.messaging.TopicNotFoundException)2 ServiceUnavailableException (co.cask.cdap.common.ServiceUnavailableException)2 CustomDatasetApp (co.cask.cdap.data2.dataset2.customds.CustomDatasetApp)2 CustomOperations (co.cask.cdap.data2.dataset2.customds.CustomOperations)2 DefaultTopLevelExtendsDataset (co.cask.cdap.data2.dataset2.customds.DefaultTopLevelExtendsDataset)2 DelegatingDataset (co.cask.cdap.data2.dataset2.customds.DelegatingDataset)2 TopLevelDataset (co.cask.cdap.data2.dataset2.customds.TopLevelDataset)2 TopLevelDirectDataset (co.cask.cdap.data2.dataset2.customds.TopLevelDirectDataset)2 TopLevelExtendsDataset (co.cask.cdap.data2.dataset2.customds.TopLevelExtendsDataset)2