Search in sources :

Example 6 with Dataset

use of co.cask.cdap.api.dataset.Dataset in project cdap by caskdata.

the class DatasetAdminService method writeSystemMetadata.

private void writeSystemMetadata(DatasetId datasetInstanceId, final DatasetSpecification spec, DatasetProperties props, final DatasetTypeMeta typeMeta, final DatasetType type, final DatasetContext context, boolean existing, UserGroupInformation ugi) throws IOException {
    // add system metadata for user datasets only
    if (DatasetsUtil.isUserDataset(datasetInstanceId)) {
        Dataset dataset = null;
        try {
            try {
                dataset = ImpersonationUtils.doAs(ugi, new Callable<Dataset>() {

                    @Override
                    public Dataset call() throws Exception {
                        return type.getDataset(context, spec, DatasetDefinition.NO_ARGUMENTS);
                    }
                });
            } catch (Exception e) {
                LOG.warn("Exception while instantiating Dataset {}", datasetInstanceId, e);
            }
            // Make sure to write whatever system metadata that can be derived
            // even if the above instantiation throws exception
            SystemMetadataWriter systemMetadataWriter;
            if (existing) {
                systemMetadataWriter = new DatasetSystemMetadataWriter(metadataStore, datasetInstanceId, props, dataset, typeMeta.getName(), spec.getDescription());
            } else {
                long createTime = System.currentTimeMillis();
                systemMetadataWriter = new DatasetSystemMetadataWriter(metadataStore, datasetInstanceId, props, createTime, dataset, typeMeta.getName(), spec.getDescription());
            }
            systemMetadataWriter.write();
        } finally {
            if (dataset != null) {
                dataset.close();
            }
        }
    }
}
Also used : DatasetSystemMetadataWriter(co.cask.cdap.data2.metadata.system.DatasetSystemMetadataWriter) Dataset(co.cask.cdap.api.dataset.Dataset) DatasetSystemMetadataWriter(co.cask.cdap.data2.metadata.system.DatasetSystemMetadataWriter) SystemMetadataWriter(co.cask.cdap.data2.metadata.system.SystemMetadataWriter) Callable(java.util.concurrent.Callable) IncompatibleUpdateException(co.cask.cdap.api.dataset.IncompatibleUpdateException) DatasetManagementException(co.cask.cdap.api.dataset.DatasetManagementException) IOException(java.io.IOException) NotFoundException(co.cask.cdap.common.NotFoundException) BadRequestException(co.cask.cdap.common.BadRequestException)

Example 7 with Dataset

use of co.cask.cdap.api.dataset.Dataset in project cdap by caskdata.

the class DatasetModulesDeployer method loadAndDeployModule.

private void loadAndDeployModule(ClassLoader artifactClassLoader, String className, Location jarLocation, String moduleName, NamespaceId namespaceId) throws ClassNotFoundException, IllegalAccessException, InstantiationException, DatasetManagementException {
    // note: using app class loader to load module class
    @SuppressWarnings("unchecked") Class<Dataset> clazz = (Class<Dataset>) artifactClassLoader.loadClass(className);
    try {
        // note: we can deploy module or create module from Dataset class
        // note: it seems dangerous to instantiate dataset module here, but this will be fine when we move deploy into
        //       isolated user's environment (e.g. separate yarn container)
        DatasetModuleId moduleId = namespaceId.datasetModule(moduleName);
        DatasetModule module;
        if (DatasetModule.class.isAssignableFrom(clazz)) {
            module = (DatasetModule) clazz.newInstance();
        } else if (Dataset.class.isAssignableFrom(clazz)) {
            if (systemDatasetFramework.hasSystemType(clazz.getName())) {
                return;
            }
            DatasetTypeId typeId = namespaceId.datasetType(clazz.getName());
            if (datasetFramework.hasType(typeId) && !allowDatasetUncheckedUpgrade) {
                return;
            }
            module = new SingleTypeModule(clazz);
        } else {
            throw new IllegalArgumentException(String.format("Cannot use class %s to add dataset module: it must be of type DatasetModule or Dataset", clazz.getName()));
        }
        LOG.info("Adding module: {}", clazz.getName());
        datasetFramework.addModule(moduleId, module, jarLocation);
    } catch (ModuleConflictException e) {
        LOG.info("Conflict while deploying module {}: {}", moduleName, e.getMessage());
        throw e;
    }
}
Also used : DatasetModuleId(co.cask.cdap.proto.id.DatasetModuleId) DatasetTypeId(co.cask.cdap.proto.id.DatasetTypeId) ModuleConflictException(co.cask.cdap.data2.dataset2.ModuleConflictException) Dataset(co.cask.cdap.api.dataset.Dataset) SingleTypeModule(co.cask.cdap.data2.dataset2.SingleTypeModule) DatasetModule(co.cask.cdap.api.dataset.module.DatasetModule)

Example 8 with Dataset

use of co.cask.cdap.api.dataset.Dataset in project cdap by caskdata.

the class ExistingEntitySystemMetadataWriter method writeSystemMetadataForDatasets.

private void writeSystemMetadataForDatasets(NamespaceId namespace, DatasetFramework dsFramework) throws DatasetManagementException, IOException, NamespaceNotFoundException {
    SystemDatasetInstantiatorFactory systemDatasetInstantiatorFactory = new SystemDatasetInstantiatorFactory(locationFactory, dsFramework, cConf);
    try (SystemDatasetInstantiator systemDatasetInstantiator = systemDatasetInstantiatorFactory.create()) {
        for (DatasetSpecificationSummary summary : dsFramework.getInstances(namespace)) {
            final DatasetId dsInstance = namespace.dataset(summary.getName());
            DatasetProperties dsProperties = DatasetProperties.of(summary.getProperties());
            String dsType = summary.getType();
            Dataset dataset = null;
            try {
                try {
                    dataset = impersonator.doAs(dsInstance, new Callable<Dataset>() {

                        @Override
                        public Dataset call() throws Exception {
                            return systemDatasetInstantiator.getDataset(dsInstance);
                        }
                    });
                } catch (Exception e) {
                    LOG.warn("Exception while instantiating dataset {}", dsInstance, e);
                }
                SystemMetadataWriter writer = new DatasetSystemMetadataWriter(metadataStore, dsInstance, dsProperties, dataset, dsType, summary.getDescription());
                writer.write();
            } finally {
                if (dataset != null) {
                    dataset.close();
                }
            }
        }
    }
}
Also used : SystemDatasetInstantiatorFactory(co.cask.cdap.data.dataset.SystemDatasetInstantiatorFactory) DatasetSystemMetadataWriter(co.cask.cdap.data2.metadata.system.DatasetSystemMetadataWriter) SystemDatasetInstantiator(co.cask.cdap.data.dataset.SystemDatasetInstantiator) Dataset(co.cask.cdap.api.dataset.Dataset) DatasetProperties(co.cask.cdap.api.dataset.DatasetProperties) DatasetSystemMetadataWriter(co.cask.cdap.data2.metadata.system.DatasetSystemMetadataWriter) ProgramSystemMetadataWriter(co.cask.cdap.data2.metadata.system.ProgramSystemMetadataWriter) ViewSystemMetadataWriter(co.cask.cdap.data2.metadata.system.ViewSystemMetadataWriter) SystemMetadataWriter(co.cask.cdap.data2.metadata.system.SystemMetadataWriter) AppSystemMetadataWriter(co.cask.cdap.data2.metadata.system.AppSystemMetadataWriter) ArtifactSystemMetadataWriter(co.cask.cdap.data2.metadata.system.ArtifactSystemMetadataWriter) StreamSystemMetadataWriter(co.cask.cdap.data2.metadata.system.StreamSystemMetadataWriter) DatasetSpecificationSummary(co.cask.cdap.proto.DatasetSpecificationSummary) Callable(java.util.concurrent.Callable) DatasetManagementException(co.cask.cdap.api.dataset.DatasetManagementException) NamespaceNotFoundException(co.cask.cdap.common.NamespaceNotFoundException) IOException(java.io.IOException) DatasetId(co.cask.cdap.proto.id.DatasetId)

Example 9 with Dataset

use of co.cask.cdap.api.dataset.Dataset in project cdap by caskdata.

the class ExploreExecutorHttpHandler method doDropPartition.

private void doDropPartition(HttpRequest request, HttpResponder responder, DatasetId datasetId) {
    Dataset dataset;
    try (SystemDatasetInstantiator datasetInstantiator = datasetInstantiatorFactory.create()) {
        dataset = datasetInstantiator.getDataset(datasetId);
        if (dataset == null) {
            responder.sendString(HttpResponseStatus.NOT_FOUND, "Cannot load dataset " + datasetId);
            return;
        }
    } catch (IOException e) {
        String classNotFoundMessage = isClassNotFoundException(e);
        if (classNotFoundMessage != null) {
            JsonObject json = new JsonObject();
            json.addProperty("handle", QueryHandle.NO_OP.getHandle());
            responder.sendJson(HttpResponseStatus.OK, json);
            return;
        }
        LOG.error("Exception instantiating dataset {}.", datasetId, e);
        responder.sendString(HttpResponseStatus.INTERNAL_SERVER_ERROR, "Exception instantiating dataset " + datasetId);
        return;
    }
    try {
        if (!(dataset instanceof PartitionedFileSet)) {
            responder.sendString(HttpResponseStatus.BAD_REQUEST, "not a partitioned dataset.");
            return;
        }
        Partitioning partitioning = ((PartitionedFileSet) dataset).getPartitioning();
        Reader reader = new InputStreamReader(new ChannelBufferInputStream(request.getContent()));
        Map<String, String> properties = GSON.fromJson(reader, new TypeToken<Map<String, String>>() {
        }.getType());
        PartitionKey partitionKey;
        try {
            partitionKey = PartitionedFileSetArguments.getOutputPartitionKey(properties, partitioning);
        } catch (Exception e) {
            responder.sendString(HttpResponseStatus.BAD_REQUEST, "invalid partition key: " + e.getMessage());
            return;
        }
        if (partitionKey == null) {
            responder.sendString(HttpResponseStatus.BAD_REQUEST, "no partition key was given.");
            return;
        }
        QueryHandle handle = exploreTableManager.dropPartition(datasetId, properties, partitionKey);
        JsonObject json = new JsonObject();
        json.addProperty("handle", handle.getHandle());
        responder.sendJson(HttpResponseStatus.OK, json);
    } catch (Throwable e) {
        LOG.error("Got exception:", e);
        responder.sendString(HttpResponseStatus.INTERNAL_SERVER_ERROR, e.getMessage());
    }
}
Also used : InputStreamReader(java.io.InputStreamReader) Dataset(co.cask.cdap.api.dataset.Dataset) JsonObject(com.google.gson.JsonObject) Reader(java.io.Reader) InputStreamReader(java.io.InputStreamReader) PartitionedFileSet(co.cask.cdap.api.dataset.lib.PartitionedFileSet) IOException(java.io.IOException) BadRequestException(co.cask.cdap.common.BadRequestException) ExploreException(co.cask.cdap.explore.service.ExploreException) SQLException(java.sql.SQLException) DatasetManagementException(co.cask.cdap.api.dataset.DatasetManagementException) JsonSyntaxException(com.google.gson.JsonSyntaxException) UnsupportedTypeException(co.cask.cdap.api.data.schema.UnsupportedTypeException) IOException(java.io.IOException) Partitioning(co.cask.cdap.api.dataset.lib.Partitioning) SystemDatasetInstantiator(co.cask.cdap.data.dataset.SystemDatasetInstantiator) TypeToken(com.google.common.reflect.TypeToken) PartitionKey(co.cask.cdap.api.dataset.lib.PartitionKey) ChannelBufferInputStream(org.jboss.netty.buffer.ChannelBufferInputStream) QueryHandle(co.cask.cdap.proto.QueryHandle)

Example 10 with Dataset

use of co.cask.cdap.api.dataset.Dataset in project cdap by caskdata.

the class ExternalDatasets method makeTrackable.

/**
   * If the input is an external source then an external dataset is created for tracking purpose and returned.
   * If the input is a regular dataset or a stream then it is already trackable, hence same input is returned.
   *
   * @param admin {@link Admin} used to create external dataset
   * @param input input to be tracked
   * @return an external dataset if input is an external source, otherwise the same input that is passed-in is returned
   */
public static Input makeTrackable(Admin admin, Input input) {
    // If input is not an external source, return the same input as it can be tracked by itself.
    if (!(input instanceof Input.InputFormatProviderInput)) {
        return input;
    }
    // Input is an external source, create an external dataset so that it can be tracked.
    String inputName = input.getName();
    InputFormatProvider inputFormatProvider = ((Input.InputFormatProviderInput) input).getInputFormatProvider();
    Map<String, String> inputFormatConfiguration = inputFormatProvider.getInputFormatConfiguration();
    // this too can be tracked by itself without creating an external dataset
    if (inputFormatProvider instanceof Dataset) {
        return input;
    }
    try {
        // Create an external dataset for the input format for lineage tracking
        Map<String, String> arguments = new HashMap<>();
        arguments.put("input.format.class", inputFormatProvider.getInputFormatClassName());
        arguments.putAll(inputFormatConfiguration);
        if (!admin.datasetExists(inputName)) {
            // Note: the dataset properties are the same as the arguments since we cannot identify them separately
            // since they are mixed up in a single configuration object (CDAP-5674)
            // Also, the properties of the external dataset created will contain runtime arguments for the same reason.
            admin.createDataset(inputName, EXTERNAL_DATASET_TYPE, DatasetProperties.of(arguments));
        } else {
            // Check if the external dataset name clashes with an existing CDAP Dataset
            String datasetType = admin.getDatasetType(inputName);
            if (!EXTERNAL_DATASET_TYPE.equals(datasetType)) {
                throw new IllegalArgumentException("An external source cannot have the same name as an existing CDAP Dataset instance " + inputName);
            }
        }
        return Input.ofDataset(inputName, Collections.unmodifiableMap(arguments)).alias(input.getAlias());
    } catch (DatasetManagementException e) {
        throw Throwables.propagate(e);
    }
}
Also used : DatasetManagementException(co.cask.cdap.api.dataset.DatasetManagementException) InputFormatProvider(co.cask.cdap.api.data.batch.InputFormatProvider) HashMap(java.util.HashMap) Dataset(co.cask.cdap.api.dataset.Dataset)

Aggregations

Dataset (co.cask.cdap.api.dataset.Dataset)18 IOException (java.io.IOException)11 DatasetManagementException (co.cask.cdap.api.dataset.DatasetManagementException)7 SystemDatasetInstantiator (co.cask.cdap.data.dataset.SystemDatasetInstantiator)6 DatasetInstantiationException (co.cask.cdap.api.data.DatasetInstantiationException)3 UnsupportedTypeException (co.cask.cdap.api.data.schema.UnsupportedTypeException)3 PartitionedFileSet (co.cask.cdap.api.dataset.lib.PartitionedFileSet)3 BadRequestException (co.cask.cdap.common.BadRequestException)3 DatasetSpecification (co.cask.cdap.api.dataset.DatasetSpecification)2 PartitionKey (co.cask.cdap.api.dataset.lib.PartitionKey)2 Partitioning (co.cask.cdap.api.dataset.lib.Partitioning)2 TopicNotFoundException (co.cask.cdap.api.messaging.TopicNotFoundException)2 ServiceUnavailableException (co.cask.cdap.common.ServiceUnavailableException)2 CustomDatasetApp (co.cask.cdap.data2.dataset2.customds.CustomDatasetApp)2 CustomOperations (co.cask.cdap.data2.dataset2.customds.CustomOperations)2 DefaultTopLevelExtendsDataset (co.cask.cdap.data2.dataset2.customds.DefaultTopLevelExtendsDataset)2 DelegatingDataset (co.cask.cdap.data2.dataset2.customds.DelegatingDataset)2 TopLevelDataset (co.cask.cdap.data2.dataset2.customds.TopLevelDataset)2 TopLevelDirectDataset (co.cask.cdap.data2.dataset2.customds.TopLevelDirectDataset)2 TopLevelExtendsDataset (co.cask.cdap.data2.dataset2.customds.TopLevelExtendsDataset)2