CAT - Catalogue
Powered by
Project Links |
---|
Software GitHub Repository https://github.com/ds2-eu/cat.git |
Progress GitHub Project https://github.com/orgs/ds2-eu/projects/38 |
General Description
The Catalogue module is a module designed to support the exchange of data within and across different data spaces. It ensures robust data governance, secure data exchanges, and compliance with sovereignty requirements. The main goal of DS2 catalogue is to enhance the functionalities of catalogue systems within existing reference architectures, enabling them to support both intra-data space and inter-data space operations. This includes defining data models for data product offers, data product offer searches, and interactions with members of other data spaces, thereby fostering collaboration across different data spaces. It is listed as an optional module since it is technically possible to handle this at participant level but this creates a lot of overhead for data consumers and providers, however in reality it is a core essential module of DS2 for most practical dataspace sharing scenarios.
The core function of the module is to support creating data assets (Data Products), and to provide publication and search interfaces with associated metadata, and data model schemas to support validation. The module ensures that data products are appropriately created, described, and maintained within the catalogue. This includes defining metadata and access policies. It provides intuitive user- and technical interfaces for the publication and discovery of data assets so users can search for and access relevant data products efficiently. It also supports robust data models defined with schemas to ensure data integrity together with description of data service interfaces to access it (mainly based on IDSA reference architecture connectors). Key features of the module include trust building, governance compliance and Interoperability. Catalogues contribute to trust by ensuring that data products and their metadata are accurately described and reliably managed. Catalogues help enforce data governance policies by providing controlled access and visibility to data assets. By adhering to standardized data models like DCAT (Data Catalog Vocabulary, https://www.w3.org/TR/vocab-dcat-3/ ), catalogues ensure seamless data exchange and integration across various data spaces.
Architecture
The figure below represents the module fit into the DS-DS environment.
The figure below represents the actors, internal structure, primary sub-components, primary DS2 module interfaces, and primary other interfaces of the module.
Component Definition
This module has the following subcomponents and other functions:
-
Catalogue UI: Support different users of DS2 to access and manage DS2 offer catalogue. While shown as separate components this implemented as single web UI and backend component using the Catalogue Microservices API. Depending on participant and user role, UI provides different type of user experience:
-
Catalogue UI for data providers: Extends existing UIs provided by connector implementations with better support for DS2 metadata descriptions. Also supports sharing of offers into multiple dataspaces using Catalogue Microservices. May also provide support for creating metadata based on selected vocabularies.
-
Catalogue UI for data consumers: Extends existing UIs provided by connector implementations with better metadata descriptions and enables browsing and consuming offers shared by multiple dataspaces. Supports presenting user extended metadata associated with DS2 catalogue and offerings.
-
Catalogue UI for data cross-dataspace data sharing: Provides UI for supporting sharing of offerings between dataspaces by e.g. definition of catalogues of catalogues or interoperable data schemas and vocabularies.
-
Catalogue Microservices: Contains extensible set of microservices providing Catalogue Module API for querying and managing catalogue offerings. Initially including:
-
Access Control: Using DS2 trust system services ensures authorization of what functionalities and data provided by other Catalogue microservices is the client UI allowed to access.
-
Catalogues: Using Open-Source catalogue interface provides creation and management DS2 specific catalogue types verifying against the data models defined in DS2 for interoperable data sharing cross data spaces.
-
Metadata: Provides access to metadata related properties in catalogs and datasets for other DS2 components. Provides API functionality for inferred hierarchical ontologies for the UI component.
-
Catalogue Federation: Supports IDSA Catalogue Protocol Specification to share catalogues in DCAT format. Service Can access catalogues (typically from connectors) combine them and provide catalogues using the protocol. This is optional because it may require participant access rights to the individual Dataspaces. Federation may also be supported through Catalogue Module UI in the connector by sending the connector catalogue directly to main DS2 catalogue. Provider UI has access rights to dataspace connector APIs and using catalogue extension the p2p federated catalogue of dataspace. Alignment with older version of IDSA catalogue data model and transformation between it and DCAT may also be provided by this component.
Microservices rely on functionality of lower-level APIs that can be already provided using well stabilized APIs of Open-Source components. In order to avoid vendor lock-in the API operations needed by CAT from these APIs are documented and can be seen as a reference specification if the underlying platforms need to be re-implemented with different components.
-
Catalogue Management: Core operations on creating and managing catalogues and datasets in DCAT format together with resources (files) associated with them. While developed as reference implementation on top of management API provided by CKAN portals, this depends on selected set of API operations and specific configuration of CKAN platform, so that this subcomponent could be implemented with another platform if needed. Existing CKAN portals can also be used, potentially with restrictions on DS2 catalogue functionality. The CKAN API provides operations for managing datasets, resources, tags, organizations, and groups, allowing CAT microservices to create, update, delete, and search these objects within the catalogue. It supports user management, activity tracking, and offers flexible querying options for catalog management. Whilst CKAN provides a single catalog, separate logical catalogs can be managed by assigning them to different organizations or groups. DCAT extension helps to organize and expose datasets in a way that simulates multiple DCAT catalogs within one CKAN instance by leveraging organizations, groups, and tags. Each of these can be presented as distinct catalogs when publishing their metadata in the DCAT format. Implementing some of the functionality of DS2 catalogue may require specific configuration of CKAN or has to be implemented with CKAN extension mechanism so this is considered here as an internal component.
-
Connector Catalogue Extension: Provides catalogue for IDSA information model-based description of catalogues, offers and agreements implemented as EDC connector extension. CAT optionally extends this existing component to support extended metadata required by DS2. Connector Catalogue Extension should support extended metadata also for IDSA federated catalogue protocol specification data model already implemented by EDC connector as described in next chapter.
-
Data Repository: To provide metadata UI optionally supports selected set of repository platforms providing storage of resources that document what is offered by connectors. Documentation provided for data may be used to create extended metadata and data schemas. Suitable documents can be selected by user to be added to DS2 catalogue offer to be analysed for extended metadata. At minimum simple document file upload is supported by UI.
Screenshots
Commercial Information
Organisation (s) | License Nature | License |
---|---|---|
VTT | Open Source | TBD |
Top Features
- Provides DS Catalog website containing data products from multiple dataspaces
- Provides API to register and query catalog products (in development)
- Provides metadata store for adding and querying extended metadata to Data Products (implemented later)
- Provides UI for data product provider to read EDC connector catalogue for assets, add metadata and send using API to Catalog website (in development).
- Provides UI for data product consumer to browse Data Product offers (initially Catalog Website can be used).
- Provides a dataspace simulation for testing.
How To Install
CAT module is not expected to be installed by use cases in this phase of project as it depends on DKAN platform and its required tools. DKAN provides the initial UI for browsing the Dataspaces and Data Products. DKAN does not need to be in the same server than rest of CAT software. You can install own installation of DKAN with its external React UI. After installing it needs to be tailored React UI for DS2 and Compoents that tailor the data models. No automatic tailoring of UI is available at this moment.
The provider and consumer UI:s and microservices providing DS2 API, and RDF metadata storage will be installed using Docker compose. You need to configure the link to API of tailored DKAN installation togheter with its API key.
Requirements
DKAN requirements can be found on https://dkan.readthedocs.io/en/latest/installation/index.html#requirements.
CAT module server requirements will be defined later.
Software
- DKAN catalog platform (https://getdkan.org/) based on DRUPAL Content Management System (https://new.drupal.org/home)
- Fastapi.js for CAT API Gateway (https://fastapi.tiangolo.com/)
- Moleculer.js microservices for API functionality (https://moleculer.services/index.html)
- React.js for UI (https://react.dev/)
- react-jsonschema-form module for editing metadata (https://rjsf-team.github.io/react-jsonschema-form/docs/)
- Fuseki as metadata store (https://jena.apache.org/documentation/fuseki2/)
Summary of installation steps
- Install DKAN platform with its external React UI and tailor UI an Data model components for DS2
- Install DS2 CAT API and UI module and configure to use the DKAN istance
Detailed steps
- A sandbox DKAN isnstallation Is easiest to be done with video guideline in https://www.youtube.com/watch?v=SnA22Lb6r_M
- This works only in local machine but you can expose it through proxy like nginx.
- A full DKAN installation can be done with general DKAN guidelines but you need to knwow how to install DRUPAL and other tools (<in https://dkan.readthedocs.io/en/latest/installation/index.html#).
- Clone CAT github project. CAT module can be started with Docker Compose UP command in main project directory, tailor the link to DKAN instance.
How To Use
This guideline is prelimintary:
- If you have already a connector
- Configure provider UI to use your own EDC connector. Select a Data Product, add extended metadata, and send to the catalogue.
- Check that your Product is available in the DKAN.
- If you want to test with simulated connectors
- Use CAT dataspace simulation from CAT subdirectory. It is not part of docker compose yet.
- Configure a dataspace and set of connectors and data products in the subdirectory of connectors.
- Run start script, it runs the connectors and registers to the DKAN
- Check you data products in the DKAN. They will be in Simulated Dataspace.
- Run stop script, it stops the connectors and unregisters them from DKAN
Other Information
No other information at the moment for CAT.
OpenAPI Specification
Open API documentation for CAT TBD
Additional Links
TBD