RDA - Terminology and Artifacts
Last updated
Last updated
Solution Packages: Solution packages are shareable bundles with data, pipelines, and configurations.
Pipelines: Pipelines are data operation workflows or Directed Acyclic Graphs(DAGs)
Bots: A bot deals with a particular task or a data function. There are three types of pre-fixes to a bot's name.
#: Bot is source filtered. It means the data queries and filters are applied while querying the data from datasource.
*: Bot is destination filtered. It means the data queries and filters are applied after retrieving the complete data from the datasource. It is not an efficient approach when dealing with a large amount of data, however, it can be used if the datasource does not support filtering capability while querying the data.
@: API Endpoint. This dataset is a wrapper for an API offered by a datasource.
Dataset: A dataset holds the data.
Configurations: A configuration stores information like credentials, which systems to connect, parameters, etc.
Plugins: A plugin defines which system to connect to and its credentials.
Artifact | Description |
Pipelines | A pipeline performs a specific sequence of tasks such as data ingestion, analysing the data, data sanitization, data transformation including applying ML algorithms. |
Datasets | A dataset is like a dataframe, a saved tabular data. It contains the data. |
Dictionary Dataset | A dictionary dataset is a dataset used for enrichment of other datasets. |
Models | A model is a Machine Learning model produced as a result of ML training in a pipeline. |
Bot Source Configurations | A bot source configuration captures the configuration (if any) and enlists a specific set of bots for automation. |
Solution Package | A solution package is a bundle of pipelines, datasets, formatting templates, ML models, Bot source configurations etc to accomplish a specific outcome. |
Data Streams | A data stream is used to notify other pipelines or for exchange of data. Pipelines interacts with other pipelines using datastream. |
Traces | Traces can be used to track how the data moves through the pipeline. This captures how a pipeline interacts with defined systems, artefacts, etc. |