RDA - Terminology and Artifacts

RDA Terminology

Terminology Reference

Solution Packages: Solution packages are shareable bundles with data, pipelines, and configurations.

Pipelines: Pipelines are data operation workflows or Directed Acyclic Graphs(DAGs)

Bots: A bot deals with a particular task or a data function. There are three types of pre-fixes to a bot's name.

#: Bot is source filtered. It means the data queries and filters are applied while querying the data from datasource.

*: Bot is destination filtered. It means the data queries and filters are applied after retrieving the complete data from the datasource. It is not an efficient approach when dealing with a large amount of data, however, it can be used if the datasource does not support filtering capability while querying the data.

@: API Endpoint. This dataset is a wrapper for an API offered by a datasource.

Dataset: A dataset holds the data.

Configurations: A configuration stores information like credentials, which systems to connect, parameters, etc.

Plugins: A plugin defines which system to connect to and its credentials.

RDA Artifacts:

Artifact

Description

Pipelines

A pipeline performs a specific sequence of tasks such as data ingestion, analysing the data, data sanitization, data transformation including applying ML algorithms.

Datasets

A dataset is like a dataframe, a saved tabular data. It contains the data.

Dictionary Dataset

A dictionary dataset is a dataset used for enrichment of other datasets.

Models

A model is a Machine Learning model produced as a result of ML training in a pipeline.

Bot Source Configurations

A bot source configuration captures the configuration (if any) and enlists a specific set of bots for automation.

Solution Package

A solution package is a bundle of pipelines, datasets, formatting templates, ML models, Bot source configurations etc to accomplish a specific outcome.

Data Streams

A data stream is used to notify other pipelines or for exchange of data. Pipelines interacts with other pipelines using datastream.

Traces

Traces can be used to track how the data moves through the pipeline. This captures how a pipeline interacts with defined systems, artefacts, etc.