Data Mapping cfxdm - dm: dedup
Last updated
Last updated
dm: dedup: This cfxdm tag allows the user to remove the duplicate values from the queried data for a selected column or columns.
It can be used to find the unique values from a selected column or unique values from more than one selected column by looking at them as a combined value.
dm: dedup syntax:
columns (optional). Specify a column or columns (comma separated) on which de-duplication of the data is to be applied.
This section explains how users can use a CSV file loaded into a dataset. This saved dataset will be used to explain how the dm: dedup function can be used to check the dedup of the stored dataset.
Download the incidents.csv file to the local machine as shown below using a standard web browser
Default dm: dedupe functionality is captured in this example.
Example-1 captures dedupe functionality for an inline dataset.
Step 1: Create an empty dm_dedupe_example_1 using AIOps studio as shown in the below screenshot.
Step 2: Add the following pipeline code/commands into the above-created pipeline as shown in the below screenshot:
You can copy the below code into your pipeline and execute that in your environment.
##### This pipeline creates a set of records with duplicate IP Addresses and hostnames
##### RDA function dm:dedup is used to demo this example.
##### This pipeline adds couple of rows with duplicate IP Addresses and hostnames
##### Uses dm function 'dedup' to remove duplicate values from IP Addresses and hostnames
@dm:empty
--> @dm:addrow ipaddress = '10.10.1.1' & hostname = 'host-1-1' & id = 'a1'
--> @dm:addrow ipaddress = '10.10.1.2' & id = 'a2'
--> @dm:addrow ipaddress = '10.10.1.2' & id = 'a3'
--> @dm:addrow ipaddress = '10.10.1.3' & id = 'a4'
--> @dm:addrow ipaddress = '10.10.1.3' & id = 'a5'
--> @dm:addrow hostname = 'host-4-4' & id = 'a6'
--> @dm:addrow hostname = 'host-4-4' & id = 'a7'
--> @dm:addrow id = 'a5'
--> @dm:dedup columns = 'ipaddress,hostname'
--> *dm:filter * get id, hostname, ipaddress
Step 3: Click verify button to make sure syntax and pipeline code is correct (as shown below)
Step 4: Click execute button and execute the pipeline. RDA will execute the pipeline without any errors (as shown below)
Step 5: RDA uses the dm function 'dm: dedup' to remove duplicate entries from the selected columns (IP Address, hostname) and prints the output for each dataset (or row) as shown in the following screenshot.
Default dm: dedup functionality is captured in this example.
Step 1: Download 'incidents.csv' to the AIOps RDA environment as shown below from the local file system.
Step 2: Upload the file 'incidents.csv' to AIOps studio using file-browser (as shown below)
Step 3: Add a new empty pipeline with the name "dm_dedup_example_2" as shown below and click the "Save" button (this step will create an empty pipeline and saves it to AIOps studio).
Step 4: Add the following pipeline commands into the empty pipeline text field that you have created in above Step 3.
You can copy the below code into your pipeline and execute that in your environment.
##### This pipeline loads incidents.csv file into AIOps Studio.
##### AIOps studio stores the data loaded from incidents.csv file
##### into local dataset named 'incident-summary'.
##### prints the data that was stored
@files:loadfile filename = "incidents.csv"
--> @dm:save name = 'incidents-summary'
--> *dm:filter *
Step 5: Check the data from incidents.csv by executing the pipeline and verifying using inspect data as shown below (screenshot -1 & screenshot-2)
Step 6: Now, add the following additional pipeline code to use the dm: dedup function to the previously created pipeline from Step-4 as shown below (Edit and add the following pipeline code) and click verify to verify the pipeline code as shown below.
##### This pipeline loads incidents.csv file into AIOps Studio.
##### AIOps studio stores the data loaded from incidents.csv file
##### into local dataset named 'incident-summary'.
##### prints the data that was stored
@files:loadfile filename = "incidents.csv"
--> @dm:save name = 'incidents-summary'
--> *dm:filter *
--> @dm:dedupe columns = 'Summary'
Step 7: Click execute button and execute the pipeline. RDA will execute the pipeline without any errors (as shown below)
Step 8: RDA uses the dm dedup to remove duplicate values from the requested column(s) and prints to output as shown below. Note: More columns can be selected as part of the dm: dedup function
Note: Total number of rows from incidents.csv was 436 before dedup function is run on the dataset. After dedup is run on the dataset, it reduces to 158 (as shown in the above screenshots). In this example, dedup column selected is 'Summary'. In addition, users can pick other columns from the dataset.