Data Mapping cfxdm - dm:mergecolumns
Data merge from multiple columns to a single column
Last updated
Data merge from multiple columns to a single column
Last updated
dm:mergecolumns: This cfxdm function allows the user to select multiple columns using include or exclude columns options using a regular expression format and merge them into a single target column.
dm:mergecolumns syntax:
include (Mandatory): Specify a column or columns in regular expression format to include matched columns.
exclude (Optional): Specify a column or columns in regular expression format to exclude matched columns.
to (Mandatory): Target column for merged data from selected one or more columns.
Note: After merging multiple columns into a single column, it will remove source columns from the final output.
Step 1: Create an empty dm_mergecolumns_example_1 using AIOps studio as shown in the below screenshot.
Step 2: Add the following pipeline code/commands into the above-created pipeline as shown in the below screenshot:
You can copy the below code into your pipeline and execute that in your environment.
##### This pipeline creates a set of records/dateset details coming from a netflow
##### environment.
##### This dataset includes flow.client_addr, flow.server_addr, flow.service_port as
##### source columns. This pipeline uses dm mergecolumns to merge the source columns
##### to a single target column
@dm:empty
--> @dm:addrow flow.client_addr = '10.95.133.42' & flow.server_addr = '10.95.133.40' & flow.service_port = '9092'
--> @dm:addrow flow.client_addr = '10.95.122.170' & flow.server_addr = '10.95.122.169' & flow.service_port = '9092'
--> @dm:addrow flow.client_addr = '10.95.122.170' & flow.server_addr = '10.95.122.169' & flow.service_port = '9092'
--> @dm:addrow flow.client_addr = '10.95.122.170' & flow.server_addr = '10.95.122.169' & flow.service_port = '9092'
--> @dm:addrow flow.client_addr = '10.95.122.205' & flow.server_addr = '10.95.122.212' & flow.service_port = '9300'
--> @dm:addrow flow.client_addr = '10.95.117.35' & flow.server_addr = '10.95.117/37' & flow.service_port = '686'
--> @dm:addrow flow.client_addr = '10.95.122.105' & flow.server_addr = '10.95.122.212' & flow.service_port = '443'
--> @dm:addrow flow.client_addr = '10.95.122.105' & flow.server_addr = '10.95.122.212' & flow.service_port = '9300'
--> @dm:addrow flow.client_addr = '10.95.122.121' & flow.server_addr = '10.95.122.212' & flow.service_port = '9092'
--> @dm:addrow flow.client_addr = '10.95.122.109' & flow.server_addr = '10.95.122.212' & flow.service_port = '8443'
--> @dm:mergecolumns include = 'flow.client_addr|flow.server_addr|flow.service_port' & to = 'flow.unique_id'
Step 3: Click verify button to make sure syntax and pipeline code is correct (as shown below)
Step 4: Click execute button and execute the pipeline. RDA will execute the pipeline without any errors (as shown below).
Step 5: RDA uses the dm mergecolumns function to merge the specified columns into target column and prints the resultant output as shown in the following screenshot.
dm: merge functionality will be very useful when uniqueness is needed from within the dataset by combining couple of columns into one column as a key.