cfxdm - dm:mergecolumns
Data merge from multiple columns to a single column
dm:mergecolumns: This cfxdm tag allows the user to select multiple columns using include or exclude columns options using a regular expression format and merge them into a single target column.
dm:mergecolumns syntax:
    include (Mandatory): Specify a column or columns in regular expression format to include matched columns.
    exclude (Optional): Specify a column or columns in regular expression format to exclude matched columns.
    to (Mandatory): Target column for merged data from selected one or more columns.
Note: After merging multiple columns into a single column, it will remove source columns from the final output.
In the below example, for a reference, we are going to use Elasticsearch as an extension to query Netflow data and ingest it into dm:mergecolumns to select specific columns using include/exclude or both together and merge them into a single target column.
Enter the below command to select Netflow tag (#es:netflow). (In this example, es name is used as a label to identify Elasticsearch extension and it's tags that are pointing to Netflow data index. The label is defined while adding the extension in cfxdx configuration file or through UI)
1
tag #es:netflow
Copied!
Netflow tag includes many columns, in this exercise, we are going to use only few selective columns. For include / exclude regular expressions examples, please refer dm:selectcolumns documentation.
Example 1: Select three columns using include option from Netflow tag and merge them into a single column.
Get the TCP protocol data from Elasticsearch Netflow tag (#es:netflow) for last 1 hour and select the below three columns and merge them (values) together into a single target column.
Source Columns:
    flow.client_addr
    flow.server_addr
    flow.service_port
Target Column:
    flow.unique_id
Get the data with above three source columns for a quick data review.
1
data `flow.ip_protocol` contains 'TCP' and `@timestamp` after -1 hour GET `flow.client_addr`, `flow.server_addr`, `flow.service_port`
Copied!
Extend the query by ingesting the above queried data selecting three columns into dm:mergecolumns tag, include all three columns and merge them into a single column as explained above.
1
data `flow.ip_protocol` contains 'TCP' and `@timestamp` after -1 hour GET `flow.client_addr`, `flow.server_addr`, `flow.service_port` --> dm:mergecolumns include = 'flow.client_addr|flow.server_addr|flow.service_port' & to = 'flow.unique_id'
Copied!
Last modified 6mo ago
Copy link