cfxdm - dm:mergecolumns

Data merge from multiple columns to a single column

dm:mergecolumns: This cfxdm tag allows the user to select multiple columns using include or exclude columns options using a regular expression format and merge them into a single target column.

dm:mergecolumns syntax:

  • include (Mandatory): Specify a column or columns in regular expression format to include matched columns.

  • exclude (Optional): Specify a column or columns in regular expression format to exclude matched columns.

  • to (Mandatory): Target column for merged data from selected one or more columns.

Note: After merging multiple columns into a single column, it will remove source columns from the final output.

In the below example, for a reference, we are going to use Elasticsearch as an extension to query Netflow data and ingest it into dm:mergecolumns to select specific columns using include/exclude or both together and merge them into a single target column.

Enter the below command to select Netflow tag (#es:netflow). (In this example, es name is used as a label to identify Elasticsearch extension and it's tags that are pointing to Netflow data index. The label is defined while adding the extension in cfxdx configuration file or through UI)

tag #es:netflow

Netflow tag includes many columns, in this exercise, we are going to use only few selective columns. For include / exclude regular expressions examples, please refer dm:selectcolumns documentation.

Example 1: Select three columns using include option from Netflow tag and merge them into a single column.

Get the TCP protocol data from Elasticsearch Netflow tag (#es:netflow) for last 1 hour and select the below three columns and merge them (values) together into a single target column.

Source Columns:

  • flow.client_addr

  • flow.server_addr

  • flow.service_port

Target Column:

  • flow.unique_id

Get the data with above three source columns for a quick data review.

data `flow.ip_protocol` contains 'TCP' and `@timestamp` after -1 hour GET `flow.client_addr`, `flow.server_addr`, `flow.service_port`

Extend the query by ingesting the above queried data selecting three columns into dm:mergecolumns tag, include all three columns and merge them into a single column as explained above.

data `flow.ip_protocol` contains 'TCP' and `@timestamp` after -1 hour GET `flow.client_addr`, `flow.server_addr`, `flow.service_port` --> dm:mergecolumns include = 'flow.client_addr|flow.server_addr|flow.service_port' & to = 'flow.unique_id'

Last updated