Splunk: Using Regex to Simplify Your Data

Splunk is an extremely powerful tool for extracting information from machine data, but machine data is often structured in a way that makes sense to a particular application or process while appearing as a garbled mess to the rest of us. Splunk allows you to cater for this and retrieve meaningful information using regular expressions (regex).

You can write your own regex to retrieve information from machine data, but it’s important to understand that Splunk does this behind the scenes anyway, so rather than writing your own regex, let Splunk do the heavy lifting for you.

The Splunk field extractor is a WYSIWYG regex editor.

When working with something like Netscaler log files, it’s not uncommon to see the same information represented in different ways within the same log file. For example, userIDs and source IP addresses appear in two different locations on different lines.

Source 192.168.001.001:10426 – Destination 10.75.001.001:2598 – username:domainname x987654:int – applicationName Standard Desktop – IE10
Context [email protected] – SessionId: 123456- desktop.biz User x987654: Group ABCDEFG

As you can see, the log file contains the following userID and IP address, but in different formats.

userID = x987654
IP address = 192.168.001.001

The question then arises, how can we combine these into a single field within Splunk?
The answer is: regex.

You could write these regex expressions yourself, but be warned, although Splunk adheres to the pcre (php) implementation of regex, in practice there are subtle differences (such as no formal use of look forward or look back).

So how can you combine two different regex strings to build a single field in Splunk? The easiest way to let Splunk build the regex for you in the field extractor.

If you work through the wizard using the Regular Expression option and select the user value in your log file (from in front of “undername:domainname”) you’ll reach the save screen.
(Be sure to use the validate step in the wizard as this will allow you to eliminate any false positives and automatically refines your regex to ensure it works accurately)
Stop when you get to the save screen.
Don’t click Finish.

Copy the regular expression from this screen and save it in Notepad.
Then use the small back arrow (highlighted in red) to move back through the wizard to the Select Sample screen again.
Now go through the same process, but selecting the userID from in front of “User.”
When you get back to the save screen, you’ll notice Splunk has built another regex for this use case.

Take a copy of this regex and put it in Notepad.
Those with an eagle eye will notice Splunk has inadvertently included a trailing space with this capture (underlined above). We’ll get rid of this when we merge these into a single regex using the following logic.
[First regex]|[Second regex](?PCapture regex)

Essentially, all we’ve done is to join two Splunk created regex strings together using the pipe ‘|’
Copy the new joined regex expression and again click the back arrow to return to the first screen

But this time, click on I prefer to write the regular expression myself.

Then paste your regex and be sure to click on the Preview button to confirm the results are what you’re after.

And you’re done… click Save and then Finish and you can now search on a field that combines multiple regex to ensure your field is correctly populated.