During the course of data analysis, there is a point where you will inevitably start generating averages/standard deviations/percentiles etc. in an attempt succinctly summarise your data. Commands such as avg(), stdev(), median() and percX() and help with this.

However, it can sometimes be the case that such statistics on their own won’t give you a true picture of what your data actually represents. For example, suppose we compute some statistics for a set of data, and we get:

From these values we might conclude that our data has many values in the 50-60 range.

We can determine whether or not this is actually the case by plotting a histogram of our values by using the bin command i.e. a plot of the counts of a particular field of our data, grouped into certain bins.

The bin command will group values of a similar magnitude into the same “bin”. The size and number of the bins can be modified by specifying the span and bins options respectively. For example, we can plot 100 bins with a size of 0.1 as follows (note that ‘g’ is the name of the field for this set of data):

| bin g bins=100 span=0.1 as bin_values

If we then perform a stats count on these new bin values and sort them:

| stats count as bin_count by bin_values
| sort + bin_values

We get the following plot:

The data are clearly distributed around two different mean values. In this case the first curve is centred on a value of ~35 and the second curve is centred on a value of ~76. What is remarkable is that in the 400,000 data items used in this example, none of them have a value that lies between 40 and 69 – a fact not at all obvious from the initial set of statistics that we compiled.

In summary, if you are using statistical commands as part of your analysis, use the bin command to quickly check that your data is actually distributed in the way that you believe it to be.


For 2021 we’ve committed to posting a new Splunk tip every week!

If you want to keep up to date on tips like the one above then sign up below:

Subscribe to our newsletter to receive regular updates from iDelta, including news and updates, information on upcoming events, and Splunk tips and tricks from our team of experts. You can also find us on Twitter and LinkedIn.

Subscribe

* indicates required
Posted by:Andrew MacLeod

Andrew is a certified Splunk Admin and has worked for iDelta for over two years. Previously, he worked as an actuarial analyst in the life and pensions industry - a role that he was in for over 7 years before deciding to embark on a career change into the IT industry. He holds an MPhys degree in theoretical physics from the University of Edinburgh. Outside of work he is a big puzzle fan, with a particular penchant for things cruciverbal and mathematical.