AWS CloudWatch metrics provide a very useful means of building out a monitoring solution across your AWS cloud resources. For years now, the Splunk Add-on for Amazon Web Services has provided the ability to ingest these CloudWatch metrics, by polling the AWS API. In this article we will look at a new way of ingesting AWS CloudWatch metrics, namely CloudWatch Metric Streams.

API polling of AWS CloudWatch metrics has always had some issues, namely:

  • Ingestion delay: CloudWatch metrics are aggregated over time, then the API call to retrieve that time block can happen up to say 5 minutes after that. If you are using ITSI to build KPIs out of those metrics then the KPI search could happen up to another 5 minutes after that. Chaining all of these delays together, you’ve a significant delay in getting notified of an issue.
  • Scalability: Polling the AWS API does not scale well and can result in additional ingestion delays when collecting data from a medium – large cloud environment.
  • API Limits: AWS imposes various API rate limits that could result in throttling of the API calls, leading to further delays

Earlier this year AWS launched CloudWatch metric streams, enabling delivery of CloudWatch metrics to a Kinesis Data Firehose. This results in faster delivery of the metric data and is a more scalable solution. The solution has been created with delivery to partner systems in mind and alongside Splunk, delivery of data to New Relic, Datadog, Dynatrace and Sumo Logic are all supported. In the case of Splunk, this new metric data ingestion method is supported on Splunk Enterprise as well as Splunk Infrastructure Monitoring (previously Signalfx). This AWS Blog Post from March 2021 provides an overview.

Stream CloudWatch Metrics To Splunk

The setup is quite straightforward however there are some key points to note.

Pre-requisites

You’ll need to have the following in place before setting up CloudWatch metric streams:

  • Splunk Enterprise Server with HTTP Event Collector (HEC) enabled
  • The HEC port must be accessible from the Internet
  • HEC port must be secured (https) and use a certificate that matches the hostname that the Kinesis Firehose will send data to
  • An HTTP Event Collector token that will be used by Firehose
  • The certificate is issued by a trusted Certificate Authority
  • An AWS account with some running resources (e.g an EC2 instance) that can emit CloudWatch metrics

Steps to implement

There are essentially two steps to complete on the AWS side:

  1. Setup a Kinesis Data Firehose to send data to Splunk
  2. Publish your cloudwatch metrics to the Kinesis Data Firehose
Setup Kinesis Data Firehose

From the AWS Console search for “Kinesis Data Firehose”, move your cursor over “Kinesis” and the “Kinesis Data Firehose” option will appear, click on it.

AWS Console search for Kinesis Data firehose
Accessing the Kinesis Data Firehose section

On the “Delivery Streams” page click on “Create Delivery Stream”:

AWS Console Create Delivery Stream
Create a Kinesis Data Firehose by clicking on Create Delivery Stream
Choose Source and Destination

Now we start to configure Kinesis Data Firehose. The first simple questions are essentially:

  • Where is the data coming from?
  • Where is the data going to?
  • What to call the Kinesis Data Firehose?

For source we choose “Direct PUT”, i.e. something will put data on the data stream. For destination we choose Splunk:

AWS Console Create Firehose Delivery Stream
Source and Destination Options for Kinesis Data Firehose
Transform Records

This optional setting allows for a lambda function to process the data prior to sending it to the destination. For now we will skip this section but in part two of this article we will show how it can be used.

Destination Settings

The folllowing settings need to be completed:

  • Splunk cluster endpoint: this is your HEC endpoint
    • it must be accessible from the Internet
    • it must be HTTPS
    • it must use a CA signed certificate
  • Splunk endpoint type: raw or event
    • for now use raw, as the CloudWatch metric stream is not in the HEC event format
    • in part 2 we will switch to using event as the lambda function will correctly format the payload for the HEC event endpoint
  • Authentication Token: create a Splunk HEC token and paste it in here
Firehose Destination Settings
Kinesis Data Firehose – Splunk Destination Settings
Backup Settings

Data that fails to transmit will be saved in an S3 bucket. You can choose an existing bucket (browse) or create a new one (create)

Advanced Settings
  • CloudWatch Error Logging – this will help you debug any issues
  • Permissions – the wizard will setup an IAM role for you to grant the required permissions
  • Tags – add tags as required
Finally..

Click create delivery stream.

Publish CloudWatch Metric Streams to Firehose

From “CloudWatch Metrics” in the AWS Console, choose the “Streams” option from the left hand menu, then click Create Metric Stream. There are a small number of settings that need to be completed:

  • Namespaces you wish to stream: Choose either All metrics or a selected subset of the available AWS CloudWatch Metric namespaces (e.g. EC2, S3, Lambda, etc)
    • alternatively you can select all and then choose to exclude specific namespaces
  • Configuration: Choose the “existing firehose” option
    • selet the Kinesis Data Firehose Stream you setup in the previous section
  • Service Role – for the purposes of this POC we chose “Create and use a new service role”
  • Change Output format – choose JSON (text format) here instead of Open Telemetry (binary format)
  • Finally choose a name for the stream and then click “Create metric stream”

Checking Output

Search your index in Splunk and your now should have events coming through. To parse the incoming data the btool command output below shows the props settings that were added:

opt/splunk/bin/splunk cmd btool props list aws:firehose:json --debug

/opt/splunk/etc/apps/cloudwatch_firehose/default/props.conf [aws:firehose:json]
/opt/splunk/etc/apps/cloudwatch_firehose/default/props.conf LINE_BREAKER = ([\n\r]+){"metric_stream_name"
/opt/splunk/etc/apps/cloudwatch_firehose/default/props.conf MAX_TIMESTAMP_LOOKAHEAD = 10
/opt/splunk/etc/apps/cloudwatch_firehose/default/props.conf SHOULD_LINEMERGE = false
/opt/splunk/etc/apps/cloudwatch_firehose/default/props.conf TIME_FORMAT = %s
/opt/splunk/etc/apps/cloudwatch_firehose/default/props.conf TIME_PREFIX = "timestamp":
/opt/splunk/etc/apps/cloudwatch_firehose/default/props.conf TRUNCATE = 200000

This results in events that are well formed JSON and have the timestamp correctly extracted:

Splunk CloudWatch Metric Event delivered using CloudWatch Metrics Streams
CloudWatch CPU Metric via CloudWatch Streaming and Firehose

This is perfectly usable but leaves us with a couple of potential problems:

  1. The format of the events and the metadata is different from that produced by the Add-on for Amazon Web Services so won’t work with the Splunk App for AWS or any existing searches, dashboards or ITSI KPIs you have defined
  2. The data is delivered as events, not Splunk Metrics

In part two we will look at addressing this through use of a lambda transformation function, plugged in to the Kinesis Data Firehose.

Troubleshooting

You can use the cloudwatch logs from the kinesis data firehose to investigate any issues sending the data to Splunk.

  • From AWS CloudWatch in the AWS Console, choose the Logs > Log Insights option on the left hand side
  • Select the Kinesis Data Firehose from the log group drop down
  • Click Run Query
  • Inspect the log messages to view any errors being generated
Searching CloudWatch Logs for Issues with firehose and cloudwatch metric streams
Use CloudWatch Log Insights to check for errors

Need some additional resource to help deliver your AWS Monitoring on Splunk? Click here to get in touch


For 2021 we’ve committed to posting a new Splunk tip every week!

If you want to keep up to date on tips like the one above then sign up below:

Subscribe to our newsletter to receive regular updates from iDelta, including news and updates, information on upcoming events, and Splunk tips and tricks from our team of experts. You can also find us on Twitter and LinkedIn.

Subscribe

* indicates required
Posted by:Stuart Robertson

Stuart Robertson is the Consulting Director at iDelta. He is one of the initial founders of iDelta and has worked there since formation in 2001. Stuart holds various certifications in Core Splunk and ITSI. Stuart also holds a Bsc(Hons) in Computing Science from the University of Glasgow.

One thought on “Splunking CloudWatch Metric Streams

Comments are closed.