Category: Observability

Integrating Splunk ITSI and Observability Cloud for Unified Insights

The Splunk Observability Cloud suite (O11y) delivers powerful real-time infrastructure and application monitoring capabilities, while Splunk IT Service Intelligence (ITSI) enables holistic and fully customisable service modelling and impact analysis. When these two technologies are integrated, they effortlessly bridge the gap between tracking infrastructure performance and the overall well-being of your business service.

Making Splunk Core Aware of O11y

A fundamental aspect of integrating ITSI and O11y is making observability metrics available to Splunk Core, and in turn, to Splunk ITSI and IT Essentials Work. For this you’ll need…

This is a Splunk built add-on available on Splunkbase: Splunk Infrastructure Monitoring Add-on.
While the name points to the SIM portion of the O11y suite, the Splunk Infrastructure Monitoring Add-on facilitates access to all O11y metrics, including APM, RUM and Synthetic Monitoring metrics.
NOTE: It is only O11y metric data that can be made available to Splunk Core – not the traces and spans from which these metric results and metadata originate.

SIM Add-on Integration Options

The add-on offers two integration options:
1. Enable Splunk Core to Query O11y Metric Stores
The Splunk Infrastructure Monitoring Add-on introduces a new SPL command called “sim” which allows you to specify a SignalFlow program for querying observability metrics in an SPL search. The SignalFlow program will be run on the remote O11y instance, and the returned metrics can then be processed in the remainder of the SPL search. 

2. Ingesting O11y Metrics into Splunk Indexes
The add-on also contains modular inputs which can be used to index O11y metrics in Splunk Core indexes. You are able to configure these modular inputs by specifying a SignalFlow program which will be run periodically to query the desired O11y metric summaries and index the results in Splunk Core.

NOTE: Ensure that the “stash” source type is always used for the data collected by these modular inputs (as in their default state) so that the collected metrics will not count toward Splunk licence charges.

Where to Install the SIM Add-on

Depending on which integration options are required, the add-on will need to be installed in at least one of these Splunk Core nodes:

Search Heads:
Required on any Search Heads where the “sim” command will be used in SPL searches to query O11y metrics.  In particular, this add-on will be required on Splunk ITSI instances utilising the “sim” command in KPI searches.

Indexers:
Required on any Indexer node/cluster where target metric store indexes are created for ingesting O11y metrics via the SIM add-on modular inputs. The add-on creates an index called “sim_metrics“ which should be used as the default target for O11y metrics as it will not count toward Splunk licence charges (and remember to specify “stash” sourcetype in the modular inputs as noted above).

Forwarders:
Required on any Heavy Forwarder node which will be running the SIM add-on modular inputs to query O11y metrics.

Which Integration Option Is Best?

While it is not possible to give a “one size fits all” answer, consider the following:

The “sim” command is lightning-fast
This is because the metric store of O11y is lightning-fast. By design, the O11y platform is capable of storing and retrieving massive volumes of highly granular data in real time. So performance is rarely a consideration when writing SPL searches using the “sim” command.

The Modular Inputs Duplicate Predetermined Metric Summaries
With the modular inputs of the add-on, you are able to decide ahead of time what O11y metric data you’d like to summarise and index in Splunk Core and at what intervals. While this will only be a subset of the original data that is being indexed, it is still duplication which might not be necessary in a given use case. More to the point, searching the summarised data indexed in Splunk Core lacks the flexibility of using “sim” searches to query metrics directly from O11y, which can be changed on the fly without ever needing to update any modular inputs or re-ingest any data.

Querying O11y directly with the “sim” command would often be the more desirable option.  However, in some scenarios it may be necessary to index O11y metrics in Splunk Core, e.g if security policies prevent certain Splunk Core users from getting direct access to O11y.
TIP: Use the O11y plot editor to create and test SignalFlow programs which can then be copied into “sim” commands in Splunk Core searches and ITSI KPIs.

Enriching ITSI with O11y Knowledge

The sky’s the limit when modelling systems in ITSI, and for large or complex service models you’ll want to leverage templates and pre-built components instead of re-inventing the wheel.
Content Packs are the mechanism in ITSI for bundling pre-built components, and for O11y content in particular there is…

The Content Pack bundles a set of valuable ITSI knowledge objects which can be leveraged for managing and visualising O11y data, including:
> Services and KPIs
> Service Templates and KPI Base Searches
> Glass Tables and a Service Analyser
> Entity Types and Entity Import Jobs

As with those of any ITSI content pack, many of the above components may not be directly usable for a given use case. They may instead serve as examples or initial templates to the custom content you will be creating.
At the very least, the below entity import jobs from the content pack are invaluable for effortlessly bringing in all O11y-discovered objects to the ITSI entity database:
> ITSI Import Objects – Get_OS_Hosts
> ITSI Import Objects – Get_RUM_*
> ITSI Import Objects – Get_SIM_AWS_*
> ITSI Import Objects – Get_SIM_Azure_*
> ITSI Import Objects – Get_SIM_GCP_*
> ITSI Import Objects – SSM_get_entities_*
> ITSI Import Objects – Splunk-APM Application Entity Search

Whatever the situation, it is in your best interest to install the Content Pack for Splunk Observability Cloud in ITSI when integrating with the O11y suite.

Installing the O11y Content Pack

The latest O11y Content Pack requires the following two add-ons to be installed in the Splunk Core environment first:
> Splunk Infrastructure Monitoring Add-on – The Splunk-built add-on described earlier in this document
> Splunk Synthetic Monitoring Add-on – A SplunkWorks-built add-on (not formally released by Splunk)

Also, if the Content Pack for Splunk Infrastructure monitoring was previously installed in ITSI, then there are additional migration steps to perform before installing the O11y content pack:
> Migrate from the Content Pack for Splunk Infrastructure Monitoring to the Content Pack for Splunk Observability Cloud topic

After the above items are addressed, the method for installing the Content Pack in ITSI is the same as with any other content pack, i.e. via Configuration > Data Integrations > Content Library.
TIP: When installing the content pack, consider using the option of adding a prefix to the names of imported content such as services, service templates and KPI base searches. That way they can be easily identified as examples which can be copied from. This is not so important for items like the entity import jobs (and you may then need to separate imports for differently named objects).

Unified Alerting with O11y and ITSI

In an environment armed with ITSI, an ideal strategy is to consolidate alert management  with ITSI as the central point for processing alerts originating from any Splunk sources such as O11y, as well as from external systems. ITSI’s advanced analytics can be leveraged to implement intelligent alert logic and the alerts actions can interface to Splunk On-Call for escalation management.

This Content Pack is required in ITSI for integrating O11y and ITSI alerting. It comes with correlation searches and aggregation policies that are utilised in the integration procedure (as noted in the High Level Implementation Plan further below).
Installing this Content Pack requires additional version-dependent actions as well as an update to the “Itsi_kpi_attributes” lookup. Please follow the below installation instructions:
Installing and Configuring the Content Pack for ITSI Monitoring and Alerting

Universal Alerting

Splunk have defined the Universal Alerting Field Normalisation Standard in ITSI for which there are pre-built correlation searches provided in the Monitoring and Alerting Content Pack. Normalising alerts to adhere to this schema ensures that alerts from any source can be processed in a common fashion using the pre-built content.
The schema details many fields, many of which are optional, and the following 4 are mandatory for any alert to comply:
> src: the target of the alert, e.g. host, device, service etc.
> signature: a string which uniquely identifies the type of alert
> vendor_severity: the original vendor-specific severity/health/status string
> severity_id: normalised severity

High Level Implementation Plan

  1. Configure O11y to send alerts to Splunk Enterprise or Cloud Platform:
    This requires creating an alert index in Splunk Core (labelled “Alert Index” in the above diagram), and a HEC endpoint. Then in O11y you can configure a new “Webhook” integration to send alerts to the HEC endpoint.
  2. Normalise O11y alerts to conform to the ITSI Universal Alerting schema
  3. Configure “Universal Correlation Search – o11y” to create notable events:
    This correlation search is shipped with the ITSI Monitoring and Alerting content pack
  4. Configure the “Episodes by Application/SRC o11y” notable event aggregation policy (NEAP):
    Also shipped with the ITSI Monitoring and Alerting content pack
  5. Configure ITSI correlation searches for monitoring aggregated episodes:
    The below 2 searches, also from the content pack:
    “Episode Monitoring – Set Episode to Highest Alarm Severity o11y”
    “Episode Monitoring – Trigger OnCall Incident”
  6. Integrate Splunk On-Call with ITSI:
    This requires installation of the Splunk On-Call (VictorOps) addon in Splunk core, and configuring it with the details of an O11y Splunk On-Call account
  7. Configure action rules in the ITSI NEAP from step 4 for Splunk On-Call Integration
  8. Configure Splunk On-Call with appropriate escalation policies

Full implementation details are documented on the Splunk Lantern site: Managing the lifecycle of an alert from detection to remediation

Next Steps

Now you have the playbook to integrate the Splunk Observability Cloud suite with Splunk ITSI. 
JDS excels in delivering tailored solutions for our customers where we integrate their O11y suite with Splunk ITSI, optimising alert management and reducing Mean Time to Resolution (MTTR).
Reach out if you would like help or advice in improving your observability and troubleshooting efficiency with Splunk Observability Cloud and Splunk ITSI.


Read a recent JDS Customer Success Story here.

Anatomy of a New Relic FSO Dashboard

Full stack observability (FSO) is of paramount importance in today’s digital landscape due to the complexity of applications and infrastructure.  Modern business systems can involve distributed architectures, microservices, and containers, necessitating a holistic view to properly understand how all components interact.  The rise of cloud-native environments and DevOps practices also underscores the need for full stack observability to support resource optimisation, collaboration, and continuous delivery.

The New Relic FSO Dashboard offers a powerful and customisable platform that empowers developers and operations teams with real-time insights and comprehensive data visualisation.

So what does the New Relic FSO dashboard actually look like?

The ‘Business Overview’ dashboard displays important business metrics such as Average Order values By Day, Purchase Volume, Revenue Trend, Sales Funnel, and Abandoned Cart Rate.

In the second tab, the IT Platform Overview dashboard provides detailed information about application and infrastructure performance and status, including availability, daily site visitors, slow pages, most-used features, and mobile device types accessing the platform. This allows for quick identification of issues and prompt corrective actions.

The third tab, Cloud Migration Dashboard, helps track the progress of the migration journey and provides insights into how the cloud environment is being executed. The Percent of On-premises Hosts vs AWS Hosts, Average CPU Usage by AWS vs On-premises Host Location, Total Number of Hosts by Application, and Average Response Time by Host Location are displayed to show the before and after application/host response times, allowing you to determine the extent of improvement in response time after migration. This dashboard enables developers and IT operations teams to monitor the progress of migration in real time.

The User Metrics Dashboard provides an overall view of user behaviour to optimise applications and enhance user experience. It includes metrics such as visitor count, app usage, page visits, average response time, page loading time, and visitor countries/cities.

The Data Analytics Dashboard helps track database performance, identify and resolve performance issues, and the Application Metrics Dashboard provides real-time information on application performance, including response time, error rate, and throughput, allowing for proactive optimisation.

Additionally, there is a dashboard that helps manage AWS costs. Many companies experience difficulties in cost management when migrating to the cloud. In the AWS Budget Overview dashboard, you can examine the total cost divided into actual costs, forecasted costs, and limits. You can also view important metrics such as Pre-production Budget, Total Cost Trends, Budget Trending, and Estimated changes per service. This allows you to track cloud costs in real time and quickly identify areas that require cost optimisation. Furthermore, it provides valuable assistance in planning future budgeting related to the cloud environment.

As a trusted, Gold partner of New Relic, JDS has a team of experts who can not only implement the New Relic FSO dashboard, but also elevate your dashboard experience, take it to the next level, and unleash its full potential.  From designing visually appealing layouts, to fine-tuning performance metrics and alerts, JDS can maximise the value of New Relic’s powerful observability platform. Your applications and users will thank you for it!

The Power of FSO with New Relic

In the sophisticated landscape of today’s IT environments, the journey from telemetry data to business information demonstrates the complexity involved in achieving full-stack observability. Businesses in the throes of digital transformation have found the intricacy of their IT infrastructure compounding. Incorporating modern technologies such as cloud-based systems, microservices, containers, and serverless architectures, along with DevOps and Site Reliability Engineering (SRE) practices, is changing the shape of our technological future.

Observability surpasses traditional monitoring by providing a more comprehensive understanding of ‘why and how‘ an issue occurred. This holistic approach integrates telemetry data—logs, events, metrics, and traces—and presents it on a unified dashboard, unveiling hidden issues and promoting a profound understanding of their root causes. The concept of “observability” offers a departure from traditional IT performance management, providing a comprehensive view of system performance. By combining telemetry data into a unified dashboard, observability allows for an all-inclusive approach to managing cost, resource allocation, and customer experience, addressing the challenges of fragmented data management.

Why is “observability” gaining momentum?

Statistics indicate a significant shift towards multi-cloud usage. This, along with the growing expectation for superior digital experiences, has put enormous strain on IT teams. The challenge lies in delivering innovative features and services at breakneck speeds, while ensuring the seamless operation of existing systems.

JDS sees a lot of over-provisioned cloud-based platforms, as was typical with on-premise configurations. Historically, the main driver for over-provisioning was to cater for spikes, however modern cloud platforms can handle variations in load a lot more effectively. The trick to optimising cloud platforms is to measure and monitor usage. New Relic is a highly efficient tool to help you get the most from your cloud investment.

Conventional IT performance management strategies have hit their peak, instigating the transition towards “observability”. In New Relic’s 2022 Observability Forecast Report, 75% of the 1,614 global respondents confirmed their top-level management’s endorsement of observability technologies. Furthermore, a robust 78% consider observability to be crucial in realising significant business objectives.

One major factor driving the observability market is the availability of new standards and technologies like Open Telemetry (Otel) and extended Berkeley Packet Filter (eBPF). Even with the increase in funding and new technology, data growth is still the main driver behind the explosion of the observability market in recent years. Data is expected to grow at a 35% CAGR through 2025​​.

The New Relic FSO platform offers observability solutions that harness the power of cloud and AI technologies, providing comprehensive functionalities such as telemetry data collection, analysis, visualisation, anomaly detection, and alerts. For those inclined towards open-source observability tools, options such as Prometheus, Grafana, Zabbix, and Jaeger are available.

Why is full-stack observability so important?

Observability isn’t just pivotal for developers, engineers, and IT teams; it’s also crucial for improving customer experience. Real-time monitoring and issue detection are vital for businesses to prevent downtime and ensure optimal performance, avoiding negative impacts on revenue and customer sentiment.

Data-driven decision-making becomes feasible through the vast amount of insight delivered from full stack observability. Organisations can analyse performance metrics, error rates and user behaviour patterns to make informed decisions that improve their products, services, and infrastructure. Additionally, full stack observability can play a crucial role in security monitoring and compliance adherence, with the ability to detect threats and vulnerabilities while meeting regulatory requirements.  

In conclusion, as the digital panorama continues to evolve, the concept of observability has emerged as a key tool in the arsenal of modern businesses. By bridging the technological chasm, it enables a unified, bird’s-eye view of telemetry and business data across disparate teams, thereby amplifying operational efficiency and enriching the digital user experience. 
As we tread deeper into the realm of artificial intelligence, the significance of AI-powered operations (AIOps) is becoming increasingly apparent. This surge in importance is exemplified by the proliferation of observability services now facilitating AIOps environments. Overall, full stack observability empowers businesses to deliver reliable services, remain agile, and drive competitiveness in the dynamic digital realm.

Our favourite announcements from Splunk .conf23

Following an incredible week in Vegas for Splunk .conf23, the JDS team is excited to see all the new and upcoming features for the Splunk platform including AI, Observability, Security and IoT.

Here is a recap of some of our favourite announcements from Splunk .conf23:

Splunk Enterprise 9.1

A new Splunk version was released a week prior to Splunk .conf23, which included some welcome features across the board, the main ones being:

  • Improved ingest action to AWS S3
  • New Federated Search modes
  • New features for Dashboard Studio

Searching logs directly in S3 – without having to ingest them into Splunk, is a widely anticipated feature that according to Splunk Docs, should be generally available very soon. With customers often struggling to balance their licensing for ingestion and retention, this feature will allow customers to keep low-value or old data in S3 while still being able to search it.

Splunk AI Assistant

The newly announced AI Assistant will not only help users find data within the Splunk platform, but will also generate SPL to search and report on it. The AI Assistant app is currently in preview but customers can sign-up to download the app at https://pre-release.splunk.com/preview/aiassist

Splunk Cloud

Splunk and Microsoft have formed a strategic partnership to bring Splunk Cloud to customers that are leveraging Azure as their cloud platform of choice, supplementing Splunk’s existing offerings with AWS and GCP.    

As a result of this partnership, Splunk and Microsoft have committed to developing more “out-of-the-box” integration capabilities. In addition, customers will now be able to spend Azure credits to buy Splunk Core, Enterprise Security and ITSI in their customer-managed environments. This is expected to be rolled out globally over the next year.

Splunk AIOps

Splunk announced the release of the Splunk App for Anomaly Detection. Anomaly Detection is already included in the existing Machine Learning Toolkit (MLTK) app but this new app has a guided wizard which will make setting up Anomaly Detection easier for users that don’t have a background in Machine Learning (ML).

The Deep Learning Toolkit has also received an update (5.1) and a rename to the “Splunk App for Data Science and Deep Learning”. It now includes a “Neural Network Designer Assistant” once again improving the accessibility of ML to those without a ML background.

One other small ML improvement is in ITSI’s Adaptive Threshold feature. Adaptive Thresholds, which dynamically creates thresholds based on historical data, can now be configured to ignore anomalies. For example, a recent P1 incident that resulted in a spike of a KPI will be excluded from threshold calculation, resulting in more accurate thresholds.

Security

TwinWave, which Splunk bought in Nov 2022, has been integrated into the Splunk portfolio and renamed Splunk Attack Analyzer. It boasts a tight integration with Splunk SOAR so that customers can automate the detonation of suspicious URLs and files in unattributable environments and subsequently feed the results back into the SOAR platform.

Enterprise Security Content Update (ESCU) 4.6 has also been released, including 6 new ML detections written by the Splunk Threat Research Team to protect against the latest threats that are being observed in the wild.

Observability 

ITSI 4.17.0 was released at the beginning of June, focusing more on improving the platform than adding new features. A couple of these improvements are:

  • Saved Searches within content packs are disabled by default.
  • A new entity clean-up command which removes searches that are no longer creating or updating entities. 
  • New dashboards to troubleshoot entity discovery issues.
  • KPI sparklines have been updated so they no longer have the “spiky” up & down visual on small time ranges – This was a common complaint from all ITSI customers.
  • Custom dashboards for viewing episodes – Each episode can now show a custom SimpleXML or Dashboard Studio dashboard so customers can customise what is shown inside of the Episode Review page. https://docs.splunk.com/Documentation/ITSI/latest/EA/EpisodeInfo#Add_an_episode_dashboard

Another welcome announcement was the introduction of Unified Identity, which enables users to log into Splunk Observability Cloud with SSO using their Splunk Cloud Platform credentials.

Splunk Edge Hub

Splunk formally announced Edge Hub at .conf, though we’ve already heard of a few organisations trying them out. It’s purpose is to combat the “data deluge” by filtering & aggregating data before it leaves the local network via Internet or internal WAN, but It’s also capable of collecting various environmental sensors (temperature, noise levels, etc) out-of-the-box. Better yet, you can see these stats directly from the built-in screen. We look forward to seeing how customers use these devices in their environments.

Splunk Edge Processor

Splunk has also added some important features to the Edge Processor product. Customers can now export their data to Splunk using Splunk HEC (HTTP Event Collector), which is easier for customers to manage. In addition, the long-awaited SPL2 has also been added to Edge Processor which is interesting because it’s yet to reach many other products (ie Splunk Core). SPL2 extends SPL with many more commands that will make it easier for customers to parse and manipulate their data in Edge Processor before it gets sent into Splunk.

It’s an exciting time for Splunk users, and JDS is pumped to be at the forefront of these latest advancements. 

One Platform. Full Stack. In Context

JDS has a proud history of working with industry-leading tools and ensuring they provide value for your business. We are excited to share that one of our major partners, Cisco, has announced their much-anticipated Full-Stack Observability (FSO) Platform at CiscoLive Las Vegas this month. We have been looking forward to the launch of the FSO platform which will help us unlock much greater value in Observability data. This will benefit our clients by allowing them to bring in a wider variety of data across the app and infrastructure stack, enriched with business context and activity data so you can ensure your tech is optimised for maximum business performance.
https://www.cisco.com/c/en_ca/solutions/full-stack-observability.html?socialshare=lightbox-fso-video

Most of our clients are involved in some level of digital transformation – be it moving to cloud-native or SaaS stacks, simplifying customer experiences with digital apps, or streamlining business processes with smart tech. This has typically meant a lot more moving parts and every time something isn’t right, a new needle-in-the-haystack challenge is presented. Being able to observe a customer’s journey and experience, including all of the technical and business elements involved, pinpoint problems or identify high-value optimisations, is critical for operational success. 

Businesses need the ability to get fast answers to questions like “where is slowness occurring”, “how can we optimise resource usage” or “where can we improve conversion.” Cisco FSO Platform has a ubiquitous and context-rich data platform, with flexible query tools and packaged solutions, to ensure IT is working at its best.

An Overview of the Cisco FSO Platform

The Cisco FSO Platform was designed from the ground up to provide end-to-end visibility across complex, hybrid and multi-cloud environments. It delivers an extensible, entity-based data model that provides the flexibility to ingest any observability data with business context. By leveraging OpenTelemetry and harnessing the power of Metrics, Events, Logs, and Traces (MELT) to seamlessly collect and analyse data generated by any source, the FSO Platform is a versatile and comprehensive solution to capture observability data across an enterprise.

Right out of the gate there are features for application visibility, security insights, resource and cost optimisation, plus partner-led tools for financial visibility and capacity planning. 


Cloud Native Application Observability

One of the standout features of the Cisco FSO Platform is its Cloud Native Application Observability capability. This feature provides deep visibility into cloud-native environments, allowing organisations to monitor and troubleshoot their applications with ease. By providing insights into digital experiences, ensuring performance alignment with end-user expectations, prioritising actions, and reducing risks, businesses can gain valuable insights into the performance and behaviour of their applications. This allows customers the ability to identify and resolve issues before they impact users.

The Verdict

The Cisco FSO Platform is an innovative solution that offers an impressive suite of features that enable businesses to enhance digital experiences, mitigate risks, and drive operational efficiency.  

The Platform represents a significant milestone in Cisco’s FSO strategy, and shows their commitment to providing a comprehensive observability solution for clients. While other observability platforms can ingest data at scale, they face challenges in understanding and building a view of services.  Cisco’s approach was to build a solution that utilises an entity model at its core, which can be tailored to overcome these limitations. This is crucial with the complexity of modern applications spanning cloud, on-premise, microservices, SaaS, and serverless technologies, yet still needing to understand your customers’ digital journey and experience as they interact with your business. 

We will be keeping a keen eye on developments and look forward to sharing our experiences as we work with our customers to rationalise their observability strategies, harnessing the unique capabilities of the FSO Platform.