Skip to main content
Sumo Logic

About AWS Observability

Learn about the features, benefits, and resources created by the AWS Observability solution.
Learn about the features, benefits, and resources created by the AWS Observability solution.

What is observability?

Observability is the ability of the internal states of a system to be determined by its external outputs.

For our purposes, Observability is the ability to observe an application from the outside and understand what is happening inside the application and its services. Observability helps ensure that the application is running reliably: the system is up and running (available), performant, and secure.

Why observability?

Modern applications are increasingly complex, as they leverage distributed technologies, cloud infrastructure, and container and orchestration tools. In addition, the connections between microservices, orchestrators, and underlying cloud resources is also growing in complexity. This complexity leads to situations where unforeseen events, unknown unknowns in terms of risk, are more prevalent and come with mysterious behaviors and failure modes. This can cause major issues in your overall incident remediation workflow, which can be broken down into three steps.

  • Monitor critical indicators of reliability such as errors or latency. Sometimes these unknown unknown types errors don't directly impact the metrics that you are tracking, which makes monitoring the issues more difficult.
  • Diagnose or isolate services or resources that might be the immediate cause of reliability issues. These unknown unknowns could impact systems in obscure ways. For example, the culprit service’s metrics might look alright, but a downstream service that consumes this service might have abnormal metrics, which could lead an SRE down the wrong path. There is no tribal knowledge that can help guide the SRE in the right direction.
  • Troubleshoot and uncover root cause(s) to guide recovery and ensure on-going application reliability. As in the case of the diagnosis step, the unknown unknowns might make it difficult to find the root cause.

Monitoring, diagnosing, and troubleshooting such issues is harder because there are no existing runbooks that can help resolve issues quickly. This problem is compounded by the fact that modern applications also emit astonishing amounts of machine data across the stack.

All this complexity, along with data deluge and unknown behaviors, can make it impossible to recover systems quickly if you don’t have a way to make sense of all of the information. This is why organizations need an Observability Solution.

Availability 

The feature is available in the following account plans.

Account Type

Account Level

Cloud Flex

Trial, Professional, and Enterprise

Credits

Trial, Essentials, Enterprise Operations, Enterprise Security, and  Enterprise Suite

The AWS Observability Solution provides an established framework to simplify the monitoring and troubleshooting of your AWS cloud infrastructure. The AWS Observability Solution can be deployed across multiple accounts and AWS Regions to:

  • Minimize the time it takes to get operational insights across your AWS infrastructure.
  • Identify elements that are subject to specific operational issues across your AWS infrastructure.
  • Minimize the time it takes to assign operational to the correct business units and functional teams in your AWS infrastructure.
  • Expedite troubleshooting and root cause isolation for incidents in their apps and microservices running on AWS infrastructure via Root Cause Explorer.

Effectively monitor your AWS infrastructure Edit section

There are approximately 150 discrete services available in AWS, including compute, network, storage, database, tooling, management, security, developer tools, and analytics to name a few.

All of these services are part of AWS. However, troubleshooting across different services can be problematic when you use separate AWS accounts to manage costs and give teams independent administrative control. Different AWS accounts often have different settings across Availability Zones or Regions that make it difficult to get a clear picture of overall application health.

Likewise, when troubleshooting an operational issue you might get a notification of an issue with a specific instance of a service without being able to determine:

  • What other parts of the application or environment are affected?

  • What other Regions are affected?

  • What is the hierarchy of elements related to the AWS service?

  • What logs are associated with the alerting metric and vice versa?

In short, you are unable to quickly get high-level insights for applications that span multiple AWS services. When you are alerted, it is difficult to trace the alert to the underlying root cause. The Sumo Logic AWS Observability Solution addresses these challenges.

System architecture

Sumo Logic provides an AWS CloudFormation template that automates the setup and installation of the AWS Observability Solution for a given account and region. This allows you to configure the monitoring of your AWS infrastructure for optimum results.

AWS O - architecture.png

Chart a map of your AWS infrastructure Edit section

The Sumo Logic AWS solution pulls in data from all AWS services and accounts in your cloud infrastructure to provide a unified view of your environment. You can navigate from overview dashboards of the infrastructure and drill down into account, AWS Region,  service, or entity views. The intuitive navigation enables you to quickly resolve issues, minimize downtime, and improve system availability.

Expand Namespace.png

Working with AWS account hierarchies Edit section

Amazon Web Services (AWS) are available to you through your AWS accounts that you can then use for billing and various aspects of your cloud infrastructure. AWS recommends that you use multiple AWS accounts to manage costs across business units and functional teams. In this way, you can provide different levels of administrative control over various AWS resources. 

In an AWS account, you can choose resources hosted in multiple locations worldwide. These locations are composed of AWS Regions and Availability Zones. Each AWS Region is a separate geographic area (for example, us-west-2 is in Oregon, USA) and has multiple, isolated locations known as Availability Zones. You can provision specified resources (such as databases and load balancers) across multiple Availability Zones, to ensure high availability and failover support. 

Root Cause Explorer Edit section

Root Cause Explorer is an add-on to the AWS Observability Solution and relies on AWS CloudWatch metrics to enable on-call staff, DevOps, and infrastructure engineers to expedite troubleshooting and root cause isolation for incidents in their apps and microservices running on AWS infrastructure. 

The Root Cause Explorer helps you correlate unusual spikes which are referred to Events of Interest, in AWS CloudWatch metrics using the context including timeline, AWS account,  AWS region, AWS namespaces, resource identifiers, AWS tags, metric type, metric name and more associated with the incident. For example, If an organization’s microservice in AWS us-west-2 is experiencing unusual user response times, then the on-call user can use Root Cause Explorer to correlate Events of Interest on over 600 AWS CloudWatch metrics over 9 AWS service namespaces (such as AWS EC2, Amazon RDS) to isolate the probable cause to a specific set of AWS EC2 instances, serving the given microservice in AWS us-west-2, that may be overloaded.

Sumo Logic Apps for AWS Observability Edit section

The AWS Observability Solution provides an intuitive dashboard framework that mirrors industry-standard AWS hierarchies using a suite of Sumo Logic apps that provide insights into AWS operational and security services across an entire AWS infrastructure. The following Sumo Logic AWS Observability Solution apps provide the ability to quickly isolate and solve problems with their specialized pre-configured dashboards.

  • AWS Observability API Gateway:  The Amazon API Gateway service allows you to create RESTful APIs and WebSocket APIs for real-time two-way communication applications in containerized and serverless environments, as well as web applications.
    Sumo Logic's AWS Observability API Gateway dashboards provide insights into Amazon API Gateway tasks while accepting and processing concurrent API calls throughout your infrastructure, including traffic management, CORS support, authorization, and access control, throttling, monitoring, and API version management.
  • AWS Observability Application Load Balancer: The AWS Application Load Balancer service functions at the application layer, it receives requests, evaluates the listener rules in priority order to determine which rule to apply, and then selects a target from the target group.
    Sumo Logic's AWS Elastic Load Balancing ULM app is a unified logs and metrics App that gives you visibility into the health of your AWS Application Load Balancer and target groups. The app's preconfigured dashboards provide insights into latency, request and host status, threat intel, and HTTP backend codes by Availability Zone and target group.
  • AWS Observability EC2 Metrics: The Amazon Elastic Compute Cloud (Amazon EC2) service provides secure, resizable compute capacity in the cloud, giving you complete control of your computing resources.
    Sumo Logic's AWS EC2 Metrics ULM app collects local host metrics and displays them using predefined search queries and dashboards. App dashboards provide a visual analysis of local host metrics for CPU, disk, memory, network, and TCP.
  • AWS Observability Lambda: The AWS Lambda service allows you to run code without the burden of provisioning or managing servers.
    Sumo Logic's AWS Observability Lambda is a unified log and metrics app for monitoring operation and performance trends in the AWS Lambda function in your account. The AWS app uses AWS Lambda Logs and Metrics from CloudWatch, as well as CloudTrail AWS Lambda Data Events. Preconfigured dashboards provide insights into executions, such as memory and duration usage by function versions or aliases, as well as performance metrics such as errors, throttles, invocations, and concurrent executions.
  • AWS Observability DynamoDB: The Amazon DynamoDB service is a fast and flexible NoSQL database service that provides consistent, single-digit millisecond latency at any scale.
    Sumo Logic's AWS Observability DynamoDB is a unified logs and metrics app that provides operational insights into your Amazon DynamoDB solution. The app provides preconfigured dashboards that allow you to monitor key metrics, and to view throttle events, errors, and latency, so you can plan the capacity of your Amazon DynamoDB.
  • AWS Observability RDS Metrics: The Amazon Relational Database Service (RDS) allows you to easily set up, operate, and scale a relational database in your cloud infrastructure.
    Sumo Logic's AWS Observability RDS Metrics provides visibility into your Amazon Relational Database Service (RDS) metrics collected from the CloudWatch metrics. Preconfigured dashboards allow you to monitor your Amazon RDS system's overview, CPU, memory, storage, network transmit and receive throughput, read and write operations, database connection count, disk queue depth, and more.
  • Amazon ECS Dashboards: The Amazon Elastic Container Service is a scalable, container management service that is used to manage containers in a cluster. With dashboards for Amazon ECS, you can monitor capacity and resource utilization of ECS components as well as quickly identify changes made to your clusters to help with troubleshooting. 
  • Amazon Elasticache: The Amazon ElastiCache allows you to set up, run, and scale popular open-source compatible in-memory data stores in the cloud. 
    The Amazon ElastiCache dashboards provide visibility into key event and performance analytics that enable proactive diagnosis and response to system and environment issues. Use the preconfigured dashboards for at-a-glance analysis of event status trends, locations, successes and failures, as well as system health and performance metrics. The dashboards also have additional performance insights for Redis clusters.
  • AWS Network Load Balancer: The AWS Network Load Balancer service is distributed in OSI Layer 4 (i.e., network) traffic (e.g., TCP, UDP, TLS) and can handle over a million requests per second. 
    The AWS Network Load Balancer dashboards provide insights to ensure that your network load-balancers are operating as expected, backend hosts are healthy and to quickly identify errors.
  • Global Intelligence for AWS CloudTrail DevOps: The Global Intelligence for AWS CloudTrail DevOps helps you accelerate root cause analysis for incidents by providing error rate and configuration insights benchmarked from Sumo Logic’s AWS customers for nine AWS services: EC2, Lambda, Auto Scaling, S3, ELB, RDS, DynamoDB, ElastiCache and Redshift. Benchmark dashboards are integrated with AWS Observability solution at the account and region level.

Data Collection for the AWS Observability Solution

Sumo Logic collects logs, metrics, and events including AWS EC2 Host Metrics, CloudWatch logs and metrics, and CloudTrail logs. The collected data streams are enriched with the following metadata:

  • Account. This is an alias for your AWS account—for example, production, development, or stage—that you supply when you install the solution.
  • Namespace. This is the name of the AWS service and is automatically added by either the Host Metrics Source or the AWS Metadata (Tag) Source installed by the template, for example, aws/apigateway, aws/applicationelb, aws/ dynamodb, aws/lambda, aws/rds, and so on.
  • Region. This is the AWS region, for example, us-east-1, us-west-2, and so on.
  • Entity. This represents either the AWS resource name or id depending on the AWS service being monitored.

This new metadata can also be used in ad-hoc logs and metrics searches.

Understanding the Observability Solution

The Observability Solution offers a unified platform for logs, metrics, traces, and metadata at the following layers:

  • Application
  • Microservices 
  • Cloud
  • Orchestrator
  • Container

The solution understands how the different datasets and services are connected, and stitches those relationships into an entity workflow that makes it more intuitive for users to get a holistic view of their service. The workflow also enables easier and faster monitoring, diagnosing, and troubleshooting.

The solution also offers features and capabilities that support each step of the troubleshooting process.

  • Monitor your systems effectively with new and improved alerting and dashboarding capabilities. The Observability Solution includes rich pre-built content that you can leverage to quickly start monitoring specific services.
  • Diagnose issues quickly using features like the Entity Explorer, trace analytics, and the Metrics Explorer.
  • Troubleshoot issues and find root causes through Behavior insights, Root Cause Explorer, and log search.

Resources created or modified at deployment

The CloudFormation template creates a number of resources in AWS and in Sumo Logic. 

Resources created in AWS

The AWS CloudFormation template execution creates or modifies the following resources in the AWS account if you are not already collecting data from those AWS services. If you are, the AWS CloudFormation template will simply integrate with your existing collector sources.

AWS Data Source

AWS Resources Created

Applicable AWS Observability Dashboards

AWS CloudTrail Logs

S3 Bucket

SNS Topic

AWS Trail

SNS Subscription

AWS Lambda

IAM Roles

AWS API Gateway

AWS Lambda

Amazon DynamoDB

Amazon RDS

Amazon ECS

Amazon ElastiCache

Amazon CloudWatch Metrics

AWS Lambda

IAM Roles

Kinesis Firehose

CloudWatch Metrics Stream

AWS API Gateway 

AWS Lambda 

Amazon DynamoDB 

AWS Application Load Balancer 

Amazon RDS

Amazon ECS

Amazon ElastiCache

AWS Network Load Balancer

Amazon Application Load Balancer logs

S3 Bucket

SNS Topic

SNS Subscription

AWS Lambda

IAM Role

AWS Application Load Balancer 

AWS Lambda CloudWatch logs

AWS Lambda

IAM Roles

AWS Lambda 

If you are using an existing bucket to collect AWS Application ELB logs, the Amazon S3 bucket policy for this bucket will be updated to include the policy below, if in case the policy does not already exist:

{
"Sid": "AwsAlbLogs",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam:::root"
},
"Action": [
"s3:PutObject"
],
"Resource": "arn:aws:s3:::{bucket_name}/*"
}

Resources created in Sumo Logic

The AWS CloudFormation template execution creates the following resources in Sumo Logic.

Resource

Name

App folder

AWS Observability-<Version> <Date of installation>

Alerts

AWS Observability <Version> <Date and Time of Installation>

Hosted Collector

aws-observability-<AccountAlias>-<AccountID>

Field Extraction Rule

AwsObservabilityFieldExtractionRule

AwsObservabilityAlbAccessLogsFER

AwsObservabilityApiGatewayCloudTrailLogsFER

AwsObservabilityDynamoDBCloudTrailLogsFER

AwsObservabilityLambdaCloudWatchLogsFER

AwsObservabilityRdsCloudTrailLogsFER

AwsObservabilityECSCloudTrailLogsFER

AwsObservabilityElastiCacheCloudTrailLogsFER

Explorer View

AWS Observability

Metric Rules

AwsObservabilityRDSClusterMetricsEntityRule

AwsObservabilityRDSInstanceMetricsEntityRule

AwsObservabilityNLBMetricsEntityRule

CloudTrail source

cloudtrail-logs-<AWS::Region>

CloudWatch logs (HTTP) source

cloudwatch-logs-<AWS::Region>

Kinesis Firehose for Metrics cloudwatch-metrics-<AWS::Region>

CloudWatch Metrics source

cloudwatch-metrics-<AWS::Region>-ApplicationELB

cloudwatch-metrics-<AWS::Region>-ApiGateway

cloudwatch-metrics-<AWS::Region>-DynamoDB

cloudwatch-metrics-<AWS::Region>-Lambda

cloudwatch-metrics-<AWS::Region>-ELB

cloudwatch-metrics-<AWS::Region>-RDS

cloudwatch-metrics-<AWS::Region>-ECS

cloudwatch-metrics-<AWS::Region>-NetworkELB

cloudwatch-metrics-<AWS::Region>-ElastiCache

cloudwatch-metrics-<AWS::Region>-SQS

cloudwatch-metrics-<AWS::Region>-SNS

Amazon S3 Alb log source

alb-logs-<AWS::Region>

Kinesis Firehose for Logs kinesis-firehose-cloudwatch-logs-<AWS::Region>

Inventory Source

inventory-<AWS::Region>

XRay Source

xray-<AWS::Region>

S3 Bucket Name

aws-observability-logs-<StackID>

Fields

account

region

namespace

tablename

loadbalancer

functionname

apiname

dbidentifier

dbinstanceidentifier

dbclusteridentifier

instanceid

clustername

cacheclusterid

networkloadbalancer