Skip to main content
Sumo Logic

About AWS Observability

Learn about the features, benefits, and resources created by the AWS Observability solution.
Learn about the features, benefits, and resources created by the AWS Observability solution.

What is observability? 

Observability is the ability of the internal states of a system to be determined by its external outputs.

For our purposes, Observability is the ability to observe an application from the outside and understand what is happening inside the application and its services. Observability helps ensure that the application is running reliably: the system is up and running (available), performant, and secure.  

Why observability?

Modern applications are increasingly complex, as they leverage distributed technologies, cloud infrastructure, and container and orchestration tools. In addition, the connections between microservices, orchestrators, and underlying cloud resources is also growing in complexity. This complexity leads to situations where unforeseen events, unknown unknowns in terms of risk, are more prevalent and come with mysterious behaviors and failure modes.  This can cause major issues in your overall incident remediation workflow, which can be broken down into three steps. 

  • Monitor critical indicators of reliability such as errors or latency. Sometimes these unknown unknown types errors don't directly impact the metrics that you are tracking, which makes monitoring the issues more difficult.  

  • Diagnose or isolate services or resources that might be the immediate cause of reliability issues. These unknown unknowns could impact systems in obscure ways. For example, the culprit service’s metrics might look alright, but a downstream service that consumes this service might have abnormal metrics, which could  lead an SRE down the wrong path. There is no tribal knowledge that can help guide the SRE in the right direction. 

  • Troubleshoot and uncover root cause(s) to guide recovery and ensure on-going application reliability. As in the case of the diagnosis step, the unknown unknowns might make it difficult to find the root cause.

 image9.png

 

Monitoring, diagnosing, and troubleshooting such issues is harder because there are no existing runbooks that can help resolve issues quickly. This problem is compounded by the fact that modern applications also emit astonishing amounts of machine data across the stack. 

All this complexity, along with data deluge and unknown behaviors, can make it impossible to recover systems quickly if you don’t have a way to make sense of all of the information. This is why organizations need an Observability Solution. 

System architecture

Sumo Logic provides an AWS CloudFormation template that automates the setup and installation of the AWS Observability Solution for a given account and region. This allows you to configure the monitoring of your AWS infrastructure for optimum results.

AWS O - architecture.png

Data Collection for the AWS Observability Solution

Sumo Logic collects logs, metrics, and events including AWS EC2 Host Metrics, CloudWatch logs and metrics, and CloudTrail logs. The collected data streams are enriched with the following metadata:

  • Account. This is an alias for your AWS account—for example, production, development, or stage—that you supply when you install the solution. 
  • Namespace. This is the name of the AWS service and is automatically added by either the Host Metrics Source or the AWS Metadata (Tag) Source installed by the template, for example, aws/apigateway, aws/applicationelb, aws/dynamodb, aws/lambda, aws/rds, and so on.
  • Region. This is the AWS region, for example, us-east-1, us-west-2, and so on.
  • Entity. This represents either the AWS resource name or id depending on the AWS service being monitored. 

AWS Observability Collection-v2.png

This new metadata can also be used in ad-hoc logs and metrics searches.

Understanding the Observability Solution

The Observability Solution offers a unified platform for logs, metrics, traces, and metadata at the following layers:

  • Application
  • Microservices
  • Cloud
  • Orchestrator
  • Container 

The solution understands how the different datasets and services are connected, and stitches those relationships into an entity workflow that makes it more intuitive for users to get a holistic view of their service. The workflow also enables easier and faster monitoring, diagnosing, and troubleshooting.

The solution also offers features and capabilities that support each step of the troubleshooting process.

  • Monitor your systems effectively with new and improved alerting and dashboarding capabilities. The Observability Solution includes rich pre-built content that you can leverage to quickly start monitoring specific services. 

  • Diagnose issues quickly using features like the Entity Explorer, trace analytics, and the  Metrics Explorer.  

  • Troubleshoot issues and find root causes through Behavior insights, Root Cause Explorer, and log search.

Resources created or modified

This section lists the resources that the CloudFormation template creates in AWS and Sumo Logic.

AWS Resources

The AWS CloudFormation template execution creates or modifies the following resources in the AWS account if you are not already collecting data from those AWS services. If you are, the AWS CloudFormation template will simply integrate with your existing collector sources.

AWS Data Source AWS Resources Created Applicable AWS Observability Dashboards
AWS CloudTrail Logs S3 Bucket
SNS Topic
AWS Trail
SNS Subscription
AWS Lambda
IAM Roles
AWS API Gateway ULM
AWS Lambda ULM
Amazon DynamoDB ULM Amazon RDS ULM
Amazon CloudWatch Metrics AWS Lambda
IAM Roles
AWS API Gateway ULM
AWS Lambda ULM
Amazon DynamoDB ULM
AWS Application Load Balancer ULM

 

Amazon Application Load Balancer logs S3 Bucket
SNS Topic
SNS Subscription
AWS Lambda
IAM Role
AWS Application Load Balancer ULM
AWS Lambda CloudWatch logs AWS Lambda
IAM Roles
AWS Lambda ULM

If you are using an existing bucket to collect AWS Application ELB logs, the Amazon S3 bucket policy for this bucket will be updated to include the policy below, if in case the policy does not already exist:
{
  "Sid": "AwsAlbLogs",
  "Effect": "Allow",
  "Principal": {
    "AWS": "arn:aws:iam:::root"
  },
  "Action": [
    "s3:PutObject"
  ],
  "Resource": "arn:aws:s3:::{bucket_name}/*"
}

Resources created in Sumo Logic

The AWS CloudFormation template execution creates the following resources in Sumo Logic.

Resource Name
App folder Sumo Logic AWS Observability Apps-<Date of installation>
Hosted Collector aws-observability-<AccountAlias>
Field Extraction Rule AwsObservabilityFieldExtractionRule
AwsObservabilityAlbAccessLogsFER
AwsObservabilityApiGatewayCloudTrailLogsFER
AwsObservabilityDynamoDBCloudTrailLogsFER
AwsObservabilityLambdaCloudWatchLogsFER
AwsObservabilityRdsCloudTrailLogsFER
Explorer View AWS Observability
Metric Rules AwsObservabilityRDSClusterMetricsEntityRule
AwsObservabilityRDSInstanceMetricsEntityRule
AwsObservabilityALBMetricsEntityRule
AwsObservabilityLambdaMetricsEntityRule
AwsObservabilityApiGatewayMetricsEntityRule
AwsObservabilityDynamoDBMetricsEntityRule
AwsObservabilityEC2MetricsEntityRule 
CloudTrail source <AccountAlias>-aws-observability-cloudtrail-logs-<AWS::Region>
CloudWatch logs (HTTP) source <AccountAlias>-cloudwatch-logs-<AWS::Region>
CloudWatch Metrics source <AccountAlias>-cloudwatch-metrics-<AWS::Region>-ApplicationELB
<AccountAlias>-cloudwatch-metrics-<AWS::Region>-ApiGateway
<AccountAlias>-cloudwatch-metrics-<AWS::Region>-DynamoDB
<AccountAlias>-cloudwatch-metrics-<AWS::Region>-Lambda
<AccountAlias>-cloudwatch-metrics-<AWS::Region>-ELB
<AccountAlias>-cloudwatch-metrics-<AWS::Region>-RDS
<AccountAlias>-cloudwatch-metrics-<AWS::Region>-ECS
<AccountAlias>-cloudwatch-metrics-<AWS::Region>-NetworkELB
<AccountAlias>-cloudwatch-metrics-<AWS::Region>-ElastiCache
Amazon S3 Alb log source <AccountAlias>-aws-observability-alb-logs-s3-<AWS::Region>
Inventory Source <AccountAlias>-inventory-<AWS::Region>
XRay Source <AccountAlias>-xray-aws-<AWS::Region>
S3 Bucket Name aws-observability-logs-<StackID>
Fields account
region
namespace
tablename
loadbalancer
functionname
apiname
dbidentifier
dbinstanceidentifier
dbclusteridentifier
instanceid