About AWS Observability
What is observability?
Observability is the ability of the internal states of a system to be determined by its external outputs.
For our purposes, Observability is the ability to observe an application from the outside and understand what is happening inside the application and its services. Observability helps ensure that the application is running reliably: the system is up and running (available), performant, and secure.
Why observability?
Modern applications are increasingly complex, as they leverage distributed technologies, cloud infrastructure, and container and orchestration tools. In addition, the connections between microservices, orchestrators, and underlying cloud resources is also growing in complexity. This complexity leads to situations where unforeseen events, unknown unknowns in terms of risk, are more prevalent and come with mysterious behaviors and failure modes. This can cause major issues in your overall incident remediation workflow, which can be broken down into three steps.
-
Monitor critical indicators of reliability such as errors or latency. Sometimes these unknown unknown types errors don't directly impact the metrics that you are tracking, which makes monitoring the issues more difficult.
-
Diagnose or isolate services or resources that might be the immediate cause of reliability issues. These unknown unknowns could impact systems in obscure ways. For example, the culprit service’s metrics might look alright, but a downstream service that consumes this service might have abnormal metrics, which could lead an SRE down the wrong path. There is no tribal knowledge that can help guide the SRE in the right direction.
-
Troubleshoot and uncover root cause(s) to guide recovery and ensure on-going application reliability. As in the case of the diagnosis step, the unknown unknowns might make it difficult to find the root cause.
Monitoring, diagnosing, and troubleshooting such issues is harder because there are no existing runbooks that can help resolve issues quickly. This problem is compounded by the fact that modern applications also emit astonishing amounts of machine data across the stack.
All this complexity, along with data deluge and unknown behaviors, can make it impossible to recover systems quickly if you don’t have a way to make sense of all of the information. This is why organizations need an Observability Solution.
System architecture
Sumo Logic provides an AWS CloudFormation template that automates the setup and installation of the AWS Observability Solution for a given account and region. This allows you to configure the monitoring of your AWS infrastructure for optimum results.
Data Collection for the AWS Observability Solution
Sumo Logic collects logs, metrics, and events including AWS EC2 Host Metrics, CloudWatch logs and metrics, and CloudTrail logs. The collected data streams are enriched with the following metadata:
- Account. This is an alias for your AWS account—for example, production, development, or stage—that you supply when you install the solution.
- Namespace. This is the name of the AWS service and is automatically added by either the Host Metrics Source or the AWS Metadata (Tag) Source installed by the template, for example, aws/apigateway, aws/applicationelb, aws/dynamodb, aws/lambda, aws/rds, and so on.
- Region. This is the AWS region, for example, us-east-1, us-west-2, and so on.
- Entity. This represents either the AWS resource name or
id
depending on the AWS service being monitored.
This new metadata can also be used in ad-hoc logs and metrics searches.
Understanding the Observability Solution
The Observability Solution offers a unified platform for logs, metrics, traces, and metadata at the following layers:
- Application
- Microservices
- Cloud
- Orchestrator
- Container
The solution understands how the different datasets and services are connected, and stitches those relationships into an entity workflow that makes it more intuitive for users to get a holistic view of their service. The workflow also enables easier and faster monitoring, diagnosing, and troubleshooting.
The solution also offers features and capabilities that support each step of the troubleshooting process.
-
Monitor your systems effectively with new and improved alerting and dashboarding capabilities. The Observability Solution includes rich pre-built content that you can leverage to quickly start monitoring specific services.
-
Diagnose issues quickly using features like the Entity Explorer, trace analytics, and the Metrics Explorer.
-
Troubleshoot issues and find root causes through Behavior insights, Root Cause Explorer, and log search.
Resources created or modified
This section lists the resources that the CloudFormation template creates in AWS and Sumo Logic.
AWS Resources
The AWS CloudFormation template execution creates or modifies the following resources in the AWS account if you are not already collecting data from those AWS services. If you are, the AWS CloudFormation template will simply integrate with your existing collector sources.
AWS Data Source | AWS Resources Created | Applicable AWS Observability Dashboards |
---|---|---|
AWS CloudTrail Logs | S3 Bucket SNS Topic AWS Trail SNS Subscription AWS Lambda IAM Roles |
AWS API Gateway ULM AWS Lambda ULM Amazon DynamoDB ULM Amazon RDS ULM |
Amazon CloudWatch Metrics | AWS Lambda IAM Roles |
AWS API Gateway ULM AWS Lambda ULM Amazon DynamoDB ULM
|
Amazon Application Load Balancer logs | S3 Bucket SNS Topic SNS Subscription AWS Lambda IAM Role |
AWS Application Load Balancer ULM |
AWS Lambda CloudWatch logs | AWS Lambda IAM Roles |
AWS Lambda ULM |
If you are using an existing bucket to collect AWS Application ELB logs, the Amazon S3 bucket policy for this bucket will be updated to include the policy below, if in case the policy does not already exist:
{
"Sid": "AwsAlbLogs",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam:::root"
},
"Action": [
"s3:PutObject"
],
"Resource": "arn:aws:s3:::{bucket_name}/*"
}
Resources created in Sumo Logic
The AWS CloudFormation template execution creates the following resources in Sumo Logic.
Resource | Name |
---|---|
App folder | Sumo Logic AWS Observability Apps-<Date of installation> |
Hosted Collector | aws-observability-<AccountAlias> |
Field Extraction Rule | AwsObservabilityFieldExtractionRule AwsObservabilityAlbAccessLogsFER AwsObservabilityApiGatewayCloudTrailLogsFER AwsObservabilityDynamoDBCloudTrailLogsFER AwsObservabilityLambdaCloudWatchLogsFER AwsObservabilityRdsCloudTrailLogsFER |
Explorer View | AWS Observability |
Metric Rules | AwsObservabilityRDSClusterMetricsEntityRule AwsObservabilityRDSInstanceMetricsEntityRule AwsObservabilityALBMetricsEntityRule AwsObservabilityLambdaMetricsEntityRule AwsObservabilityApiGatewayMetricsEntityRule AwsObservabilityDynamoDBMetricsEntityRule AwsObservabilityEC2MetricsEntityRule |
CloudTrail source | <AccountAlias>-aws-observability-cloudtrail-logs-<AWS::Region> |
CloudWatch logs (HTTP) source | <AccountAlias>-cloudwatch-logs-<AWS::Region> |
CloudWatch Metrics source | <AccountAlias>-cloudwatch-metrics-<AWS::Region>-ApplicationELB <AccountAlias>-cloudwatch-metrics-<AWS::Region>-ApiGateway <AccountAlias>-cloudwatch-metrics-<AWS::Region>-DynamoDB <AccountAlias>-cloudwatch-metrics-<AWS::Region>-Lambda <AccountAlias>-cloudwatch-metrics-<AWS::Region>-ELB <AccountAlias>-cloudwatch-metrics-<AWS::Region>-RDS <AccountAlias>-cloudwatch-metrics-<AWS::Region>-ECS <AccountAlias>-cloudwatch-metrics-<AWS::Region>-NetworkELB <AccountAlias>-cloudwatch-metrics-<AWS::Region>-ElastiCache |
Amazon S3 Alb log source | <AccountAlias>-aws-observability-alb-logs-s3-<AWS::Region> |
Inventory Source | <AccountAlias>-inventory-<AWS::Region> |
XRay Source | <AccountAlias>-xray-aws-<AWS::Region> |
S3 Bucket Name | aws-observability-logs-<StackID> |
Fields | account region namespace tablename loadbalancer functionname apiname dbidentifier dbinstanceidentifier dbclusteridentifier instanceid |