Before you install the Sumo Logic App for Amazon CloudFront, you'll need to perform the following steps:
- Grant Sumo Logic access to an Amazon S3 bucket.
- Enable CloudFront logging.
- Confirm that logs are being delivered to the Amazon S3 bucket.
- Add an AWS Source to Sumo Logic.
- Install the Sumo Logic App for Amazon CloudFront.
When you create an AWS Source, you associate it with a Hosted Collector. Before creating the Source, identify the Hosted Collector you want to use or create a new Hosted Collector. For instructions, see Configure a Hosted Collector.
Create an AWS Source
- In Sumo Logic select Manage Data > Collection > Collection.
- On the Collectors page, click Add Source next to a Hosted Collector, either an existing Hosted Collector or one you have created for this purpose.
- Select your AWS Source type.
- Enter a name for the new Source. A description is optional.
- For Bucket Name, enter the exact name of your organization's S3 bucket. Be sure to double-check the name as it appears in AWS, for example:
- For Path Expression, enter the string that matches the S3 objects you'd like to collect. You can use one wildcard (*) in this string. Recursive path expressions use a single wildcard and do NOT use a leading forward slash. See About Amazon Path Expressions for details.
- Collection should begin. Choose or enter how far back you'd like to begin collecting historical logs. You can either:
- Choose a predefined value from dropdown list, ranging from "Now" to “72 hours ago” to “All Time”, or
- Enter a relative value. To enter a relative value, click the Collection should begin field and press the delete key on your keyboard to clear the field. Then, enter a relative time expression, for example
-1w. You can define when you want collection to begin in terms of months (M), weeks (w), days (d), hours (h) and minutes (m).
- For Source Category, enter any string to tag the output collected from this Source. (Category metadata is stored in a searchable field called _sourceCategory.)
- For AWS Access you have two Access Method options. Select Role-based access or Key access based on the AWS authentication you are providing. Role-based access is preferred, this was completed in the prerequisite step Grant Sumo Logic access to an AWS Product.
- For Scan Interval, use the default of 5 minutes. Alternately, enter the frequency Sumo Logic will scan your S3 bucket for new data. To learn more about Scan Interval considerations, see About setting the S3 Scan Interval.
- Set any of the following under Advanced:
- Enable Timestamp Parsing. This option is selected by default. If it's deselected, no timestamp information is parsed at all.
- Time Zone. There are two options for Time Zone. You can use the time zone present in your log files, and then choose an option in case time zone information is missing from a log message. Or, you can have Sumo Logic completely disregard any time zone information present in logs by forcing a time zone. It's very important to have the proper time zone set, no matter which option you choose. If the time zone of logs can't be determined, Sumo Logic assigns logs UTC; if the rest of your logs are from another time zone your search results will be affected.
- Timestamp Format. By default, Sumo Logic will automatically detect the timestamp format of your logs. However, you can manually specify a timestamp format for a Source. See Timestamps, Time Zones, Time Ranges, and Date Formats for more information.
- Enable Multiline Processing. See Collecting Multiline Logs for details on multiline processing and its options. This is enabled by default. Use this option if you're working with multiline messages (for example, log4J or exception stack traces). Deselect this option if you want to avoid unnecessary processing when collecting single-message-per-line files (for example, Linux system.log). Choose one of the following:
- Infer Boundaries. Enable when you want Sumo Logic to automatically attempt to determine which lines belong to the same message. If you deselect the Infer Boundaries option, you will need to enter a regular expression in the Boundary Regex field to use for detecting the entire first line of multiline messages.
- Boundary Regex. You can specify the boundary between messages using a regular expression. Enter a regular expression that matches the entire first line of every multiline message in your log files.
- Create any Processing Rules you'd like for the AWS Source.
- When you are finished configuring the Source click Save.
Multiline Processing Boundary Regex Example
If your CloudFront log message is of this format:
2017-06-13 22:02:13 SYD1 ..............
You could use this Boundary Regex:
Sample Log Message
2017-09-27 00:21:12 ORD51-M1 335 188.8.131.52 GET domain.cloudfront.net /content/FDW/HLS/HLS360p/FDW_s01e002_tv_hv_or_en_xx_HLS360p_16x9_00_v00154.ts 403 https://www.company.com/stream/food-wars/s01e002 Mozilla/5.0%2520(Macintosh;%2520Intel%2520Mac%2520OS%2520X%252010.12;%2520rv:54.0)%2520Gecko/20100101%2520Firefox/54.0 Policy=eyJTdGF0ZW1lbnQiOiBbeyJSZXNvdXJjZSI6Imh0dHBzOi8vZDM5dG5yZzRlbnBrNncuY2xvdWRmcm9udC5uZXQvY29udGVudC9GRFcvSExTL0hMUzM2MHAvRkRXX3MwMWUwMDJfdHZfaHZfb3JfZW5feHhfSExTMzYwcF8xNng5XzAwX3YwMDE1NC50cyIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTUwMzk4MTc1OX0sIklwQWRkcmVzcyI6eyJBV1M6U291cmNlSXAiOiI2NS4zMC4xLjEzOC8zMiJ9LCJEYXRlR3JlYXRlclRoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTUwMzk3NDI1OX19fV19&Signature=NCDlKRp6Nv9triqA1r-RBulrMXlvQCRxH16c3dP4RGdmwx8yQO0d75%7EdN94-wwaQ2x7NDlzNUrn7IXkUyHJN3S9kdx7RfVt-gQw9E3hMc4rYYe5NVR0wAeye%7E3gMKuFY%7EhshJqMrbE96HmzzhgQ5qS9gW797PDiwddCmtjYxqgndfF7jO2JJ9QwSpHfqcn5Ceo89Ra0mxwjo4JYu8JfiDhbOAkTU7mpy1ql%7EmYOuwc4zntjMK%7ERKOtcrV3sP9uIunpdh6Ur0-pOmPYTJt13VgUfoYmFB0CJnc8TMosN8ouqMIlSnLXfeKiIdDiP%7EGQKtYeZ54NVx6PqrmOQBSVhikg__&Key-Pair-Id=APKAIWFUV66JZQCBHYXA - Error h5BKcPRKo5oIEz0KZ06V6bRCTJttiW_WUJQmT71jjTnYGE8pA1kfQA== domain.cloudfront.net https 722 0.001 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Error HTTP/2.0
Count of HTTP Status Codes
_sourceCategory= aws/cf | parse "*\t*\t*\t*\t*\t*\t*\t*\t*\t*\t*\t*\t*\t*\t*" as _filedate,_ftime,edgeloc, scbytes, c_ip,method,cs_host,uri_stem,status,referer,user_agent,uri_query,cookie,edgeresult,requestid
| count as count by status
| sort by count