Amazon S3 bucket Sources do not support the double asterisk (**) within the file expression like other Source types. How do I instruct the Hosted Collector to traverse multiple directories? Normally, I would do something like this:
What is the alternative, as double wildcards are not supported?
Amazon S3 interprets wildcards in the file path differently than other Sources. It does not consider forward slashes the same way a traditional filesystem does; Amazon S3 considers them simply part of a path string. Using a single asterisk in S3 Source paths accomplishes the same goal as using two asterisks in other Sources.
For example, CloudTrail logging generates a new folder every day that looks like this:
To gather all logs under a directory structure that is constantly changing, use the file path above when creating your S3 Source:
For example, an S3 source file path set to My_S3_Bucket_Name/Cloudtrail/* will collect everything under CloudTrail.
My_S3_Bucket_Name/CloudTrail/2014/12/05/20141205.json.gz My_S3_Bucket_Name/CloudTrail/2013/11/04/20131104.json.gz My_S3_Bucket_Name/CloudTrail/2012/10/03/20121003.json.gz
When you configure the Source's Collect From Time parameter, if you select All Time, this will result in collecting every single log, which in some cases can be several years of logs! Specify a specific date instead in order to avoid this issue.