The overwhelming majority of log data processed by Sumo Logic represents nearly real-time messages. As such, timestamp detection and data indexing systems are optimized to handle streams of data originating in the recent past.
Ingesting old or historical data, especially when mixed with recent or real-time data in one Source, may occasionally be misinterpreted.
This article includes the assumptions that Sumo Logic works from about customer data, tips to help you make sure your data is handled correctly, and guidance on when to contact Sumo Logic Support regarding historical data uploads.
Assumption: Data is less than 365 Days Old
Sumo Logic assumes that all log message times fall within a window of -1 year through +2 days compared to the current time. Any log messages with a parsed timestamp outside of that window is automatically re-stamped with the current time.
Assumption: Data from a Source will have Similar Timestamps
Sumo Logic also assumes that log messages coming from a particular Source will all have timestamps that are close together. If a message comes through that appears to be more than one day earlier or later than recent messages from that Source, it will be auto-corrected to match the current time.
Best Practices for Working with Historical Data
Use the following tips for working with historical data:
- Try to avoid mixing old data and new data in the same Source. As a best practice, create dedicated Sources specifically for historical data.
- To ingest very old data (timestamps earlier than one year in the past), you will need to contact Sumo Logic Support. Historical data will be re-stamped with the current time if no adjustment is made to your account.
- To avoid quota throttling during initial upload of historical backlogs, break up your data and load it in chunks. Load the data in sequence, earliest to newest.
- Searching historical data is generally slower than searching more recent data.