As your production grows, more and more components are added, and your ability to identify that moment when things go wrong becomes more limited.
Defining static alerts used to be the only solution offered to this type of problem, users would set an alert to know if the number of errors exceeded a certain threshold in a certain timeframe. The problem with this approach is that it’s very sensitive to traffic spikes and the nature of your production thus creating multiple false positive alerts while missing real events.
Coralogix’s Volume Anomaly introduce an automatic model that learns the ratio of error logs/bad API responses across all different environments/components/services throughout the different hours of the day and days of the week so you can get a pinpointed insight on any abnormal error behavior across your stack.
Volume Anomalies also tie our ability to understand your error types, version upgrades, and log severity to deliver an automatic analysis of the error spike reason by pointing out suspected error types and suspected versions which caused the issue.
Manual:
In your insights timeline, look for the Volume anomaly event:
2) The “Error volume anomaly” tab contains:
The model of normal error behavior Vs. the current observed behavior during the specified anomaly timeframe.
Suspected Errors – errors generated by that same component which occurred more than their normal behavior for this time of the day during the volume anomaly timeframe.
Top Errors – most common errors from that component during the anomaly timeframe
Newly Introduced Errors – Errors which occurred in that component for the first time during the anomaly timeframe.
3) The Logs tab and Loggregation tabs will contain all error logs generated by the anomalous component during the anomaly timeframe and their templates.
Note that you also get an email notification whenever a volume anomaly occurs, you can modify the components on which you want to be alerted in our notification settings.