alerts/count
This check is used to estimate how many times given alert would fire. It will run expr
query from every alert rule against selected Prometheus servers and report how many unique alerts it would generate. If for
and/or keep_firing_for
are set on alerts they will be used to adjust results.
Configuration
Syntax:
alerts {
range = "1h"
step = "1m"
resolve = "5m"
minCount = 0
comment = "..."
severity = "bug|warning|info"
}
range
- query range, how far to look back,1h
would mean that pint will query last 1h of metrics. Defaults to1d
.step
- query resolution, for most accurate result use step equal toscrape_interval
, try to reduce it if that would load too many samples. Defaults to1m
.resolve
- duration after which stale alerts are resolved. Defaults to5m
.minCount
- minimal number of alerts for this check to report it. Default to0
. Set this to a no-zero value if you want this check to report only if the estimated number of alerts is high enough.comment
- set a custom comment that will be added to reported problems.severity
- set custom severity for reported issues, defaults toinfo
. This can be only set whenminCount
is set to a non-zero value.
How to enable it
This check is not enabled by default as it requires explicit configuration to work. To enable it add one or more prometheus {...}
blocks and a rule {...}
block with this checks config.
Example:
prometheus "prod" {
uri = "https://prometheus-prod.example.com"
timeout = "60s"
}
rule {
alerts {
range = "1d"
step = "1m"
resolve = "5m"
}
}
Report an error if there would be too many (>=50) alerts firing:
prometheus "prod" {
uri = "https://prometheus-prod.example.com"
timeout = "60s"
}
rule {
alerts {
range = "1d"
step = "1m"
resolve = "5m"
minCount = 50
comment = "You cannot add an rule that would immediately fire 50+ alerts, fix the problem first"
severity = "bug"
}
}
How to disable it
You can disable this check globally by adding this config block:
checks {
disabled = ["alerts/count"]
}
You can also disable it for all rules inside given file by adding a comment anywhere in that file. Example:
# pint file/disable alerts/count
Or you can disable it per rule by adding a comment to it. Example:
# pint disable alerts/count
If you want to disable only individual instances of this check you can add a more specific comment.
# pint disable alerts/count($prometheus)
Where $prometheus
is the name of Prometheus server to disable.
Example:
# pint disable alerts/count(prod)
How to snooze it
You can disable this check until given time by adding a comment to it. Example:
# pint snooze $TIMESTAMP alerts/count
Where $TIMESTAMP
is either use RFC3339 formatted or YYYY-MM-DD
. Adding this comment will disable alerts/count
until $TIMESTAMP
, after that check will be re-enabled.