rule/dependency
This check validates rule dependencies in two scenarios:
Removed dependencies
When running pint ci, this check validates that any removed recording rule isn’t still being used by other rules.
Removing any recording rule that is a dependency of other rules is likely to make them stop working, unless there’s some other source of the metric that was produced by the removed rule.
Example: consider these two rules, one generates down:count metric that is then used by the alert rule:
groups:
- name: ...
rules:
- record: down:count
expr: count(up == 0) by(job)
- alert: Job is down
expr: down:count > 0
If we were to edit this file and delete the recording rule:
groups:
- name: ...
rules:
- alert: Job is down
expr: down:count > 0
This would leave our alert rule broken because Prometheus would no longer have down:count metric.
This check tries to detect scenarios like this but works across all files.
Cross-group dependencies
This check warns when a rule uses a metric produced by a recording rule in a different group.
Each group is evaluated independently with no guaranteed execution order. During evaluation, recording rules write results to a transaction that is committed to TSDB at the end of the group run. Queries within the same group can see uncommitted results from earlier rules in that group, but queries in other groups can only see values that have been committed to TSDB.
In practice, if rule B depends on rule A, they should be in the same group. Otherwise, B will see a stale value of A from the previous evaluation cycle, adding up to one evaluation interval of lag.
Example:
groups:
- name: recordings
rules:
- record: error:rate5m
expr: rate(errors_total[5m])
- name: alerts
rules:
- alert: HighErrorRate
expr: error:rate5m > 0.1
The HighErrorRate alert uses error:rate5m from a different group. Both groups run at fixed intervals but start at different times. With a one-minute interval:
- 0s -
recordingsevaluateserror:rate5mand commits to TSDB. - 55s -
alertsquerieserror:rate5mand sees a 55-second-old value. - 60s -
recordingsruns again. - 115s -
alertsruns again, still seeing stale data.
Each time alerts runs, it sees a value that is already up to one interval old.
Moving both rules to the same group eliminates this lag:
groups:
- name: errors
rules:
- record: error:rate5m
expr: rate(errors_total[5m])
- alert: HighErrorRate
expr: error:rate5m > 0.1
- 0s -
errorsevaluateserror:rate5mand writes to the transaction. - 0s -
errorsevaluatesHighErrorRate, which sees the value from the previous step with no lag.
Now the alert always sees the freshly computed value.
Configuration
This check doesn’t have any configuration options.
How to enable it
This check is enabled by default.
How to disable it
You can disable this check globally by adding this config block:
checks {
disabled = ["rule/dependency"]
}
You can also disable it for all rules inside given file by adding a comment anywhere in that file. Example:
# pint file/disable rule/dependency
Or you can disable it per rule by adding a comment to it. Example:
# pint disable rule/dependency
How to snooze it
You can disable this check until given time by adding a comment to it. Example:
# pint snooze $TIMESTAMP rule/dependency
Where $TIMESTAMP is either use RFC3339 formatted or YYYY-MM-DD. Adding this comment will disable rule/dependency until $TIMESTAMP, after that check will be re-enabled.