如截图,告警事件详情所示:
标签属于监控实例:172.16.15.205
而备注,却属于另一个实例:
172.16.15.205标签及备注信息如下:
如截图,告警事件详情所示:
标签属于监控实例:172.16.15.205
而备注,却属于另一个实例:
172.16.15.205标签及备注信息如下:
会不会是对象备注在告警触发时间之后有过修改,导致的差异
无任何调整动作,帮忙排查下
得需要日志看看了,例如关键字rule_eval
下面贴了相关日志,帮忙看看,此处回复,受到600字符限制了
Jan 26 18:30:43 victorymetrics n9e: 2024-01-26 18:30:43.633223 DEBUG process/process.go:309 rule_eval:alert-3-3 event:xxx
Jan 26 18:30:43 victorymetrics n9e: 2024-01-26 18:30:43.633223 DEBUG process/process.go:309 rule_eval:alert-3-3 event:&{Id:0 Cate:prometheus Cluster:promxy-victoriametrics DatasourceId:3 GroupId:5 GroupName:运维-OPS Hash:5381c4436475ac2919056aa4cfcbbc58 RuleId:3 RuleName:CPU使用率大于等于-95% RuleNote: RuleProd:metric RuleAlgo: Severity:1 PromForDuration:240 PromQl:100 - (avg by (ident,company,business,env,project,service,host)(irate(node_cpu_seconds_total{mode="idle",service!~".-db|cdh|hadoop|clickhouse|bigdata-.|bi-baobiao-app",env="prod",project!~"bigdata",ident!~"10.0.102.22|10.0.101.167|10.100.57.18|10.102.2.13|10.102.2.52|10.102.2.40|10.100.61.24|10.0.0.228|10.0.102.26|10.102.2.16|10.0.101.52|10.0.0.241|10.0.103.56|10.0.100.119",attribute!="db",resourcetype="vm",project!="ci"}[5m]) )) * 100 >= 95 RuleConfig:{"inhibit":false,"queries":[{"prom_ql":"100 - (avg by (ident,company,business,env,project,service,host)(irate(node_cpu_seconds_total{mode="idle",service!~".-db|cdh|hadoop|clickhouse|bigdata-.|bi-baobiao-app",env="prod",project!~"bigdata",ident!~"10.0.102.22|10.0.101.167|10.100.57.18|10.102.2.13|10.102.2.52|10.102.2.40|10.100.61.24|10.0.0.228|10.0.102.26|10.102.2.16|10.0.101.52|10.0.0.241|10.0.103.56|10.0.100.119",attribute!="db",resourcetype="vm",project!="ci"}[5m]) )) * 100 \u003e= 95","severity":1}]} RuleConfigJson:map[inhibit:false queries:[map[prom_ql:100 - (avg by (ident,company,business,env,project,service,host)(irate(node_cpu_seconds_total{mode="idle",service!~".-db|cdh|hadoop|clickhouse|bigdata-.|bi-baobiao-app",env="prod",project!~"bigdata",ident!~"10.0.102.22|10.0.101.167|10.100.57.18|10.102.2.13|10.102.2.52|10.102.2.40|10.100.61.24|10.0.0.228|10.0.102.26|10.102.2.16|10.0.101.52|10.0.0.241|10.0.103.56|10.0.100.119",attribute!="db",resourcetype="vm",project!="ci"}[5m]) )) * 100 >= 95 severity:1]]] PromEvalInterval:60 Callbacks:https://xxxxxx CallbacksJSON:[https://xxxxx] RunbookUrl: NotifyRecovered:1 NotifyChannels: NotifyChannelsJSON:[] NotifyGroups: NotifyGroupsJSON:[] NotifyGroupsObj:[] TargetIdent:10.0.101.211 TargetNote:航天开票-第三方 TriggerTime:1706265043 TriggerValue:97.68333 Tags:business=xz,,company=dsl,,env=prod,,host=172.16.15.205,,ident=172.16.15.205,,project=qydx,,rulename=CPU使用率大于等于-95%,,service=app TagsJSON:[business=xz company=dsl env=prod host=172.16.15.205 ident=172.16.15.205 project=qydx rulename=CPU使用率大于等于-95% service=app] TagsMap:map[business:xz company:dsl env:prod host:172.16.15.205 ident:172.16.15.205 project:qydx rulename:CPU使用率大于等于-95% service:app] Annotations:{} AnnotationsJSON:map[] IsRecovered:false NotifyUsersObj:[] LastEvalTime:1706265043 LastEscalationNotifyTime:0 LastSentTime:0 NotifyCurNumber:0 FirstTriggerTime:0 ExtraConfig: Status:0 Claimant: SubRuleId:0} fire
再看一下日志里 172.16.15.205备注信息修改为目前看到的备注是什么时候呢,我目前猜测是告警发出之后才修改的
这个怎么查看历史修改记录呢?我记得是很久很久之前数月之前的事情了
要不再修改一下现在备注尝试触发一下,现在描述现象和我理解不太一样:)