kafak 告警已经关闭啦,但是还一直告警

Viewed 47

kafka目前运行正常,只是有分区不满足3告警,昨天调整了告警策略名称,保存啦,但是发现还是告警出来使用的之前策略名称,大佬帮看下吧,谢谢
image.png

系统日志显示
2023-08-10 19:59:00.830011 DEBUG process/process.go:288 rule_eval:alert-1-44 event:&{Id:0 Cate:prometheus Cluster:prometheus DatasourceId:1 GroupId:6 GroupName:kafka 主机组 Hash:8c566fab0fb812c8f72020e6111a9298 RuleId:44 RuleName:Risk of data loss - number of replicas less than 3 - exporter RuleNote: RuleProd:metric RuleAlgo: Severity:2 PromForDuration:120 PromQl: RuleConfig:{"inhibit":false,"queries":[{"prom_ql":"sum(kafka_topic_partition_in_sync_replica) by (topic) \u003c 3 ","severity":2}]} RuleConfigJson:map[inhibit:false queries:[map[prom_ql:sum(kafka_topic_partition_in_sync_replica) by (topic) < 3 severity:2]]] PromEvalInterval:30 Callbacks: CallbacksJSON:[] RunbookUrl: NotifyRecovered:1 NotifyChannels:wecom NotifyChannelsJSON:[wecom] NotifyGroups:1 NotifyGroupsJSON:[1] NotifyGroupsObj:[] TargetIdent: TargetNote: TriggerTime:1691668740 TriggerValue:1 Tags:rulename=Risk of data loss - number of replicas less than 3 - exporter,,service=kafka,,topic=carStatus TagsJSON:[rulename=Risk of data loss - number of replicas less than 3 - exporter service=kafka topic=carStatus] TagsMap:map[rulename:Risk of data loss - number of replicas less than 3 - exporter service:kafka topic:carStatus] Annotations:{} AnnotationsJSON:map[] IsRecovered:false NotifyUsersObj:[] LastEvalTime:1691668740 LastSentTime:0 NotifyCurNumber:0 FirstTriggerTime:0} fire

4 Answers

是否告警不是根据这个日志判断。是看是否还收到告警消息。这个日志只是说根据promql查询确实查到了数据,后面还有其他逻辑判断这个事件是否真的应该生成。

你截图的时间规则更新时间晚于系统日志中处理规则的时间,不太好判断你的问题,可以找找在处理规则之前时间点上做了更新规则名称的证据呗

目前告警还在发送
image.png
image.png

2023-08-11 11:28:29.559676 DEBUG process/process.go:288 rule_eval:alert-1-44 event:&{Id:0 Cate:prometheus Cluster:prometheus DatasourceId:1 GroupId:6 GroupName:kafka 主机组 Hash:335419ca17d2a1a1c862acaf87bb6626 RuleId:44 RuleName:Risk of data loss - number of replicas less than 3 - exporter RuleNote: RuleProd:metric RuleAlgo: Severity:2 PromForDuration:120 PromQl: RuleConfig:{"inhibit":false,"queries":[{"prom_ql":"sum(kafka_topic_partition_in_sync_replica) by (topic) \u003c 3 ","severity":2}]} RuleConfigJson:map[inhibit:false queries:[map[prom_ql:sum(kafka_topic_partition_in_sync_replica) by (topic) < 3 severity:2]]] PromEvalInterval:30 Callbacks: CallbacksJSON:[] RunbookUrl: NotifyRecovered:1 NotifyChannels:wecom NotifyChannelsJSON:[wecom] NotifyGroups:1 NotifyGroupsJSON:[1] NotifyGroupsObj:[] TargetIdent: TargetNote: TriggerTime:1691724509 TriggerValue:1 Tags:rulename=Risk of data loss - number of replicas less than 3 - exporter,,service=kafka,,topic=newRealTimeLocation TagsJSON:[rulename=Risk of data loss - number of replicas less than 3 - exporter service=kafka topic=newRealTimeLocation] TagsMap:map[rulename:Risk of data loss - number of replicas less than 3 - exporter service:kafka topic:newRealTimeLocation] Annotations:{} AnnotationsJSON:map[] IsRecovered:false NotifyUsersObj:[] LastEvalTime:1691724509 LastSentTime:0 NotifyCurNumber:0 FirstTriggerTime:0} fire
2023-08-11 11:28:29.559747 DEBUG process/process.go:288 rule_eval:alert-1-44 event:&{Id:0 Cate:prometheus Cluster:prometheus DatasourceId:1 GroupId:6 GroupName:kafka 主机组 Hash:89d445f6faa02c630e897e3edda1b445 RuleId:44 RuleName:Risk of data loss - number of replicas less than 3 - exporter RuleNote: RuleProd:metric RuleAlgo: Severity:2 PromForDuration:120 PromQl: RuleConfig:{"inhibit":false,"queries":[{"prom_ql":"sum(kafka_topic_partition_in_sync_replica) by (topic) \u003c 3 ","severity":2}]} RuleConfigJson:map[inhibit:false queries:[map[prom_ql:sum(kafka_topic_partition_in_sync_replica) by (topic) < 3 severity:2]]] PromEvalInterval:30 Callbacks: CallbacksJSON:[] RunbookUrl: NotifyRecovered:1 NotifyChannels:wecom NotifyChannelsJSON:[wecom] NotifyGroups:1 NotifyGroupsJSON:[1] NotifyGroupsObj:[] TargetIdent: TargetNote: TriggerTime:1691724509 TriggerValue:1 Tags:rulename=Risk of data loss - number of replicas less than 3 - exporter,,service=kafka,,topic=takeover_data_test TagsJSON:[rulename=Risk of data loss - number of replicas less than 3 - exporter service=kafka topic=takeover_data_test] TagsMap:map[rulename:Risk of data loss - number of replicas less than 3 - exporter service:kafka topic:takeover_data_test] Annotations:{} AnnotationsJSON:map[] IsRecovered:false NotifyUsersObj:[] LastEvalTime:1691724509 LastSentTime:0 NotifyCurNumber:0 FirstTriggerTime:0} fire
2023-08-11 11:28:29.649300 INFO dispatch/log.go:20 event(44ee73a6395f3b86f07b521214aab7de triggered) consume: rule_id=44 cluster:prometheus [rulename=Risk of data loss - number of replicas less than 3 - exporter service=kafka topic=vehicle_eventType_data]2@1691724509

可以再提供下下面的信息:
1 告警规则配置页面的截图,把浏览器地址栏也截上
2 夜莺的版本号