helm一键安装夜莺,如何监控pod状态异常报警呢?
可以试试这个Promql,没记错的话,Promql中的指标是通过KSM采集的:
(sum(kube_pod_container_status_last_terminated_reason{reason=~\"Error|OOMKilled\"}) by (namespace,pod,container) >0)
* on(namespace,pod,container)
sum(increase(kube_pod_container_status_restarts_total) >0) by(namespace,pod,container)