categraf Prometheus插件故障

Viewed 86

categraf Prometheus插件隔一段时间就会停止采集Prometheus自动发现的metric,只有重启才能恢复。
版本: v0.2.11-909d5e2e7b3cbe830bdfcaa89bfbcc5bbdb93e64

报错日志:
Mar 31 14:11:11 prodjenkins-ops-01 categraf: ts=2023-03-31T06:11:11.468Z caller=prometheus.go:659 level=warn msg="Received broken pipe"
Mar 31 14:11:11 prodjenkins-ops-01 categraf: ts=2023-03-31T06:11:11.468Z caller=prometheus.go:688 level=info msg="Stopping scrape discovery manager..."
Mar 31 14:11:11 prodjenkins-ops-01 categraf: ts=2023-03-31T06:11:11.468Z caller=prometheus.go:702 level=info msg="Stopping notify discovery manager..."
Mar 31 14:11:11 prodjenkins-ops-01 categraf: ts=2023-03-31T06:11:11.468Z caller=prometheus.go:724 level=info msg="Stopping scrape manager..."
Mar 31 14:11:11 prodjenkins-ops-01 categraf: 2023/03/31 14:11:11 main.go:94: I! received signal: broken pipe
Mar 31 14:11:11 prodjenkins-ops-01 categraf: ts=2023-03-31T06:11:11.474Z caller=prometheus.go:684 level=info msg="Scrape discovery manager stopped"
Mar 31 14:11:11 prodjenkins-ops-01 categraf: ts=2023-03-31T06:11:11.522Z caller=prometheus.go:698 level=info msg="Notify discovery manager stopped"
Mar 31 14:11:11 prodjenkins-ops-01 categraf: ts=2023-03-31T06:11:11.534Z caller=prometheus.go:718 level=info msg="Scrape manager stopped"
Mar 31 14:11:11 prodjenkins-ops-01 categraf: ts=2023-03-31T06:11:11.594Z caller=dedupe.go:112 component=remote level=info remote_name=39b83e url=http://192.168.8.121:19000/prometheus/v1/write msg="Stopping remote storage..."
Mar 31 14:11:12 prodjenkins-ops-01 categraf: ts=2023-03-31T06:11:12.601Z caller=dedupe.go:112 component=remote level=info remote_name=39b83e url=http://192.168.8.121:19000/prometheus/v1/write msg="WAL watcher stopped" queue=39b83e
Mar 31 14:11:12 prodjenkins-ops-01 categraf: ts=2023-03-31T06:11:12.601Z caller=dedupe.go:112 component=remote level=info remote_name=39b83e url=http://192.168.8.121:19000/prometheus/v1/write msg="Stopping metadata watcher..."
Mar 31 14:11:12 prodjenkins-ops-01 categraf: ts=2023-03-31T06:11:12.601Z caller=dedupe.go:112 component=remote level=info remote_name=39b83e url=http://192.168.8.121:19000/prometheus/v1/write msg="Scraped metadata watcher stopped"
Mar 31 14:11:17 prodjenkins-ops-01 categraf: ts=2023-03-31T06:11:17.796Z caller=dedupe.go:112 component=remote level=info remote_name=39b83e url=http://192.168.8.121:19000/prometheus/v1/write msg="Remote storage stopped."
Mar 31 14:11:17 prodjenkins-ops-01 categraf: ts=2023-03-31T06:11:17.796Z caller=notifier.go:608 level=info component=notifier msg="Stopping notification manager..."
Mar 31 14:11:17 prodjenkins-ops-01 categraf: ts=2023-03-31T06:11:17.800Z caller=prometheus.go:876 level=info msg="Notifier manager stopped"
Mar 31 14:11:17 prodjenkins-ops-01 categraf: ts=2023-03-31T06:11:17.800Z caller=prometheus.go:889 level=info msg="See you next time!"
Mar 31 14:16:28 prodjenkins-ops-01 systemd-logind: Removed session 28.
Mar 31 14:16:28 prodjenkins-ops-01 systemd: Removed slice User Slice of curl.

1 Answers

systemd托管的吧?

收到 sigpipe信号了

Received broken pipe

是的,基于systemd托管,有什么解决的办法吗

你用的哪个版本? 用新版本,增加了处理了sigpipe信号

版本: v0.2.11-909d5e2e7b3cbe830bdfcaa89bfbcc5bbdb93e64