v6版ga4 categraf收集基础设施看到机器上来了,监控大盘看不到机器, prometheus查不到这台任何数据

Viewed 178

早上好,

我这边防火墙放开了9090和17000端口, 然后就只有基础设施看得到, 其他地方没数据,
有无排查思路建议下, 还是iptables的端口问题


9 Answers

找到问题了,测试部署时yum install prometheus, 生产部署的时候prometheus是tar包安装的, 启动参数里没加
--enable-feature=remote-write-receiver
在n9e日志里看到

Apr 23 12:18:49 gaia_pro_nightingale n9e[27773]: 2023-04-23 12:18:49.205002 WARNING writer/writer.go:72 post to http://172.22.204.190:9090/api/v1/write got error: push data with remote write request got status code: 404, response body: remote write receiver needs to be enabled with --enable-feature=remote-write-receiver
Apr 23 12:18:49 gaia_pro_nightingale n9e[27773]: 2023-04-23 12:18:49.205055 WARNING writer/writer.go:73 example timeseries:labels:<name:"__name__" value:"categraf_go_memstats_lookups_total" > labels:<name:"env" value:"prod_db" > labels:<name:"ident" value:"gaia_pro_db_bcsj1" > labels:<name:"busigroup" value:"gaia_mysql_srm" > samples:<timestamp:1682223528538 >
Apr 23 12:19:04 gaia_pro_nightingale n9e[27773]: 2023-04-23 12:19:04.020335 WARNING writer/writer.go:72 post to http://172.22.204.190:9090/api/v1/write got error: push data with remote write request got status code: 404, response body: remote write receiver needs to be enabled with --enable-feature=remote-write-receiver
Apr 23 12:19:04 gaia_pro_nightingale n9e[27773]: 2023-04-23 12:19:04.020384 WARNING writer/writer.go:73 example timeseries:labels:<name:"__name__" value:"processes_zombies" > labels:<name:"env" value:"prod_db" > labels:<name:"ident" value:"gaia_pro_db_bcsj1" > labels:<name:"busigroup" value:"gaia_mysql_srm" > samples:<timestamp:1682223543535 >

添加启动参数重启prometheus就好了, 感谢各位~

即时查询里查询看下

prom里没数据的话,可以先解除防火墙限制

1, v6 架构介绍:https://flashcat.cloud/blog/nightingale-v6-arch/ 或者关注视频号 SRETalk,视频号里也有架构讲解,或者加入夜莺黄埔营QQ群,群号:479290895 群空间文件里有历次直播录屏
2,categraf 推数据给服务端,需要正确配置 writers 部分,贴一下categraf config.toml 中的 writer 部分的配置
3,categraf 如果推数据失败,会打印日志,可以看一下日志里是否有线索

以我小白级别的错误经验,我之前出现过的情况把n9e的配置写错了,通过n9e日志可以看到报错信息,但是具体情况需要参考大佬ulricqin建议,了解整体架构后,按组件日志和配置去排查,尝试找找线索,如果没有发现,可以提供下配置,日志等信息,也方便所有热心人士更好的帮你发现问题

categraf日志里看上去正常的

cat messages-20230423 | grep categraf
Apr 21 16:26:35 gaia_pro_db_bcsj1 rz[220633]: [root] categraf.tar.gz/ZMODEM: error: zgethdr returned -1
Apr 21 16:26:35 gaia_pro_db_bcsj1 rz[220633]: [root] categraf.tar.gz/ZMODEM: error
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 I! tracing disabled
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 main.go:128: I! runner.binarydir: /opt/categraf
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 main.go:129: I! runner.hostname: gaia_pro_db_bcsj1
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 main.go:130: I! runner.fd_limits: (soft=65536, hard=65536)
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 main.go:131: I! runner.vm_limits: (soft=unlimited, hard=unlimited)
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 provider.go:69: I! use input provider: [local]
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 traces_agent.go:19: I! traces agent disabled!
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 prometheus_agent.go:19: I! prometheus scraping disabled!
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 ibex_agent.go:19: I! ibex agent disabled!
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 agent.go:39: I! agent starting
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:155: E! input: local.arp_packet not supported
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:209: I! input: local.conntrack started
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:209: I! input: local.cpu started
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:209: I! input: local.disk started
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:209: I! input: local.diskio started
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:209: I! input: local.greenplum started
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:209: I! input: local.ipvs started
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:155: E! input: local.jolokia_agent_kafka not supported
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:155: E! input: local.jolokia_agent_misc not supported
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:209: I! input: local.kernel started
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:209: I! input: local.kernel_vmstat started
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:209: I! input: local.linux_sysctl_fs started
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:209: I! input: local.mem started
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:209: I! input: local.mysql started
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:209: I! input: local.net started
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:209: I! input: local.netstat started
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:209: I! input: local.nfsclient started
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:155: E! input: local.oracle not supported
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:209: I! input: local.processes started
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:209: I! input: local.self_metrics started
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:209: I! input: local.sockstat started
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 metrics_agent.go:209: I! input: local.system started
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 agent.go:47: I! [*agent.MetricsAgent] started
Apr 21 16:55:42 gaia_pro_db_bcsj1 categraf: 2023/04/21 16:55:42 agent.go:50: I! agent started

writer的是n9e 17000

[writer_opt]
batch = 1000
chan_size = 1000000

[[writers]]
url = "http://172.22.204.189:17000/prometheus/v1/write"

# Basic auth username
basic_auth_user = ""

# Basic auth password
basic_auth_pass = ""

系统配置,数据源,估计是没添加数据源