夜莺5.0版本因时间戳导致数据写入失败

Viewed 52

夜莺5.0版本夜莺server写入prometheus失败,排查发现是服务端的系统时间比当前时间慢了6s,在纠正系统时间并重启prometheus和n9e server后,仍然继续报错:
2023-06-19 17:35:45.511959 WARNING writer/writer.go:67 example timeseries:labels:<name:"name" value:"net_packets_recv" > labels:<name:"cloudpods_id" value:"7e1bb184-f029-4be2-8469-e8af7fa009c5" > labels:<name:"id" value:"p-hfidc-t4-28" > labels:<name:"host_name" value:"p-hfidc-t4-28" > labels:<name:"interface" value:"cali0bcbf516ad9" > labels:<name:"ident" value:"10.161.17.21" > labels:<name:"cloud" value:"cloudpods" > labels:<name:"region" value:"hf-idc" > samples:<value:2.5354865e+07 timestamp:1687167345092 >
2023-06-19 17:35:45.512164 WARNING writer/writer.go:66 post to http://127.0.0.1:9090/api/v1/write got error: push data with remote write request got status code: 400, response body: out of bounds

这种情况,还有其他补救办法吗

1 Answers

可以升级v6.0.0.ga12试试,新版本做了优化,不同机器的数据不会混合成一个batch发给prometheus了。Prometheus在接收remote write数据的时候,有个很恶心的点,是batch里有一条数据时间有问题,就drop点整个batch。

我这个问题:https://answer.flashcat.cloud/questions/10010000000002808是不是和这个是一个道理,需要升级下n9e吗?我目前是v6.0.0-ga.11@ulricqin