OpsDaiLou 发布的文章

架构:

Grafana + Prometheus + 第三方Exporter

  • Grafana:查询数据,通过面板展示监控项,统一展示。
  • Prometheus:配置不同的job,获取多方面监控指标数据,供Grafana查询获取展示。
  • 第三方Exporter:获取相关数据,供Prometheus获取存储。比如: node_exporterblackbox_exporterredis_exporterpostgres_exporternginx-module-vts

- 阅读剩余部分 -

Promtail报错

level=warn ts=2025-05-07T06:44:22.715765487Z caller=client.go:369 component=client host=local-loki:3100 msg="error sending batch, will retry" status=500 error="server returned HTTP status 500 Internal Server Error (500): rpc error: code = DeadlineExceeded desc = context deadline exceeded"

查看loki 日志

level=info ts=2025-05-07T06:44:11.830518711Z caller=checkpoint.go:615 msg="starting checkpoint"
level=info ts=2025-05-07T06:44:11.830788697Z caller=checkpoint.go:340 msg="attempting checkpoint for" dir=/data/wal/checkpoint.1237659
level=warn ts=2025-05-07T06:44:20.643216928Z caller=logging.go:86 traceID=1f665e71538b4138 orgID=fake msg="POST /loki/api/v1/push (500) 5.012615543s Response: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\\n\" ws: false; Connection: close; Content-Length: 266310; Content-Type: application/x-protobuf; User-Agent: promtail/2.6.1; "
level=warn ts=2025-05-07T06:44:21.530545119Z caller=logging.go:86 traceID=1e71d2fb0b8fa807 orgID=fake msg="POST /loki/api/v1/push (500) 5.289903404s Response: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\\n\" ws: false; Connection: close; Content-Length: 217308; Content-Type: application/x-protobuf; User-Agent: promtail/2.6.1; "
level=warn ts=2025-05-07T06:44:22.441075158Z caller=logging.go:86 traceID=646a00c6b84f9eff orgID=fake msg="POST /loki/api/v1/push (500) 5.100367208s Response: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\\n\" ws: false; Connection: close; Content-Length: 265673; Content-Type: application/x-protobuf; User-Agent: promtail/2.6.1; "
level=warn ts=2025-05-07T06:44:22.448904508Z caller=logging.go:86 traceID=7c39e6b8a70cd408 orgID=fake msg="POST /loki/api/v1/push (500) 5.31823812s Response: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\\n\" ws: false; Connection: close; Content-Length: 267119; Content-Type: application/x-protobuf; User-Agent: promtail/2.6.1; "
level=info ts=2025-05-07T06:45:04.018620273Z caller=table_manager.go:134 msg="uploading tables"
level=info ts=2025-05-07T06:45:04.018708423Z caller=index_set.go:86 msg="uploading table index_20215"

context deadline exceeded 超出上下文截至时间

这几个报错的时间都在5s,应该是有配置限制

处理方法

参考:https://github.com/grafana/loki/issues/6182#issuecomment-1695993787

# Ingester_client块配置了分配器distributor将如何连接到摄入器ingesters。仅在运行所有组件,分销商distributor或查询querier时都适当。
ingester_client:
  remote_timeout: 10s # 客户端的远程请求超时时间

默认配置

# The remote request timeout on the client side.
# 客户端的远程请求超时。
# CLI flag: -ingester.client.timeout
[remote_timeout: <duration> | default = 5s]

查看Promtail日志,发现存在Per stream rate limit exceeded (limit: 3MB/sec)报错,触发了Loki每个流每秒最大字节速率限制,可能导致数据丢失,查询异常问题出现。

Promtail异常日志

level=warn ts=2025-05-06T03:09:37.961274973Z caller=client.go:369 component=client host=local-loki:3100 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): entry with timestamp 2025-05-06 03:09:37.873760873 +0000 UTC ignored, reason: 'Per stream rate limit exceeded (limit: 3MB/sec) while attempting to ingest for stream '{agent=\"promtail\", filename=\"/opt/test_service/logs/info.log\", hostname=\"test-001\", job=\"service_logs\", service=\"test_service\"}' totaling 494791B, consider splitting a stream via additional labels or contact your Loki administrator to see if the limit can be increased' for stream: {agent=\"promtail\", filename=\"/opt/test_service/logs/info.log\", hostname=\"test-001\", job=\"service_logs\", service=\"test_service\"},"
level=warn ts=2025-05-06T03:09:39.610424292Z caller=client.go:369 component=client host=local-loki:3100 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): entry with timestamp 2025-05-06 03:09:39.552467567 +0000 UTC ignored, reason: 'Per stream rate limit exceeded (limit: 3MB/sec) while attempting to ingest for stream '{agent=\"promtail\", filename=\"/opt/test_service/logs/info.log\", hostname=\"test-001\", job=\"service_logs\", service=\"test_service\"}' totaling 2404B, consider splitting a stream via additional labels or contact your Loki administrator to see if the limit can be increased' for stream: {agent=\"promtail\", filename=\"/opt/test_service/logs/info.log\", hostname=\"test-001\", job=\"service_logs\", service=\"test_service\"},"

- 阅读剩余部分 -

问题描述

通过Grafana查询Loki数据源中的日志,发现存在指定时间段内,查询数据与实际文件中的日志不符的现象。

排查发现Promtail存在报错日志server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded for user fake (limit: 41943040 bytes/sec),超过了Loki配置的速率上限,被限流了。遂进行调整Loki配置,进行处理。

- 阅读剩余部分 -

promtail通过nginx向loki发送数据,收到以下报错

level=error ts=2025-05-08T01:21:05.955472787Z caller=client.go:380 component=client host=local-loki:3100 msg="final error sending batch" status=413 error="server returned HTTP status 413 Request Entity Too Large (413): <html>"

排查nginx日志,发现

2025/05/08 09:21:06 [error] 21#21: *62769236 client intended to send too large body: 1312028 bytes, client: 10.40.30.155, server: , request: "POST /loki/api/v1/push HTTP/1.1", host: "local-loki:3100"
 - | 10.40.30.155 | - | 08/May/2025:09:21:06 +0800 | POST /loki/api/v1/push HTTP/1.1 | 413 | 183 | - | promtail/2.6.1 | - | 0.032 | - | - | - | local-loki:3100

触发了Nginx的最大请求体限制,收到了 413 (Request Entity Too Large) error

- 阅读剩余部分 -