Loki Ingestion rate limit exceeded 错误排查与解决
问题描述
通过Grafana查询Loki数据源中的日志,发现存在指定时间段内,查询数据与实际文件中的日志不符的现象。
排查发现Promtail存在报错日志server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded for user fake (limit: 41943040 bytes/sec),超过了Loki配置的速率上限,被限流了。遂进行调整Loki配置,进行处理。
排查
查看promtail日志,发现存在报错
level=warn ts=2025-05-06T02:44:14.759325913Z caller=client.go:369 component=client host=local-loki:3100 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded for user fake (limit: 41943040 bytes/sec) while attempting to ingest '1' lines totaling '9350335' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"
level=warn ts=2025-05-06T02:44:15.662897277Z caller=client.go:369 component=client host=local-loki:3100 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded for user fake (limit: 41943040 bytes/sec) while attempting to ingest '1' lines totaling '9350335' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded for user fake (limit: 41943040 bytes/sec) while attempting to ingest '1' lines totaling '9350335' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased
服务器返回HTTP状态429 Too Many Requests(429):在尝试摄取总计'9350335'字节的'1'行时,超出了用户伪造的摄取速率限制(限制:41943040字节/秒),请减少日志量或联系您的Loki管理员以查看是否可以增加限制。
Ingestion rate limit exceeded 说明当前用户 fake 的 日志写入速率 超过了 Loki 配置的上限,限制值是:41943040 bytes/sec(即 40MB/s)
被Loki限流了,查看Loki配置limits_config
limits_config:
  # 查询返回的唯一系列最大值
  max_query_series: 100000
  retention_period: 72h         # 过期时间
  per_stream_rate_limit: 3MB # 每个流每秒的最大字节速率
  reject_old_samples: true      # 旧样品是否会被拒绝
  reject_old_samples_max_age: 24h      # 拒绝旧样本的最大时限
  enforce_metric_name: false
  #ingestion_rate_mb: 50 #每个用户每秒的采样率限制
  ingestion_rate_mb: 40 #每个用户每秒的采样率限制
  # 限制每行日志最大值
  # max_line_size: 30mb
  # max_line_size_truncate: true配置的ingestion_rate_mb: 40,每个用户每秒摄取速率限制 40MB/s。
解决方法
调整Loki配置
之前已经将ingestion_rate_mb调整到了40M/s,现在调整到100MB/s,突发调整到200MB
调整为
limits_config:
  ingestion_rate_mb: 100 #每个用户每秒最大日志写入速率(单位:MB)
  ingestion_burst_size_mb: 200 # 每个用户允许的最大突发写入大小的限制(单位:MB)# Per-user ingestion rate limit in sample size per second. Sample size includes
# size of the logs line and the size of structured metadata labels. Units in MB.
# 以每秒样本大小为单位的每用户摄取速率限制。样本大小包括日志行的大小和结构化元数据标签的大小。单位为 MB。
# CLI flag: -distributor.ingestion-rate-limit-mb
[ingestion_rate_mb: <float> | default = 4]
# Per-user allowed ingestion burst size (in sample size). Units in MB. The burst
# size refers to the per-distributor local rate limiter even in the case of the
# 'global' strategy, and should be set at least to the maximum logs size
# expected in a single push request.
# 每用户允许的摄取突发大小(以样本大小为单位)。单位为 MB。即使在使用 “全局 ”策略的情况下,突发大小也指每个分发器的本地速率限制器,至少应设置为单次推送请求中预期的最大日志大小。
# CLI flag: -distributor.ingestion-burst-size-mb
[ingestion_burst_size_mb: <float> | default = 6]