我目前正在评估Loki,并且由于chunk数量的原因,面临着磁盘空间不足的问题。
我的实例是在Docker容器中运行的,使用官方文档提供的docker-compose设置(包括Loki、Promtail和Grafana)(请参见下面的docker-compose.yml)。
我基本上使用了Loki和Promtail的默认配置。除了一些调整保留期(我需要3个月),以及更高的摄入速率和摄入突发大小(请参见下面的配置)。
我绑定了一个包含1TB日志文件(MS Exchange日志)的卷,并设置了一个仅使用一个标签的promtail作业。
生成的chunks不断占用磁盘空间,我不得不逐步将VM磁盘扩展到1TB。
目前,我有0.9 TB的chunks。这不应该远远少于这个吗?(像初始日志大小的25%之类的)。上个周末,我停止了Promtail容器,以防止磁盘空间不足。今天我再次启动Promtail,并收到以下警告:
level=warn ts=2022-01-24T08:54:57.763739304Z caller=client.go:349 component=client host=loki:3100 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded (limit: 12582912 bytes/sec) while attempting to ingest '2774' lines totaling '1048373' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"
我之前就有过这个警告,将ingestion_rate_mb
增加到12
,将ingestion_burst_size_mb
增加到24
就可以解决这个问题...
现在有点走投无路了。
Docker Compose
version: "3"
networks:
loki:
services:
loki:
image: grafana/loki:2.4.1
container_name: loki
restart: always
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
volumes:
- ${DATADIR}/loki/etc:/etc/loki:rw
networks:
- loki
promtail:
image: grafana/promtail:2.4.1
container_name: promtail
restart: always
volumes:
- /var/log/exchange:/var/log
- ${DATADIR}/promtail/etc:/etc/promtail
ports:
- "1514:1514" # for syslog-ng
- "9080:9080" # for http web interface
command: -config.file=/etc/promtail/config.yml
networks:
- loki
grafana:
image: grafana/grafana:latest
container_name: grafana
restart: always
volumes:
- grafana_var:/var/lib/grafana
ports:
- "3000:3000"
networks:
- loki
volumes:
grafana_var:
Loki配置:
server:
http_listen_port: 3100
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
ruler:
alertmanager_url: http://localhost:9093
# https://grafana.com/docs/loki/latest/configuration/#limits_config
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h
ingestion_rate_mb: 12
ingestion_burst_size_mb: 24
per_stream_rate_limit: 12MB
chunk_store_config:
max_look_back_period: 336h
table_manager:
retention_deletes_enabled: true
retention_period: 2190h
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_encoding: snappy
Promtail配置
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: exchange
static_configs:
- targets:
- localhost
labels:
job: exchangelog
__path__: /var/log/*/*/*log
boltdb-shipper
,但没有 compactor(请注意,我不是 Loki 设置的权威,但在处理文件大小时,这似乎是一个疏忽...) - Jan 'splite' K.