You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed this behaviour: sometimes fluentbit pods (usually one) just stop sending logs to the configured outputs (Elasticsearch in this case).
There is no error message or any weird log appearing, besides the logline that informs that the config file changed (level=info time=2024-09-03T09:57:40Z msg="Config file changed, reloading...").
I configured fluentbit via namespace-scoped CRDs (FluentBitConfig, Output, but I have the ClusterFilter, ClusterInput, ClusterFluentBitConfig cluster-wide CRDs as well), and I noticed that the fluent-bit-config configmap changes the order of the configured CRDs (e.g., the order of the configured outputs changes from time to time, although the overall content remains the same).
The underlying idea is to create an elasticsearch index per kubernetes namespace and forward all the namespace's traffic there.
The generated configuration (configured entirely via the operator CRDs) looks like this:
[Service]
Http_Server true
Log_Level debug
Parsers_File /fluent-bit/config/parsers.conf
Parsers_File /fluent-bit/config/parsers_multiline.conf
[Input]
Name tail
Path /var/log/containers/*.log
Read_from_Head false
Refresh_Interval 10
Skip_Long_Lines true
DB /fluent-bit/tail/pos.db
DB.Sync Normal
Mem_Buf_Limit 100MB
Parser cri
Tag kube.*
storage.type memory
[Filter]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Labels false
Annotations false
[Filter]
Name nest
Match kube.*
Operation lift
Nested_under kubernetes
Add_prefix kubernetes_
[Filter]
Name modify
Match kube.*
Remove stream
Remove kubernetes_pod_id
Remove kubernetes_host
Remove kubernetes_container_hash
[Filter]
Name nest
Match kube.*
Operation nest
Wildcard kubernetes_*
Nest_under kubernetes
Remove_prefix kubernetes_
[Filter]
Name rewrite_tag
Match kube.*
Rule $kubernetes['namespace_name'] ^(NAMESPACE1)$ 977c7a0a4554b6071fdf7e33484f3853.$TAG false
Emitter_Name re_emitted_977c7a0a4554b6071fdf7e33484f3853
[Filter]
Name rewrite_tag
Match kube.*
Rule $kubernetes['namespace_name'] ^(NAMESPACE2)$ a692cd3035eea3e14eae8ab89d9aed1b.$TAG false
Emitter_Name re_emitted_a692cd3035eea3e14eae8ab89d9aed1b
...
...
[Output]
Name es
Match_Regex ^977c7a0a4554b6071fdf7e33484f3853\.(?:kube|service)\.(.*)
Host ELASTICSEARCH_HOST
Port 9200
HTTP_User admin
HTTP_Passwd password
Index index-namespace1
Logstash_Format false
Time_Key @timestamp
Generate_ID false
Suppress_Type_Name On
tls On
tls.verify false
[Output]
Name es
Match_Regex ^a692cd3035eea3e14eae8ab89d9aed1b\.(?:kube|service)\.(.*)
Host ELASTICSEARCH_HOST
Port 9200
HTTP_User admin
HTTP_Passwd password
Index index-namespace2
Logstash_Format false
Time_Key @timestamp
Generate_ID false
Suppress_Type_Name On
tls On
tls.verify false
I noticed that if I check the prometheus targets, that pod results in the DOWN state:
Get "http://10.42.139.104:2020/api/v2/metrics/prometheus": context deadline exceeded
I also tried to manually curl the pod (from another pod) and, indeed, it is not reachable (while all the others belonging to the same daemonset are reachable and they keep sending the logs to the intended outputs).
Any idea about what is going on? I also tried to check whether there could be resources (in terms of cpu/ram) issues (via kubectl describe and kubectl get pods -o yaml) but nothing obvious appeared. The pod is not failing nor getting restarted.
Once the pod is restarted, everything works again as intended.
To Reproduce
I can't really reproduce the issue, since it happens randomly and it's "solved" once the pod is killed (and a new one is created)
Expected behavior
The logs are sent to the intended output or at least there is a "talking" error message that highlights the cause of this behavior.
Seems we run into the same issue. This happens randomly on random nodes. For me, it seems, fluent-bit just starts to hang sometimes when a reload (or multiple reloads in a short period) happening. @marcozov have you found any workaround
Describe the issue
I noticed this behaviour: sometimes fluentbit pods (usually one) just stop sending logs to the configured outputs (Elasticsearch in this case).
There is no error message or any weird log appearing, besides the logline that informs that the config file changed (
level=info time=2024-09-03T09:57:40Z msg="Config file changed, reloading..."
).I configured fluentbit via namespace-scoped CRDs (FluentBitConfig, Output, but I have the ClusterFilter, ClusterInput, ClusterFluentBitConfig cluster-wide CRDs as well), and I noticed that the fluent-bit-config configmap changes the order of the configured CRDs (e.g., the order of the configured outputs changes from time to time, although the overall content remains the same).
The underlying idea is to create an elasticsearch index per kubernetes namespace and forward all the namespace's traffic there.
The generated configuration (configured entirely via the operator CRDs) looks like this:
I noticed that if I check the prometheus targets, that pod results in the DOWN state:
I also tried to manually curl the pod (from another pod) and, indeed, it is not reachable (while all the others belonging to the same daemonset are reachable and they keep sending the logs to the intended outputs).
Any idea about what is going on? I also tried to check whether there could be resources (in terms of cpu/ram) issues (via
kubectl describe
andkubectl get pods -o yaml
) but nothing obvious appeared. The pod is not failing nor getting restarted.Once the pod is restarted, everything works again as intended.
To Reproduce
I can't really reproduce the issue, since it happens randomly and it's "solved" once the pod is killed (and a new one is created)
Expected behavior
The logs are sent to the intended output or at least there is a "talking" error message that highlights the cause of this behavior.
Your Environment
How did you install fluent operator?
Via helm chart (with argocd, from the https://fluent.github.io/helm-charts/ repository), with the following parameters:
I disabled the inputs and filter because I configured them by myself in order to use namespace-wide CRDs.
Additional context
No response
The text was updated successfully, but these errors were encountered: