Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Fluentd shipping nginx logs #159

Open
max-blue opened this issue Jun 15, 2023 · 0 comments
Open

Issues with Fluentd shipping nginx logs #159

max-blue opened this issue Jun 15, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@max-blue
Copy link

max-blue commented Jun 15, 2023

Describe the bug
I'm experiencing an issue when Fluentd attempts to send logs to Elasticsearch. The problem seems to be related to parsing the log data. The specific error message is "Failed to parse content to map" and "Invalid UTF-8 middle byte 0x5c", suggesting a possible issue with the encoding of the data being sent to Elasticsearch.

Version of Helm and Kubernetes:

Helm Version: 3.12.1

Kubernetes Version:

Client Version: v1.27.2
Kustomize Version: v5.0.1
Server Version: v1.25.9-eks-0a21954

Which version of the chart:
13.9.0

What happened:
When Fluentd attempts to send logs to Elasticsearch, it fails with an error "Failed to parse content to map" and "Invalid UTF-8 middle byte 0x5c". This error appears to be due to issues with the encoding of the log data. This specifically happen with Nginx Ingress Controller logs.

What you expected to happen:
Fluentd should correctly send logs to Elasticsearch without encountering parsing errors.

How to reproduce it (as minimally and precisely as possible):
FluentD helm Values

fluentdElasticsearch:
  enabled: true
  syncPolicy: {}
  version: "13.10.0"
  values:
    elasticsearch:
      auth:
        enabled: true
        user: "kubernetes"
        existingSecret:
          name: elasticsearch
          key: password
      buffer:
        chunkLimitSize: 4M
      hosts:
        - "mycluster.us-east-1.aws.found.io:9243"
      log400Reason: true
      logLevel: warn
      logstash:
        prefix: logs-default
      requestTimeout: 15s
      scheme: https
      suppressTypeName: true
    env:
      FLUENT_UID: '0'
      LIVENESS_THRESHOLD_SECONDS: 900
      STUCK_THRESHOLD_SECONDS: 1800
      TZ: US/Eastern
    priorityClassName: system-node-critical
    resources:
      limits:
        cpu: 500m
        memory: 2Gi
      requests:
        cpu: 100m
        memory: 2Gi
    serviceMonitor:
      enabled: true
    tolerations:
      - operator: Exists
    image:
      repository: ecr.aws/fluentd
      tag: v4.3.9
      pullPolicy: Always
    livenessProbe:
      enabled: false
    configMaps:
      useDefaults:
        containersInputConf: false
        systemInputConf: false
    extraConfigMaps:
      containers.input.conf: |-
        <source>
          @id fluentd-containers.log
          @type tail
          path /var/log/containers/*.log
          pos_file /var/log/containers.log.pos
          tag raw.kubernetes.*
          read_from_head true
          <parse>
            @type multi_format
            <pattern>
              format json
              time_key time
              time_format %Y-%m-%dT%H:%M:%S.%NZ
            </pattern>
            <pattern>
              format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
              time_format %Y-%m-%dT%H:%M:%S.%N%:z
            </pattern>
          </parse>
        </source>
        # Detect exceptions in the log output and forward them as one log entry.
        <match raw.kubernetes.**>
          @id raw.kubernetes
          @type detect_exceptions
          remove_tag_prefix raw
          message log
          stream stream
          multiline_flush_interval 5
          max_bytes 500000
          max_lines 1000
        </match>
        <filter kubernetes.**>
          @id filter_kubernetes_metadata
          @type kubernetes_metadata
        </filter>
        # Concatenate multi-line logs
        <filter kubernetes.**>
          @type concat
          key "message"
          stream_identity_key "$.docker.container_id"
          multiline_end_regexp "/\n$/"
          separator ""
          timeout_label @NORMAL
          flush_interval 5
        </filter>
        <filter kubernetes.**>
          @type record_transformer
          <record>
            cluster "#{ENV['CLUSTER_NAME']}"
          </record>
        </filter>
        # Fixes json fields in Elasticsearch
        <filter kubernetes.**>
          @id filter_parser
          @type parser
          key_name log
          reserve_data true
          remove_key_name_field false
          <parse>
            @type multi_format
            <pattern>
              format json
              time_key ""
            </pattern>
            <pattern>
              format none
            </pattern>
          </parse>
        </filter>
        <match /.+fluentd-elasticsearch.+/>
          @type rewrite_tag_filter
          <rule>
            key message
            pattern /Rejected by Elasticsearch/
            tag elasticsearch-rejected
          </rule>
        </match>
        <match /+.(elastalert|fluentd|kubecost|metricbeat|my-collector).+/>
          @type null
        </match>
        <filter kubernetes.**>
          @type grep
          <exclude>
            key message
            pattern /^[\n\r\s]*$/
          </exclude>
        </filter>
        <filter kubernetes.**>
          @type grep
          <exclude>
            key message
            pattern /\/manage\/(health|info)/
          </exclude>
        </filter>
        <filter kubernetes.**>
          @type grep
          <exclude>
            key $.kubernetes.container_name
            pattern /^(my-collector|logstash|liveness-probe)$/
          </exclude>
        </filter>
        <filter kubernetes.**>
          @type grep
          <exclude>
            key message
            pattern /^Read timed out$/
          </exclude>
        </filter>
        <filter kubernetes.**>
          @type grep
          <exclude>
            key message
            pattern /Successfully connected to (.+):6161[67]/
          </exclude>
        </filter>
        <filter kubernetes.**>
          @type grep
          <exclude>
            key message
            pattern /Adding headers AuthenticatedUser, Authorization and x-company-userinfo/
          </exclude>
        </filter>
        <filter kubernetes.**>
          @type grep
          <exclude>
            key message
            pattern /^\[UserRolesAndPermissionsClient#fetchUserRolesAndPermissions\]/
          </exclude>
        </filter>
        <filter kubernetes.**>
          @type grep
          <exclude>
            key message
            pattern /^\[Consumer clientId=gist-service-.+ groupId=myGroup\]/
          </exclude>
        </filter>
        <filter kubernetes.**>
          @type grep
          <exclude>
            key message
            pattern /Inside method fetchUserRolesAndPermissions to fetch user permissions and child role/
          </exclude>
        </filter>
        <filter kubernetes.**>
          @type grep
          <exclude>
            key message
            pattern /(START|END) request GET:/v1/live-view/data/
          </exclude>
        </filter>
        <filter kubernetes.**>
          @type grep
          <exclude>
            key message
            pattern /^x-company-userinfo token null$/
          </exclude>
        </filter>
        <filter kubernetes.**>
          @type grep
          <exclude>
            key message
            pattern /DefaultDnsRawRecord/
          </exclude>
        </filter>
        <filter kubernetes.**>
          @type grep
          <exclude>
            key message
            pattern /DefaultDnsQuestion/
          </exclude>
        </filter>
        <filter kubernetes.var.log.containers.gist-**>
          @type grep
          <exclude>
            key message
            pattern /Get scheduled entry/
          </exclude>
        </filter>
        <filter kubernetes.**.database-*-0_*_database-**>
          @type concat
          key message
          multiline_start_regexp /^\d{4}(-\d{1,2}){2} \d{2}(:\d{1,2}){2}\.\d{3} [A-Z]{3} \[\d*\]/
          flush_interval 5
          timeout_label @NORMAL
        </filter>
        <filter kubernetes.**.database-*-0_*_database-**>
          @type grep
            <exclude>
              key message
              pattern /unsupported\ frontend\ protocol/
            </exclude>
        </filter>
        <filter kubernetes.var.log.containers.**>
          @type concat
          key message
          stream_identity_key "$.docker.container_id"
          multiline_start_regexp /^(\[(TRACE|DEBUG|INFO|WARN|ERROR|FATAL)\] )?((\[)?(\d{4}(-\d{1,2}){2})|(\d{1,2}-\D{3}-\d{4}))[T\ ]\d{1,2}(:\d{1,2}){2}([\.\,]\d{3,6})?(\])?/
          flush_interval 5
          timeout_label @NORMAL
        </filter>
        <filter kubernetes.var.log.containers.licensing-**>
          @type concat
          key message
          multiline_start_regexp /^\[?20\d{2}(-\d{1,2}){2}/
          flush_interval 5
          timeout_label @NORMAL
        </filter>
        <filter kubernetes.var.log.containers.licensing-**>
          @type parser
          key_name message
          reserve_data true
          reserve_time true
          <parse>
            @type multiline
            format_firstline /^\[?20\d{2}(-\d{1,2}){2}/
            format1 /^(?<message_time>\[?20\d{2}(-\d{1,2}){2}( |T){1}\d{1,2}:\d{1,2}:\d{2}\.\d{3}Z?\]?) (?<message>.*)/
          </parse>
        </filter>
        <filter kubernetes.var.log.containers.**-media-service-**>
          @type grep
          <exclude>
            key message
            pattern /^Successfully connected to ssl/
          </exclude>
        </filter>
        <filter kubernetes.var.log.containers.**-gateway-**>
          @type grep
          <exclude>
            key message
            pattern /^{/
          </exclude>
        </filter>
        <filter kubernetes.var.log.containers.**-gateway-**>
          @type grep
          <exclude>
            key message
            pattern /\[39mDEBUG/
          </exclude>
        </filter>
        <filter kubernetes.var.log.containers.**-gateway-**>
          @type concat
          key message
          multiline_start_regexp /^.?(\[(?!entity|binary)|\d{1,3}\.\d{1,3}\.)/
          flush_interval 5
          timeout_label @NORMAL
        </filter>
        <filter kubernetes.var.log.containers.**-gateway-**>
          @type parser
          key_name message
          reserve_data true
          reserve_time true
          <parse>
            @type multiline
            format_firstline /^.?(\[(?!entity|binary)|\d{1,3}\.\d{1,3}\.)/
            format1 /^.?(\[(?!entity|binary)|\d{1,3}\.\d{1,3}\.)(?<message>.*)/
          </parse>
        </filter>

Anything else we need to know:
Getting the error below in fluentd-elasticsearch logs

{"retry_time":2,"next_retry_seconds":"2023-06-15 13:49:05 3634480173375825512677/4398046511104000000000 -0400","chunk":"5fc7957e2e81bbff46b3b84856ab18e9","error" │
│ :"#<Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure: could not push logs to Elasticsearch cluster ({:host=>\"my-cluster.us-east-1.aws.found.io\", :port=>9243, :scheme=>\"htt │
│ ps\", :user=>\"kubernetes\", :password=>\"obfuscated\", :path=>\"\"}): [400] {\"error\":{\"root_cause\":[{\"type\":\"parse_exception\",\"reason\":\"Failed to parse content to map\"}],\"type\":\"parse_exception\ │
│ ",\"reason\":\"Failed to parse content to map\",\"caused_by\":{\"type\":\"json_parse_exception\",\"reason\":\"Invalid UTF-8 middle byte 0x5c\\n at [Source: (byte[])\\\"{\\\"stream\\\":\\\"stderr\\\",\\\"log\\\" │
│ :\\\"2023/05/25 00:03:00 [error] 66#66: *704 broken header: \\\\\\\"\\\\u0016\\\\u0003\\\\u0001\\\\u0000�\\\\u0001\\\\u0000\\\\u0000�\\\\u0003\\\\u0003�\\\\u0010�,���hK�S-\\\\u007F�\\\\u000e�m\\\\\\\\���Mlۿ��pT │
│ �� 4��\\\\u0018��]d��8^%�AJH�u+]��6Vd\\\\u0007�t�%\\\\u0000&�+�/�,�0̨̩�\\\\t�\\\\u0013�\\\",\\\"docker\\\":{\\\"container_id\\\":\\\"bb7888033a32580cae2256fb6f081fd64f28e3aa7a97da38e16cfd6dd65a2305\\\"},\\\"kuber │
│ netes\\\":{\\\"container_name\\\":\\\"controller\\\",\\\"namespace_name\\\":\\\"nginx-internal\\\",\\\"pod_name\\\":\\\"ingress-nginx-internal-controller-5cb994bc6b-z29gr\\\",\\\"contain\\\"[truncated 1295 byte │
│ s]; line: 1, column: 110]\"}},\"status\":400}>","message":"[elasticsearch] failed to flush the buffer. retry_time=2 next_retry_seconds=2023-06-15 13:49:05 3634480173375825512677/4398046511104000000000 -0400 chu │
│ nk=\"5fc7957e2e81bbff46b3b84856ab18e9\" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error=\"could not push logs to Elasticsearch cluster ({:host=>\\\"mycluster.us-east-1.aws.found.io\\\", :port=>9243, :scheme=>\\\"https\\\", :user=>\\\"kubernetes\\\", :password=>\\\"obfuscated\\\", :path=>\\\"\\\"}): [400] {\\\"error\\\":{\\\"root_cause\\\":[{\\\"type\\\":\\\"pars │
│ e_exception\\\",\\\"reason\\\":\\\"Failed to parse content to map\\\"}],\\\"type\\\":\\\"parse_exception\\\",\\\"reason\\\":\\\"Failed to parse content to map\\\",\\\"caused_by\\\":{\\\"type\\\":\\\"json_parse_ │
│ exception\\\",\\\"reason\\\":\\\"Invalid UTF-8 middle byte 0x5c\\\\n at [Source: (byte[])\\\\\\\"{\\\\\\\"stream\\\\\\\":\\\\\\\"stderr\\\\\\\",\\\\\\\"log\\\\\\\":\\\\\\\"2023/05/25 00:03:00 [error] 66#66: *70 │
│ 4 broken header: \\\\\\\\\\\\\\\"\\\\\\\\u0016\\\\\\\\u0003\\\\\\\\u0001\\\\\\\\u0000\\uFFFD\\\\\\\\u0001\\\\\\\\u0000\\\\\\\\u0000\\uFFFD\\\\\\\\u0003\\\\\\\\u0003\\uFFFD\\\\\\\\u0010\\uFFFD,\\uFFFD\\uFFFD\\uF │
│ FFDhK\\uFFFDS-\\\\\\\\u007F\\uFFFD\\\\\\\\u000e\\uFFFDm\\\\\\\\\\\\\\\\\\uFFFD\\uFFFD\\uFFFDMl\\u06FF\\uFFFD\\uFFFDpT\\uFFFD\\uFFFD 4\\uFFFD\\uFFFD\\\\\\\\u0018\\uFFFD\\uFFFD]d\\uFFFD\\uFFFD8^%\\uFFFDAJH\\uFFFD │
│ u+]\\uFFFD\\uFFFD6Vd\\\\\\\\u0007\\uFFFDt\\uFFFD%\\\\\\\\u0000&\\uFFFD+\\uFFFD/\\uFFFD,\\uFFFD0\\u0329\\u0328\\uFFFD\\\\\\\\t\\uFFFD\\\\\\\\u0013\\uFFFD\\\\\\\",\\\\\\\"docker\\\\\\\":{\\\\\\\"container_id\\\\\ │
│ \\":\\\\\\\"bb7888033a32580cae2256fb6f081fd64f28e3aa7a97da38e16cfd6dd65a2305\\\\\\\"},\\\\\\\"kubernetes\\\\\\\":{\\\\\\\"container_name\\\\\\\":\\\\\\\"controller\\\\\\\",\\\\\\\"namespace_name\\\\\\\":\\\\\\\ │
│ "nginx-internal\\\\\\\",\\\\\\\"pod_name\\\\\\\":\\\\\\\"ingress-nginx-internal-controller-5cb994bc6b-z29gr\\\\\\\",\\\\\\\"contain\\\\\\\"[truncated 1295 bytes]; line: 1, column: 110]\\\"}},\\\"status\\\":400} │
│ \""}                                                                                               

I had added the following config changes to the nginx ingress controller with no luck

log-format-escape-json: true
log-format-upstream: {"timestamp":"$time_iso8601","network":{"forwarded_ip":"$http_x_forwarded_for"},"user":{"name":"$remote_user"},"user_agent":{"original":"$http_user_agent"},"http":{"version":"$server_protocol","request":{"body":{"bytes":"$body_bytes_sent"},"bytes":"$request_length","method":"$request_method","referrer":"$http_referer"},"response":{"body":{"bytes":"$body_bytes_sent"},"bytes":"$bytes_sent","status_code":"$status","time":"$request_time"},"upstream":{"bytes":"$upstream_response_length","status_code":"$upstream_status","time":"$upstream_response_time","address":"$upstream_addr","name":"$proxy_upstream_name"}},"url":{"domain":"$host","path":"$uri","query":"$args","original":"$request_uri"}}

Nginx outputs valid Json

@max-blue max-blue added the bug Something isn't working label Jun 15, 2023
@max-blue max-blue changed the title issue title Issues with Fluentd shipping nginx logs Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

1 participant