You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'be encountered multiple occasions on random hosts where filebeat had inexplicably started to re-consume a log file from the beginning for no apparent reason. For reference, filebeat is configured to publish the collected logs to kafka. I was able to confirm that filebeat is in fact re-consuming the log file by observing the kafka metadata in our logs (includes kf timestamp, topic, and offset) as well as the byte offset included by filebeat. It's also guaranteed that this has not been due to a consumer group offset reset as the duplicate entries had the identical byte offset and our tppic retention period is only 12h. The duplicated log entries were often from logs which were generated weeks or even months in the past.
I realize that a change of inode or deviceid could cause a file to be re-consumed. Even though I am highly doubtful this is the case, I would like to be able to confirm this without a doubt. I realize this could be obtained by consuming the registry file although I'm not convinced that method would be appropriate or work properly. Even if it did, the volume of changes to this file can be very large especially on hosts with thousand of log files to tail. This would essential result in a massive unnecessary increase of logs being shipped and add substantial stress on elasticsearch. As a better option, I propose adding the file inode and deviceid ascmeta fields sent along with each log entry, which could be optionally enabled. I can't see this as having a negative impact on performance and could allow for improved diagnostics for issues such as this one.
The text was updated successfully, but these errors were encountered:
We are in the process of updating the filebeat configuration across all hosts from the legacy log input to the new filestream input. Once completed, we'll be able to specify the most appropriate fingerprint option depending on the usecase. It would still be beneficial to have the ability to optionally add that information to the log event. In a worst case, having at least the ability to add the resulting fingerprint value should be ok.
I'be encountered multiple occasions on random hosts where filebeat had inexplicably started to re-consume a log file from the beginning for no apparent reason. For reference, filebeat is configured to publish the collected logs to kafka. I was able to confirm that filebeat is in fact re-consuming the log file by observing the kafka metadata in our logs (includes kf timestamp, topic, and offset) as well as the byte offset included by filebeat. It's also guaranteed that this has not been due to a consumer group offset reset as the duplicate entries had the identical byte offset and our tppic retention period is only 12h. The duplicated log entries were often from logs which were generated weeks or even months in the past.
I realize that a change of inode or deviceid could cause a file to be re-consumed. Even though I am highly doubtful this is the case, I would like to be able to confirm this without a doubt. I realize this could be obtained by consuming the registry file although I'm not convinced that method would be appropriate or work properly. Even if it did, the volume of changes to this file can be very large especially on hosts with thousand of log files to tail. This would essential result in a massive unnecessary increase of logs being shipped and add substantial stress on elasticsearch. As a better option, I propose adding the file inode and deviceid ascmeta fields sent along with each log entry, which could be optionally enabled. I can't see this as having a negative impact on performance and could allow for improved diagnostics for issues such as this one.
The text was updated successfully, but these errors were encountered: