Ability to include file inode and deviceid in log meta fields #40056

hartfordfive · 2024-06-28T12:21:15Z

I'be encountered multiple occasions on random hosts where filebeat had inexplicably started to re-consume a log file from the beginning for no apparent reason. For reference, filebeat is configured to publish the collected logs to kafka. I was able to confirm that filebeat is in fact re-consuming the log file by observing the kafka metadata in our logs (includes kf timestamp, topic, and offset) as well as the byte offset included by filebeat. It's also guaranteed that this has not been due to a consumer group offset reset as the duplicate entries had the identical byte offset and our tppic retention period is only 12h. The duplicated log entries were often from logs which were generated weeks or even months in the past.

I realize that a change of inode or deviceid could cause a file to be re-consumed. Even though I am highly doubtful this is the case, I would like to be able to confirm this without a doubt. I realize this could be obtained by consuming the registry file although I'm not convinced that method would be appropriate or work properly. Even if it did, the volume of changes to this file can be very large especially on hosts with thousand of log files to tail. This would essential result in a massive unnecessary increase of logs being shipped and add substantial stress on elasticsearch. As a better option, I propose adding the file inode and deviceid ascmeta fields sent along with each log entry, which could be optionally enabled. I can't see this as having a negative impact on performance and could allow for improved diagnostics for issues such as this one.

ycombinator · 2024-06-28T18:23:51Z

I realize that a change of inode or deviceid could cause a file to be re-consumed.

Have you considered using the filestream input with the file_identity: fingerprint setting? It was built specifically to address the inode change situation; you can read more about it in this blog post.

elasticmachine · 2024-06-28T18:24:00Z

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

hartfordfive · 2024-06-29T13:19:45Z

We are in the process of updating the filebeat configuration across all hosts from the legacy log input to the new filestream input. Once completed, we'll be able to specify the most appropriate fingerprint option depending on the usecase. It would still be beneficial to have the ability to optionally add that information to the log event. In a worst case, having at least the ability to add the resulting fingerprint value should be ok.

pierrehilbert · 2024-06-30T17:55:29Z

Wouldn't the work done #36065 solve your issue?

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 28, 2024

ycombinator added question Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Jun 28, 2024

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to include file inode and deviceid in log meta fields #40056

Ability to include file inode and deviceid in log meta fields #40056

hartfordfive commented Jun 28, 2024

ycombinator commented Jun 28, 2024

elasticmachine commented Jun 28, 2024

hartfordfive commented Jun 29, 2024

pierrehilbert commented Jun 30, 2024

Ability to include file inode and deviceid in log meta fields #40056

Ability to include file inode and deviceid in log meta fields #40056

Comments

hartfordfive commented Jun 28, 2024

ycombinator commented Jun 28, 2024

elasticmachine commented Jun 28, 2024

hartfordfive commented Jun 29, 2024

pierrehilbert commented Jun 30, 2024