Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to include file inode and deviceid in log meta fields #40056

Open
hartfordfive opened this issue Jun 28, 2024 · 4 comments
Open

Ability to include file inode and deviceid in log meta fields #40056

hartfordfive opened this issue Jun 28, 2024 · 4 comments
Labels
question Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@hartfordfive
Copy link
Contributor

I'be encountered multiple occasions on random hosts where filebeat had inexplicably started to re-consume a log file from the beginning for no apparent reason. For reference, filebeat is configured to publish the collected logs to kafka. I was able to confirm that filebeat is in fact re-consuming the log file by observing the kafka metadata in our logs (includes kf timestamp, topic, and offset) as well as the byte offset included by filebeat. It's also guaranteed that this has not been due to a consumer group offset reset as the duplicate entries had the identical byte offset and our tppic retention period is only 12h. The duplicated log entries were often from logs which were generated weeks or even months in the past.

I realize that a change of inode or deviceid could cause a file to be re-consumed. Even though I am highly doubtful this is the case, I would like to be able to confirm this without a doubt. I realize this could be obtained by consuming the registry file although I'm not convinced that method would be appropriate or work properly. Even if it did, the volume of changes to this file can be very large especially on hosts with thousand of log files to tail. This would essential result in a massive unnecessary increase of logs being shipped and add substantial stress on elasticsearch. As a better option, I propose adding the file inode and deviceid ascmeta fields sent along with each log entry, which could be optionally enabled. I can't see this as having a negative impact on performance and could allow for improved diagnostics for issues such as this one.

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 28, 2024
@ycombinator
Copy link
Contributor

I realize that a change of inode or deviceid could cause a file to be re-consumed.

Have you considered using the filestream input with the file_identity: fingerprint setting? It was built specifically to address the inode change situation; you can read more about it in this blog post.

@ycombinator ycombinator added question Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Jun 28, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 28, 2024
@hartfordfive
Copy link
Contributor Author

We are in the process of updating the filebeat configuration across all hosts from the legacy log input to the new filestream input. Once completed, we'll be able to specify the most appropriate fingerprint option depending on the usecase. It would still be beneficial to have the ability to optionally add that information to the log event. In a worst case, having at least the ability to add the resulting fingerprint value should be ok.

@pierrehilbert
Copy link
Collaborator

Wouldn't the work done #36065 solve your issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

No branches or pull requests

4 participants