Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does this pyelftools support for multi thread handling? #465

Open
ytxloveyou opened this issue Apr 6, 2023 · 5 comments
Open

Does this pyelftools support for multi thread handling? #465

ytxloveyou opened this issue Apr 6, 2023 · 5 comments

Comments

@ytxloveyou
Copy link

When the input elf files are very large, the Processing time becomes very long.... So i am wondering if it could support for multi thread or process handling....

@sevaa
Copy link
Contributor

sevaa commented Apr 11, 2023

Python is notoriously single threaded :)

Pyelftools tries to lazy-load where possible. What's the usage scenario exactly? ELF proper parsing, or DWARF? For one thing, the DWARF handling portions of the library tend to load sections as a whole, with no support for progressive loading.

@ytxloveyou
Copy link
Author

ytxloveyou commented Apr 11, 2023

Python is notoriously single threaded :)

Pyelftools tries to lazy-load where possible. What's the usage scenario exactly? ELF proper parsing, or DWARF? For one thing, the DWARF handling portions of the library tend to load sections as a whole, with no support for progressive loading.


ok....thanks for feedback... i try to use it to re-construct and analye the variable type (like structure ,union or so on) from the elf dwarf info... when the elf file is less than 5mb, then it is fine. but if it is bigger, it might cost 5-15 minutes to go through all compute units and the content in it. So i am wondering whether the multi-process or multi-thread could help in this case....
If no, maybe i need to find out how to do it in C or some other ways to do multiple process job in the meantime.....

The reason why i raise this issues is that in our company(automotive product) we use vector toolchain( maybe you know it or not) it is called ASAP2 tool which could analyze ELF files to generate symbol with types. it runs very fast... I really want to know how could they achieve it..

@sevaa
Copy link
Contributor

sevaa commented Apr 11, 2023

So it's DWARF. Have you timed the execution - which call exactly is taking 15 minutes?

@ytxloveyou
Copy link
Author

Just assume that the there are more than 10000 compute units , so by calling cu_iter it might cost 5 minutes, then for each comput units , if i need to export all variable with its types, then i need to loop through all the DT members to find out the relevant type definition, this process might need 10 minutes in some cases... something like it ... maybe i did something wrong...

@sevaa
Copy link
Contributor

sevaa commented Apr 11, 2023

So off the top of my head, one thing you can try is iterating through DIEs smarter. Are you after all variables, or all static lifetime variables? Globals, static class members, or both? When it comes to modern DWARF, navigating to a sibling is a fast operation. You could try that instead of scrolling through all DIEs.

Iterating between compile units is not a long operation per se - there is a linked list-like data structure there, going from a CU to the next CU is fast. That said, the section needs to be loaded first, and that's a time consuming piece of I/O.

One more thing I thought of, you might be able to slice some time from I/O by somewhat reimplementing get_DWARF_info() - don't load the miscellaneous sections. There is no built-in support for lazy loading of those (that I know of), you'd have to roll your own. It's a good idea for improvement, though. Depending on what kind of information you want to dump, though, loclists might be necessary,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants