Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryError and no partitions found #41

Open
Frank071 opened this issue Sep 28, 2018 · 16 comments
Open

MemoryError and no partitions found #41

Frank071 opened this issue Sep 28, 2018 · 16 comments

Comments

@Frank071
Copy link

Frank071 commented Sep 28, 2018

RecuperaBit consistently drops out of warp with a memory error:

INFO:root:Found NTFS boot sector at sector 1317656575
INFO:root:First scan completed
INFO:root:Parsing MFT entries
Traceback (most recent call last):
  File "main.py", line 385, in <module>
    main()
  File "main.py", line 367, in main
    parts.update(scanner.get_partitions())
  File "/root/RecuperaBit-master/recuperabit/fs/ntfs.py", line 698, in get_partitions
    parsed = parse_file_record(dump)
  File "/root/RecuperaBit-master/recuperabit/fs/ntfs.py", line 149, in parse_file_record
    attributes = _attributes_reader(entry, header['off_first'])
  File "/root/RecuperaBit-master/recuperabit/fs/ntfs.py", line 109, in _attributes_reader
    attr, name = parse_mft_attr(entry[offset:])
  File "/root/RecuperaBit-master/recuperabit/fs/ntfs.py", line 79, in parse_mft_attr
    nonresident = unpack(attr, attr_nonresident_fmt)
  File "/root/RecuperaBit-master/recuperabit/utils.py", line 98, in unpack
    result[label] = formatter(data[low:high+1])
MemoryError

Carefully following the process via top I see that in fact the memory in use rises up to 4G and then the process dies. It looks as if it is really the amount of files (or whatever) it finds that causes this.

I found that 'partitioned_files' grows beyond 100000 so around line 720 in ntfs.py I added some code to stop the process when it reached 50000. This "solves" the MemoryError, but now I end up with "0 partitions found" :-(

I already used photorec on this imagefile and that resulted in a lot of files, but obviously all named cryptic, so I hoped recuperabit would rush to the rescue... Any help is appreciated (and if not then perhaps this can help so the tool will exit nicely when it grows irratically).

@Frank071
Copy link
Author

I see that '0 partitions' is also part of #17 - so it might be unrelated to my shortcutting the scan process.
Issue #17 ends with a request for a dump of a sector, but is still open. Would it help if I would dump some?

...
INFO:root:Found NTFS file record at sector 6296932
INFO:root:Found NTFS file record at sector 6296934
INFO:root:Found NTFS file record at sector 6296936
INFO:root:Found NTFS file record at sector 6296938
INFO:root:Found NTFS file record at sector 6296940
INFO:root:Found NTFS file record at sector 6296942
...

While at it: the save-file contains 3.7 million lines....

@Lazza
Copy link
Owner

Lazza commented Sep 28, 2018

Can you provide detailed specifications about your environment, the Python version (and implementation), system architecture, OS, etcetera?

Also, how large is the disk image you are analyzing?

Do you have enough swap?

@Frank071
Copy link
Author

The disk image is roughly 630GB. This is a Linux amd64 environment better known as sysresccd 4.14.32-std522 with python 3.6.3 (I am not sure what you mean with 'implementation'). The system has 24G available and that is not the limit (no OOM action or whatever), so swap is not used (the counter stays 0).

@Lazza
Copy link
Owner

Lazza commented Sep 29, 2018

with python 3.6.3

It shouldn't even run on Python 3, given that it uses Python 2 syntax.

I am not sure what you mean with 'implementation'

I meant cPython vs Pypy.

@Frank071
Copy link
Author

Frank071 commented Sep 30, 2018

It shouldn't even run on Python 3, given that it uses Python 2 syntax.

Sorry... my bad... I run it with Python 2 (2.7.14)

I am not sure what you mean with 'implementation'

I meant cPython vs Pypy.

cPython

@Lazza
Copy link
Owner

Lazza commented Oct 2, 2018

The fact that it stops at precisely 4GB looks a bit strange. Have you checked the Python executable is really 64bits? Could you try with Pypy?

Unfortunately some large disks or disks that were very fragmented currently require a lot of memory.

@Frank071
Copy link
Author

Frank071 commented Oct 3, 2018

Although 'uname -m' reports this system as being 64bit, all binaries - including python2 - are 32bit. So that is the 4G limit solved. As I am on systemrescuecd the possibilities are limited, 'pypy' for instance is not available. So I fear it stops here - although it would help if we could think of a mechanism that allows partial recovery.

@Lazza
Copy link
Owner

Lazza commented Oct 4, 2018

As I am on systemrescuecd the possibilities are limited

You could opt for a different live distro (ensuring it's 64-bit) and try if it still crashes. Pypy can be usually installed from the repositories of several distributions.

@Frank071
Copy link
Author

Frank071 commented Oct 5, 2018

I had to physically get to the system, but it is now running a proper 64bit environment and I have pypy available. That solves the crashing bit and it gets a great deal further (I see MATCH lines), but it gets killed off eventually due to being too memory consuming (> 16GB). It hasn't recovered anything when killed :(

@Lazza
Copy link
Owner

Lazza commented Oct 5, 2018

I wish I could say I have an easy solution for that, but currently... I don't. In the future RecuperaBit might (should ?) use a SQLite file to store thousands of information, artifacts, etc but that would require a rewrite that at this time I cannot promise due to lack of time.

What I can suggest as a workaround is one of the following:

  • Create some very large swap files and mount them (you can do as much as you want if you have space on a disk)
  • Edit RecuperaBit to prune partitions under a certain size (see this comment here)

@Frank071
Copy link
Author

Frank071 commented Oct 7, 2018

OK, adding swap did not help much (tried that before I read your comment) but the pruning bit did wonders. I now got through to the recovery state. Nice! Perhaps the pruning bit could be a command line option, or even some intelligent feature based on the number of partitions found?

@Lazza
Copy link
Owner

Lazza commented Oct 9, 2018

The problem is that with pruning you are discarding valuable information and you might be recovering less files (there is no way to figure out if those partitions you discard are indeed useless).

The user interface might definitely benefit from some improvement but also hopefully the backend can be optimized a bit.

@mirh
Copy link

mirh commented Feb 3, 2020

Like, I don't want to be that guy.. But is it really normal that a super humble 230GB image takes 5GB of ram to process?
Ok, yes it seems.

@Lazza
Copy link
Owner

Lazza commented Feb 6, 2020

@mirh I understand your concern, currently most of the processing for the reconstruction is done in RAM. Ideally, RecuperaBit should leverage a SQLite DB for much better efficiency.

@mirh
Copy link

mirh commented Feb 7, 2020

Would be cool to see that drop by a factor of 3 to 5...
Could also the process be multithreaded/parallelized somewhat? The competition seems to be "ready" as soon as it has ended the reading of the image. RB takes a very long time in addition instead.

@Lazza
Copy link
Owner

Lazza commented Jun 12, 2020

Regarding the process that figures out the partition boundaries, it could probably be parallelized partially. That is indeed quite an interesting suggestion.

I am really sad that in this period my amount of free time that I can dedicate to improving the tool is near zero. 😢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants