Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHA3 randomlly crashes on GitHub MacOS x64 action runner #3843

Open
marty1885 opened this issue Dec 15, 2023 · 7 comments
Open

SHA3 randomlly crashes on GitHub MacOS x64 action runner #3843

marty1885 opened this issue Dec 15, 2023 · 7 comments

Comments

@marty1885
Copy link

marty1885 commented Dec 15, 2023

Hi,

A library that I maintain optionally depends on Botan-3. We have been finding that the SHA3 test randomly crashes on our CI runs with the following error. And only SHA3 is failing. All other hashes we test (MD5, SHA256, SHA1, BLAKE2b) and OS (Linux) runs as expected. I've also verified this is not a null dereference on my side as the returned hasher is null-checked. Nor the input parameter is invalid.

       Start 55: Hash.SHA256
55/57 Test #55: Hash.SHA256 ............................   Passed    0.02 sec
      Start 56: Hash.SHA3
56/57 Test #56: Hash.SHA3 ..............................***Exception: Illegal  0.01 sec
      Start 57: Hash.BLAKE2b
57/57 Test #57: Hash.BLAKE2b ...........................   Passed    0.02 sec

98% tests passed, 1 tests failed out of 57

Errors while running CTest
Total Test time (real) =   0.95 sec
Output from these tests are in: /Users/runner/work/trantor/trantor/build/Testing/Temporary/LastTest.log

Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
The following tests FAILED:
make: *** [test] Error 8
	 56 - Hash.SHA3 (ILLEGAL)

You can find the full CI run here. https://github.com/an-tao/trantor/actions/runs/7224966423/job/19687414958?pr=309
And the code in question: an-tao/trantor#309

@randombit
Copy link
Owner

I don't see anything wrong with your usage.

The interesting thing here is that you say it fails only randomly or occasionally. Otherwise I would immediately suspect a miscompilation of some kind.

Are you able to get a backtrace?

make: *** [test] Error 8
	 56 - Hash.SHA3 (ILLEGAL)

could this be related to a SIGILL? I'm wondering if we are somehow jumping to the BMI2 enabled codepath on a machine that doesn't support it.

@marty1885
Copy link
Author

here's the backtrace I'm able to get. I added a signal handler to catch SIGILL. But don't have the debug symbols for Botan. Looks it jumped to NULL?

0   hash_unittest                       0x000000010f9c1380 _ZZ4mainENK3$_0clEi + 32
1   hash_unittest                       0x000000010f9c1359 _ZZ4mainEN3$_08__invokeEi + 9
2   libsystem_platform.dylib            0x00007ff812e90dfd _sigtramp + 29
3   ???                                 0x0000000000000000 0x0 + 0
4   libbotan-3.2.2.0.dylib              0x00000001102a7a9f _ZN5Botan18Keccak_Permutation7permuteEv + 89
5   libbotan-3.2.2.0.dylib              0x0000000110239c61 _ZN5Botan5SHA_312final_resultENSt3__14spanIhLm18446744073709551615EEE + 31
6   hash_unittest                       0x000000010f9c2dc2 _ZN7trantor5utils4sha3EPKvm + 98
7   hash_unittest                       0x000000010f9c09dc _ZN14Hash_SHA3_Test8TestBodyEv + 60
8   hash_unittest                       0x000000010fa1a06b _ZN7testing8internal38HandleSehExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc + 123
9   hash_unittest                       0x000000010f9def4a _ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc + 106
10  hash_unittest                       0x000000010f9dee93 _ZN7testing4Test3RunEv + 195
11  hash_unittest                       0x000000010f9dfef2 _ZN7testing8TestInfo3RunEv + 290
12  hash_unittest                       0x000000010f9e0fcd _ZN7testing9TestSuite3RunEv + 317
13  hash_unittest                       0x000000010f9f027d _ZN7testing8internal12UnitTestImpl11RunAllTestsEv + 1005
14  hash_unittest                       0x000000010fa1eeab _ZN7testing8internal38HandleSehExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc + 123
15  hash_unittest                       0x000000010f9efc2a _ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc + 106
16  hash_unittest                       0x000000010f9efb15 _ZN7testing8UnitTest3RunEv + 197
17  hash_unittest                       0x000000010f9c1156 main + 54
18  dyld                                0x000000011beac52e start + 462

@reneme
Copy link
Collaborator

reneme commented Dec 18, 2023

I noticed that you're using a sort-of strong type to write the output value to. The type is defined as:

struct Hash256
{
    unsigned char bytes[32];
};

... and you're using it like so:

template <typename Hash>
inline bool attemptHash(const std::string_view& name,
                        Hash& hash,
                        const void* data,
                        size_t len)
{
    auto hashFunction = Botan::HashFunction::create(name);
    // [...]

    hashFunction->update((const unsigned char*)data, len);
    hashFunction->final((unsigned char*)&hash);
    //                  ~~~ assumes that address of object is equal to address of buffer ~~~
    return true;
}

Hash256 sha3(const void* data, size_t len)
{
    Hash256 hash;
    if (attemptHash("SHA-3(256)", hash, data, len))
        return hash;

    // [...]
    return hash
}

You're assuming that the address of the member bytes is always equal to the address of the object hash. Now, I'm not saying that this is illegal and must be the cause of the sporadic crashes. Nevertheless, it strikes me as quite unusual, to be honest.

Maybe explicitly take the address of hash.bytes (in attemptHash) at least?

@marty1885
Copy link
Author

You are right but I think it is unrelated. Even under extreme conditions, the object size cannot be smaller then the data I declared. Even if there's a huge buffer between the beginning of the object and the data, it will still be valid memory.

@reneme
Copy link
Collaborator

reneme commented Dec 19, 2023

I successfully reproduced the crash in my fork, and after fixing the UB I pointed out, the crash seems to go away. Even running the unit tests many times in a single build job and/or restarting the build job several times didn't trigger the crash anymore. When I re-introduce the UB it comes back quite consistently.

Even under extreme conditions, the object size cannot be smaller then the data I declared.

Technically, I agree with you. And unfortunately, I can't explain why this UB seems to cause these spurious crashes. I'm going to blame some compiler optimizations going rogue.

Please try the fix in your branch.

@reneme
Copy link
Collaborator

reneme commented Feb 5, 2024

@marty1885 Any news here? Otherwise, I'd like to close the issue

@marty1885
Copy link
Author

Hey, sorry for the delay. It was CNY and stuff. I made a new branch and applied the changes. Still having the issue. PR is on the following link

an-tao/trantor#322

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants