Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation faults when using shapes with automated user Debug Tools #9352

Open
louwrentius opened this issue Jun 26, 2024 · 7 comments
Open
Labels
24.04 bug Something isn't working

Comments

@louwrentius
Copy link

Describe the Bug

This issue originates from this forum thread

During our investigation on our Staging/Acceptance environment for another stability issue, we found out we can reproducibly crash coolwsd with a segmentation fault, when we run the automated typer with two simultaneous users in the same document.

Per instruction on the forum thread, I've tried to collect information with gdb (see attachments). Please note that I'm inexperienced with gdb and may require additional guidance to collect the information you need to troubleshoot this issue.

Steps to Reproduce

  1. Go to Nextcloud Files
  2. Open a (new or existing) document for testing
  3. Start the Debug Tools (CTRL-Shift-ALT-D)
  4. Under Functionality: check 'Typer'
  5. Under Automated User, check both 'Enable automated user' and 'Insert and delete shape'
  6. The Debug Tools will start to type content and insert/delete random shapes.
  7. Now, use a second user account to open the same document and follow steps three to five. In our case, this triggers a segmentation fault and (usually) an automated restart of the coolwsd service.

Expected Behavior

No crash / segmentation faults.

Actual Behavior

As the second user enables the automated user and the 'insert and delete shape' option in the debug tools, it takes mere seconds to 'crash' coolwsd / cooforkit.

Attachments

make-run-output.txt.zip
coolwsd-SIGSEGV-snippet.txt
coolwsd-gdb-trace.txt

Desktop

  • Collabora version: compiled from source (main branch) on June 24th
  • OS and version: Debian 12
  • Browser and version: any browser

This issue is also confirmed on ubuntu 22.04 with .deb from collabora mirror: 24.04.4-1

@louwrentius louwrentius added bug Something isn't working unconfirmed labels Jun 26, 2024
@mmeeks
Copy link
Contributor

mmeeks commented Jun 28, 2024

Great catch. If you've built yourself; did you build with debugging symbols - and can you walk back through your stack-trace with:

addr2line -e + /usr/src/instdir/program/libmergedlo.so 0x7f117359d782

etc. - up the stack-trace; sadly it's not so easy to do stack unwinding and symbol resolution reliably in the signal handler where the trace is dumped, and source file offsets are very dependent on compiler version & options.

@mmeeks
Copy link
Contributor

mmeeks commented Jun 28, 2024

With a self-build here:
COOLWSD version: 24.04.4.2 git hash: 1a08eff
LOKit version: Collabora OfficeDev 24.04.4.2 git hash: 12ff439
Served by: openSUSE Leap 15.5

Writer: survives minutes with no problems.

@mmeeks
Copy link
Contributor

mmeeks commented Jun 28, 2024

Eventually got it:
#5 0x00007f986a050910 in () at /lib64/libpthread.so.0
#6 VclReferenceBase::acquire() const (this=) at include/vcl/vclreferencebase.hxx:37
#7 rtl::Referencevcl::Window::set(vcl::Window*) (this=0x55e75696c570, pBody=0x0) at include/rtl/ref.hxx:139
#8 VclPtrvcl::Window::operator=(vcl::Window*) (this=0x55e75696c570, pBody=0x0) at include/vcl/vclptr.hxx:176
#9 vcl::Cursor::ImplDoShow(bool, bool) (this=0x55e756d649e0, bDrawDirect=true, bRestore=false) at vcl/source/window/cursor.cxx:227
#10 0x00007f986729383a in SdrObjEditView::SdrEndTextEdit(bool) (this=0x55e75135ace0, bDontDeleteReally=) at svx/source/svdraw/svdedxv.cxx:1798
#11 0x00007f9859e319c1 in SwFEShell::EndTextEdit() (this=0x55e751356dd0) at sw/source/core/frmedt/feshview.cxx:1268
#12 0x00007f985a3e08f1 in SwEditWin::EnterDrawMode(MouseEvent const&, Point const&) (this=this@entry=0x55e756371b00, rMEvt=..., aDocPos=Point = {...}) at sw/source/uibase/docvw/edtwin.cxx:5531
#13 0x00007f985a3de3c1 in SwEditWin::MouseButtonDown(MouseEvent const&) (this=0x55e756371b00, _rMEvt=) at sw/source/uibase/docvw/edtwin.cxx:3230
#14 0x00007f98681b2d18 in (anonymous namespace)::LOKPostAsyncEvent(void*, void*) (pEv=pEv@entry=0x55e756a0bfc0) at sfx2/source/view/lokhelper.cxx:924
#15 0x00007f98681b19b9 in (anonymous namespace)::postEventAsync((anonymous namespace)::LOKAsyncEventData*) (pEvent=0x55e756a0bfc0) at sfx2/source/view/lokhelper.cxx:973
#16 0x00007f985a547441 in SwXTextDocument::postMouseEvent(int, int, int, int, int, int) (this=, nType=0, nX=8415, nY=2389, nCount=1, nButtons=1, nModifier=) at sw/source/uibase/uno/unotxdoc.cxx:3764
#17 0x00007f986997a503 in doc_postMouseEvent(_LibreOfficeKitDocument*, int, int, int, int, int, int) (pThis=0x55e7564356d0, nType=0, nX=8415, nY=2389, nCount=1, nButtons=0, nModifier=0) at desktop/source/lib/init.cxx:5481
#18 0x000055e74f842643 in lok::Document::postMouseEvent(int, int, int, int, int, int) (this=0x55e756323670, nType=0, nX=8415, nY=2389, nCount=1, nButtons=1, nModifier=0) at /opt/libreoffice/co-24.04/include/LibreOfficeKit/LibreOfficeKit.hxx:297
#19 0x000055e74f82b28c in ChildSession::mouseEvent(StringVector const&, LokEventTargetEnum) (this=0x55e75628b6b0, tokens=..., target=LokEventTargetEnum::Document) at kit/ChildSession.cpp:1797
#20 0x000055e74f80ec8d in ChildSession::_handleInput(char const*, int) (this=0x55e75628b6b0, buffer=0x55e7580714a0 "mouse type=buttondown x=8415 y=2389.9999199999997 count=1 buttons=1 modifier=0modifier=0!", length=78) at kit/ChildSession.cpp:543
#21 0x000055e74f93ba68 in Session::handleMessage(std::vector<char, std::allocator > const&) (this=0x55e75628b6b0, data=std::vector of length 78, capacity 78 = {...}) at common/Session.cpp:288
#22 0x000055e74f8b24c2 in Document::forwardToChild(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::vector<char, std::allocator > const&) (this=0x55e75614b800, prefix="child-064", payload=std::vector of length 88, capacity 88 = {...}) at kit/Kit.cpp:2035
#23 0x000055e74f8b5b64 in Document::drainQueue() (this=0x55e75614b800) at kit/Kit.cpp:2258

@mmeeks
Copy link
Contributor

mmeeks commented Jun 28, 2024

in frame #10 SdrObjEditView::SdrEndTextEdit(bool) (this=0x55e75135ace0, bDontDeleteReally=) at svx/source/svdraw/svdedxv.cxx:1798

(gdb) p pTECursorBuffer
$11 = (vcl::Cursor *) 0x55e756d649e0
(gdb) p pTextEditCursorBuffer
$12 = (vcl::Cursor *) 0x0

Quite why we duplicate the member pointer into a local variable and then use it later is not clear to me at a glance. Interesting that dates back to the 1st checkin in 2000 when it was:

  • Cursor* pTECursorMerker=pTextEditCursorMerker;

I guess new tests find old problems :-) Potentially a reference counted type would help here.

@louwrentius
Copy link
Author

louwrentius commented Jun 28, 2024

Thanks for the investigation. I will try and recompile with debug symbols enabled and use addr2line to give you the output you need.

The file you were referencing with the addr2line command is part of this file:

wget https://github.com/CollaboraOnline/online/releases/download/for-code-assets/core-co-24.04-assets.tar.gz

And I think that's not build with symbols?

Just to set expectations, I have no experience writing C++ or debugging it so I may need elementary school level instructions to help you.

@mmeeks mmeeks added 24.04 and removed unconfirmed labels Jun 28, 2024
@mmeeks
Copy link
Contributor

mmeeks commented Jun 28, 2024

It's fine - I have the trace now :-) The hope was that:

diff --git a/svx/source/svdraw/svdedxv.cxx b/svx/source/svdraw/svdedxv.cxx
index 4afe96b4c657..6f8bc13cf879 100644
--- a/svx/source/svdraw/svdedxv.cxx
+++ b/svx/source/svdraw/svdedxv.cxx
@@ -1566,7 +1566,7 @@ SdrEndTextEditKind SdrObjEditView::SdrEndTextEdit(bool bDontDeleteReally)
 {
     SdrEndTextEditKind eRet = SdrEndTextEditKind::Unchanged;
     rtl::Reference<SdrTextObj> pTEObj = mxWeakTextEditObj.get();
-    vcl::Window* pTEWin = mpTextEditWin;
+    VclPtr<vcl::Window> pTEWin = mpTextEditWin;
     OutlinerView* pTEOutlinerView = mpTextEditOutlinerView;
     vcl::Cursor* pTECursorBuffer = pTextEditCursorBuffer;
     SdrUndoManager* pUndoEditUndoManager = nullptr;

would fix it. But it seems it's a tougher nut. Perhaps @quikee has some ideas - but we prolly need to run this under ASAN or do a chunk more code-reading & analysis it seems.

@mmeeks
Copy link
Contributor

mmeeks commented Jun 28, 2024

==25265==ERROR: AddressSanitizer: heap-use-after-free on address 0x606026819b74 at pc 0x7f4c66961cd2 bp 0x7ffd4b931eb0 sp 0x7ffd4b931ea8
READ of size 1 at 0x606026819b74 thread T0 (kitbroker_004)
#0 0x7f4c66961cd1 (/opt/collaboraoffice/program/libmergedlo.so+0x142accd1)
/home/collabora/online-buildscripts/staging/builddir/libreoffice/vcl/source/window/cursor.cxx:200
#1 0x7f4c64d640a0 (/opt/collaboraoffice/program/libmergedlo.so+0x126af0a0)
#2 0x7f4c212a499a (/opt/collaboraoffice/program/../program/libswlo.so+0x514799a)
#3 0x7f4c2366cb26 (/opt/collaboraoffice/program/../program/libswlo.so+0x750fb26)
#4 0x7f4c622c3af7 (/opt/collaboraoffice/program/libmergedlo.so+0xfc0eaf7)
#5 0x7f4c622c5c97 (/opt/collaboraoffice/program/libmergedlo.so+0xfc10c97)
#6 0x7f4c622fadb5 (/opt/collaboraoffice/program/libmergedlo.so+0xfc45db5)
#7 0x7f4c622faaed (/opt/collaboraoffice/program/libmergedlo.so+0xfc45aed)
#8 0x7f4c62b9c49b (/opt/collaboraoffice/program/libmergedlo.so+0x104e749b)
#9 0x7f4c66dd36e5 (/opt/collaboraoffice/program/libmergedlo.so+0x1471e6e5)
#10 0x7f4c66dc4ec3 (/opt/collaboraoffice/program/libmergedlo.so+0x1470fec3)
...
0x606026819b74 is located 52 bytes inside of 64-byte region [0x606026819b40,0x606026819b80)
freed by thread T0 (kitbroker_004) here:
#0 0x8ca308 (/usr/bin/coolforkit+0x8ca308)
/home/collabora/lode/packages/llvm-llvmorg-12.0.1.src/compiler-rt/lib/asan/asan_new_delete.cpp:172 (discriminator 5)
#1 0x7f4c5ff2f4ae (/opt/collaboraoffice/program/libmergedlo.so+0xd87a4ae)
/home/collabora/online-buildscripts/staging/builddir/libreoffice/include/o3tl/deleter.hxx:46
previously allocated by thread T0 (kitbroker_004) here:
#0 0x8c9408 (/usr/bin/coolforkit+0x8c9408)
#1 0x7f4c5fe97cc0 (/opt/collaboraoffice/program/libmergedlo.so+0xd7e2cc0)
/home/collabora/online-buildscripts/staging/builddir/libreoffice/editeng/source/editeng/impedit.hxx:1353

@timar staging has a terrible number of context frames configured in ASAN for allocations & frees - can we bump that up to at least 10 frames ? what we have here is not that helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
24.04 bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

2 participants