Closed Bug 1063723 Opened 10 years ago Closed 5 years ago

crash in nsDocumentViewer::OnDonePrinting @ address 0x5a5a5a5a/e5e5e5e9 use-afer-free

Categories

(Thunderbird :: General, defect)

30 Branch
x86
All
defect
Not set
critical

Tracking

(thunderbird_esr68 verified, thunderbird73 verified)

VERIFIED FIXED
Thunderbird 74.0
Tracking Status
thunderbird_esr68 --- verified
thunderbird73 --- verified

People

(Reporter: wsmwk, Assigned: benc)

References

Details

(5 keywords, Whiteboard: [regression:TB30][needinfo mats when we have STR])

Crash Data

Attachments

(2 files, 1 obsolete file)

New crash in version 31.0. Rank #38 The earliest crash I find is version 30. Most crashes are @ address 0x5a5a5a5a This bug was filed from the Socorro interface and is report bp-7e684bd2-b61a-46b1-8368-c33a72140827. ============================================================= 0 xul.dll nsDocumentViewer::OnDonePrinting() layout/base/nsDocumentViewer.cpp 1 xul.dll nsDocumentViewer::Print(nsIPrintSettings*, nsIWebProgressListener*) layout/base/nsDocumentViewer.cpp 2 xul.dll nsMsgPrintEngine::PrintMsgWindow() mailnews/base/src/nsMsgPrintEngine.cpp 3 xul.dll nsPrintMsgWindowEvent::Run() mailnews/base/src/nsMsgPrintEngine.cpp 4 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp 5 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp http://hg.mozilla.org/releases/mozilla-esr31/annotate/51a9591df62f/layout/base/nsDocumentViewer.cpp#l4285 hg@1 4276 // We are done printing, now cleanup hg@1 4277 if (mDeferredWindowClose) { ehsan@80187 4278 mDeferredWindowClose = false; trev.saunde 4279 nsCOMPtr<nsIDOMWindow> win = trev.saunde 4280 do_GetInterface(static_cast<nsIDocShell*>(mContainer)); hg@1 4281 if (win) hg@1 4282 win->Close(); hg@1 4283 } else if (mClosingWhilePrinting) { hg@1 4284 if (mDocument) { ayg@106310 4285 mDocument->SetScriptGlobalObject(nullptr);
#14 crash for TB31.4.0
Still a topcrash, #16 in 38.3.0.
Do we need a testcase for this? (So far I'm not successful in getting one) bp-93859b33-1d6c-4806-9cdb-231f92160305
Who are you asking? Surely you need a reproducible case. Note that the crash is in M-C (layout/base): Frame Module Signature Source 0 xul.dll nsDocumentViewer::OnDonePrinting() layout/base/nsDocumentViewer.cpp 1 xul.dll nsDocumentViewer::Print(nsIPrintSettings*, nsIWebProgressListener*) layout/base/nsDocumentViewer.cpp
Component: General → Layout
Product: Thunderbird → Core
Whiteboard: [regression:TB30] → [regression:TB30][tbird topcrash]
Of course a reproducible test case is great, but it is not the norm in fixing crashes. This crash seems to be associated with object lifetime management issues in the m-c print code, and they probably would be less than enthusiastic about accepting patches from us without a reproducible case.
Flags: needinfo?(bugs)
Flags: needinfo?(bugs) → needinfo?(mats)
Flags: needinfo?(mats)
Given that 100% of the reported crashes are for TB it seems likely that it's something on a higher level in TB that needs fixing. Maybe they fail to hold a strong ref on the document / print engine for the lifetime of the print process? or something like that... it's really hard to tell without STR. The symptoms does look a bit similar to bug 830236, perhaps the STR is similar too? I.e. start a print job (perhaps it needs to be slow), then close the TB window it was invoked from or quit TB altogether. need-info? me when you have STR and I'll take a look.
Component: Layout → General
Product: Core → Thunderbird
Version: Trunk → unspecified
So far at least, in TB45 this is ranked #35 compared to #13 in version TB38. So perhaps something in 45 changed. Not also Thunderbird printing bug 1257112 seems to be gone in TB45. Or the signature has changed.
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #10) > So far at least, in TB45 this is ranked #35 compared to #13 in version TB38. > So perhaps something in 45 changed. now #11 rank for 45.2.0
address is now e5e5e5e9 Most of the users crashing are one-offs, or they are not reporting all their crashes. So rather improbable that we will find someone who can provide good info. But, I am hoping this will be helpd by what is described in bug 532395.
Summary: crash in nsDocumentViewer::OnDonePrinting() @ address 0x5a5a5a5a → crash in nsDocumentViewer::OnDonePrinting @ address 0x5a5a5a5a/e5e5e5e9
add to that less than 20% of crash comments mentioning something other than printing - so highly unlikely a printing testcase will emerge
Whiteboard: [regression:TB30][tbird topcrash] → [regression:TB30][needinfo mats when we have STR]
> need-info? me when you have STR and I'll take a look. I have one user just report, that he crashes always on the third print attempt. Happens only on one computer - similar computers don't experience the crash. Uninstalled printers and reinstalled, same results.
#17 for 45.8.0. But ~#50 for 52.1.0 and 52.1.1, so -topcrash
#14 crash for 52.8.0 A higher than "normal" percentage of crashes are it and de locales
So far as I can tell this signature is gone in version 60
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME

(In reply to Wayne Mery (:wsmwk) from comment #17)

So far as I can tell this signature is gone in version 60

I wonder whatever gave me that idea. Clearly a still a strong top crash in both 60 and 68 - https://crash-stats.mozilla.com/signature/?product=Thunderbird&signature=nsDocumentViewer%3A%3AOnDonePrinting&date=%3E%3D2019-03-14T17%3A42%3A00.000Z&date=%3C2019-09-14T17%3A42%3A00.000Z#graphs

bp-54b2d5d0-f295-47da-9d42-43d590190914 68.1.0

m_kata do you see a useful approach to solving this?

Status: RESOLVED → REOPENED
Flags: needinfo?(m_kato)
Resolution: WORKSFORME → ---

(In reply to Wayne Mery (:wsmwk) from comment #18)

(In reply to Wayne Mery (:wsmwk) from comment #17)

So far as I can tell this signature is gone in version 60

I wonder whatever gave me that idea. Clearly a still a strong top crash in both 60 and 68 - https://crash-stats.mozilla.com/signature/?product=Thunderbird&signature=nsDocumentViewer%3A%3AOnDonePrinting&date=%3E%3D2019-03-14T17%3A42%3A00.000Z&date=%3C2019-09-14T17%3A42%3A00.000Z#graphs

bp-54b2d5d0-f295-47da-9d42-43d590190914 68.1.0

m_kata do you see a useful approach to solving this?

Hmm, I guess that nsDocumentViewer::OnDonePrinting uses freed/destroyed nsDocumentViewer (this). So it may be able to fix by adding KungFuDeathGrip into nsDocumentViewer::Print...

Flags: needinfo?(m_kato)

Who can move comment 19 forward for us?

Flags: needinfo?(jorgk)

Good question. There are already a few KungFuDeathGrip in that code, for example here:
https://searchfox.org/mozilla-central/rev/b58e44b74ef2b4a44bdfb4140c2565ac852504be/layout/base/nsDocumentViewer.cpp#1018

It's not really our code, so hard to poke around in it. Strangely I can't see any crashes for FF. Makoto-san, can you propose a patch. Just add RefPtr<nsDocumentViewer> kungFuDeathGrip(this); into nsDocumentViewer::Print. But how to convince the reviewer? And if we do, we can then remove some of the other "kung fus", right?

Flags: needinfo?(jorgk) → needinfo?(m_kato)

Ryan, can you reach out to some crash reporters? And, are there any useful crash comments from the past 6 months?

Flags: needinfo?(ryanvm)

Nothing interesting in the comments that I can see, mostly just people saying it happened when they were trying to print something or canceling printing.

Flags: needinfo?(ryanvm)
Flags: needinfo?(m_kato)

Makoto-san, can you have a look?

Flags: needinfo?(m_kato)

(In reply to Jorg K (GMT+1) (PTO to 19th Jan 2020, sporadically reading bugmail) from comment #21)

Good question. There are already a few KungFuDeathGrip in that code, for example here:
https://searchfox.org/mozilla-central/rev/b58e44b74ef2b4a44bdfb4140c2565ac852504be/layout/base/nsDocumentViewer.cpp#1018

It's not really our code, so hard to poke around in it. Strangely I can't see any crashes for FF. Makoto-san, can you propose a patch. Just add RefPtr<nsDocumentViewer> kungFuDeathGrip(this); into nsDocumentViewer::Print. But how to convince the reviewer? And if we do, we can then remove some of the other "kung fus", right?

All crash case seems be be that print job is failure.

mWebBrowserPrint is hold by nsMsgPrintEngine. Although I don't know PrintMsgWindow is called before previous print job is finished, if nsMsgPrintEngine is reused, mWebBrowserPrint may be released by newer PrintMsgWindow.

If so, this may be fixed by the following, but I am not sure.

void nsMsgPrintEngine::PrintMsgWindow() {
...
        nsCOMPtr<nsIWebBrowserPrint> webBrowserPrint = mWebBrowserPrint;
        rv = webBrowserPrint->Print(mPrintSettings,
                                    (nsIWebProgressListener *)this);
Flags: needinfo?(m_kato)

Thanks m_kato.

Over to Magnus (or someone else)

Flags: needinfo?(mkmelin+mozilla)
Attached patch bug1063723_print_crash.patch (obsolete) — Splinter Review
Assignee: nobody → mkmelin+mozilla
Status: REOPENED → ASSIGNED
Flags: needinfo?(mkmelin+mozilla)
Attachment #9119042 - Flags: review?(benc)
Comment on attachment 9119042 [details] [diff] [review] bug1063723_print_crash.patch Review of attachment 9119042 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/base/src/nsMsgPrintEngine.cpp @@ +549,5 @@ > rv = mStartupPPObs->Observe(nullptr, nullptr, nullptr); > } > } else { > mPrintSettings->SetPrintSilent(mCurrentlyPrintingURI != 0); > + nsCOMPtr<nsIWebBrowserPrint> webBrowserPrint = mWebBrowserPrint; Please add a comment here, something like: Create a strong reference to avoid a crash. Maybe Ben has some better wording. We used to call this "strong reference" in the context of the MOZ_CAN_RUN_SCRIPT stuff we did a while ago.
Comment on attachment 9119042 [details] [diff] [review] bug1063723_print_crash.patch Review of attachment 9119042 [details] [diff] [review]: ----------------------------------------------------------------- According to the stack trace, it's crashing inside the `mWebBrowserPrint->Print(...)` call. Since `mWebBrowserPrint` is _only_ ever used by `PrintMsgWindow()`, I can't see how anything could be releasing it unless `PrintMsgWindow()` is being called again - recursively - from within the `mWebBrowserPrint->Print()` call. Which would seem to be a really bad thing, if true. If `PrintMsgWindow()` is not being re-entered, I can't see how holding another local reference `mWebBrowserPrint` will help.
Attachment #9119042 - Flags: review?(benc) → review-

Here's a patch to turn mWebBrowserPrint into a local variable instead.
In theory, that might better handle recursive calling of nsMsgPrintEngine::PrintMsgWindow()... but if that really is what's happening (and I have my doubts), then surely there'll be bigger problems...

So I can't really see this helping, but hey.

I'll have a quick go and try and see if I can replicate the bug by artificially failing the print job.

Assignee: mkmelin+mozilla → benc
Attachment #9119641 - Flags: review+

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/8386d49168fa
Move nsMsgPrintEngine mWebBrowserPrint to function scope to avoid potential reuse crash. r=mkmelin

Status: ASSIGNED → RESOLVED
Closed: 6 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 74.0
Attachment #9119042 - Attachment is obsolete: true

Beta has ~3 crashes per week, so to determine the effect of this patch I suggest we uplift it right to beta 73.

Crash Signature: [@ nsDocumentViewer::OnDonePrinting()] [@ nsDocumentViewer::OnDonePrinting ] → [@ nsDocumentViewer::OnDonePrinting ]
Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(mkmelin+mozilla)
Attachment #9119641 - Flags: approval-comm-beta+
Comment on attachment 9119641 [details] [diff] [review] 1063723-move-mWebBrowserPrint-to-local-scope-1.patch Ben is the patch good to take for ESR 68?
Flags: needinfo?(benc)
Comment on attachment 9119641 [details] [diff] [review] 1063723-move-mWebBrowserPrint-to-local-scope-1.patch Rob, can you land this early so we know it sticks? Let's take it on ESR because this has proven to be good in 73 and 74 beta. [1] Although we can't know about 75 because of Bug 1620314 - Printing a message isn't working in Daily 75.0a1 and beta 75 [1] https://crash-stats.mozilla.org/signature/?release_channel=%21release&signature=nsDocumentViewer%3A%3AOnDonePrinting&date=%3E%3D2019-12-28T13%3A59%3A00.000Z&date=%3C2020-03-28T13%3A59%3A00.000Z
Flags: needinfo?(rob)
Attachment #9119641 - Flags: approval-comm-esr68+

I have a try build with the patch applied (just started it, so still running) at https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=c3758dd8a668e03870f85b183b397e6cd34faa13

That patch applies fine, but the changed file does not compile now.

/builds/worker/workspace/build/src/clang/bin/clang++ -o nsMsgPrintEngine.o -c  -I/builds/worker/workspace/build/src/obj-thunderbird/dist/stl_wrappers -I/builds/worker/workspace/build/src/obj-thunderbird/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -DNDEBUG=1 -DTRIMMED=1 -DSTATIC_EXPORTABLE_JS_API -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -I/builds/worker/workspace/build/src/comm/mailnews/base/src -I/builds/worker/workspace/build/src/obj-thunderbird/comm/mailnews/base/src -I/builds/worker/workspace/build/src/dom/base -I/builds/worker/workspace/build/src/obj-thunderbird/dist/include -I/builds/worker/workspace/build/src/obj-thunderbird/dist/include/nspr -I/builds/worker/workspace/build/src/obj-thunderbird/dist/include/nss -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-thunderbird/mozilla-config.h -Qunused-arguments -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -Qunused-arguments -Wall -Wbitfield-enum-conversion -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wshadow-field-in-constructor-modified -Wsign-compare -Wtype-limits -Wunreachable-code -Wunreachable-code-return -Wwrite-strings -Wno-invalid-offsetof -Wclass-varargs -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wloop-analysis -Wc++1z-compat -Wc++2a-compat -Wcomma -Wimplicit-fallthrough -Werror=non-literal-null-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-enum-zero-compare -Wtautological-unsigned-zero-compare -Wno-inline-new-delete -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=backend-plugin -Wno-error=return-std-move -Wno-error=atomic-alignment -Wformat -Wformat-security -Wno-gnu-zero-variadic-macro-arguments -Wno-unknown-warning-option -Wno-return-type-c-linkage -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fcrash-diagnostics-dir=/builds/worker/artifacts -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -fno-exceptions -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -Xclang -load -Xclang /builds/worker/workspace/build/src/obj-thunderbird/build/clang-plugin/libclang-plugin.so -Xclang -add-plugin -Xclang moz-check -O2 -fomit-frame-pointer -funwind-tables -Werror  -MD -MP -MF .deps/nsMsgPrintEngine.o.pp   /builds/worker/workspace/build/src/comm/mailnews/base/src/nsMsgPrintEngine.cpp
/builds/worker/workspace/build/src/comm/mailnews/base/src/nsMsgPrintEngine.cpp:338:19: error: use of undeclared identifier 'mWebBrowserPrint'
            domWin, mWebBrowserPrint, mPrintSettings, this, aIsForPrinting,
Flags: needinfo?(rob)

I found that ShowPrintProgressDialog accepts NULL for the mWebBrowserPrint parameter, so I changed it and now the above patch compiles when applied to comm-esr68.

It doesn't look like ShowPrintProgressDialog does anything with that variable(?) and I'm able to print, print preview, etc with the build I did with it applied.

s/NULL/nullptr/.

Thanks for checking it. I assume Ben is off the hook.

Flags: needinfo?(benc)
Summary: crash in nsDocumentViewer::OnDonePrinting @ address 0x5a5a5a5a/e5e5e5e9 → crash in nsDocumentViewer::OnDonePrinting @ address 0x5a5a5a5a/e5e5e5e9 use-afer-free

verified gone in ESr

correcting mistake

See Also: → 1610797
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: