Closed Bug 1342366 Opened 7 years ago Closed 7 years ago

Crash in nsWyciwygChannel::GetCharsetAndSource

Categories

(Core :: Networking, defect)

51 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox51 --- wontfix
firefox52 --- wontfix
firefox-esr52 54+ fixed
firefox53 --- fixed
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: philipp, Assigned: michal)

Details

(Keywords: crash, Whiteboard: [necko-active])

Crash Data

Attachments

(2 files, 4 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-1b533675-4b62-4c72-9f25-5ef0c2170224.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	nsWyciwygChannel::GetCharsetAndSource(int*, nsACString_internal&) 	netwerk/protocol/wyciwyg/nsWyciwygChannel.cpp:641
1 	xul.dll 	mozilla::net::WyciwygChannelParent::OnStartRequest(nsIRequest*, nsISupports*) 	netwerk/protocol/wyciwyg/WyciwygChannelParent.cpp:276
2 	xul.dll 	nsWyciwygChannel::NotifyListener() 	netwerk/protocol/wyciwyg/nsWyciwygChannel.cpp:855
3 	xul.dll 	mozilla::detail::RunnableMethodImpl<void ( nsDocument::*)(void), 0, 0>::Run() 	obj-firefox/dist/include/nsThreadUtils.h:768
4 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1067
5 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:96
6 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:225
7 	xul.dll 	nsBaseAppShell::Run() 	widget/nsBaseAppShell.cpp:156
8 	xul.dll 	nsAppShell::Run() 	widget/windows/nsAppShell.cpp:262
9 	xul.dll 	nsAppStartup::Run() 	toolkit/components/startup/nsAppStartup.cpp:283
10 	xul.dll 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp:4401
11 	xul.dll 	XREMain::XRE_main(int, char** const, nsXREAppData const*) 	toolkit/xre/nsAppRunner.cpp:4534
12 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:4625

this cross-platform crash signature has been going on for a while. many user comments in crash reports mention being on imdb.com and clicking on the back button (navigating back a page) there.
Murder in a locked room...
Assignee: nobody → michal.novotny
Whiteboard: [necko-active]
Attached patch fix (obsolete) — Splinter Review
It's crashing on mCacheEntry because it is nulled out on background thread in CloseCacheEntryInternal().

The patch changes:
- CloseCacheEntry() is called in NotifyListener() after OnStart/OnStart is called and after mIsPending is set to false
- GetCharsetAndSource() returns early if mIsPending is false, so no race is possible even if the listener calls it later than from OnStart/OnStop.
- OpenCacheEntry() changed so that OnCacheEntryAvailable() is called only when opening entry for reading
- OnCacheEntryAvailable() was cleaned a bit, there was IMO a bug that channel was not canceled when ReadFromCache() failed.
Attachment #8842825 - Flags: review?(honzab.moz)
Comment on attachment 8842825 [details] [diff] [review]
fix

Review of attachment 8842825 [details] [diff] [review]:
-----------------------------------------------------------------

I think with cache2 we can just back bug 599127 out (or the relevant parts of it)?  That would fix this crash and remove some unnecessary complexity, IMO.

::: netwerk/protocol/wyciwyg/nsWyciwygChannel.cpp
@@ +440,5 @@
>    // open a cache entry for this channel...
>    // mIsPending set to true since OnCacheEntryAvailable may be called
>    // synchronously and fails when mIsPending found false.
>    mIsPending = true;
> +  nsresult rv = OpenCacheEntry(mURI, false);

I liked the flags more, here you have to know what the bool means.  But whatever.

@@ +629,5 @@
>      aCharset = mCharset;
>      return NS_OK;
>    }
>  
> +  if (!mIsPending) {

so, is mIsPending accessed only on a single thread?  some thread assertions would be good to add.

@@ +637,1 @@
>    if (!mCacheEntry) {

this is accessed concurrently.  the usual problem - a refptr accessed on more than one thread.  this should be locked, imo.

OTOH, there is no need for the CloseCacheEntryInternal method anymore.  Could we just back bug 599127 out?

@@ +870,5 @@
>      mListener = nullptr;
>      mListenerContext = nullptr;
>    }
>  
>    mIsPending = false;

hmm.. interesting we live with this so long.  The correct pending state should change as following:

->OnStart()
mIsPending = false
->OnStop()
Attachment #8842825 - Flags: review?(honzab.moz) → feedback-
I frequently get this crash when using the back button to navigate through IMDB pages.
Michal, what do you thing about comment 4?
Flags: needinfo?(michal.novotny)
Backing out bug 599127 makes sense. I'll create a patch soon.
Flags: needinfo?(michal.novotny)
Attached patch patch v2 (obsolete) — Splinter Review
Attachment #8842825 - Attachment is obsolete: true
Attachment #8845681 - Flags: review?(honzab.moz)
Comment on attachment 8845681 [details] [diff] [review]
patch v2

Review of attachment 8845681 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm, but I think we can simplify even a bit more, + few other notes (for a possible followup)

::: netwerk/protocol/wyciwyg/nsWyciwygChannel.cpp
@@ -130,5 @@
> -    // with bug 913828.
> -    MOZ_ASSERT(NS_IsMainThread());
> -    nsCOMPtr<nsICacheService> service =
> -      do_GetService(NS_CACHESERVICE_CONTRACTID, &rv);
> -  }

I'd rather leave this bit.  I am not sure if comm-central has migrated to cache2 already or not.

@@ +431,5 @@
>      rv = mCacheEntry->SetMetaDataElement("inhibit-persistent-caching", "1");
>      if (NS_FAILED(rv)) return rv;
>    }
>  
> +  if (mNeedToSetSecurityInfo) {

what if you just always set it?  will it have any performance/IO op effect?

@@ +480,5 @@
>  NS_IMETHODIMP
>  nsWyciwygChannel::SetSecurityInfo(nsISupports *aSecurityInfo)
>  {
> +  if (mMode == READING) {
> +    return NS_ERROR_UNEXPECTED;

maybe even assert(false)?

@@ -628,5 @@
> -  if (mCharsetAndSourceSet) {
> -    *aSource = mCharsetSource;
> -    aCharset = mCharset;
> -    return NS_OK;
> -  }

looks like that this was working even when we didn't have a cache entry.  shouldn't we preserve it?

can SetCharsetAndSource be called also when we are in the READ mode?  If yes, then it's a bit weird.  

If no, OTOH, all these mNeedToWrite* etc flags are here IMO just because we were opening the cache entry for writing asynchronously.  Now we open for write with OpenTruncate, so that after AsyncOpen we always have an entry.  So, I think we can remove all these 'caches' to write to the cache entry later, or not?

@@ -855,5 @@
>  
>  void
>  nsWyciwygChannel::NotifyListener()
> -{    
> -  if (mListener) {

any reason not to check we have a listener?  it scares me a bit...  also, from experience - turned to an actual new rule when calling callbacks, the code should even be like this:

nsCOMPtr<> listener = mListener;
if (listener)
  listener->OnStart()
mPending = false
if (listener)
  listener->OnStop()

mListener = nullptr;

the same should be in OnStart/StopRequest

but up to you to do this change in a followup to separate potential regressions.
Attachment #8845681 - Flags: review?(honzab.moz) → feedback+
(In reply to Honza Bambas (:mayhemer) from comment #10)
> ::: netwerk/protocol/wyciwyg/nsWyciwygChannel.cpp
> @@ -130,5 @@
> > -    // with bug 913828.
> > -    MOZ_ASSERT(NS_IsMainThread());
> > -    nsCOMPtr<nsICacheService> service =
> > -      do_GetService(NS_CACHESERVICE_CONTRACTID, &rv);
> > -  }
> 
> I'd rather leave this bit.  I am not sure if comm-central has migrated to
> cache2 already or not.

And how is this related? WyciwygChannel will no longer use cache off the main thread, so why is it needed?

> @@ +431,5 @@
> >      rv = mCacheEntry->SetMetaDataElement("inhibit-persistent-caching", "1");
> >      if (NS_FAILED(rv)) return rv;
> >    }
> >  
> > +  if (mNeedToSetSecurityInfo) {
> 
> what if you just always set it?  will it have any performance/IO op effect?

I didn't measure it, but it seems to me unnecessary to serialize security info and write it into metadata in CacheEntry::SetSecurityInfo every time document.write is called.

> @@ -628,5 @@
> > -  if (mCharsetAndSourceSet) {
> > -    *aSource = mCharsetSource;
> > -    aCharset = mCharset;
> > -    return NS_OK;
> > -  }
> 
> looks like that this was working even when we didn't have a cache entry. 
> shouldn't we preserve it?
> 
> can SetCharsetAndSource be called also when we are in the READ mode?  If
> yes, then it's a bit weird.

I'm pretty sure it is called also in READ mode, but I don't remember where does the call come from. Both GetCharsetAndSource and SetCharsetAndSource are called in READ mode when we have the entry. GetCharsetAndSource isn't called in WRITE mode and SetCharsetAndSource is called before first write so mode is still NONE. So it should work with the way how we use wyciwyg channel now. I can add couple of assertions to make sure we know when the use case changes.

> If no, OTOH, all these mNeedToWrite* etc flags are here IMO just because we
> were opening the cache entry for writing asynchronously.  Now we open for
> write with OpenTruncate, so that after AsyncOpen we always have an entry. 
> So, I think we can remove all these 'caches' to write to the cache entry
> later, or not?

No, as stated above, it is called before we open the cache entry.

> >  void
> >  nsWyciwygChannel::NotifyListener()
> > -{    
> > -  if (mListener) {
> 
> any reason not to check we have a listener?  it scares me a bit...  also,

It cannot be null. Listener is checked in AsyncOpen and there is no other place where it could be nulled out.

> from experience - turned to an actual new rule when calling callbacks, the
> code should even be like this:
> 
> nsCOMPtr<> listener = mListener;
> if (listener)
>   listener->OnStart()
> mPending = false
> if (listener)
>   listener->OnStop()
> 
> mListener = nullptr;
> 
> the same should be in OnStart/StopRequest

I'm not against.
Flags: needinfo?(honzab.moz)
(In reply to Michal Novotny (:michal) from comment #11)
> (In reply to Honza Bambas (:mayhemer) from comment #10)
> > ::: netwerk/protocol/wyciwyg/nsWyciwygChannel.cpp
> > @@ -130,5 @@
> > > -    // with bug 913828.
> > > -    MOZ_ASSERT(NS_IsMainThread());
> > > -    nsCOMPtr<nsICacheService> service =
> > > -      do_GetService(NS_CACHESERVICE_CONTRACTID, &rv);
> > > -  }
> > 
> > I'd rather leave this bit.  I am not sure if comm-central has migrated to
> > cache2 already or not.
> 
> And how is this related? WyciwygChannel will no longer use cache off the
> main thread, so why is it needed?

Yes!  That's true.  Remove it then.

> 
> > @@ +431,5 @@
> > >      rv = mCacheEntry->SetMetaDataElement("inhibit-persistent-caching", "1");
> > >      if (NS_FAILED(rv)) return rv;
> > >    }
> > >  
> > > +  if (mNeedToSetSecurityInfo) {
> > 
> > what if you just always set it?  will it have any performance/IO op effect?
> 
> I didn't measure it, but it seems to me unnecessary to serialize security
> info and write it into metadata in CacheEntry::SetSecurityInfo every time
> document.write is called.

OK, let's leave it.

> 
> > @@ -628,5 @@
> > > -  if (mCharsetAndSourceSet) {
> > > -    *aSource = mCharsetSource;
> > > -    aCharset = mCharset;
> > > -    return NS_OK;
> > > -  }
> > 
> > looks like that this was working even when we didn't have a cache entry. 
> > shouldn't we preserve it?
> > 
> > can SetCharsetAndSource be called also when we are in the READ mode?  If
> > yes, then it's a bit weird.
> 
> I'm pretty sure it is called also in READ mode, but I don't remember where
> does the call come from. Both GetCharsetAndSource and SetCharsetAndSource
> are called in READ mode when we have the entry. GetCharsetAndSource isn't
> called in WRITE mode and SetCharsetAndSource is called before first write so
> mode is still NONE. So it should work with the way how we use wyciwyg
> channel now. I can add couple of assertions to make sure we know when the
> use case changes.

That's alright, still, this is bug to a fix a crash, not wysiwyg channel.

> 
> > If no, OTOH, all these mNeedToWrite* etc flags are here IMO just because we
> > were opening the cache entry for writing asynchronously.  Now we open for
> > write with OpenTruncate, so that after AsyncOpen we always have an entry. 
> > So, I think we can remove all these 'caches' to write to the cache entry
> > later, or not?
> 
> No, as stated above, it is called before we open the cache entry.
> 
> > >  void
> > >  nsWyciwygChannel::NotifyListener()
> > > -{    
> > > -  if (mListener) {
> > 
> > any reason not to check we have a listener?  it scares me a bit...  also,
> 
> It cannot be null. Listener is checked in AsyncOpen and there is no other
> place where it could be nulled out.

I'd still rather leave at least the null check in.

> 
> > from experience - turned to an actual new rule when calling callbacks, the
> > code should even be like this:
> > 
> > nsCOMPtr<> listener = mListener;
> > if (listener)
> >   listener->OnStart()
> > mPending = false
> > if (listener)
> >   listener->OnStop()
> > 
> > mListener = nullptr;
> > 
> > the same should be in OnStart/StopRequest
> 
> I'm not against.

OK, I'll change the f+ to r+, feel free to update the patch and just carry forward.
Flags: needinfo?(honzab.moz)
Comment on attachment 8845681 [details] [diff] [review]
patch v2

r+ based on comment 12.  We can't remove some internal caches for charset and secinfo.
Attachment #8845681 - Flags: feedback+ → review+
Attached patch patch v4Splinter Review
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7df6560f902c0f07d671e1ec652b89b89cc9570a&selectedJob=83729531

I had to change the assertion in nsWyciwygChannel::SetCharsetAndSource. With e10s turned on, nsWyciwygChannel::SetCharsetAndSource is called in reading mode when we already closed and released the entry. This is now a new bug, it happens without this patch too. I'll file a follow-up bug for this.
Attachment #8845971 - Attachment is obsolete: true
Attachment #8847471 - Flags: review+
Pushed by mnovotny@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/749de78b4831
Crash in nsWyciwygChannel::GetCharsetAndSource, r=honzab
https://hg.mozilla.org/mozilla-central/rev/749de78b4831
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
No reports on Nightly since this fix landed. Is this something we can consider uplifting to Aurora/Beta/ESR52?
Flags: needinfo?(michal.novotny)
Comment on attachment 8847471 [details] [diff] [review]
patch v4

Approval Request Comment
[Feature/Bug causing the regression]: 599127 + e10s
[User impact if declined]: possible crash when navigating back to wyciwyg document
[Is this code covered by automated tests?]: there are some tests that generate wyciwyg documents as well as navigate to them
[Has the fix been verified in Nightly?]: there are no reports of the crash after landing the patch
[Needs manual test from QE? If yes, steps to reproduce]: no, because there are no exact STR to trigger the crash
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: the risk is low
[Why is the change risky/not risky?]: the patch mostly reverts changes from bug 599127 and simplifies the code a lot
[String changes made/needed]: none
Flags: needinfo?(michal.novotny)
Attachment #8847471 - Flags: approval-mozilla-beta?
Attachment #8847471 - Flags: approval-mozilla-aurora?
Comment on attachment 8847471 [details] [diff] [review]
patch v4

Fix a crash. Aurora54+ & Beta53+.
Attachment #8847471 - Flags: approval-mozilla-beta?
Attachment #8847471 - Flags: approval-mozilla-beta+
Attachment #8847471 - Flags: approval-mozilla-aurora?
Attachment #8847471 - Flags: approval-mozilla-aurora+
needs rebasing for beta:

rafting 408128:d40938cc2aee "Bug 1342366 - Crash in nsWyciwygChannel::GetCharsetAndSource, r=honzab a=gchang"
merging netwerk/protocol/wyciwyg/nsWyciwygChannel.cpp
warning: conflicts while merging netwerk/protocol/wyciwyg/nsWyciwygChannel.cpp! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(use 'hg resolve' and 'hg graft --continue')
Flags: needinfo?(michal.novotny)
Attached patch patch for beta53 (obsolete) — Splinter Review
Flags: needinfo?(michal.novotny)
Comment on attachment 8852094 [details] [diff] [review]
patch for beta53

This doesn't apply to Beta at all.

patching file netwerk/protocol/wyciwyg/nsWyciwygChannel.cpp
Hunk #1 FAILED at 8
Hunk #3 FAILED at 416
Hunk #4 succeeded at 456 with fuzz 1 (offset 5 lines).
Hunk #5 FAILED at 508
Hunk #6 FAILED at 620
Hunk #8 FAILED at 697
Hunk #9 FAILED at 793
6 out of 9 hunks FAILED -- saving rejects to file netwerk/protocol/wyciwyg/nsWyciwygChannel.cpp.rej
patching file netwerk/protocol/wyciwyg/nsWyciwygChannel.h
Hunk #2 FAILED at 38
Hunk #3 succeeded at 101 with fuzz 1 (offset 0 lines).
1 out of 3 hunks FAILED -- saving rejects to file netwerk/protocol/wyciwyg/nsWyciwygChannel.h.rej
abort: patch failed to apply
Attachment #8852094 - Attachment is obsolete: true
Flags: needinfo?(michal.novotny)
Attached patch patch for beta53Splinter Review
(In reply to Ryan VanderMeulen [:RyanVM] from comment #24)
> Comment on attachment 8852094 [details] [diff] [review]
> patch for beta53
> 
> This doesn't apply to Beta at all.

Sorry, the patch was against an old beta tree (although I'm pretty sure I did hg pull && hg up). This patch should work.
Flags: needinfo?(michal.novotny)
On the bright side, I bet it'll work nicely for ESR52 when the time comes :D
Comment on attachment 8852266 [details] [diff] [review]
patch for beta53

crash-stats isn't showing any hits with signature after 53.0b7, which is exactly what we'd expect given when the patch was uplifted. Fx52 hits this pretty frequently, so I think we should move forward with getting this onto esr52 for the next release. I've verified that it grafts cleanly.
Attachment #8852266 - Flags: approval-mozilla-esr52?
Comment on attachment 8852266 [details] [diff] [review]
patch for beta53

While this crash is low volume, 50 crash occurrences in past 7 days, I am still taking it because the fix has worked well on nightly/beta/release. ESR52+
Attachment #8852266 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
(In reply to Michal Novotny (:michal) from comment #19)
> [Is this code covered by automated tests?]: there are some tests that
> generate wyciwyg documents as well as navigate to them
> [Has the fix been verified in Nightly?]: there are no reports of the crash
> after landing the patch
> [Needs manual test from QE? If yes, steps to reproduce]: no, because there
> are no exact STR to trigger the crash

Setting qe-verify- based on Michal's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: