Investigate intermittent Windows 7 and 8 [@ nsStandardURL::~nsStandardURL()] crashes hit in automation

RESOLVED FIXED in Firefox 38, Firefox OS v2.2

Status

()

Core
Networking
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: RyanVM, Assigned: valentin)

Tracking

({crash})

Trunk
mozilla40
x86
Windows 7
crash
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox37 wontfix, firefox38 fixed, firefox39 fixed, firefox40 fixed, firefox-esr31 wontfix, b2g-v2.2 fixed, b2g-master fixed)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

4 years ago
We seem to have a number of crashes in ~nsStandardURL that are hit in automation (see deps). The common theme seems to be Windows 7 as the underlying platform. Is there anyone on the networking team that can maybe investigate if there's an underlying cause that needs fixing here?
Flags: needinfo?(jduell.mcbugs)
(Reporter)

Updated

4 years ago
Summary: Investigate intermittent Windows 7 [@ nsStandardURL::~nsStandardURL()] → Investigate intermittent Windows 7 [@ nsStandardURL::~nsStandardURL()] crashes hit in automation
Valentin: you've been looking at a lot of URI stuff.  Can you take this?  We can get you a Windows machine if you need one.
Flags: needinfo?(jduell.mcbugs) → needinfo?(valentin.gosu)
I've actually crossed paths with this issue while working on bug 1009648.
It only occurs on windows, in debug builds, and the crash point is the DumpLeakedURLs mechanism.
I'm not really sure if it's because of a compiler peculiarity on Windows, or if nsStandardURL::ShutdownGlobalObjects gets called at the same time with ::~nsStandardURL on different threads.
My guess is that it's a combination of the two.
I'll see what we can do...
Assignee: nobody → valentin.gosu
Flags: needinfo?(valentin.gosu)
(Reporter)

Comment 3

4 years ago
Any news here? :)
Flags: needinfo?(valentin.gosu)
(Reporter)

Comment 4

4 years ago
We're seeing this crashes on Windows XP lately too. For example:
https://treeherder.mozilla.org/logviewer.html#?job_id=4961339&repo=mozilla-inbound
Given that the race is fairly difficult to reproduce I think I might have to submit a patch to m-c and see if we get any more crashes.
(Reporter)

Updated

3 years ago
Depends on: 1116316
(Reporter)

Updated

3 years ago
Blocks: 1116316
No longer depends on: 1116316
Blocks: 1145481
(In reply to Valentin Gosu [:valentin] from comment #5)
> Given that the race is fairly difficult to reproduce I think I might have to
> submit a patch to m-c and see if we get any more crashes.

Jason, is there somebody who could look at this, or failing that, disable this code on Windows?  It is happening extremely frequently.  Bug 1145481 happened like 14 times today alone.  This is causing some severe tree stability issues.
Flags: needinfo?(jduell.mcbugs)
Summary: Investigate intermittent Windows 7 [@ nsStandardURL::~nsStandardURL()] crashes hit in automation → Investigate intermittent Windows 7 and 8 [@ nsStandardURL::~nsStandardURL()] crashes hit in automation
Nathan, could you try doing a TSan run with DEBUG_DUMP_URLS_AT_SHUTDOWN defined in netwerk/base/nsStandardURL.cpp?  You'd want to run browser/devtools/webconsole/test/browser_webconsole_view_source.js as that seems to be hitting some shutdown race condition all over the place in bug 1145481.  So you could try running that directory or however we run that.  (It is Windows-only, but presumably the race isn't platform specific.)  Thanks.
Flags: needinfo?(nfroyd)
I'm thinking that there's a URL being released off the main thread during shutdown.
The TSan results should provide some insight.
(In reply to Valentin Gosu [:valentin] from comment #8)
> I'm thinking that there's a URL being released off the main thread during
> shutdown.
> The TSan results should provide some insight.

Presumably we'd catch that with owning thread assertions, wouldn't we?  nsStandardURL isn't threadsafe.  Unless it's UAF or something...
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #9)
> Presumably we'd catch that with owning thread assertions, wouldn't we? 
> nsStandardURL isn't threadsafe.  Unless it's UAF or something...

Oh, I was confused.  I guess it is some URI thing that is threadsafe, not URL.  I can try running this test in an ASan build and see if turns up anything.
(In reply to Andrew McCreight [:mccr8] from comment #10)
> I can try running this test in an ASan build and see if turns up anything.
I ran an ASan build on this directory with NS_BUILD_REFCNT_LOGGING defined a few times, but it didn't turn up anything.
Honza, can I get your feedback on this issue?
My take on it is that the static DumpLeakedURLs gets destroyed while at the same time URLs are released on other threads. I suspect the reason this is only a problem on Windows is because the compiler calls the destructors for static objects in a slightly different way.

There are a few things we could do here:
1. Consider disabling DEBUG_DUMP_URLS_AT_SHUTDOWN for Windows until we figure out a better way of tracking leaked URLs.
2. Introduce some locking.
3. Assert that URLs are only destroyed on the main thread.

What do you think?
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(jduell.mcbugs)
Flags: needinfo?(honzab.moz)
nsStandardURL is not using threadsafe refcounting, so I don't think it should be getting destroyed on other threads.
I added a MOZ_ASSERT(NS_IsMainThread()) to the destructor, and it seems there are a bunch of them :)
(In reply to Valentin Gosu [:valentin] from comment #14)
> I added a MOZ_ASSERT(NS_IsMainThread()) to the destructor, and it seems
> there are a bunch of them :)

Hmm, I guess it is okay from a refcounting perspective if it is always on the same thread?  If they nsStandardURL are really being destroyed on multiple threads, then a quick fix would be to check if NS_IsMainThread() before touching the global variable.  I think for debugging purposes we only care about held by main thread objects.  Or just a lock, I guess.
(In reply to Andrew McCreight [:mccr8] from comment #7)
> Nathan, could you try doing a TSan run with DEBUG_DUMP_URLS_AT_SHUTDOWN
> defined in netwerk/base/nsStandardURL.cpp?  You'd want to run
> browser/devtools/webconsole/test/browser_webconsole_view_source.js as that
> seems to be hitting some shutdown race condition all over the place in bug
> 1145481.  So you could try running that directory or however we run that. 
> (It is Windows-only, but presumably the race isn't platform specific.) 
> Thanks.

I ran this through TSan a couple of times, but it didn't seem to want to run right, complaining about unimportable javascript modules.  This sort of thing isn't unheard of when running with TSan, and it sounds like y'all might have a better handle on what's going wrong anyway.
Flags: needinfo?(nfroyd)
The first patch in bug 1129784 seems to trigger this fairly consistently
(see https://bugzilla.mozilla.org/show_bug.cgi?id=1129784#c9 for a try push).

This is blocking me from landing that patch, so is there an ETA for a fix here?
Blocks: 1129784
Flags: needinfo?(valentin.gosu)
Created attachment 8584798 [details] [diff] [review]
Only track leaked URLs on the main thread
Attachment #8584798 - Flags: review?(honzab.moz)
(In reply to Valentin Gosu [:valentin] from comment #19)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=12706798e4d7

You canceled the needinfo without any further information. Will this patch resolve the problem when it lands?
(In reply to Eddy Bruel [:ejpbruel] from comment #20)
> (In reply to Valentin Gosu [:valentin] from comment #19)
> > https://treeherder.mozilla.org/#/jobs?repo=try&revision=12706798e4d7
> 
> You canceled the needinfo without any further information. Will this patch
> resolve the problem when it lands?

Yes. Sorry I didn't make that clear.
(In reply to Valentin Gosu [:valentin] from comment #21)
> (In reply to Eddy Bruel [:ejpbruel] from comment #20)
> > (In reply to Valentin Gosu [:valentin] from comment #19)
> > > https://treeherder.mozilla.org/#/jobs?repo=try&revision=12706798e4d7
> > 
> > You canceled the needinfo without any further information. Will this patch
> > resolve the problem when it lands?
> 
> Yes. Sorry I didn't make that clear.

Thanks. Honza, do you think you would be able to review this quickly? This would allow me to land bug 1129784.
Comment on attachment 8584798 [details] [diff] [review]
Only track leaked URLs on the main thread

Review of attachment 8584798 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab with comments addressed

::: netwerk/base/nsStandardURL.cpp
@@ +270,5 @@
>      // default parser in case nsIStandardURL::Init is never called
>      mParser = net_GetStdURLParser();
>  
>  #ifdef DEBUG_DUMP_URLS_AT_SHUTDOWN
> +    if (!NS_IsMainThread()) {

I'd rather see this as:

if (NS_IsMainThread()) {
  do the leak detect stuff...
}

Since once someone may add a code bellow it and may be nicely screwed...

applies to all the nsStandardURL blocks.

@@ +308,5 @@
>  DumpLeakedURLs::~DumpLeakedURLs()
>  {
> +    if (!NS_IsMainThread()) {
> +        return;
> +    }

I believe this will always happen on the main thread, but who knows on which thread win is calling this.  Maybe add an assertion?
Attachment #8584798 - Flags: review?(honzab.moz) → review+
Flags: needinfo?(honzab.moz)
Thanks Honza!
Created attachment 8586027 [details] [diff] [review]
Only track leaked URLs on the main thread
Attachment #8584798 - Attachment is obsolete: true
(Reporter)

Comment 28

3 years ago
https://hg.mozilla.org/mozilla-central/rev/c4c48aad5eb4
Status: NEW → RESOLVED
Last Resolved: 3 years ago
status-firefox40: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
(Reporter)

Comment 29

3 years ago
Please nominate this for Aurora/Beta/b2g37 approval when you get a chance.
Flags: needinfo?(valentin.gosu)
Comment on attachment 8586027 [details] [diff] [review]
Only track leaked URLs on the main thread

Approval Request Comment
[Feature/regressing bug #]: old
[User impact if declined]: this causes intermittent failures on tree herder
[Describe test coverage new/current, TreeHerder]: this code runs very frequently
[Risks and why]: very low. This is debug-only code
[String/UUID change made/needed]: none
Flags: needinfo?(valentin.gosu)
Attachment #8586027 - Flags: approval-mozilla-aurora?
Comment on attachment 8586027 [details] [diff] [review]
Only track leaked URLs on the main thread

See previous comment.
Attachment #8586027 - Flags: approval-mozilla-beta?
Attachment #8586027 - Flags: approval-mozilla-b2g37?
Comment on attachment 8586027 [details] [diff] [review]
Only track leaked URLs on the main thread

Should be in 38 beta 3.
Attachment #8586027 - Flags: approval-mozilla-beta?
Attachment #8586027 - Flags: approval-mozilla-beta+
Attachment #8586027 - Flags: approval-mozilla-aurora?
Attachment #8586027 - Flags: approval-mozilla-aurora+
(Reporter)

Updated

3 years ago
status-b2g-v2.2: --- → affected
status-b2g-master: --- → fixed
status-firefox37: --- → wontfix
status-firefox38: --- → affected
status-firefox39: --- → affected
status-firefox-esr31: --- → wontfix
(Reporter)

Comment 33

3 years ago
https://hg.mozilla.org/releases/mozilla-beta/rev/58dca3f7560a
status-firefox38: affected → fixed

Updated

3 years ago
Attachment #8586027 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
You need to log in before you can comment on or make changes to this bug.