Closed Bug 1099209 Opened 10 years ago Closed 9 years ago

Investigate intermittent Windows 7 and 8 [@ nsStandardURL::~nsStandardURL()] crashes hit in automation

Categories

(Core :: Networking, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox37 --- wontfix
firefox38 --- fixed
firefox39 --- fixed
firefox40 --- fixed
firefox-esr31 --- wontfix
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: RyanVM, Assigned: valentin)

References

Details

(Keywords: crash)

Attachments

(1 file, 1 obsolete file)

We seem to have a number of crashes in ~nsStandardURL that are hit in automation (see deps). The common theme seems to be Windows 7 as the underlying platform. Is there anyone on the networking team that can maybe investigate if there's an underlying cause that needs fixing here?
Flags: needinfo?(jduell.mcbugs)
Summary: Investigate intermittent Windows 7 [@ nsStandardURL::~nsStandardURL()] → Investigate intermittent Windows 7 [@ nsStandardURL::~nsStandardURL()] crashes hit in automation
Valentin: you've been looking at a lot of URI stuff.  Can you take this?  We can get you a Windows machine if you need one.
Flags: needinfo?(jduell.mcbugs) → needinfo?(valentin.gosu)
I've actually crossed paths with this issue while working on bug 1009648.
It only occurs on windows, in debug builds, and the crash point is the DumpLeakedURLs mechanism.
I'm not really sure if it's because of a compiler peculiarity on Windows, or if nsStandardURL::ShutdownGlobalObjects gets called at the same time with ::~nsStandardURL on different threads.
My guess is that it's a combination of the two.
I'll see what we can do...
Assignee: nobody → valentin.gosu
Flags: needinfo?(valentin.gosu)
Any news here? :)
Flags: needinfo?(valentin.gosu)
We're seeing this crashes on Windows XP lately too. For example:
https://treeherder.mozilla.org/logviewer.html#?job_id=4961339&repo=mozilla-inbound
Given that the race is fairly difficult to reproduce I think I might have to submit a patch to m-c and see if we get any more crashes.
Depends on: 1116316
Blocks: 1116316
No longer depends on: 1116316
(In reply to Valentin Gosu [:valentin] from comment #5)
> Given that the race is fairly difficult to reproduce I think I might have to
> submit a patch to m-c and see if we get any more crashes.

Jason, is there somebody who could look at this, or failing that, disable this code on Windows?  It is happening extremely frequently.  Bug 1145481 happened like 14 times today alone.  This is causing some severe tree stability issues.
Flags: needinfo?(jduell.mcbugs)
Summary: Investigate intermittent Windows 7 [@ nsStandardURL::~nsStandardURL()] crashes hit in automation → Investigate intermittent Windows 7 and 8 [@ nsStandardURL::~nsStandardURL()] crashes hit in automation
Nathan, could you try doing a TSan run with DEBUG_DUMP_URLS_AT_SHUTDOWN defined in netwerk/base/nsStandardURL.cpp?  You'd want to run browser/devtools/webconsole/test/browser_webconsole_view_source.js as that seems to be hitting some shutdown race condition all over the place in bug 1145481.  So you could try running that directory or however we run that.  (It is Windows-only, but presumably the race isn't platform specific.)  Thanks.
Flags: needinfo?(nfroyd)
I'm thinking that there's a URL being released off the main thread during shutdown.
The TSan results should provide some insight.
(In reply to Valentin Gosu [:valentin] from comment #8)
> I'm thinking that there's a URL being released off the main thread during
> shutdown.
> The TSan results should provide some insight.

Presumably we'd catch that with owning thread assertions, wouldn't we?  nsStandardURL isn't threadsafe.  Unless it's UAF or something...
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #9)
> Presumably we'd catch that with owning thread assertions, wouldn't we? 
> nsStandardURL isn't threadsafe.  Unless it's UAF or something...

Oh, I was confused.  I guess it is some URI thing that is threadsafe, not URL.  I can try running this test in an ASan build and see if turns up anything.
(In reply to Andrew McCreight [:mccr8] from comment #10)
> I can try running this test in an ASan build and see if turns up anything.
I ran an ASan build on this directory with NS_BUILD_REFCNT_LOGGING defined a few times, but it didn't turn up anything.
Honza, can I get your feedback on this issue?
My take on it is that the static DumpLeakedURLs gets destroyed while at the same time URLs are released on other threads. I suspect the reason this is only a problem on Windows is because the compiler calls the destructors for static objects in a slightly different way.

There are a few things we could do here:
1. Consider disabling DEBUG_DUMP_URLS_AT_SHUTDOWN for Windows until we figure out a better way of tracking leaked URLs.
2. Introduce some locking.
3. Assert that URLs are only destroyed on the main thread.

What do you think?
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(jduell.mcbugs)
Flags: needinfo?(honzab.moz)
nsStandardURL is not using threadsafe refcounting, so I don't think it should be getting destroyed on other threads.
I added a MOZ_ASSERT(NS_IsMainThread()) to the destructor, and it seems there are a bunch of them :)
(In reply to Valentin Gosu [:valentin] from comment #14)
> I added a MOZ_ASSERT(NS_IsMainThread()) to the destructor, and it seems
> there are a bunch of them :)

Hmm, I guess it is okay from a refcounting perspective if it is always on the same thread?  If they nsStandardURL are really being destroyed on multiple threads, then a quick fix would be to check if NS_IsMainThread() before touching the global variable.  I think for debugging purposes we only care about held by main thread objects.  Or just a lock, I guess.
(In reply to Andrew McCreight [:mccr8] from comment #7)
> Nathan, could you try doing a TSan run with DEBUG_DUMP_URLS_AT_SHUTDOWN
> defined in netwerk/base/nsStandardURL.cpp?  You'd want to run
> browser/devtools/webconsole/test/browser_webconsole_view_source.js as that
> seems to be hitting some shutdown race condition all over the place in bug
> 1145481.  So you could try running that directory or however we run that. 
> (It is Windows-only, but presumably the race isn't platform specific.) 
> Thanks.

I ran this through TSan a couple of times, but it didn't seem to want to run right, complaining about unimportable javascript modules.  This sort of thing isn't unheard of when running with TSan, and it sounds like y'all might have a better handle on what's going wrong anyway.
Flags: needinfo?(nfroyd)
The first patch in bug 1129784 seems to trigger this fairly consistently
(see https://bugzilla.mozilla.org/show_bug.cgi?id=1129784#c9 for a try push).

This is blocking me from landing that patch, so is there an ETA for a fix here?
Blocks: 1129784
Flags: needinfo?(valentin.gosu)
Attachment #8584798 - Flags: review?(honzab.moz)
(In reply to Valentin Gosu [:valentin] from comment #19)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=12706798e4d7

You canceled the needinfo without any further information. Will this patch resolve the problem when it lands?
(In reply to Eddy Bruel [:ejpbruel] from comment #20)
> (In reply to Valentin Gosu [:valentin] from comment #19)
> > https://treeherder.mozilla.org/#/jobs?repo=try&revision=12706798e4d7
> 
> You canceled the needinfo without any further information. Will this patch
> resolve the problem when it lands?

Yes. Sorry I didn't make that clear.
(In reply to Valentin Gosu [:valentin] from comment #21)
> (In reply to Eddy Bruel [:ejpbruel] from comment #20)
> > (In reply to Valentin Gosu [:valentin] from comment #19)
> > > https://treeherder.mozilla.org/#/jobs?repo=try&revision=12706798e4d7
> > 
> > You canceled the needinfo without any further information. Will this patch
> > resolve the problem when it lands?
> 
> Yes. Sorry I didn't make that clear.

Thanks. Honza, do you think you would be able to review this quickly? This would allow me to land bug 1129784.
Comment on attachment 8584798 [details] [diff] [review]
Only track leaked URLs on the main thread

Review of attachment 8584798 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab with comments addressed

::: netwerk/base/nsStandardURL.cpp
@@ +270,5 @@
>      // default parser in case nsIStandardURL::Init is never called
>      mParser = net_GetStdURLParser();
>  
>  #ifdef DEBUG_DUMP_URLS_AT_SHUTDOWN
> +    if (!NS_IsMainThread()) {

I'd rather see this as:

if (NS_IsMainThread()) {
  do the leak detect stuff...
}

Since once someone may add a code bellow it and may be nicely screwed...

applies to all the nsStandardURL blocks.

@@ +308,5 @@
>  DumpLeakedURLs::~DumpLeakedURLs()
>  {
> +    if (!NS_IsMainThread()) {
> +        return;
> +    }

I believe this will always happen on the main thread, but who knows on which thread win is calling this.  Maybe add an assertion?
Attachment #8584798 - Flags: review?(honzab.moz) → review+
Flags: needinfo?(honzab.moz)
Thanks Honza!
Attachment #8584798 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/c4c48aad5eb4
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Please nominate this for Aurora/Beta/b2g37 approval when you get a chance.
Flags: needinfo?(valentin.gosu)
Comment on attachment 8586027 [details] [diff] [review]
Only track leaked URLs on the main thread

Approval Request Comment
[Feature/regressing bug #]: old
[User impact if declined]: this causes intermittent failures on tree herder
[Describe test coverage new/current, TreeHerder]: this code runs very frequently
[Risks and why]: very low. This is debug-only code
[String/UUID change made/needed]: none
Flags: needinfo?(valentin.gosu)
Attachment #8586027 - Flags: approval-mozilla-aurora?
Comment on attachment 8586027 [details] [diff] [review]
Only track leaked URLs on the main thread

See previous comment.
Attachment #8586027 - Flags: approval-mozilla-beta?
Attachment #8586027 - Flags: approval-mozilla-b2g37?
Comment on attachment 8586027 [details] [diff] [review]
Only track leaked URLs on the main thread

Should be in 38 beta 3.
Attachment #8586027 - Flags: approval-mozilla-beta?
Attachment #8586027 - Flags: approval-mozilla-beta+
Attachment #8586027 - Flags: approval-mozilla-aurora?
Attachment #8586027 - Flags: approval-mozilla-aurora+
Attachment #8586027 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: