Closed Bug 1035075 Opened 5 years ago Closed 5 years ago

crash in mozilla::net::ProxyAutoConfig::MyIPAddress(JS::CallArgs const&)

Categories

(Core :: Networking: HTTP, defect, critical)

32 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla34
Tracking Status
firefox31 --- wontfix
firefox32 - wontfix
firefox33 + fixed
firefox34 + fixed
firefox-esr31 33+ fixed

People

(Reporter: andrei, Assigned: dragana)

References

()

Details

(Keywords: crash)

Crash Data

Attachments

(1 file, 2 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-f761b777-af4a-45f5-afbc-77a7f2140707.
=============================================================

We experienced this crash via Mozmill. Looks to be  some sort of proxy issue.
(As additional data, we do have custom proxies set up on the machine where the crash occurred).

Not sure what more info I can get here. The test log doesn't have anything useful:
> 03:29:03 TEST-END | testToolbar/testHomeButton.js | finished in 328ms
> 03:29:06 *** WIFI GEO: shutdown called
> 03:29:06 mozcrash INFO | Saved minidump as /Users/mozauto/Library/Application Support/Firefox/Crash Reports/pending/9FA0C864-F6F9-4F73-ADE6-35D856F445BE.dmp
> 03:29:06 mozcrash INFO | Saved app info as /Users/mozauto/Library/Application Support/Firefox/Crash Reports/pending/9FA0C864-F6F9-4F73-ADE6-35D856F445BE.extra
> 03:29:06 PROCESS-CRASH | /Users/mozauto/jenkins/workspace/mozilla-aurora_functional/data/mozmill-tests/firefox/tests/functional/testToolbar/testHomeButton.js | application crashed [Unknown top frame]
> 03:29:06 Crash dump filename: /Users/mozauto/jenkins/workspace/mozilla-aurora_functional/data/profile/minidumps/9FA0C864-F6F9-4F73-ADE6-35D856F445BE.dmp
> 03:29:06 No symbols path given, can't process dump.
> 03:29:06 MINIDUMP_STACKWALK not set, can't process dump.
> 03:29:06 RESULTS | Passed: 105
Crash-stats shows the function MyIPAddress crashing in multiple locations lately:
https://crash-stats.mozilla.com/query/?query_type=simple&query=MyIPAddress
OS: Mac OS X → All
Version: unspecified → 32 Branch
steve, want to put this on your queue?
Flags: needinfo?(sworkman)
Done.
Assignee: nobody → sworkman
Status: NEW → ASSIGNED
Flags: needinfo?(sworkman)
based on comment 0 and some comments in another bug I am going to dup with this, I think its a shutdown issue.
Duplicate of this bug: 1020506
Attached patch 1035075.1 (obsolete) — Splinter Review
Assignee: sworkman → mcmanus
Attachment #8458891 - Flags: review?(sworkman)
Comment on attachment 8458891 [details] [diff] [review]
1035075.1

Review of attachment 8458891 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for taking this Pat. Sorry I didn't get to it sooner.
Changes look good r=me.
Attachment #8458891 - Flags: review?(sworkman) → review+
Crashes were still happening with the backout, so I relanded this. https://hg.mozilla.org/integration/mozilla-inbound/rev/f698c8166457
Flags: needinfo?(mcmanus)
Two other notes:
 * this orange is opt mochitest-7, not debug.  (They're split into different numbers of chunks.)
 * the orange is not 100% of the time, but probably somewhere around 80% of the time
wild guess about the orange: my patch makes the observer code synchronously wait for the pac thread to shutdown. Before it shuts down it actually cancels all the resolutions in the pac queue (it has always done this) - and now the main thread waits for those cancels to happen.

its possible that in the case of the orange some extra requests are being canceled due to the synchronous wait. If this was the case the test was probably always technically racy.

if that guess is right, two options might be:
 * figure out what aspect of the test is dependent on pac resolution after the shutdown observers run and fix it
 * remove the cancel semantics at gecko shutdown time - but probably keep them at reload time
Flags: needinfo?(mcmanus)
dragana offered to try and push this one over the finish line - Thanks!
Assignee: mcmanus → dd.mozilla
I was trying to figure out how ProxyAutoConfig works. 
isn't it the problem that mPAC gets destroyed but sRunning is not set to nullptr? maybe i not sure if i got it right, would it be enough just to set sRunning to nullptr in destructor?
Flags: needinfo?(mcmanus)
(In reply to Dragana Damjanovic from comment #16)
> I was trying to figure out how ProxyAutoConfig works. 
> isn't it the problem that mPAC gets destroyed but sRunning is not set to
> nullptr? maybe i not sure if i got it right, would it be enough just to set
> sRunning to nullptr in destructor?

hey Dragana,

proxyautoconfig::sRunning is only non-null while the js engine is actually executing - its only there so that some C++ helper functions that are bound to pac specific js functions can find the object data.

I agree with your inference that the stack we see is UAF of sRunning. however, I think its because mPAC is being destroyed from another thread while the pac thread is actually executing the JS. resetting sRunning when that happens isn't going to solve the problem - the whole js context is also destroyed. If the js engine calls a c++ helper funcion and sRunning is null, something is very wrong.

I think this is happening because nspacman::shutdown() is not synchronous with joining the pac thread. the dtor that does try and join the pac thread (thread->shutdown()) only happens synchronously on the main thread, if the dtor is reached from another thread (I suspect it is the socket thread in this case - making 3 threads in question) then that thread join is proxied back to the main thread and perhaps not exectued before mPAC is destroyed.. that would be consistent with the stack trace.

My patch tries to fix that by having the shutdown() (which happens on the main thread anyhow) also join the pac thread.. that means it will wait until the pac thread has run all of its events and isn't using a js context.

does that help?

of course that still leaves the orange it created to deal with.
Flags: needinfo?(mcmanus)
Attached patch fix v2 (obsolete) — Splinter Review
The problem was with thread shutdown() function.  This function blocks current execution but continue serving the thread queue. So mPACMan->Shutdown() is called and blocked and the next event in the main thread queue is called which again calls mPACMan->Shutdown().

So now I dispatch a runnable to wait for the PAC thread to shutdown.
Attachment #8458891 - Attachment is obsolete: true
Attachment #8472297 - Flags: review?(mcmanus)
Comment on attachment 8472297 [details] [diff] [review]
fix v2

Review of attachment 8472297 [details] [diff] [review]:
-----------------------------------------------------------------

> The problem was with thread shutdown() function.  This function blocks
> current execution but continue serving the thread queue. So
> mPACMan->Shutdown() is called and blocked and the next event in the main
> thread queue is called which again calls mPACMan->Shutdown().

That's a good diagnosis. I had forgotten that shutdown() isn't just join() - it runs events on the current thread.

Generally re-entrant stuff is a good way to have bugs - so let's go with both your patch and a simple little addition that checks ::shutdown for double calling.

r=mcmanus if that tests out ok for you

::: netwerk/base/src/nsPACMan.cpp
@@ +124,5 @@
> +  }
> +
> +  NS_IMETHODIMP Run()
> +  {
> +    NS_ABORT_IF_FALSE(NS_IsMainThread(), "wrong thread");

if (mPACMan->mPACThread) {}

@@ +312,5 @@
>  
>  void
>  nsPACMan::Shutdown()
>  {
>    NS_ABORT_IF_FALSE(NS_IsMainThread(), "pacman must be shutdown on main thread");

if (mShutdown) {
 return;
}

// that might avoid some re-entrancy problems.
Attachment #8472297 - Flags: review?(mcmanus) → review+
Attached patch fix v3Splinter Review
Attachment #8472297 - Attachment is obsolete: true
Attachment #8472868 - Flags: review+
Attachment #8472868 - Flags: review+
Attachment #8472868 - Flags: review?(mcmanus)
Comment on attachment 8472868 [details] [diff] [review]
fix v3

Review of attachment 8472868 [details] [diff] [review]:
-----------------------------------------------------------------

When I said "r=mcmanus if that tests out ok for you" that meant you didn't need to r? it again as long as you made and tested the changes I indicated (which you did! Thanks!). If you didn't like those suggestions, then the r? cycle would have to go around the merry go round again.
Attachment #8472868 - Flags: review?(mcmanus) → review+
Duplicate of this bug: 1054897
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/8b5435b03ae8
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Flags: qe-verify-
We've just experienced this crash on 32 (release) again:
https://crash-stats.mozilla.com/report/index/384aa689-8cec-4493-a725-1d3282140826

Should the fix be uplifted at least to 33?
(Given that 32 is already out the door)
Flags: needinfo?(ryanvm)
Flags: needinfo?(dd.mozilla)
Questions like that should be addressed to the bug's assignee, not the guy who landed the patch for them.
Flags: needinfo?(ryanvm)
[Tracking Requested - why for this release]:
I'm not sure about the impact of this crash and any side-effects it could cause, but I think backporting it to 33 and 31ESR would be good. The crash ADI is not that high for this crash but it looks like that people on 31.0 see this most often:

Firefox 	31.0 	93.67 %	429
Firefox 	32.0b8 	2.40 %	11
Firefox 	32.0b7 	1.31 %	6
Firefox 	33.0a2 	0.44 %	2 

Dragana may be able to give better details about possible side-effects.
Based on comment 28, I'm marking this as won't fix in 32 (already at RC1). We can fix this in 33 and 31.2.0 (ships alongside 33).
Flags: needinfo?(dd.mozilla)
Dragana - Now that this has been on m-c for a week, can you please request approval to uplift to 33 (aurora) and ESR31?
Flags: needinfo?(dd.mozilla)
Comment on attachment 8472868 [details] [diff] [review]
fix v3

Hi--I'm filling this out for :mcmanus, who's out for a week.  Sorry about imperfect information as a result.

Approval Request Comment
[Feature/regressing bug #]: not sure exactly :( but we're seeing crashes back to 31
[User impact if declined]: crashes 
[Describe test coverage new/current, TBPL]:  comment 10 seems to indicate we're hitting this codepath during testing.
[Risks and why]: low risk: just adds logic to wait during shutdown, and patch has been baking on m-c for a week
[String/UUID change made/needed]: no

If this is not a sec:{high,crit} bug, please state case for ESR consideration:
crashing on ESR (see comment 28)
Attachment #8472868 - Flags: approval-mozilla-esr31?
Attachment #8472868 - Flags: approval-mozilla-aurora?
Flags: needinfo?(dd.mozilla)
Attachment #8472868 - Flags: approval-mozilla-esr31?
Attachment #8472868 - Flags: approval-mozilla-esr31+
Attachment #8472868 - Flags: approval-mozilla-aurora?
Attachment #8472868 - Flags: approval-mozilla-aurora+
Duplicate of this bug: 1050187
You need to log in before you can comment on or make changes to this bug.