Closed Bug 1137274 Opened 10 years ago Closed 7 years ago

startup main thread crash in nsProtocolProxyService::SetupPACThread()

Categories

(MailNews Core :: Networking, defect)

x86
All
defect
Not set
critical

Tracking

(thunderbird38- affected, thunderbird40 affected, thunderbird41 affected, thunderbird48 affected, thunderbird49 affected, thunderbird_esr38 affected, thunderbird_esr45 affected)

VERIFIED WORKSFORME
Tracking Status
thunderbird38 - affected
thunderbird40 --- affected
thunderbird41 --- affected
thunderbird48 --- affected
thunderbird49 --- affected
thunderbird_esr38 --- affected
thunderbird_esr45 --- affected

People

(Reporter: wsmwk, Unassigned)

References

Details

(Keywords: crash, regression, topcrash-thunderbird, Whiteboard: [blocked on bug 791645][startupcrash][addon:foxyproxy][workaround:AutoProxy])

Crash Data

New signature starting 38.0a2 report bp-fefb1053-2f74-4538-a961-aadfe2150225. ============================================================= 0 xul.dll nsProtocolProxyService::SetupPACThread() netwerk/base/nsProtocolProxyService.cpp 1 xul.dll nsProtocolProxyService::Resolve_Internal(nsIChannel*, nsProtocolInfo const&, unsigned int, bool*, nsIProxyInfo**) netwerk/base/nsProtocolProxyService.cpp 2 xul.dll nsProtocolProxyService::DeprecatedBlockingResolve(nsIChannel*, unsigned int, nsIProxyInfo**) netwerk/base/nsProtocolProxyService.cpp 3 xul.dll MsgExamineForProxy(nsIChannel*, nsIProxyInfo**) c:/builds/moz2_slave/tb-c-aurora-w32-ntly-000000000/build/mailnews/base/util/nsMsgUtils.cpp:2118 4 xul.dll nsPop3Protocol::LoadUrl(nsIURI*, nsISupports*) c:/builds/moz2_slave/tb-c-aurora-w32-ntly-000000000/build/mailnews/local/src/nsPop3Protocol.cpp:990 5 xul.dll nsPop3Service::GetMail(bool, nsIMsgWindow*, nsIUrlListener*, nsIMsgFolder*, nsIPop3IncomingServer*, nsIURI**) c:/builds/moz2_slave/tb-c-aurora-w32-ntly-000000000/build/mailnews/local/src/nsPop3Service.cpp:138 http://hg.mozilla.org/releases/mozilla-aurora/annotate/57f387aaa54b/netwerk/base/nsProtocolProxyService.cpp#l987 mcmanus@111917 985 if (mSystemProxySettings && mcmanus@111917 986 NS_SUCCEEDED(mSystemProxySettings->GetMainThreadOnly(&mainThreadOnly)) && mcmanus@111917 987 !mainThreadOnly) {
Top crash, but probably only one user. I looked at it and cannot make sense of it unfortunately.
Mostly seen in 38.0a2, and now #1 crash for 38 beta. Some rare examples in 31.5.0 (only 2-3) like bp-69918d26-7b5f-46af-b4d4-1d9232150308 bp-9299893e-1a65-4a2f-89ac-f83e82150308 some (perhaps the majority of) stacks are 0 xul.dll nsProtocolProxyService::SetupPACThread() netwerk/base/nsProtocolProxyService.cpp 1 xul.dll nsProtocolProxyService::Resolve_Internal(nsIChannel*, nsProtocolInfo const&, unsigned int, bool*, nsIProxyInfo**) netwerk/base/nsProtocolProxyService.cpp 2 xul.dll nsProtocolProxyService::DeprecatedBlockingResolve(nsIChannel*, unsigned int, nsIProxyInfo**) netwerk/base/nsProtocolProxyService.cpp 3 xul.dll MsgExamineForProxy(nsIChannel*, nsIProxyInfo**) c:/builds/moz2_slave/tb-c-aurora-w32-ntly-000000000/build/mailnews/base/util/nsMsgUtils.cpp:2118 Others along the lines of 0 xul.dll nsProtocolProxyService::SetupPACThread() netwerk/base/src/nsProtocolProxyService.cpp 1 xul.dll nsProtocolProxyService::Resolve_Internal(nsIURI*, nsProtocolInfo const&, unsigned int, bool*, nsIProxyInfo**) netwerk/base/src/nsProtocolProxyService.cpp 2 xul.dll nsProtocolProxyService::AsyncResolveInternal(nsIURI*, unsigned int, nsIProtocolProxyCallback*, nsICancelable**, bool) netwerk/base/src/nsProtocolProxyService.cpp 3 xul.dll nsProtocolProxyService::AsyncResolve2(nsIURI*, unsigned int, nsIProtocolProxyCallback*, nsICancelable**) netwerk/base/src/nsProtocolProxyService.cpp 4 xul.dll mozilla::net::nsHttpChannel::ResolveProxy() netwerk/protocol/http/nsHttpChannel.cpp 5 xul.dll mozilla::net::nsHttpChannel::AsyncOpen(nsIStreamListener*, nsISupports*) netwerk/protocol/http/nsHttpChannel.cpp 6 xul.dll nsURILoader::OpenURI(nsIChannel*, unsigned int, nsIInterfaceRequestor*) uriloader/base/nsURILoader.cpp 7 xul.dll nsDocShell::DoChannelLoad(nsIChannel*, nsIURILoader*, bool) docshell/base/nsDocShell.cpp
Summary: main thread crash in nsProtocolProxyService::SetupPACThread() → startup main thread crash in nsProtocolProxyService::SetupPACThread()
Whiteboard: [startupcrash]
xref bug 791645 since DeprecatedBlockingResolve is what gets us there
The only theory that I have been able to come up with for this crash is that mSystemProxySettings is somehow being changed by another thread. I believe that this code is in initialization code, so perhaps we could call initialization early in the startup process? We did that as a workaround in another bug.
Version: unspecified → 38
(In reply to Kent James (:rkent) from comment #1) > ... probably only one user. only about 20% of the crashes, which would still be a topcrash if we removed that one user. (In reply to Magnus Melin from comment #3) > xref bug 791645 since DeprecatedBlockingResolve is what gets us there Does fixing that help us? Or is there more given comment 4
Flags: needinfo?(mkmelin+mozilla)
I'm guessing it probably would make the code do things differently enough that it would no longer be a problem, or at least a different problem.
Flags: needinfo?(mkmelin+mozilla)
(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #5) > (In reply to Kent James (:rkent) from comment #1) > > ... probably only one user. > > only about 20% of the crashes, which would still be a topcrash if we removed > that one user. Actually 40% for that user f...@orange.fr, but still topcrash if we subtract him out.
Depends on: 791645
This crash occurs in the if() check in this code: nsProtocolProxyService::SetupPACThread() { ... if (mSystemProxySettings && NS_SUCCEEDED(mSystemProxySettings->GetMainThreadOnly(&mainThreadOnly)) && !mainThreadOnly) { The only interpretation I have been able to give of this is that mSystemProxySettings->GetMainThreadOnly is being called with null mSystemProxySettings, which can only happen if mSystemProxySettings is being changed in another thread. mcmanus, is that a likely interpretation? If so, would some mutex-like protection around the variable help? This is likely to be the #1 crash for Thunderbird after initial release, until enough affected users abandon Thunderbird. The crash also occurs in Firefox, though rarely.
Flags: needinfo?(mcmanus)
(In reply to Kent James (:rkent) from comment #8) > This crash occurs in the if() check in this code: > > nsProtocolProxyService::SetupPACThread() > { > ... > if (mSystemProxySettings && > > NS_SUCCEEDED(mSystemProxySettings->GetMainThreadOnly(&mainThreadOnly)) && > !mainThreadOnly) { > > The only interpretation I have been able to give of this is that > mSystemProxySettings->GetMainThreadOnly is being called with null > mSystemProxySettings, which can only happen if mSystemProxySettings is being > changed in another thread. > > mcmanus, is that a likely interpretation? If so, would some mutex-like > protection around the variable help? I'm not sure I totally understand your threading, but the only place that mSystemProxySettings gets set is out of the init() method, which should be main thread only.. and SetupPACThread should also be main thread only.. That test intends to run on the man thread to find out if its ok to run systemproxysettings functions off the main thread (which is highly desirable). > > This is likely to be the #1 crash for Thunderbird after initial release, > until enough affected users abandon Thunderbird. The crash also occurs in > Firefox, though rarely.
Flags: needinfo?(mcmanus)
linux nsProtocolProxyService::ApplyFilters bp-c3e7cb71-00ac-478a-bebd-476182150501
Crash Signature: [@ nsProtocolProxyService::SetupPACThread()] → [@ nsProtocolProxyService::SetupPACThread()] [@ nsProtocolProxyService::ApplyFilters]
OS: Windows NT → All
I don't have any good ideas about how to proceed with this bug. The issues here are not critical enough to block the release, so I'm going to drop the tracking flag. The issues here are probably internal to the proxy management code, but they will not have much incentive to work on his since we are using a special, deprecated case. We need to switch to the current async implementation.
Not sure how TB is kept in sync with the code base of MC but I can't help but notice this happened around the same time as bug 436344 got fixed.
(In reply to Jesper Hansen from comment #13) > Not sure how TB is kept in sync with the code base of MC but I can't help > but notice this happened around the same time as bug 436344 got fixed. What do you think?
Flags: needinfo?(rkent)
Flags: needinfo?(mkmelin+mozilla)
(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #14) > (In reply to Jesper Hansen from comment #13) > > Not sure how TB is kept in sync with the code base of MC but I can't help > > but notice this happened around the same time as bug 436344 got fixed. > > What do you think? Comment 3 points in the same direction. Basically, the proxy service was made async but they left in the older sync part for applications (such as Thunderbird) that had not updated. Something though must have been introduced at that point that is causing these issues. Because we are not using the same base calls as Firefox, it does not get much attention. I'm hoping that the comments in bug 1173837 that the crash occurs with the Foxy Proxy addon enabled will allow a way to reproduce this. I don't think the issue is the addon itself, but any way to make this reproducible might lead us to the root cause.
Flags: needinfo?(rkent)
bug 1125372 should be seen in addition to fixing bug 436344
Flags: needinfo?(mkmelin+mozilla)
Since Bug 1173837 has been duplicated, I think maybe I should paste my comment here: " Just add a confirmation. I updated to TB38 on release channel today and get constant crash on startup. Did a lot of experiments, and finally located to extension conflict of FoxyProxy. I switched to AutoProxy now and TB is now seems working OK. "
#9 topcrash is nsPACMan::IsPACURI(nsIURI*) bp-71605ed8-735f-417e-a704-9647f2150615 bp-55b89ff3-a6b8-4eb7-9baf-ec7382150612 #13 is nsProtocolProxyService::ApplyFilter #3 is nsProtocolProxyService::SetupPACThread
Crash Signature: [@ nsProtocolProxyService::SetupPACThread()] [@ nsProtocolProxyService::ApplyFilters] → [@ nsProtocolProxyService::SetupPACThread()] [@ nsProtocolProxyService::ApplyFilters] [@ nsPACMan::IsPACURI(nsIURI*) ]
Whiteboard: [startupcrash] → [startupcrash][addon:foxyproxy][workaround:AutoProxy]
So I've investigated further, all three of the proxy crashes are FoxyProxy related. Looking at the source, that addon overrides the proxy service at the XPCOM level, which is quite fragile. The situation was not nearly as clear in the betas, but I sampled about 30 reports in 38.0.1 and every one had foxyproxy installed, plus I duplicate these exact crashes when I run it myself. So we can probably get rid of at least 95 % of these crashes by disabling the addon.
As far as I know, Foxyproxy is the only auto-switch-proxy solution that supports Gmail. AutoProxy only supports http(s) protocol, and has stopped developing. Hope you can solve this, it's very important for chinese users.
Think I'll tell this: asyncResolve is not asynchronous by the code in Firefox (see bug 1152332) or Thunderbird. I'll give these as an example of asyncResolve calling our aasyncResolve synchronously: console.log: /////////////// asyncResolve2 console.log: // 1 asyncResolve console.log: // 2 aasyncResolve console.log: ****/*/////***//// applyFilter: "https://www.mozilla.org/thunderbird/img/tb5/start/tb-daily-logo.png" console.log: // 3b aasyncResolve console.log: // 4b aasyncResolve console.log: /////////////// asyncResolve2 console.log: // 1 asyncResolve console.log: // 2 aasyncResolve console.log: ****/*/////***//// applyFilter: "http://clients1.google.com/ocsp" console.log: // 3b aasyncResolve console.log: // 4b aasyncResolve console.log: /////////////// asyncResolve2 console.log: // 1 asyncResolve console.log: // 2 aasyncResolve console.log: ****/*/////***//// applyFilter: "http://ocsp.digicert.com/" console.log: // 3b aasyncResolve console.log: // 4b aasyncResolve ---- Notice how it goes 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3 4. Now in bug 1152332 I described that the Chat in Thunderbird depends on asyncResolve returning straight away as onProxyAvailable couldn't be called before then. What I went ahead and did was to ensure that asyncResolve became async in the addon as it was otherwise sync in nature, instead proving this result[1]: console.log: /////////////// asyncResolve2 console.log: // 1 asyncResolve console.log: /////////////// asyncResolve2 console.log: // 1 asyncResolve console.log: // 2 aasyncResolve console.log: ****/*/////***//// applyFilter: "https://www.mozilla.org/thunderbird/img/tb5/start/tb-daily-logo.png" console.log: // 3b aasyncResolve console.log: // 4b aasyncResolve console.log: // 2 aasyncResolve console.log: ****/*/////***//// applyFilter: "http://clients1.google.com/ocsp" console.log: // 3b aasyncResolve console.log: // 4b aasyncResolve console.log: /////////////// asyncResolve2 console.log: // 1 asyncResolve console.log: // 2 aasyncResolve console.log: ****/*/////***//// applyFilter: "http://ocsp.digicert.com/" console.log: // 3b aasyncResolve console.log: // 4b aasyncResolve --- Notice how it goes 1, 1, 2, 3, 4, 2, 3, 4, 1, 2, 3, 4. Now this was required to fix a bug in the Chat of Thunderbird, but it also helped immensely on speeding up proxying. Altho it might not look much as the timers are still executed on the main thread. We are thinking of moving those however into a threaded worker of sorts, but those are not supported by all versions of Firefox, so we're trying this for compatibility until we bump the min-version up. The bug described in this report happens a lot more often because of this change in the addon, but only in Thunderbird. But I must add that it keeps happening even if I revert to the old method. ---------------------------------------- What matters: Now looking at what happens in nsProtocolProxyService.cpp ApplyFilters[2] (since that's where I crashed last): mFilters has been cleared during execution or modified. I tried to put breakpoints into the two methods that modifies mFilters without result. Are they modified outside of nsProtocolProxyService.cpp? [1] http://code.getfoxyproxy.org/Plugin/commit/?id=c365a2bbbdd8052e50c80739d99a5e679df45873 [2] http://mxr.mozilla.org/comm-central/source/mozilla/netwerk/base/nsProtocolProxyService.cpp#1900
Hi Jesper, We're happy to work with you trying to get Foxy Proxy working again in Thunderbird. We can even land patches in the underlying mozilla code for Thunderbird builds that are not accepted in Firefox builds. But in the short run I have little choice but to force disabling of Foxy Proxy in Thunderbird 38. Users crash soon after startup, and most don't know it is Foxy Proxy causing this. For the first few weeks after release this will be one of our largest crashers, then go down because people will just abandon Thunderbird or the update. As for the actual bug, I spent quite a bit of time in the debugger last night looking at this, and the issue appears to have something to do with the objects being released prematurely, so I would not expect to see an explicit reset of variables like mFilters. The debugger though seemed to be quite confused about what the actual object is, which may be related to your override of the underlying XPCOM object. Had you asked me a week ago if what you were trying to do was even possible, I would have said no. (But I know something about the impossible, my ExQuilla addon solves essentially the same "impossible" problem of overriding a C++ method with a Javascript object, though in a completely different manner). I hope you succeed, but this is a very difficult problem, and AFAIK unsupported in the Mozilla platform.
I asked jorgev (AMO lead) to mark foxyproxy and foxyproxy-basic as incompatible with Thunderbird 38 and above, as a lot of users are getting hit by this as they update.
foxyproxy installed - NO proxy settings made, one imap account, and checking for mail on startup disabled... TB 36.0a1 2014-11-25 works TB 37.0a1 2014-12-24 works TB 38.0a1 2015-01-17 works TB 38.0a1 2015-01-31 fails So there is a 2 week window to explore
Crash rates are way down since disabling. I'm going to stop tracking for now.
(In reply to Kent James (:rkent) from comment #22) > Hi Jesper, > > We're happy to work with you trying to get Foxy Proxy working again in > Thunderbird. We can even land patches in the underlying mozilla code for > Thunderbird builds that are not accepted in Firefox builds. Jesper, in what bug #s are the fixes being worked? And, is bug 436344 the best choice of bugs to blame for introducing the crashes?
Flags: needinfo?(jesper)
(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #27) > (In reply to Kent James (:rkent) from comment #22) > > Hi Jesper, > > > > We're happy to work with you trying to get Foxy Proxy working again in > > Thunderbird. We can even land patches in the underlying mozilla code for > > Thunderbird builds that are not accepted in Firefox builds. > > Jesper, in what bug #s are the fixes being worked? > > And, is bug 436344 the best choice of bugs to blame for introducing the > crashes? Bug 436344 is the single source of this bug since it removed DeprecatedBlockingResolve from the idl. Bug 791645 is the workplace.
Flags: needinfo?(jesper)
Blocks: 436344
See Also: → 1152332
See Also: → 1183890, 1183733
Crash Signature: [@ nsProtocolProxyService::SetupPACThread()] [@ nsProtocolProxyService::ApplyFilters] [@ nsPACMan::IsPACURI(nsIURI*) ] → [@ nsProtocolProxyService::SetupPACThread()] [@ nsProtocolProxyService::ApplyFilters] [@ nsPACMan::IsPACURI(nsIURI*) ] [@ nsProtocolProxyService::SetupPACThread] [@ nsPACMan::IsPACURI ]
(In reply to Lu Wei from comment #20) > As far as I know, Foxyproxy is the only auto-switch-proxy solution that > supports Gmail. AutoProxy only supports http(s) protocol, and has stopped > developing. Hope you can solve this, it's very important for chinese users. I use Foxyproxy to switch the proxys for different email accounts. For example, I have one Foxyproxy string pattern that will enable the proxy for imap of the gmail through a local socks5 proxy, and for other email accounts, I don't need proxies. Currently Foxyproxy is the only add-on I can use for such purpose. So, I'm forced to use an old Thunderbird version. (But I do need the new Thunderbird feature such as the recently fixed bug653342, this quite important for Chinese users) Hope this bug will be fixed soon. Thanks. Asmwarrior
Whiteboard: [startupcrash][addon:foxyproxy][workaround:AutoProxy] → [blocked on bug 791645][startupcrash][addon:foxyproxy][workaround:AutoProxy]
Component: Networking: POP → Networking
This appears to be gone. No crashes past version 48.0b99
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Asmwarrior reports no crashes and I don't find any obviously related new crash signatures
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.