Closed
Bug 581628
Opened 15 years ago
Closed 15 years ago
Segfaults in nsUrlClassifierDBService.cpp when homedir is inaccessible [@ nsUrlClassifierDBServiceWorker::ApplyUpdate][@ nsUrlClassifierDBServiceWorker::FinishUpdate()]
Categories
(Toolkit :: Safe Browsing, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
status1.9.2 | --- | .14-fixed |
People
(Reporter: skunk, Assigned: skunk)
References
Details
(Keywords: crash, Whiteboard: [qa-examined-192])
Crash Data
Attachments
(1 file)
919 bytes,
patch
|
sdwilsh
:
review+
bzbarsky
:
approval2.0+
dveditz
:
approval1.9.2.14+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.7) Gecko/20100716 Ubuntu/10.04 (lucid) Firefox/3.6.7
Build Identifier: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.7) Gecko/20100716 Ubuntu/10.04 (lucid) Firefox/3.6.7
This issue was originally reported in Ubuntu's Launchpad tracker. I am filing it here since it reaches fairly deep into the code, and is likely not an artifact of Ubuntu's modifications.
https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/585061
I use Firefox on a system in which home directories are served from an AFS file server. When a user's Kerberos authentication expires, access to the entire home directory is cut off, and all attempts to read or write to it fail with EACCES.
This occurs regularly on my employee workstation: I stay logged in for weeks at a time, with a more-or-less permanent instance of Firefox running, and every night after I leave work my authentication expires and Firefox sits for a few hours without any access to its configuration files under ~/.mozilla/. When I return in the morning, I renew my authentication, and my home directory becomes accessible again, but by this point Firefox has usually vanished without a trace.
I debugged this problem using a local build of (Ubuntu-patched) Firefox, and found where the segfaults were occurring: nsUrlClassifierDBService.cpp. The code assumes in a couple of places that mConnection is non-NULL, but every time the browser crashed on me, that assumption had been broken.
I am attaching a preliminary patch that, for the past few months, has pretty conclusively eliminated the crashes for me. Firefox now always makes it till the morning when I get in, sometimes with a couple "A script has become unresponsive..." dialogs, but otherwise ready to go once I renew my authentication.
This patch isn't meant to be committed as-is, but it should point out the problems with the current code.
Reproducible: Sometimes
Steps to Reproduce:
1. Make heavy use of Firefox (lots of open tabs, lots of background JS running, etc.)
2. Cut off access to $HOME for a few hours, cold-turkey
3. Wait for segfault
Assignee | ||
Comment 1•15 years ago
|
||
![]() |
||
Comment 2•15 years ago
|
||
Over to necko for lack of a better option... Benjamin, dcamp, do you know whether anyone maintains URLClassifier at this point?
Status: UNCONFIRMED → NEW
Component: General → Networking
Ever confirmed: true
Keywords: crash
QA Contact: general → networking
![]() |
||
Updated•15 years ago
|
Component: Networking → Phishing Protection
Product: Core → Firefox
QA Contact: networking → phishing.protection
![]() |
||
Comment 3•15 years ago
|
||
Jesse says maybe ddahl is planning to look at this code? This could be a nice limited place to start. ;)
See Also: → https://launchpad.net/bugs/585061
Comment 4•15 years ago
|
||
dwitte and sdwilsh have both proposed rewriting it, although they are both busy folk. I think the original component was correct, though.
Comment 5•15 years ago
|
||
I can probably review this, but it's going to be at least a week...
![]() |
||
Updated•15 years ago
|
Attachment #459993 -
Flags: review?(sdwilsh)
daniel: for the future, please try to use hg diff with --git -p, see https://developer.mozilla.org/en/Mercurial_FAQ
and don't add // foo after }s
Summary: Segfaults in nsUrlClassifierDBService.cpp when homedir is inaccessible → Segfaults in nsUrlClassifierDBService.cpp when homedir is inaccessible [@ nsUrlClassifierDBServiceWorker::ApplyUpdate][@ nsUrlClassifierDBServiceWorker::FinishUpdate()]
Assignee | ||
Comment 7•15 years ago
|
||
Timeless, as I said, this patch is not intended to be committed as-is. It only serves to indicate the specific locations where the segfaults are occurring (in addition to providing a simple workaround that proves the point).
Why did you assign this bug to me? I am not a Mozilla developer.
![]() |
||
Comment 8•15 years ago
|
||
I think timeless was under the impression that the patch was something to aim to check in (in which case the patch author would normally be the assignee; anyone who posts a patch is sorta a "Mozilla developer" if you think about it... ;) ).
Assigning back to default.
Assignee: skunk → nobody
Comment 9•15 years ago
|
||
(In reply to comment #5)
> I can probably review this, but it's going to be at least a week...
sdwilsh: If you can review that would be awesome, we are in heavy blockerland with DevTools features right now.
Comment 10•15 years ago
|
||
does this mean I don't need to review this?
![]() |
||
Comment 11•15 years ago
|
||
Well, we need someone to fix the bug. If the attached patch is acceptable, then we should check it in. If not, why not? We should at least get _that_ info in the bug so someone can pick it up as needed.
Assignee | ||
Comment 12•15 years ago
|
||
There may be a deeper problem going on ("why is mConnection NULL?", "why are these methods being called when mConnection is NULL?"), but for now, just sidestepping the NULL-dereferences would be great.
I would just eyeball the patch, and make appropriate changes manually. It's pretty trivial.
Comment 13•15 years ago
|
||
(In reply to comment #12)
> There may be a deeper problem going on ("why is mConnection NULL?", "why are
> these methods being called when mConnection is NULL?"), but for now, just
> sidestepping the NULL-dereferences would be great.
For what it's worth, I think we need to figure out those questions first. Blindly wallpapering over the issue is not the way to go. Clearly some invariants held by the code are being broken here, and it could mean that other things are broken as a result.
![]() |
||
Comment 14•15 years ago
|
||
> why is mConnection NULL?
Probably because nsUrlClassifierDBServiceWorker::OpenDb failed?
> why are these methods being called when mConnection is NULL?
This is a good question. BeginUpdate propagates the error out, but is someone eating it later on or something? nsUrlClassifierStreamUpdater::DownloadUpdates seems to check the return value...
![]() |
||
Comment 15•15 years ago
|
||
On the other hand, nsUrlClassifierDBService::BeginUpdate uses an async proxy. Wouldn't that lose the error?
Comment 16•15 years ago
|
||
Ugh, yes.
Assignee | ||
Comment 17•15 years ago
|
||
I don't think anyone is against tracking down the "real" problem in this bug. But given that everyone here has plenty on their plates, and that debugging this issue appears non-trivial, putting in a workaround till a fix is found would ease the practical impact of this bug.
(You can have it print a GLib warning or somesuch to the terminal so that devs don't forget it's there. Not like it'll come up terribly often, anyway.)
For my part, my patched Firefox has worked very well---and my use case is pretty punishing. I still get some long-running-instance wonkiness at times, and maybe my changes don't help with that. But compared to a straight-up SIGSEGV, I'll take the wonking any day.
Comment 18•15 years ago
|
||
Comment on attachment 459993 [details] [diff] [review]
Preliminary patch
We should just take this as a stopgap. Not sure when someone will have cycles to look into how we could ever get into this state.
Attachment #459993 -
Flags: review?(sdwilsh) → review+
![]() |
||
Updated•15 years ago
|
Assignee: nobody → skunk
![]() |
||
Updated•15 years ago
|
Attachment #459993 -
Flags: approval2.0+
![]() |
||
Comment 19•15 years ago
|
||
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 20•15 years ago
|
||
Can the patch be pushed to 3.6.x? (Just noticed it's not in 3.6.13.)
Updated•15 years ago
|
Attachment #459993 -
Flags: approval1.9.2.14?
Comment 21•15 years ago
|
||
Comment on attachment 459993 [details] [diff] [review]
Preliminary patch
Approved for 1.9.2.14, a=dveditz for release-drivers
Attachment #459993 -
Flags: approval1.9.2.14? → approval1.9.2.14+
Updated•15 years ago
|
Keywords: checkin-needed
![]() |
||
Comment 22•15 years ago
|
||
status1.9.2:
--- → .14-fixed
Keywords: checkin-needed
Comment 23•15 years ago
|
||
From the context, there seems to be no real way for me, as QA, to verify this fix. I don't have the correct setup to reproduce the problem. Daniel, can you try using a nightly 1.9.2 firefox build from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-1.9.2/ and seeing if the build fixes the issue?
Updated•15 years ago
|
Whiteboard: [qa-examined-192]
Assignee | ||
Comment 24•15 years ago
|
||
Al: I'm presuming the ideal is to reproduce a crash with 3.6.13, and observe the lack of one with 3.6.14pre. However, whichever of the two I install, the auto-update installs the new 3.6.14pre nightly---which may be affecting the behavior of the bug. How should I go about testing this?
Comment 25•15 years ago
|
||
The autoupdate shouldn't be triggered for many hours. If you download a clean 3.6.13 released build from the website and run it, it shouldn't autoupdate very soon. You can then download the current 3.6.14pre build and try it there.
The other option is to use the profilemanager to set up a different profile for this but you shouldn't need to do so.
Comment 26•15 years ago
|
||
(In reply to comment #24)
> Al: I'm presuming the ideal is to reproduce a crash with 3.6.13, and observe
> the lack of one with 3.6.14pre. However, whichever of the two I install, the
> auto-update installs the new 3.6.14pre nightly---which may be affecting the
> behavior of the bug. How should I go about testing this?
You can disable auto-update in about:config. set 'app.update.auto' to false
Assignee | ||
Comment 27•14 years ago
|
||
Okay, I've been testing Namoroka 3.6.13 versus 3.6.14pre (post-dating this fix), and I'm afraid to say the results are less than conclusive.
* I get crashes with 3.6.13, although not as frequently as with Ubuntu's 3.6.13;
* I get crashes with 3.6.14pre, also not as frequently, though these are due to SIGBUS (seemingly stemming from Flash), which is a different failure mode from this bug;
* Ubuntu 3.6.13 plus my patch is the only build that has proven uncrashable so far;
* Because the Namoroka builds are optimized, there's not much I can do in the way of post-mortem to see what's going on.
All of these are with the same ~/.mozilla directory, and thus the same runtime configuration.
Wish I could report a before-vs.-after smoking gun, but the reality is more of a muddle.
Comment 28•14 years ago
|
||
(In reply to comment #27)
> * Because the Namoroka builds are optimized, there's not much I can do in the
> way of post-mortem to see what's going on.
We have symbols available, but I'm not really sure how to get them and use them on linux...
Comment 29•14 years ago
|
||
You can fetch them using the script as described here:
https://developer.mozilla.org/en/Using_the_Mozilla_symbol_server#Downloading_symbols_on_Linux_.2f_Mac_OS_X
Assignee | ||
Comment 30•14 years ago
|
||
It's not just symbol names (which were easily obtainable for the Ubuntu build), but also the stack trace. In debugging the original bug, I got to nsUrlClassifierDBServer only with a debug/unoptimized build; the optimized build + debug symbols yielded nothing.
Updated•14 years ago
|
Crash Signature: [@ nsUrlClassifierDBServiceWorker::ApplyUpdate]
[@ nsUrlClassifierDBServiceWorker::FinishUpdate()]
Updated•11 years ago
|
Product: Firefox → Toolkit
You need to log in
before you can comment on or make changes to this bug.
Description
•