User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:22.214.171.124) Gecko/20100716 Ubuntu/10.04 (lucid) Firefox/3.6.7 Build Identifier: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:126.96.36.199) Gecko/20100716 Ubuntu/10.04 (lucid) Firefox/3.6.7 This issue was originally reported in Ubuntu's Launchpad tracker. I am filing it here since it reaches fairly deep into the code, and is likely not an artifact of Ubuntu's modifications. https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/585061 I use Firefox on a system in which home directories are served from an AFS file server. When a user's Kerberos authentication expires, access to the entire home directory is cut off, and all attempts to read or write to it fail with EACCES. This occurs regularly on my employee workstation: I stay logged in for weeks at a time, with a more-or-less permanent instance of Firefox running, and every night after I leave work my authentication expires and Firefox sits for a few hours without any access to its configuration files under ~/.mozilla/. When I return in the morning, I renew my authentication, and my home directory becomes accessible again, but by this point Firefox has usually vanished without a trace. I debugged this problem using a local build of (Ubuntu-patched) Firefox, and found where the segfaults were occurring: nsUrlClassifierDBService.cpp. The code assumes in a couple of places that mConnection is non-NULL, but every time the browser crashed on me, that assumption had been broken. I am attaching a preliminary patch that, for the past few months, has pretty conclusively eliminated the crashes for me. Firefox now always makes it till the morning when I get in, sometimes with a couple "A script has become unresponsive..." dialogs, but otherwise ready to go once I renew my authentication. This patch isn't meant to be committed as-is, but it should point out the problems with the current code. Reproducible: Sometimes Steps to Reproduce: 1. Make heavy use of Firefox (lots of open tabs, lots of background JS running, etc.) 2. Cut off access to $HOME for a few hours, cold-turkey 3. Wait for segfault
Over to necko for lack of a better option... Benjamin, dcamp, do you know whether anyone maintains URLClassifier at this point?
Status: UNCONFIRMED → NEW
Component: General → Networking
Ever confirmed: true
QA Contact: general → networking
Component: Networking → Phishing Protection
Product: Core → Firefox
QA Contact: networking → phishing.protection
Jesse says maybe ddahl is planning to look at this code? This could be a nice limited place to start. ;)
dwitte and sdwilsh have both proposed rewriting it, although they are both busy folk. I think the original component was correct, though.
I can probably review this, but it's going to be at least a week...
Attachment #459993 - Flags: review?(sdwilsh)
daniel: for the future, please try to use hg diff with --git -p, see https://developer.mozilla.org/en/Mercurial_FAQ and don't add // foo after }s
Summary: Segfaults in nsUrlClassifierDBService.cpp when homedir is inaccessible → Segfaults in nsUrlClassifierDBService.cpp when homedir is inaccessible [@ nsUrlClassifierDBServiceWorker::ApplyUpdate][@ nsUrlClassifierDBServiceWorker::FinishUpdate()]
Timeless, as I said, this patch is not intended to be committed as-is. It only serves to indicate the specific locations where the segfaults are occurring (in addition to providing a simple workaround that proves the point). Why did you assign this bug to me? I am not a Mozilla developer.
I think timeless was under the impression that the patch was something to aim to check in (in which case the patch author would normally be the assignee; anyone who posts a patch is sorta a "Mozilla developer" if you think about it... ;) ). Assigning back to default.
Assignee: skunk → nobody
(In reply to comment #5) > I can probably review this, but it's going to be at least a week... sdwilsh: If you can review that would be awesome, we are in heavy blockerland with DevTools features right now.
does this mean I don't need to review this?
Well, we need someone to fix the bug. If the attached patch is acceptable, then we should check it in. If not, why not? We should at least get _that_ info in the bug so someone can pick it up as needed.
There may be a deeper problem going on ("why is mConnection NULL?", "why are these methods being called when mConnection is NULL?"), but for now, just sidestepping the NULL-dereferences would be great. I would just eyeball the patch, and make appropriate changes manually. It's pretty trivial.
(In reply to comment #12) > There may be a deeper problem going on ("why is mConnection NULL?", "why are > these methods being called when mConnection is NULL?"), but for now, just > sidestepping the NULL-dereferences would be great. For what it's worth, I think we need to figure out those questions first. Blindly wallpapering over the issue is not the way to go. Clearly some invariants held by the code are being broken here, and it could mean that other things are broken as a result.
> why is mConnection NULL? Probably because nsUrlClassifierDBServiceWorker::OpenDb failed? > why are these methods being called when mConnection is NULL? This is a good question. BeginUpdate propagates the error out, but is someone eating it later on or something? nsUrlClassifierStreamUpdater::DownloadUpdates seems to check the return value...
On the other hand, nsUrlClassifierDBService::BeginUpdate uses an async proxy. Wouldn't that lose the error?
I don't think anyone is against tracking down the "real" problem in this bug. But given that everyone here has plenty on their plates, and that debugging this issue appears non-trivial, putting in a workaround till a fix is found would ease the practical impact of this bug. (You can have it print a GLib warning or somesuch to the terminal so that devs don't forget it's there. Not like it'll come up terribly often, anyway.) For my part, my patched Firefox has worked very well---and my use case is pretty punishing. I still get some long-running-instance wonkiness at times, and maybe my changes don't help with that. But compared to a straight-up SIGSEGV, I'll take the wonking any day.
Comment on attachment 459993 [details] [diff] [review] Preliminary patch We should just take this as a stopgap. Not sure when someone will have cycles to look into how we could ever get into this state.
Attachment #459993 - Flags: review?(sdwilsh) → review+
Assignee: nobody → skunk
Attachment #459993 - Flags: approval2.0+
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Can the patch be pushed to 3.6.x? (Just noticed it's not in 3.6.13.)
Comment on attachment 459993 [details] [diff] [review] Preliminary patch Approved for 188.8.131.52, a=dveditz for release-drivers
Attachment #459993 - Flags: approval184.108.40.206? → approval220.127.116.11+
status1.9.2: --- → .14-fixed
From the context, there seems to be no real way for me, as QA, to verify this fix. I don't have the correct setup to reproduce the problem. Daniel, can you try using a nightly 1.9.2 firefox build from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-1.9.2/ and seeing if the build fixes the issue?
Al: I'm presuming the ideal is to reproduce a crash with 3.6.13, and observe the lack of one with 3.6.14pre. However, whichever of the two I install, the auto-update installs the new 3.6.14pre nightly---which may be affecting the behavior of the bug. How should I go about testing this?
The autoupdate shouldn't be triggered for many hours. If you download a clean 3.6.13 released build from the website and run it, it shouldn't autoupdate very soon. You can then download the current 3.6.14pre build and try it there. The other option is to use the profilemanager to set up a different profile for this but you shouldn't need to do so.
(In reply to comment #24) > Al: I'm presuming the ideal is to reproduce a crash with 3.6.13, and observe > the lack of one with 3.6.14pre. However, whichever of the two I install, the > auto-update installs the new 3.6.14pre nightly---which may be affecting the > behavior of the bug. How should I go about testing this? You can disable auto-update in about:config. set 'app.update.auto' to false
Okay, I've been testing Namoroka 3.6.13 versus 3.6.14pre (post-dating this fix), and I'm afraid to say the results are less than conclusive. * I get crashes with 3.6.13, although not as frequently as with Ubuntu's 3.6.13; * I get crashes with 3.6.14pre, also not as frequently, though these are due to SIGBUS (seemingly stemming from Flash), which is a different failure mode from this bug; * Ubuntu 3.6.13 plus my patch is the only build that has proven uncrashable so far; * Because the Namoroka builds are optimized, there's not much I can do in the way of post-mortem to see what's going on. All of these are with the same ~/.mozilla directory, and thus the same runtime configuration. Wish I could report a before-vs.-after smoking gun, but the reality is more of a muddle.
(In reply to comment #27) > * Because the Namoroka builds are optimized, there's not much I can do in the > way of post-mortem to see what's going on. We have symbols available, but I'm not really sure how to get them and use them on linux...
You can fetch them using the script as described here: https://developer.mozilla.org/en/Using_the_Mozilla_symbol_server#Downloading_symbols_on_Linux_.2f_Mac_OS_X
It's not just symbol names (which were easily obtainable for the Ubuntu build), but also the stack trace. In debugging the original bug, I got to nsUrlClassifierDBServer only with a debug/unoptimized build; the optimized build + debug symbols yielded nothing.
Crash Signature: [@ nsUrlClassifierDBServiceWorker::ApplyUpdate] [@ nsUrlClassifierDBServiceWorker::FinishUpdate()]
Component: Phishing Protection → Phishing Protection
Product: Firefox → Toolkit
You need to log in before you can comment on or make changes to this bug.