581628 - Segfaults in nsUrlClassifierDBService.cpp when homedir is inaccessible [@ nsUrlClassifierDBServiceWorker::ApplyUpdate][@ nsUrlClassifierDBServiceWorker::FinishUpdate()]

Assignee

Description

•

15 years ago

User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.7) Gecko/20100716 Ubuntu/10.04 (lucid) Firefox/3.6.7 Build Identifier: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.7) Gecko/20100716 Ubuntu/10.04 (lucid) Firefox/3.6.7 This issue was originally reported in Ubuntu's Launchpad tracker. I am filing it here since it reaches fairly deep into the code, and is likely not an artifact of Ubuntu's modifications. https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/585061 I use Firefox on a system in which home directories are served from an AFS file server. When a user's Kerberos authentication expires, access to the entire home directory is cut off, and all attempts to read or write to it fail with EACCES. This occurs regularly on my employee workstation: I stay logged in for weeks at a time, with a more-or-less permanent instance of Firefox running, and every night after I leave work my authentication expires and Firefox sits for a few hours without any access to its configuration files under ~/.mozilla/. When I return in the morning, I renew my authentication, and my home directory becomes accessible again, but by this point Firefox has usually vanished without a trace. I debugged this problem using a local build of (Ubuntu-patched) Firefox, and found where the segfaults were occurring: nsUrlClassifierDBService.cpp. The code assumes in a couple of places that mConnection is non-NULL, but every time the browser crashed on me, that assumption had been broken. I am attaching a preliminary patch that, for the past few months, has pretty conclusively eliminated the crashes for me. Firefox now always makes it till the morning when I get in, sometimes with a couple "A script has become unresponsive..." dialogs, but otherwise ready to go once I renew my authentication. This patch isn't meant to be committed as-is, but it should point out the problems with the current code. Reproducible: Sometimes Steps to Reproduce: 1. Make heavy use of Firefox (lots of open tabs, lots of background JS running, etc.) 2. Cut off access to $HOME for a few hours, cold-turkey 3. Wait for segfault

Daniel Richard G.

Assignee

Comment 1

•

15 years ago

Attached patch Preliminary patch — Details — Splinter Review

Boris Zbarsky [:bzbarsky]

Comment 2

•

15 years ago

Over to necko for lack of a better option... Benjamin, dcamp, do you know whether anyone maintains URLClassifier at this point?

Status: UNCONFIRMED → NEW

Component: General → Networking

Ever confirmed: true

Keywords: crash

QA Contact: general → networking

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Component: Networking → Phishing Protection

Product: Core → Firefox

QA Contact: networking → phishing.protection

Boris Zbarsky [:bzbarsky]

Comment 3

•

15 years ago

Jesse says maybe ddahl is planning to look at this code? This could be a nice limited place to start. ;)

Launchpad

Updated

•

15 years ago

See Also: → https://launchpad.net/bugs/585061

Benjamin Smedberg

Comment 4

•

15 years ago

dwitte and sdwilsh have both proposed rewriting it, although they are both busy folk. I think the original component was correct, though.

Shawn Wilsher :sdwilsh

Comment 5

•

15 years ago

I can probably review this, but it's going to be at least a week...

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Attachment #459993 - Flags: review?(sdwilsh)

timeless

Updated

•

15 years ago

Assignee: nobody → skunk

timeless

Comment 6

•

15 years ago

daniel: for the future, please try to use hg diff with --git -p, see https://developer.mozilla.org/en/Mercurial_FAQ and don't add // foo after }s

Summary: Segfaults in nsUrlClassifierDBService.cpp when homedir is inaccessible → Segfaults in nsUrlClassifierDBService.cpp when homedir is inaccessible [@ nsUrlClassifierDBServiceWorker::ApplyUpdate][@ nsUrlClassifierDBServiceWorker::FinishUpdate()]

Daniel Richard G.

Assignee

Comment 7

•

15 years ago

Timeless, as I said, this patch is not intended to be committed as-is. It only serves to indicate the specific locations where the segfaults are occurring (in addition to providing a simple workaround that proves the point). Why did you assign this bug to me? I am not a Mozilla developer.

Boris Zbarsky [:bzbarsky]

Comment 8

•

15 years ago

I think timeless was under the impression that the patch was something to aim to check in (in which case the patch author would normally be the assignee; anyone who posts a patch is sorta a "Mozilla developer" if you think about it... ;) ). Assigning back to default.

Assignee: skunk → nobody

David Dahl :ddahl

Comment 9

•

15 years ago

(In reply to comment #5) > I can probably review this, but it's going to be at least a week... sdwilsh: If you can review that would be awesome, we are in heavy blockerland with DevTools features right now.

Shawn Wilsher :sdwilsh

Comment 10

•

15 years ago

does this mean I don't need to review this?

Boris Zbarsky [:bzbarsky]

Comment 11

•

15 years ago

Well, we need someone to fix the bug. If the attached patch is acceptable, then we should check it in. If not, why not? We should at least get _that_ info in the bug so someone can pick it up as needed.

Daniel Richard G.

Assignee

Comment 12

•

15 years ago

There may be a deeper problem going on ("why is mConnection NULL?", "why are these methods being called when mConnection is NULL?"), but for now, just sidestepping the NULL-dereferences would be great. I would just eyeball the patch, and make appropriate changes manually. It's pretty trivial.

Shawn Wilsher :sdwilsh

Comment 13

•

15 years ago

(In reply to comment #12) > There may be a deeper problem going on ("why is mConnection NULL?", "why are > these methods being called when mConnection is NULL?"), but for now, just > sidestepping the NULL-dereferences would be great. For what it's worth, I think we need to figure out those questions first. Blindly wallpapering over the issue is not the way to go. Clearly some invariants held by the code are being broken here, and it could mean that other things are broken as a result.

Boris Zbarsky [:bzbarsky]

Comment 14

•

15 years ago

> why is mConnection NULL? Probably because nsUrlClassifierDBServiceWorker::OpenDb failed? > why are these methods being called when mConnection is NULL? This is a good question. BeginUpdate propagates the error out, but is someone eating it later on or something? nsUrlClassifierStreamUpdater::DownloadUpdates seems to check the return value...

Boris Zbarsky [:bzbarsky]

Comment 15

•

15 years ago

On the other hand, nsUrlClassifierDBService::BeginUpdate uses an async proxy. Wouldn't that lose the error?

Shawn Wilsher :sdwilsh

Comment 16

•

15 years ago

Ugh, yes.

Daniel Richard G.

Assignee

Comment 17

•

15 years ago

I don't think anyone is against tracking down the "real" problem in this bug. But given that everyone here has plenty on their plates, and that debugging this issue appears non-trivial, putting in a workaround till a fix is found would ease the practical impact of this bug. (You can have it print a GLib warning or somesuch to the terminal so that devs don't forget it's there. Not like it'll come up terribly often, anyway.) For my part, my patched Firefox has worked very well---and my use case is pretty punishing. I still get some long-running-instance wonkiness at times, and maybe my changes don't help with that. But compared to a straight-up SIGSEGV, I'll take the wonking any day.

Shawn Wilsher :sdwilsh

Comment 18

•

15 years ago

Comment on attachment 459993 [details] [diff] [review] Preliminary patch We should just take this as a stopgap. Not sure when someone will have cycles to look into how we could ever get into this state.

Attachment #459993 - Flags: review?(sdwilsh) → review+

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Assignee: nobody → skunk

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Attachment #459993 - Flags: approval2.0+

Boris Zbarsky [:bzbarsky]

Comment 19

•

15 years ago

Pushed http://hg.mozilla.org/mozilla-central/rev/ee499eb8ad11

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Daniel Richard G.

Assignee

Comment 20

•

15 years ago

Can the patch be pushed to 3.6.x? (Just noticed it's not in 3.6.13.)

Shawn Wilsher :sdwilsh

Updated

•

15 years ago

Attachment #459993 - Flags: approval1.9.2.14?

Daniel Veditz [:dveditz]

Comment 21

•

15 years ago

Comment on attachment 459993 [details] [diff] [review] Preliminary patch Approved for 1.9.2.14, a=dveditz for release-drivers

Attachment #459993 - Flags: approval1.9.2.14? → approval1.9.2.14+

Shawn Wilsher :sdwilsh

Updated

•

15 years ago

Keywords: checkin-needed

Boris Zbarsky [:bzbarsky]

Comment 22

•

15 years ago

Pushed http://hg.mozilla.org/releases/mozilla-1.9.2/rev/17e93767af48

status1.9.2: --- → .14-fixed

Keywords: checkin-needed

Al Billings [:abillings - ex-MoCo]

Comment 23

•

15 years ago

From the context, there seems to be no real way for me, as QA, to verify this fix. I don't have the correct setup to reproduce the problem. Daniel, can you try using a nightly 1.9.2 firefox build from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-1.9.2/ and seeing if the build fixes the issue?

Al Billings [:abillings - ex-MoCo]

Updated

•

15 years ago

Whiteboard: [qa-examined-192]

Daniel Richard G.

Assignee

Comment 24

•

15 years ago

Al: I'm presuming the ideal is to reproduce a crash with 3.6.13, and observe the lack of one with 3.6.14pre. However, whichever of the two I install, the auto-update installs the new 3.6.14pre nightly---which may be affecting the behavior of the bug. How should I go about testing this?

Al Billings [:abillings - ex-MoCo]

Comment 25

•

15 years ago

The autoupdate shouldn't be triggered for many hours. If you download a clean 3.6.13 released build from the website and run it, it shouldn't autoupdate very soon. You can then download the current 3.6.14pre build and try it there. The other option is to use the profilemanager to set up a different profile for this but you shouldn't need to do so.

David Dahl :ddahl

Comment 26

•

15 years ago

(In reply to comment #24) > Al: I'm presuming the ideal is to reproduce a crash with 3.6.13, and observe > the lack of one with 3.6.14pre. However, whichever of the two I install, the > auto-update installs the new 3.6.14pre nightly---which may be affecting the > behavior of the bug. How should I go about testing this? You can disable auto-update in about:config. set 'app.update.auto' to false

Daniel Richard G.

Assignee

Comment 27

•

14 years ago

Okay, I've been testing Namoroka 3.6.13 versus 3.6.14pre (post-dating this fix), and I'm afraid to say the results are less than conclusive. * I get crashes with 3.6.13, although not as frequently as with Ubuntu's 3.6.13; * I get crashes with 3.6.14pre, also not as frequently, though these are due to SIGBUS (seemingly stemming from Flash), which is a different failure mode from this bug; * Ubuntu 3.6.13 plus my patch is the only build that has proven uncrashable so far; * Because the Namoroka builds are optimized, there's not much I can do in the way of post-mortem to see what's going on. All of these are with the same ~/.mozilla directory, and thus the same runtime configuration. Wish I could report a before-vs.-after smoking gun, but the reality is more of a muddle.

Shawn Wilsher :sdwilsh

Comment 28

•

14 years ago

(In reply to comment #27) > * Because the Namoroka builds are optimized, there's not much I can do in the > way of post-mortem to see what's going on. We have symbols available, but I'm not really sure how to get them and use them on linux...

(not currently active) Ted Mielczarek

Comment 29

•

14 years ago

You can fetch them using the script as described here: https://developer.mozilla.org/en/Using_the_Mozilla_symbol_server#Downloading_symbols_on_Linux_.2f_Mac_OS_X

Daniel Richard G.

Assignee

Comment 30

•

14 years ago

It's not just symbol names (which were easily obtainable for the Ubuntu build), but also the stack trace. In debugging the original bug, I got to nsUrlClassifierDBServer only with a debug/unoptimized build; the optimized build + debug symbols yielded nothing.

Nobody; OK to take it and work on it

Updated

•

14 years ago

Crash Signature: [@ nsUrlClassifierDBServiceWorker::ApplyUpdate] [@ nsUrlClassifierDBServiceWorker::FinishUpdate()]

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: Firefox → Toolkit