563538 - Browser freezes due a lock contention across Places statements

Reporter

Description

•

14 years ago

Attached file Hang sample — Details

+++ This bug was initially created as a clone of Bug #559217 +++

Encountered another ~30sec hang, and grabbed a clean sample. This is with a current trunk nightly (Gecko/20100503). Happened shortly after restart with a few tabs restored, was entering javascript:prompt("foo") into the URL bar.

Shawn Wilsher :sdwilsh

Comment 1

•

14 years ago

We can't do anything with this one short of making it async :(

Marco Bonardo [:mak]

Comment 2

•

14 years ago

also making it async could not be so easy, this is part of browser.js::getShortcutOrURI that has a pretty sync scope.

Dave Townsend [:mossop]

Comment 3

•

14 years ago

What kind of frequency is this happening for people?

Marco Bonardo [:mak]

Comment 4

•

14 years ago

I'd think that is really hard to tell since it can happen randomly. I've never seen it personally.

Shawn Wilsher :sdwilsh

Comment 5

•

14 years ago

(In reply to comment #3)
> What kind of frequency is this happening for people?
I have only ever heard complaints about it from dolske fwiw.  Other bad hangs we've had I heard about it from *lots* of people, and that was really before I was known as the guy to talk to about this kind of stuff.

Steve England [:stevee]

Comment 6

•

14 years ago

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.5pre) Gecko/20100515 Namoroka/3.6.5pre ID:20100515042315

I don't know if this is the bug I'm seeing on the 1.9.2 branch, but I always get two browser lockups after I start firefox where the HD churns away for a minute or two. It might be something to do with my very large history, or not.

I don't see the lockups on the trunk with the same places.sqlite fwiw. But that doesn't narrow it down much.

Markus Stange [:mstange]

Comment 7

•

14 years ago

I've hung in this stack while compiling Firefox in the background. I didn't profile often, though, so I'm not sure if specifically this stack is very common. But multi-second freezes in general happen very often to me during compilation.

Dave Townsend [:mossop]

Comment 8

•

14 years ago

Doesn't sound common enough to block on right now

blocking2.0: ? → -

Justin Dolske [:Dolske]

Reporter

Comment 9

•

14 years ago

I've noticed that this _seems_ to happen regularly, when I've been running a build for a day or two, restart to update to the latest lightly, and within a minute or so of restarting it will hang.

I caught it in a debug build this time, the main thread hung in nsNavHistory::RegisterOpenPage() while a background thread is doing... something. I didn't have any luck extracting a useful statement in gdb, but by breaking on the CompletionNotifier's Run() + some lucky stepping, I ended up in nsPlacesExpiration.js#629.

So it looks like I git hit with a long-running expiration right after startup, which in turn is blocking the main thread.

Shawn Wilsher :sdwilsh

Comment 10

•

14 years ago

(In reply to comment #9)
> I caught it in a debug build this time, the main thread hung in
> nsNavHistory::RegisterOpenPage() while a background thread is doing...
> something. I didn't have any luck extracting a useful statement in gdb, but by
> breaking on the CompletionNotifier's Run() + some lucky stepping, I ended up in
> nsPlacesExpiration.js#629.
Marco!

Marco Bonardo [:mak]

Comment 11

•

14 years ago

The first expiration happens 3 minutes after startup, is this compatible with your timings? We could split the expiration statements in 2 parts to break the running thread eventually.

Marco Bonardo [:mak]

Comment 12

•

14 years ago

actually, for the getShortcutORURI thing, I don't expect us to have an incredible amount of keywords, we could have what, 20, 30 at a maximum (actually I think I have about 5). At first request we could cache all keyword/uri associations in a string hash, and serve them from it. At least the check "is this a keyword" would be faster after the first request and immediatly fallback to the default.

Regarding expiration I have a patch in Bug 562644 that removes the only sync part of the service, and avoid doing double queries... should improve expiration performance quite a bit.

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 13

•

14 years ago

(In reply to comment #12)
> At first request we could cache all keyword/uri associations in a string hash,
> and serve them from it.

I like this idea!

Marco Bonardo [:mak]

Updated

•

14 years ago

Depends on: 572030

Marco Bonardo [:mak]

Comment 14

•

14 years ago

The keywords hash idea is partially implemented in bug 572030, it still needs some work right now.

Depends on: 562644

Justin Dolske [:Dolske]

Reporter

Comment 15

•

14 years ago

Brendan reports seeing a hang-after-startup that may very well be this too (CC'd). I'm also still seeing this regularly, though I've become numb to just starting the browser and doing something else for a few minutes (my hangs seem to be in the 1-3 minute ballpark, now). Renomming.

(In reply to comment #11)
> The first expiration happens 3 minutes after startup, is this compatible with
> your timings?

Err, why do we do this? Seems like that should at least be on an idle timer (though that would just make the hang happen at random less-noticeable times).

blocking2.0: - → ?

Dave Townsend [:mossop]

Comment 16

•

14 years ago

This is perhaps what I'm seeing too. Though last time I sampled I saw the background thread in autocomplete stuff and then the main thread getting an annotation (favicons maybe?). It's really annoying.

Brendan Eich [:brendan]

Comment 17

•

14 years ago

Why isn't this assigned yet? If it's what is biting me, it happens often but unpredictably, and it's a blocker in my book -- I restart every time there's a new m-c nightly.

/be

Robert Sayre

Updated

•

14 years ago

blocking2.0: ? → betaN+

Robert Sayre

Updated

•

14 years ago

Assignee: nobody → mak77

Robert Sayre

Comment 18

•

14 years ago

This bug is a good example of what Lilly calls "table stakes". It doesn't matter how many nifty WebGL or IndexDB features we have if users encounter 30 second hangs and tell their friends.

This blocks Firefox 4 and needs to be fixed ASAP. If I have the wrong assignee, go ahead change it to another person.

Marco Bonardo [:mak]

Comment 19

•

14 years ago

(In reply to comment #15)
> Err, why do we do this? Seems like that should at least be on an idle timer
> (though that would just make the hang happen at random less-noticeable times).

Expiring on idle would be insufficient for anyone, we do also that when needed but it would also be completely useless for a bunch of users that don't hit idle. Expiration is not more expensive than before, it is less and we don't have evidence that is caused by expiration (that is also running in the async thread and doing really small steps). I just guessed because it runs on a timer, but I also saw some hang and it was not running in that moment.

Favicons are not annotations and are on the async thread as well, even if a small part is still sync and should be fixed (we improved it but it's not yet ready). A bunch of other stuff could ask for annotations, it's too generic.

There is nothing more than a fact that someone is seeing hangs randomly, we have planned a bunch of improvements but there is no evidence what could improve this to prioritize those. I just have the keywords bug that I can finish, but there is no evidence if it will help or not.

There is not even evidence if this bug is exactly what people is seeing.
If you think someone other has time and knowledge to reach a solution faster feel free to change assignee, I won't mind.

Robert Sayre

Comment 20

•

14 years ago

(In reply to comment #19)
> 
> There is not even evidence if this bug is exactly what people is seeing.
> If you think someone other has time and knowledge to reach a solution faster
> feel free to change assignee, I won't mind.

Your team's job is to find out what this bug is, and make sure it is fixed.

Dietrich Ayala (:dietrich)

Comment 21

•

14 years ago

(In reply to comment #20)
> (In reply to comment #19)
> > 
> > There is not even evidence if this bug is exactly what people is seeing.
> > If you think someone other has time and knowledge to reach a solution faster
> > feel free to change assignee, I won't mind.
> 
> Your team's job is to find out what this bug is, and make sure it is fixed.

He didn't imply that it wasn't. Only that if anyone else had a better solution, they're welcome to hop in :)

As Marco noted, there are a couple of in-progress bugs that might affect this one. We'll try to get those in as soon as possible. Since Dolske can reproduce regularly, we'll work with him to test fixes.

Status: NEW → ASSIGNED

Shawn Wilsher :sdwilsh

Comment 22

•

14 years ago

I also have some ideas on how we can improve our concurrency for read-only tasks in places based on stuff we learned from our IndexedDB work which will reduce lock contention as well.

Marco Bonardo [:mak]

Comment 23

•

14 years ago

I can most likely reduce even more expiration work, so that we run minimum number of needed statements. I can figure out a less API-changing approach for caching keywords and put down a list of statements we still run sync without user explicit request (so during a common navigation session).
Also killing temp tables should cut by 30% execution times of most of our statements, I should start working on that so it's ready sooner than later, lazy_add killing is already WIP, and new sqlite should be coming.

Please, anyone who can capture samples of the hangs should provide them to help figure out where we still fail in your case.

Dave Townsend [:mossop]

Comment 24

•

14 years ago

Attached file Hang sample — Details

Here is a sample from today's build shortly after I started it (a little under 5 minutes). Normally I get hung when typing in the address bar but in this case the hang was while typing in a bugzilla comment.

Shawn Wilsher :sdwilsh

Comment 25

•

14 years ago

(In reply to comment #24)
> Here is a sample from today's build shortly after I started it (a little under
> 5 minutes). Normally I get hung when typing in the address bar but in this case
> the hang was while typing in a bugzilla comment.
That is places code again, but not sure what is triggering it because it's in JS :(

Marco Bonardo [:mak]

Comment 26

•

14 years ago

I can guess, since he was not activating anything in our UI, livemarks updating could be triggering getItemAnnotation. Btw, there will always be some kind of locking because our threads are part sync, part async, we can just try to make them rare reducing the time spent in statements, unless we want to spend 6 months just rewriting all the module, that we are doing in chunks.

Dave Townsend [:mossop]

Comment 27

•

14 years ago

(In reply to comment #26)
> I can guess, since he was not activating anything in our UI, livemarks updating
> could be triggering getItemAnnotation. Btw, there will always be some kind of
> locking because our threads are part sync, part async, we can just try to make
> them rare reducing the time spent in statements, unless we want to spend 6
> months just rewriting all the module, that we are doing in chunks.

I don't believe that I have any livemarks.

Brendan Eich [:brendan]

Comment 28

•

14 years ago

Is this a deadlock? I.e., is no progress made and no CPU time accrued once the threads enter this state?

/be

Shawn Wilsher :sdwilsh

Comment 29

•

14 years ago

(In reply to comment #28)
> Is this a deadlock? I.e., is no progress made and no CPU time accrued once the
> threads enter this state?
No, it's just one thread holding the lock for longer than expected.  There's more lock contention that we'd like to have here, and we need to get rid of it.

Dave Townsend [:mossop]

Comment 30

•

14 years ago

(In reply to comment #29)
> (In reply to comment #28)
> > Is this a deadlock? I.e., is no progress made and no CPU time accrued once the
> > threads enter this state?
> No, it's just one thread holding the lock for longer than expected.  There's
> more lock contention that we'd like to have here, and we need to get rid of it.

Is there a quick way we can add logging to a build to indicate which statements end up waiting on a lock? I guess that doesn't tell us what the evil long running background thread is that is causing the problem but might help us avoid it.

Shawn Wilsher :sdwilsh

Comment 31

•

14 years ago

Nothing comes to mind immediately.  I'll think about it some more and maybe I'll come up with something on the plane tomorrow.

Brendan Eich [:brendan]

Comment 32

•

14 years ago

A mutex should never be held across any I/O operation (or of course an unbounded or even just overlong CPU operation).

A condvar can be the right tool if the contending thread is able to wait, but of course the main thread has to service its event queue, so effectively try-locking may be the only recourse.

Is this sqlite code or our code, or a mix?

/be

Shawn Wilsher :sdwilsh

Comment 33

•

14 years ago

(In reply to comment #32)
> Is this sqlite code or our code, or a mix?
This is SQLite code and it's how they ensure that access to the database is serialized.

This is exactly why I advocate everyone to use the asynchronous storage API because it's lock free on the calling thread (usually the main thread).

Robert Sayre

Comment 34

•

14 years ago

(In reply to comment #33)
> 
> This is exactly why I advocate everyone to use the asynchronous storage API

Yes, but comment 26 says

>  unless we want to spend 6 months just rewriting all the module, that we are doing in chunks.

6 months seems like a Really Long Time, but I don't have a lot of data here. Shipping with hangs like this doesn't fly, so where does that leave us?

Shawn Wilsher :sdwilsh

Comment 35

•

14 years ago

(In reply to comment #34)
> 6 months seems like a Really Long Time, but I don't have a lot of data here.
> Shipping with hangs like this doesn't fly, so where does that leave us?
I'm not sure six months is an accurate estimate, but I'm not sure we've got one either.  We've been converting places over to the asynchronous API in chunks each release.

We should see if this is still an issue once mak gets his patch up to nuke nsPlacesDBFlush for two reasons:
1) This would get rid of a lot of the I/O (or at least the large amounts of I/O) we do on the background thread (every two minutes + whenever we add a bookmark).
2) All of our queries get a lot simpler so they should run faster.

I don't have a bug number handy, but I bet mak does.

Marco Bonardo [:mak]

Comment 36

•

14 years ago

bug 552023 is temp tables removal (bug 499828, that i'm working on, should come first to simplify code removing redundant stuff), bug 556631 is making frecency async, there is other stuff we can/should convert but will need a list of queries that are running sync during user's browsing, Storage PRLog should be able to generate the list easily. I have an expiration patch on disk that should further reduce timed work, will soon file a bug for it.

Shawn Wilsher :sdwilsh

Comment 37

•

14 years ago

Another interesting question for folk seeing this:
1) Do you have Weave (Firefox Sync) installed?
2) If yes to (1), do you have more than one computer using it?

(trying to figure out the increase in places activity)

Marco Bonardo [:mak]

Comment 38

•

14 years ago

Shawn, I still have concerns about the fact Storage uses BEGIN IMMEDIATE TRANSACTION on async statements execution, is there a reason to differ from DEFERRED that is used in the sync state?
IMMEDIATE means a RESERVED lock is setup immediately on the DB regardless when the database will be accessed or written to. DEFERRED does instead set a SHARED lock till first write happens then lock changes to RESERVED. Using DEFERRED would give us a larger safe time for reads, unless there is a specific reason IMMEDIATE is being used.

Marco Bonardo [:mak]

Comment 39

•

14 years ago

code reference for comment 38:
http://mxr.mozilla.org/mozilla-central/source/storage/src/mozStorageAsyncStatementExecution.cpp#564

Shawn Wilsher :sdwilsh

Comment 40

•

14 years ago

As I recall, it was because of this from the SQLite docs:
Because the acquisition of locks is deferred until they are needed, it is possible that another thread or process could create a separate transaction and write to the database after the BEGIN on the current thread has executed. 

But given that we don't have any way to determine if a statement is read-only or write-only, we may end up creating an immediate transaction for read statements, which would be undesirable.  We should probably switch that to a deferred transaction until we can do that.  Can you file a bug for that, and mark it blocking this?

Marco Bonardo [:mak]

Updated

•

14 years ago

Depends on: 580240

Marco Bonardo [:mak]

Comment 41

•

14 years ago

Filed bug 580240. Notice this involves only multiple statements executed at once through dbconn.executeAsync, not single async statements, we can probably evaluate splitting batches if they are locking for too long times.
For example this causes autocomplete to lock the database for read-only statements.
Also expiration and DBflush are using this method, I'll tweak the former as far as I can (will evaluate chunking statements if needed), and the second is going to be killed along with temp tables.

Marco Bonardo [:mak]

Updated

•

14 years ago

Depends on: 552023, 556631

Shawn Wilsher :sdwilsh

Comment 42

•

14 years ago

(In reply to comment #41)
> For example this causes autocomplete to lock the database for read-only
> statements.
We can create another db connection for things like this which are read-only so there is only file lock contention (and we can use multiple readers) as noted in comment 22.

> Also expiration and DBflush are using this method, I'll tweak the former as far
> as I can (will evaluate chunking statements if needed), and the second is going
> to be killed along with temp tables.
Well, at least with DBFlush we did this to reduce the number of fsyncs which also hurt us on the main thread due to I/O we do on it still.  Expiration is likely in the same boat.

Dave Townsend [:mossop]

Comment 43

•

14 years ago

(In reply to comment #37)
> Another interesting question for folk seeing this:
> 1) Do you have Weave (Firefox Sync) installed?
> 2) If yes to (1), do you have more than one computer using it?

I see this without Weave installed, though Weave had been installed in the profile previously.

Brendan Eich [:brendan]

Comment 44

•

14 years ago

No Weave here.

Do sqlite maintainers know about API-exposed mutexes being held across I/O being implicated here? That is a flawed design, if I'm understanding things here. It is unnecessary and it can't scale, never mind blowing back as in this bug (again if this diagnosis is correct).

Mutexes may be implemented with competitive spinning on SMP systems. They are not ever for waiting indefinitely or even "too long". Any unbounded computation or input/output operation inside a critical section wants a condvar, and if a thread can't wait it will have to poll.

/be

Justin Dolske [:Dolske]

Reporter

Comment 45

•

14 years ago

(In reply to comment #37)
> Another interesting question for folk seeing this:
> 1) Do you have Weave (Firefox Sync) installed?

I do now, but it was not installed when I had previously reported and commented on this bug.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 46

•

14 years ago

(In reply to comment #44)
> Do sqlite maintainers know about API-exposed mutexes being held across I/O
> being implicated here? That is a flawed design, if I'm understanding things
> here. It is unnecessary and it can't scale, never mind blowing back as in this
> bug (again if this diagnosis is correct).

Yes, at least since January when we discussed this exact issue in bug 524780.

Marco Bonardo [:mak]

Updated

•

14 years ago

Blocks: 560198

Marco Bonardo [:mak]

Updated

•

14 years ago

Depends on: 580228

Dave Townsend [:mossop]

Comment 48

•

14 years ago

Attached file Another hang — Details

Don't know if you want more of these, in this case I wasn't doing anything in Firefox, just switched back to it to find it hung and it remained so for over a minute.

Dave Townsend [:mossop]

Comment 49

•

14 years ago

(In reply to comment #48)
> Created attachment 459997 [details]
> Another hang
> 
> Don't know if you want more of these, in this case I wasn't doing anything in
> Firefox, just switched back to it to find it hung and it remained so for over a
> minute.

Actually just noticed that the background thread in this case is the url classifier service, why would the annotation service be blocked on that?

Shawn Wilsher :sdwilsh

Comment 50

•

14 years ago

(In reply to comment #49)
> Actually just noticed that the background thread in this case is the url
> classifier service, why would the annotation service be blocked on that?
There's another thread further down that is the async execution thread which I suspect is places.

Marco Bonardo [:mak]

Comment 51

•

14 years ago

when you capture the sample, could you check name of the annotation with the debugger from itemHasAnnotation or getItemAnnotation? identifying the caller would be interesting.

Ed Lee :Mardak

Comment 52

•

14 years ago

Attached file annotation watcher component — Details

I've been using this component to track how long calls to annotation service takes, but they're all fast. And when the browser does hang, it's not annotation related for me. On startup I get..

1) 0ms getItemsWithAnnotation("livemark/feedURI") PlacesUtils.jsm:1092
2) 0ms addObserver([object]) nsMicrosummaryService.js:95
3) 0ms getItemsWithAnnotation("livemark/feedURI") nsLivemarkService.js:131
4) 0ms getItemsWithAnnotation("microsummary/generatorURI") nsMicrosummaryService:536

Any suggestions for other services to maybe track?

If you want to try out the component, drop the attached into your components directory and update browser.manifest with these 2 lines:

component {ee5a097b-6ee6-4c6a-8eca-6bce87d570e9} AnnoWatcher.js
contract @mozilla.org/browser/annotation-service;1 {ee5a097b-6ee6-4c6a-8eca-6bce87d570e9}

Boris Zbarsky [:bzbarsky]

Updated

•

14 years ago

Blocks: 581978

Drew Willcoxon :adw

Comment 53

•

14 years ago

Ed, there are many non-scriptable methods in the anno service, so it's possible your annotation watcher (which is cool) is missing some.  The stack in bug 581978 comment 10 enters the anno service through GetPageAnnotationString for instance.

Shawn Wilsher :sdwilsh

Updated

•

14 years ago

Depends on: 582703

Ed Lee :Mardak

Comment 54

•

14 years ago

Error: 32248ms
getPageAnnotation([object XPCWrappedNative_NoHelper],"weave/guid")@:0
anno([object XPCWrappedNative_NoHelper],"weave/guid")@resource://services-sync/util.js:217
GUIDForUri([object XPCWrappedNative_NoHelper],true)@resource://services-sync/engines/history.js:62
HT_onVisit([object XPCWrappedNative_NoHelper],649125,1280579641520209,3054,649117,1,[object Object])@resource://services-sync/engines/history.js:277

Source File: file:///Applications/Firefox%20x.x%20-%20Minefield.app/Contents/MacOS/components/AnnoWatcher.js
Line: 32

Triggered by weave when getting a page/history annotation "weave/guid".

Seems like getting item annotations are very fast though -- there were a ton of these just /before/ that 32second delay:

Error: 0ms
getItemAnnotation(1164,"PlacesOrganizer/OrganizerQuery")@:0
anno(1164,"PlacesOrganizer/OrganizerQuery")@resource://services-sync/util.js:217
(1164,4,[object Array])@resource://services-sync/engines/bookmarks.js:1101
("MobileBookmarks")@resource://services-sync/engines/bookmarks.js:1101
_ensureMobileQuery()@resource://services-sync/engines/bookmarks.js:1111
BMT__ignore(1098)@resource://services-sync/engines/bookmarks.js:1055
BMT_onItemChanged(1098,"favicon",false,"https://mail.google.com/mail/images/favicon.ico",0,1)@resource://services-sync/engines/bookmarks.js:1134

Source File: file:///Applications/Firefox%20x.x%20-%20Minefield.app/Contents/MacOS/components/AnnoWatcher.js
Line: 32

Shawn Wilsher :sdwilsh

Comment 55

•

14 years ago

(In reply to comment #54)
> Seems like getting item annotations are very fast though -- there were a ton of
> these just /before/ that 32second delay:
The common case may be fast, but the failure case makes us fall on our faces, skid about 100 feet, and then fall off a cliff without a parachute.  We need an async annotation API for sync at the very least stat, I think (maybe just an async guid API for now?).

Ed Lee :Mardak

Comment 57

•

14 years ago

I just restarted my minefield updating to 20100804 and after a bunch of fast annotation calls:

Error: 28952ms
getItemsWithAnnotation("microsummary/generatorURI")@:0
()@file:///Applications/Firefox%20x.x%20-%20Minefield.app/Contents/MacOS/components/nsMicrosummaryService.js:536
MSS__updateMicrosummaries()@file:///Applications/Firefox%20x.x%20-%20Minefield.app/Contents/MacOS/components/nsMicrosummaryService.js:182
MSS_notify([object XPCWrappedNative_NoHelper])@file:///Applications/Firefox%20x.x%20-%20Minefield.app/Contents/MacOS/components/nsMicrosummaryService.js:151

However, the very first thing in my log is also to a call to getItemsWithAnnotation and that returned instantly:

Error: 0ms
getItemsWithAnnotation("livemark/feedURI")@:0
PU_getMostRecentFolderForFeedURI([object XPCWrappedNative_NoHelper])@resource://gre/modules/PlacesUtils.jsm:1092
PSB_updateState()@chrome://browser/content/browser.js:2308
([object XPCWrappedNative_NoHelper],null,[object XPCWrappedNative_NoHelper])@chrome://browser/content/browser.js:7508
([object Object],0,[object Array])@chrome://browser/content/tabbrowser.xml:279
_callProgressListeners([object XULElement],"onLocationChange",[object Array],true,false)@chrome://browser/content/tabbrowser.xml:276
updateCurrentBrowser()@chrome://browser/content/tabbrowser.xml:759
onselect([object Event])@chrome://browser/content/browser.xul:1
@:0
set_selectedIndex(7)@chrome://global/content/bindings/tabbox.xml:645
set_selectedPanel([object XULElement])@chrome://global/content/bindings/tabbox.xml:664
set_selectedIndex(7)@chrome://global/content/bindings/tabbox.xml:394
set_selectedItem([object XULElement])@chrome://global/content/bindings/tabbox.xml:426
set_selectedTab([object XULElement])@chrome://global/content/bindings/tabbox.xml:106
set_selectedTab([object XULElement])@chrome://browser/content/tabbrowser.xml:1720
sss_restoreHistoryPrecursor([object ChromeWindow],[object Array],[object Array],7,0,0)@file:///Applications/Firefox%20x.x%20-%20Minefield.app/Contents/MacOS/components/nsSessionStore.js:2029
sss_restoreWindow([object ChromeWindow],[object Object],true)@file:///Applications/Firefox%20x.x%20-%20Minefield.app/Contents/MacOS/components/nsSessionStore.js:1923
sss_onLoad([object ChromeWindow])@file:///Applications/Firefox%20x.x%20-%20Minefield.app/Contents/MacOS/components/nsSessionStore.js:598
sss_init([object ChromeWindow])@file:///Applications/Firefox%20x.x%20-%20Minefield.app/Contents/MacOS/components/nsSessionStore.js:305
init([object ChromeWindow])@:0
delayedStartup(false,false,134)@chrome://browser/content/browser.js:4787
@:0

Marco Bonardo [:mak]

Comment 58

•

14 years ago

hm, both microsummaries and livemark services are caching items and not hitting the db more than once.
the livemarks stuff probably comes from getMostRecentFolderForFeedURI that is called to update the Star button status, it will query more than once till livemarks service is fully initialized.
getItemsWithAnnotation is accessing only moz_items_annos and moz_annos, so something must be locking moz_items_annos on the other thread, but I can't find any obvious culprit, so there must some global lock on all the database.
Edward how much time was elapsed from browser start? Did this happen almost immediately after startup?
Ideally WAL should help these cases since any reader can run while the db is being written using a snapshot of the data.

Ed Lee :Mardak

Comment 59

•

14 years ago

(In reply to comment #58)
> Edward how much time was elapsed from browser start? Did this happen almost
> immediately after startup?
After restarting, I was able to type in the location bar and see some results, but typing some more resulted in a beachball. So probably about 10 seconds after the browser loaded?

Marco Bonardo [:mak]

Updated

•

14 years ago

Depends on: 573492

Ed Lee :Mardak

Updated

•

14 years ago

Depends on: 584927

Marco Bonardo [:mak]

Updated

•

14 years ago

Depends on: 546253

Marco Bonardo [:mak]

Updated

•

14 years ago

Whiteboard: [needs dependencies]

[not reading bugmail]

Updated

•

14 years ago

Depends on: 591072

Marco Bonardo [:mak]

Updated

•

14 years ago

No longer depends on: 591072

Shawn Wilsher :sdwilsh

Comment 60

•

14 years ago

Hot to get the statement running on the async thread when you see the hang:
1) In gdb, get to the async thread, and get into the stack from for mozilla::storage::AsyncExecuteStatements::executeStatement.
2) call sqlite3_sql(aStatement)

This method returns a const char * of the SQL string, and we can then identify which statement is running in the background.

Shawn Wilsher :sdwilsh

Updated

•

14 years ago

No longer depends on: 582703

Marco Bonardo [:mak]

Updated

•

14 years ago

Keywords: helpwanted

Whiteboard: [needs dependencies] → [needs dependencies][see comment 60]

Marco Bonardo [:mak]

Comment 61

•

14 years ago

Unassigning since this is mostly meta and investigated by the whole team on the branch, Shawn will also post more details later on the hang.

Assignee: mak77 → nobody

Status: ASSIGNED → NEW

Shawn Wilsher :sdwilsh

Comment 62

•

14 years ago

That's actually over in bug 582703 comment 12.

(no longer active)

Comment 63

•

14 years ago

Why are we blocking on this meta bug?

Robert Sayre

Comment 64

•

14 years ago

This is not a meta bug. It does have dependent bugs.

Shawn Wilsher :sdwilsh

Comment 65

•

14 years ago

(In reply to comment #64)
> This is not a meta bug. It does have dependent bugs.
Although it should be fixed by those dependent bugs.  We'll have usable nightlies on the Places branch that people can evaluate once bug 616001 [needs review] and bug 617761 [needs owner] are finished up.

Tomas

Comment 66

•

14 years ago

So this meta bug is fixed?

Marco Bonardo [:mak]

Comment 67

•

14 years ago

Ideally it is since all reported causes have been fixed so far. If anyone of those that were able to reproduce the hangs could tell if they can still reproduce anything or things improved, we'd like to know.

Marco Bonardo [:mak]

Comment 68

•

14 years ago

I think Dolske reproduced a hang due to a missing call to finalize, but I can't find the filed bug.  Shawn or Dolke could you add it to dependencies please?

Shawn Wilsher :sdwilsh

Comment 69

•

14 years ago

(In reply to comment #68)
> I think Dolske reproduced a hang due to a missing call to finalize, but I can't
> find the filed bug.  Shawn or Dolke could you add it to dependencies please?
I asked dolske to file it, not sure if he did though.

Dietrich Ayala (:dietrich)

Comment 70

•

14 years ago

Closing this. Re-open if there are new and related hangs reported.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Hang sample 14 years ago Justin Dolske [:Dolske] 32.40 KB, text/plain		Details
Hang sample 14 years ago Dave Townsend [:mossop] 15.43 KB, text/plain		Details
Another hang 14 years ago Dave Townsend [:mossop] 25.56 KB, text/plain		Details
annotation watcher component 14 years ago Ed Lee :Mardak 1.37 KB, text/plain		Details