<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Updated

•

14 years ago

Assignee: nobody → mbrubeck

Comment 1

•

14 years ago

I've done some simple logging to time some of the parent-process functions that run during page load. Tested on Galaxy Tab. The only significant time I've found so far is onLocationChange, which spends 26ms total including 12ms in BrowserUI.updateURI, 5ms in removeTransientNotifications (?!), and 6ms in scrollContentToTop. The UI was visibly unresponsive for much longer than 26ms, so I'll need to improve on this ad-hoc profiling.

Updated

•

14 years ago

tracking-fennec: ? → 2.0b4+

Comment 2

•

14 years ago

I tried commenting out the favicon and bookmark code that runs during page load, but I'm still seeing noticeable hangs. I'm profiling the front-end JavaScript code to look for long-running calls, but haven't found much. I'll try debugging/profiling this at a platform level next. We do sometimes spend ~100ms in BrowserUI.sizeControls, possibly because of a getBoundingClientRect call that could be avoided in most cases.

Comment 3

•

14 years ago

This hang is very noticable on the Galaxy Tab, where it sometimes lasts >2 seconds. It is hardly noticeable at all on the T-Mobile G2. I suspect that the difference is related to the much slower filesystem performance on the Tab.

Doug Turner (:dougt)

Assignee

Comment 4

•

14 years ago

we shouldn't be touching the filesystem during a load. :D

Comment 5

•

14 years ago

(In reply to comment #4) > we shouldn't be touching the filesystem during a load. :D I thought I'd strace the parent process while loading google.com Nokia-N900-42-11:~# strace -c -p 9690 Process 9690 attached - interrupt to quit ^CProcess 9690 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 72.69 0.027619 40 689 poll 10.60 0.004028 13 310 futex 9.88 0.003752 9 438 6 read 3.45 0.001312 6 220 write 2.81 0.001068 1 1757 gettimeofday 0.32 0.000122 1 99 fcntl64 0.24 0.000092 0 679 ioctl 0.00 0.000000 0 1 restart_syscall 0.00 0.000000 0 7 1 open 0.00 0.000000 0 1 close 0.00 0.000000 0 39 36 access 0.00 0.000000 0 4 munmap 0.00 0.000000 0 4 clone 0.00 0.000000 0 3 mprotect 0.00 0.000000 0 29 _llseek 0.00 0.000000 0 4 sched_setscheduler 0.00 0.000000 0 4 sched_get_priority_max 0.00 0.000000 0 4 sched_get_priority_min 0.00 0.000000 0 5 mmap2 0.00 0.000000 0 5 stat64 0.00 0.000000 0 5 lstat64 0.00 0.000000 0 54 fstat64 0.00 0.000000 0 280 clock_gettime 0.00 0.000000 0 6 shmat 0.00 0.000000 0 4 shmdt 0.00 0.000000 0 6 shmctl ------ ----------- ----------- --------- --------- ---------------- 100.00 0.037993 4657 43 total

Comment 6

•

14 years ago

strace on N900 of just the load of google.com in it. No startup or anything else.

Doug Turner (:dougt)

Assignee

Comment 7

•

14 years ago

nothing surprising.

Comment 8

•

14 years ago

I can't reproduce this bug on Galaxy Tab using the latest nightly from the Places branch.

Doug Turner (:dougt)

Assignee

Comment 9

•

14 years ago

matt, what about the m-c branch?

Comment 10

•

14 years ago

(In reply to comment #9) > matt, what about the m-c branch? Good question... Actually, it turns out I was accidentally testing this on m-c instead of places. (I had both branded and unbranded builds installed simultaneously, from testing a different bug, and got confused.) So it looks like this works for me on m-c. Is anyone else still able to reproduce this hang?

Status: NEW → ASSIGNED

Comment 11

•

14 years ago

I am still seeing it

Comment 12

•

14 years ago

Unassigning since I can no longer reproduce this. I stopped seeing this around the time that I had to uninstall/reinstall to fix some issues from an patch in bug 623999. So it may have been related to the size or contents of my old profile.

Assignee: mbrubeck → nobody

Keywords: perf

Updated

•

14 years ago

Status: ASSIGNED → NEW

Updated

•

14 years ago

tracking-fennec: 2.0b4+ → 2.0next+

Updated

•

14 years ago

tracking-fennec: 2.0next+ → 2.0b4+

[:fabrice] Fabrice Desré

Comment 13

•

14 years ago

running strace -c -p on my G2 when loading google.com : % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 99.52 23.103940 55538 416 1 recv 0.25 0.057499 710 81 ioctl 0.10 0.023799 643 37 SYS_224 0.05 0.012085 111 109 writev 0.04 0.009978 48 210 msgget 0.02 0.003908 279 14 mmap2 0.01 0.001680 187 9 open 0.00 0.000946 79 12 munmap 0.00 0.000917 28 33 getpid 0.00 0.000824 412 2 read 0.00 0.000246 19 13 close 0.00 0.000062 31 2 dup 0.00 0.000061 61 1 mprotect 0.00 0.000061 31 2 brk 0.00 0.000031 16 2 lseek 0.00 0.000000 0 2 getpriority ------ ----------- ----------- --------- --------- ---------------- 100.00 23.216037 945 1 total

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 14

•

14 years ago

(In reply to comment #13) > running strace -c -p on my G2 when loading google.com : Based on this, I assume you have no hang on the G2

[:fabrice] Fabrice Desré

Comment 15

•

14 years ago

(In reply to comment #14) > (In reply to comment #13) > > running strace -c -p on my G2 when loading google.com : > > Based on this, I assume you have no hang on the G2 Yes - I should have told. Just thought it could be useful to compare with a slow device.

Comment 16

•

14 years ago

With a patch that does this + if (NS_IsMainThread()) { + printf(" [%ld] BONED!\n", PR_Now() / PR_USEC_PER_SEC); + } in SQLiteMutex::lock, I'm pretty consistently seeing the pattern below when loading google.com. ++DOMWINDOW == 7 (0x24def68) [serial = 9] [outer = 0x1fc3f30] ++DOCSHELL 0x7f0cfc8c18a0 == 3 ++DOMWINDOW == 8 (0x7f0cfc89ea68) [serial = 10] [outer = (nil)] ++DOMWINDOW == 9 (0x7f0cfc3079f8) [serial = 11] [outer = 0x7f0cfc89e9f0] [1295585870] BONED! [1295585870] BONED! [1295585870] BONED! [1295585875] BONED! [1295585875] BONED! [1295585875] BONED! [1295585875] BONED! [1295585875] BONED! [1295585875] BONED! --DOMWINDOW == 8 (0x233adc8) [serial = 5] [outer = (nil)] [url = about:blank] [1295585880] BONED! [1295585880] BONED! [1295585880] BONED! [1295585880] BONED! [1295585880] BONED! [1295585880] BONED! The first three bonings happen right as the page is loading. These are likely the cause of the 1-second hangs being seen in this bug. The second two are after 5 seconds, then 5 seconds again, which doesn't seem accidental. No idea what any of this is, will look at getting some PrintJSStack()s in a bit. Does anyone know if these are fixed on places?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 17

•

14 years ago

First three are (gdb) bt #0 mozilla::storage::SQLiteMutex::lock (this=0x7fffe08c4f38) at /home/cjones/mozilla/mozilla-central/storage/src/SQLiteMutex.h:128 #1 0x00007ffff4a40188 in SQLiteMutexAutoLock (this=0x7fffffffcfe0, aMutex=...) at /home/cjones/mozilla/mozilla-central/storage/src/SQLiteMutex.h:183 #2 0x00007ffff4a3ea5a in mozilla::storage::Connection::GetTransactionInProgress (this=0x7fffe08c4f00, _inProgress=0x7fffffffd03c) at /home/cjones/mozilla/mozilla-central/storage/src/mozStorageConnection.cpp:914 #3 0x00007ffff418391f in nsDOMStoragePersistentDB::MaybeCommitInsertTransaction (this=0x7fffe08c48c8) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStoragePersistentDB.cpp:1069 #4 0x00007ffff41803e2 in nsDOMStoragePersistentDB::GetKeyValue (this=0x7fffe08c48c8, aStorage=0x90e1e0, aKey=..., aValue=..., aSecure=0x7fffffffd3bc) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStoragePersistentDB.cpp:617 #5 0x00007ffff4179cb8 in nsDOMStorageDBWrapper::GetKeyValue (this=0x7fffe08c4810, aStorage=0x90e1e0, aKey=..., aValue=..., aSecure=0x7fffffffd3bc) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStorageDBWrapper.cpp:142 #6 0x00007ffff416faff in DOMStorageImpl::GetDBValue (this=0x90e1e0, aKey=..., aValue=..., aSecure=0x7fffffffd3bc) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStorage.cpp:877 #7 0x00007ffff4170829 in DOMStorageImpl::GetValue (this=0x90e1e0, aCallerSecure=false, aKey=..., aResult=0x7fffffffd6d8) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStorage.cpp:1198 #8 0x00007ffff4188957 in mozilla::dom::StorageParent::RecvGetValue (this=0x95f330, aCallerSecure=@0x7fffffffd721, aSessionOnly=@0x7fffffffd722, aKey=..., aItem=0x7fffffffd730, rv=0x7fffffffd6d8) at /home/cjones/mozilla/mozilla-central/dom/src/storage/StorageParent.cpp:116 #9 0x00007ffff4d5f2f2 in mozilla::dom::PStorageParent::OnMessageReceived (this=0x95f330, __msg=..., __reply=@0x7fffffffd9a0) at PStorageParent.cpp:328 Next five are around the following, all above nsDOMStoragePersistentDB::FlushTemporaryTables (gdb) bt #0 mozilla::storage::SQLiteMutex::lock (this=0x7fffe08c6928) at /home/cjones/mozilla/mozilla-central/storage/src/SQLiteMutex.h:128 #1 0x00007ffff4a40188 in SQLiteMutexAutoLock (this=0x7fffffffd290, aMutex=...) at /home/cjones/mozilla/mozilla-central/storage/src/SQLiteMutex.h:183 #2 0x00007ffff4a3eaec in mozilla::storage::Connection::BeginTransactionAs (this=0x7fffe08c68f0, aTransactionType=0) at /home/cjones/mozilla/mozilla-central/storage/src/mozStorageConnection.cpp:930 #3 0x00007ffff37db6ee in mozStorageTransaction (this=0x7fffffffd300, aConnection=0x7fffe08c68f0, aCommitOnComplete=0, aType=0) at ../../dist/include/mozStorageHelper.h:77 #4 0x00007ffff417fa46 in nsDOMStoragePersistentDB::FlushTemporaryTables (this=0x7fffe08c4810, force=false) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStoragePersistentDB.cpp:535 #5 0x00007ffff4179a6b in nsDOMStorageDBWrapper::FlushAndDeleteTemporaryTables (this=0x7fffe08c4810, force=false) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStorageDBWrapper.cpp:104 #6 0x00007ffff416dcc9 in nsDOMStorageManager::Observe (this=0x7fffe02cac00, aSubject=0x0, aTopic=0x7ffff58f2205 "domstorage-flush-timer", aData=0x0) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStorage.cpp:486 #7 0x00007ffff4f0f771 in nsObserverList::NotifyObservers (this=0x9db7c8, aSubject=0x0, aTopic=0x7ffff58f2205 "domstorage-flush-timer", someData=0x0) at /home/cjones/mozilla/mozilla-central/xpcom/ds/nsObserverList.cpp:130 #8 0x00007ffff4f1140e in nsObserverService::NotifyObservers (this=0x7fffe00f5160, aSubject=0x0, aTopic=0x7ffff58f2205 "domstorage-flush-timer", someData=0x0) at /home/cjones/mozilla/mozilla-central/xpcom/ds/nsObserverService.cpp:182 #9 0x00007ffff416d837 in nsDOMStorageManager::Observe (this=0x7fffe02cac00, aSubject=0x7fffe0c1de70, aTopic=0x7ffff5b5393d "timer-callback", aData=0x0) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStorage.cpp:449

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 18

•

14 years ago

The PStorage:GetValue is from an "insecure caller", so I would assume google.com. The nsDOMStorageDBWrapper::FlushAndDeleteTemporaryTables seems related to that. PStorage is an entirely 'sync' protocol. This is a *big* *big* problem! It means that web content can force the fennec UI to not only block, but block on disk IO! If there's not a feasible fix for this, I would strongly recommend disabling PStorage for 4.0. I also see some boning loading old-school sites like timecube.com. Will poke in a bit.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 19

•

14 years ago

Attached file Full backtraces from other lock()ers — Details

Got these backtraces from loading timecube.com, after several loads of about:blank. tl;dr nsOfflineCacheDevice::Init nsCookieService::TryInitDB (x2, similar stacks) ContentParent::RecvStartVisitedQuery, ending up in History::GetIsVisitedStatement/Connection::Clone (x4, similar stacks)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 20

•

14 years ago

We're doing this (gdb) printf "%s", PrintJSStack() 0 anonymous(aTotalResults = -1, aAddonCount = 2, aAddons = [object Object],[object Object]) ["about:home":326] header = [object HTMLHeadingElement @ 0x7fffd9d35580 (native @ 0xf8cbb0)] loading = [object HTMLDivElement @ 0x7fffd9d35478 (native @ 0xf8ce30)] list = [object HTMLDivElement @ 0x7fffd9d35420 (native @ 0xf8ca30)] this = [object Object] 1 anonymous(aDelay = 10000) ["about:home":351] file = [xpconnect wrapped nsILocalFile @ 0x888ec0 (native @ 0xdebd40)] this = [object Object] 2 updateAddons() ["about:home":363] this = [object Window @ 0xf56020 (native @ 0xf54468)] 3 initAddons() ["about:home":368] this = [object Window @ 0xf56020 (native @ 0xf54468)] 4 init() ["about:home":119] this = [object Window @ 0xf56020 (native @ 0xf54468)] 5 onload(event = [object Event @ 0x80f5d0 (native @ 0xf51c20)]) ["about:home":1] this = [object Window @ 0xf56020 (native @ 0xf54468)] near startup too. Not sure if that's already known.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 21

•

14 years ago

(gdb) printf "%s", PrintJSStack() 0 anonymous() ["file:///home/cjones/mozilla/mm-dbg/dist/bin/components/nsPlacesAutoComplete.js":257] db = undefined this = [object Object] 1 anonymous() ["resource://gre/modules/XPCOMUtils.jsm":176] this = [object Object] 2 anonymous() ["file:///home/cjones/mozilla/mm-dbg/dist/bin/components/nsPlacesAutoComplete.js":287] replacementText = "" this = [object Object] 3 anonymous() ["resource://gre/modules/XPCOMUtils.jsm":176] this = [object Object] 4 PAC_getBoundSearchQuery(aTokens = , aMatchBehavior = 1) ["file:///home/cjones/mozilla/mm-dbg/dist/bin/components/nsPlacesAutoComplete.js":892] query = undefined this = [object Object] 5 PAC_getSearch(aTokens = ) ["file:///home/cjones/mozilla/mm-dbg/dist/bin/components/nsPlacesAutoComplete.js":860] this = [object Object] 6 PAC_startSearch(aListener = [xpconnect wrapped nsIAutoCompleteObserver @ 0x7fffe058d390 (native @ 0x7fffe037f2f0)], aPreviousResult = null, aSearchParam = "", aSearchString = "") ["file:///home/cjones/mozilla/mm-dbg/dist/bin/components/nsPlacesAutoComplete.js":484] queries = undefined tokens = undefined query = undefined result = [xpconnect wrapped nsIAutoCompleteSimpleResult @ 0x7fffe058c1d0 (native @ 0x7fffe058c160)] searchParamParts = this = [object Object] 7 fetch(onResult = undefined, query = "") ["file:///home/cjones/mozilla/mm-dbg/dist/bin/components/AutoCompleteCache.js":93] this = [object Object] 8 anonymous([xpconnect wrapped nsITimer @ 0x12b31b0 (native @ 0x1116a00)]) ["file:///home/cjones/mozilla/mm-dbg/dist/bin/components/AutoCompleteCache.js":140] this = [object Object]

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 22

•

14 years ago

No more sore thumbs while browsing around normally, with desktop-fennec. This doesn't match my experience on device, so I'm not sure if any of the backtraces above are "the bug".

Comment 23

•

14 years ago

(In reply to comment #17) These two are local storage. (In reply to comment #19) > nsOfflineCacheDevice::Init Don't know anything about this. > nsCookieService::TryInitDB (x2, similar stacks) This only happens during startup (we have to do this at some point; no avoiding it) > ContentParent::RecvStartVisitedQuery, ending up in > History::GetIsVisitedStatement/Connection::Clone (x4, similar stacks) You should only ever see this once...very confused why you'd see it four times! (In reply to comment #20) I believe the add-ons manager does some synchronous disk I/O still. (In reply to comment #21) This is initializing the location bar code; should happen only once.

Comment 24

•

14 years ago

(In reply to comment #23) > (In reply to comment #20) > I believe the add-ons manager does some synchronous disk I/O still. This is the creation of the add-on recommended cache. It's a Fennec only concept and this actual path should only happen for new profiles. Afterward, the cache is refreshed during normal add-on update checking. > > (In reply to comment #21) > This is initializing the location bar code; should happen only once. Actually, Fennec has a cache for the default autocomplete lookup ("") and this is very likely the refreshing of that cache. We only update the cache on certain triggers though.

Comment 25

•

14 years ago

(In reply to comment #24) > Actually, Fennec has a cache for the default autocomplete lookup ("") and this > is very likely the refreshing of that cache. We only update the cache on > certain triggers though. It is certainly doing the initialization of the location bar code too, though. It shouldn't be happening more than once (and if it is, somebody is likely calling a createInstance on it instead of getService, which would be unfortunate).

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 26

•

14 years ago

(In reply to comment #25) > (In reply to comment #24) > > Actually, Fennec has a cache for the default autocomplete lookup ("") and this > > is very likely the refreshing of that cache. We only update the cache on > > certain triggers though. > It is certainly doing the initialization of the location bar code too, though. > It shouldn't be happening more than once (and if it is, somebody is likely > calling a createInstance on it instead of getService, which would be > unfortunate). I see your point. Nope. We use getService: http://mxr.mozilla.org/mobile-browser/source/components/AutoCompleteCache.js#63

Comment 27

•

14 years ago

(In reply to comment #23) > > ContentParent::RecvStartVisitedQuery, ending up in > > History::GetIsVisitedStatement/Connection::Clone (x4, similar stacks) > You should only ever see this once...very confused why you'd see it four times! Sorry, "x4, similar stacks" was unclear --- I hit SQLiteMutex::lock four times with similar stacks, in quick succession. The stacks were probably all part of the same operation, but I didn't verify.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 28

•

14 years ago

(In reply to comment #27) > Sorry, "x4, similar stacks" was unclear --- I hit SQLiteMutex::lock four times > with similar stacks, in quick succession. The stacks were probably all part of > the same operation, but I didn't verify. I should note that SQLiteMutex is only our wrapper for the internal mutex SQLite uses (and exposes) for situations where we need a set of SQLite operations to be atomic. We don't use it much. If you actually want to see all times we actually acquire a sqlite3_mutex on the main thread, you'll want to do something like this unit test does: https://mxr.mozilla.org/mozilla-central/source/storage/test/test_true_async.cpp

Comment 29

•

14 years ago

Great to know, thanks!

Honza Bambas (:mayhemer)

Comment 30

•

14 years ago

Chris, thanks for the analyzes. Stacks from comment 17 can be optimized: - the first one: we can keep the information of having a transaction on the connection using our own flag - the second one: we don't have to begin a transaction every time we iterate tables to be potentially flushed, we can do it at the moment we find an actual data to flush (will not happen that often) Other way is to use Shawn's proposal to use async API for DOM storage. I found the sync approach pretty fast in Firefox, so I avoided more complexity to an already complex patch for temporary tables. Although the code is quit stable, I think it is too late to change it to use async API. So, the main question: what's causing (technically) such huge dalay to acquire the mutex? Is it some concurrency with other connections?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 31

•

14 years ago

(In reply to comment #30) > Other way is to use Shawn's proposal to use async API for DOM storage. I found > the sync approach pretty fast in Firefox, so I avoided more complexity to an > already complex patch for temporary tables. We should do this, regardless (after 4.0 though). Looks like you filed bug 627635 on this though. > Although the code is quit stable, I think it is too late to change it to use > async API. Agreed. > So, the main question: what's causing (technically) such huge dalay to acquire > the mutex? Is it some concurrency with other connections? SQLite holds the mutex while doing disk I/O.

Comment 32

•

14 years ago

(In reply to comment #30) > Other way is to use Shawn's proposal to use async API for DOM storage. I found > the sync approach pretty fast in Firefox Did you test on machines with slow disks/filesystems, like the Samsung Galaxy S***? The fundamental problem with the current implementation is that it blocks the UI thread on content-controlled disk IO. This is incompatible with our target of a responsive UI and 60fps UI animations.

Brad Lassey [:blassey] (use needinfo?)

Updated

•

14 years ago

tracking-fennec: 2.0b4+ → 2.0b5+

Honza Bambas (:mayhemer)

Updated

•

14 years ago

Blocks: 600307

Updated

•

14 years ago

Assignee: nobody → doug.turner

Comment 33

•

14 years ago

not blocking b5 anymore. this seems to happen less and less, but has not gone away entirely. still blocking 2.0 so we keep trying to find any potential issue.

tracking-fennec: 2.0b5+ → 2.0+

Comment 34

•

14 years ago

Let's try using strace on the galaxy s

Comment 35

•

14 years ago

This might be fixed by either of the two listed depend bugs. If not, we should remove from blocker list.

Depends on: 634666, localStorageIO