Closed Bug 622112 Opened 14 years ago Closed 14 years ago

Fennec hangs ~1 second during page load

Categories

(Firefox for Android Graveyard :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

(fennec2.0+)

RESOLVED INCOMPLETE
Tracking Status
fennec 2.0+ ---

People

(Reporter: aakashd, Assigned: dougt)

References

Details

(Keywords: perf)

Attachments

(1 file)

Build Id: Mozilla/5.0 (Android; Linux armv71; rv:2.0b9pre) Gecko/20101230 Namoroka/4.0b9pre Fennec/4.0b4pre Note: This is a follow-up of bug https://bugzilla.mozilla.org/show_bug.cgi?id=621802 There's a very noticeable hang for ~1 sec once the title and favicon are updated for the page and content is loading onto the window. The pages I tried and found this issue on are: www.espn.com www.facebook.com bugzilla.mozilla.org The pages I tried and did NOT find this issue on are: addons.mozilla.org about:firefox Also, there's a noticeable hang when going to the controls panel as well which I think it related to this.
tracking-fennec: --- → ?
Assignee: nobody → mbrubeck
I've done some simple logging to time some of the parent-process functions that run during page load. Tested on Galaxy Tab. The only significant time I've found so far is onLocationChange, which spends 26ms total including 12ms in BrowserUI.updateURI, 5ms in removeTransientNotifications (?!), and 6ms in scrollContentToTop. The UI was visibly unresponsive for much longer than 26ms, so I'll need to improve on this ad-hoc profiling.
tracking-fennec: ? → 2.0b4+
I tried commenting out the favicon and bookmark code that runs during page load, but I'm still seeing noticeable hangs. I'm profiling the front-end JavaScript code to look for long-running calls, but haven't found much. I'll try debugging/profiling this at a platform level next. We do sometimes spend ~100ms in BrowserUI.sizeControls, possibly because of a getBoundingClientRect call that could be avoided in most cases.
This hang is very noticable on the Galaxy Tab, where it sometimes lasts >2 seconds. It is hardly noticeable at all on the T-Mobile G2. I suspect that the difference is related to the much slower filesystem performance on the Tab.
we shouldn't be touching the filesystem during a load. :D
(In reply to comment #4) > we shouldn't be touching the filesystem during a load. :D I thought I'd strace the parent process while loading google.com Nokia-N900-42-11:~# strace -c -p 9690 Process 9690 attached - interrupt to quit ^CProcess 9690 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 72.69 0.027619 40 689 poll 10.60 0.004028 13 310 futex 9.88 0.003752 9 438 6 read 3.45 0.001312 6 220 write 2.81 0.001068 1 1757 gettimeofday 0.32 0.000122 1 99 fcntl64 0.24 0.000092 0 679 ioctl 0.00 0.000000 0 1 restart_syscall 0.00 0.000000 0 7 1 open 0.00 0.000000 0 1 close 0.00 0.000000 0 39 36 access 0.00 0.000000 0 4 munmap 0.00 0.000000 0 4 clone 0.00 0.000000 0 3 mprotect 0.00 0.000000 0 29 _llseek 0.00 0.000000 0 4 sched_setscheduler 0.00 0.000000 0 4 sched_get_priority_max 0.00 0.000000 0 4 sched_get_priority_min 0.00 0.000000 0 5 mmap2 0.00 0.000000 0 5 stat64 0.00 0.000000 0 5 lstat64 0.00 0.000000 0 54 fstat64 0.00 0.000000 0 280 clock_gettime 0.00 0.000000 0 6 shmat 0.00 0.000000 0 4 shmdt 0.00 0.000000 0 6 shmctl ------ ----------- ----------- --------- --------- ---------------- 100.00 0.037993 4657 43 total
strace on N900 of just the load of google.com in it. No startup or anything else.
nothing surprising.
I can't reproduce this bug on Galaxy Tab using the latest nightly from the Places branch.
matt, what about the m-c branch?
(In reply to comment #9) > matt, what about the m-c branch? Good question... Actually, it turns out I was accidentally testing this on m-c instead of places. (I had both branded and unbranded builds installed simultaneously, from testing a different bug, and got confused.) So it looks like this works for me on m-c. Is anyone else still able to reproduce this hang?
Status: NEW → ASSIGNED
I am still seeing it
Unassigning since I can no longer reproduce this. I stopped seeing this around the time that I had to uninstall/reinstall to fix some issues from an patch in bug 623999. So it may have been related to the size or contents of my old profile.
Assignee: mbrubeck → nobody
Keywords: perf
Status: ASSIGNED → NEW
tracking-fennec: 2.0b4+ → 2.0next+
tracking-fennec: 2.0next+ → 2.0b4+
running strace -c -p on my G2 when loading google.com : % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 99.52 23.103940 55538 416 1 recv 0.25 0.057499 710 81 ioctl 0.10 0.023799 643 37 SYS_224 0.05 0.012085 111 109 writev 0.04 0.009978 48 210 msgget 0.02 0.003908 279 14 mmap2 0.01 0.001680 187 9 open 0.00 0.000946 79 12 munmap 0.00 0.000917 28 33 getpid 0.00 0.000824 412 2 read 0.00 0.000246 19 13 close 0.00 0.000062 31 2 dup 0.00 0.000061 61 1 mprotect 0.00 0.000061 31 2 brk 0.00 0.000031 16 2 lseek 0.00 0.000000 0 2 getpriority ------ ----------- ----------- --------- --------- ---------------- 100.00 23.216037 945 1 total
(In reply to comment #13) > running strace -c -p on my G2 when loading google.com : Based on this, I assume you have no hang on the G2
(In reply to comment #14) > (In reply to comment #13) > > running strace -c -p on my G2 when loading google.com : > > Based on this, I assume you have no hang on the G2 Yes - I should have told. Just thought it could be useful to compare with a slow device.
With a patch that does this + if (NS_IsMainThread()) { + printf(" [%ld] BONED!\n", PR_Now() / PR_USEC_PER_SEC); + } in SQLiteMutex::lock, I'm pretty consistently seeing the pattern below when loading google.com. ++DOMWINDOW == 7 (0x24def68) [serial = 9] [outer = 0x1fc3f30] ++DOCSHELL 0x7f0cfc8c18a0 == 3 ++DOMWINDOW == 8 (0x7f0cfc89ea68) [serial = 10] [outer = (nil)] ++DOMWINDOW == 9 (0x7f0cfc3079f8) [serial = 11] [outer = 0x7f0cfc89e9f0] [1295585870] BONED! [1295585870] BONED! [1295585870] BONED! [1295585875] BONED! [1295585875] BONED! [1295585875] BONED! [1295585875] BONED! [1295585875] BONED! [1295585875] BONED! --DOMWINDOW == 8 (0x233adc8) [serial = 5] [outer = (nil)] [url = about:blank] [1295585880] BONED! [1295585880] BONED! [1295585880] BONED! [1295585880] BONED! [1295585880] BONED! [1295585880] BONED! The first three bonings happen right as the page is loading. These are likely the cause of the 1-second hangs being seen in this bug. The second two are after 5 seconds, then 5 seconds again, which doesn't seem accidental. No idea what any of this is, will look at getting some PrintJSStack()s in a bit. Does anyone know if these are fixed on places?
First three are (gdb) bt #0 mozilla::storage::SQLiteMutex::lock (this=0x7fffe08c4f38) at /home/cjones/mozilla/mozilla-central/storage/src/SQLiteMutex.h:128 #1 0x00007ffff4a40188 in SQLiteMutexAutoLock (this=0x7fffffffcfe0, aMutex=...) at /home/cjones/mozilla/mozilla-central/storage/src/SQLiteMutex.h:183 #2 0x00007ffff4a3ea5a in mozilla::storage::Connection::GetTransactionInProgress (this=0x7fffe08c4f00, _inProgress=0x7fffffffd03c) at /home/cjones/mozilla/mozilla-central/storage/src/mozStorageConnection.cpp:914 #3 0x00007ffff418391f in nsDOMStoragePersistentDB::MaybeCommitInsertTransaction (this=0x7fffe08c48c8) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStoragePersistentDB.cpp:1069 #4 0x00007ffff41803e2 in nsDOMStoragePersistentDB::GetKeyValue (this=0x7fffe08c48c8, aStorage=0x90e1e0, aKey=..., aValue=..., aSecure=0x7fffffffd3bc) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStoragePersistentDB.cpp:617 #5 0x00007ffff4179cb8 in nsDOMStorageDBWrapper::GetKeyValue (this=0x7fffe08c4810, aStorage=0x90e1e0, aKey=..., aValue=..., aSecure=0x7fffffffd3bc) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStorageDBWrapper.cpp:142 #6 0x00007ffff416faff in DOMStorageImpl::GetDBValue (this=0x90e1e0, aKey=..., aValue=..., aSecure=0x7fffffffd3bc) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStorage.cpp:877 #7 0x00007ffff4170829 in DOMStorageImpl::GetValue (this=0x90e1e0, aCallerSecure=false, aKey=..., aResult=0x7fffffffd6d8) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStorage.cpp:1198 #8 0x00007ffff4188957 in mozilla::dom::StorageParent::RecvGetValue (this=0x95f330, aCallerSecure=@0x7fffffffd721, aSessionOnly=@0x7fffffffd722, aKey=..., aItem=0x7fffffffd730, rv=0x7fffffffd6d8) at /home/cjones/mozilla/mozilla-central/dom/src/storage/StorageParent.cpp:116 #9 0x00007ffff4d5f2f2 in mozilla::dom::PStorageParent::OnMessageReceived (this=0x95f330, __msg=..., __reply=@0x7fffffffd9a0) at PStorageParent.cpp:328 Next five are around the following, all above nsDOMStoragePersistentDB::FlushTemporaryTables (gdb) bt #0 mozilla::storage::SQLiteMutex::lock (this=0x7fffe08c6928) at /home/cjones/mozilla/mozilla-central/storage/src/SQLiteMutex.h:128 #1 0x00007ffff4a40188 in SQLiteMutexAutoLock (this=0x7fffffffd290, aMutex=...) at /home/cjones/mozilla/mozilla-central/storage/src/SQLiteMutex.h:183 #2 0x00007ffff4a3eaec in mozilla::storage::Connection::BeginTransactionAs (this=0x7fffe08c68f0, aTransactionType=0) at /home/cjones/mozilla/mozilla-central/storage/src/mozStorageConnection.cpp:930 #3 0x00007ffff37db6ee in mozStorageTransaction (this=0x7fffffffd300, aConnection=0x7fffe08c68f0, aCommitOnComplete=0, aType=0) at ../../dist/include/mozStorageHelper.h:77 #4 0x00007ffff417fa46 in nsDOMStoragePersistentDB::FlushTemporaryTables (this=0x7fffe08c4810, force=false) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStoragePersistentDB.cpp:535 #5 0x00007ffff4179a6b in nsDOMStorageDBWrapper::FlushAndDeleteTemporaryTables (this=0x7fffe08c4810, force=false) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStorageDBWrapper.cpp:104 #6 0x00007ffff416dcc9 in nsDOMStorageManager::Observe (this=0x7fffe02cac00, aSubject=0x0, aTopic=0x7ffff58f2205 "domstorage-flush-timer", aData=0x0) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStorage.cpp:486 #7 0x00007ffff4f0f771 in nsObserverList::NotifyObservers (this=0x9db7c8, aSubject=0x0, aTopic=0x7ffff58f2205 "domstorage-flush-timer", someData=0x0) at /home/cjones/mozilla/mozilla-central/xpcom/ds/nsObserverList.cpp:130 #8 0x00007ffff4f1140e in nsObserverService::NotifyObservers (this=0x7fffe00f5160, aSubject=0x0, aTopic=0x7ffff58f2205 "domstorage-flush-timer", someData=0x0) at /home/cjones/mozilla/mozilla-central/xpcom/ds/nsObserverService.cpp:182 #9 0x00007ffff416d837 in nsDOMStorageManager::Observe (this=0x7fffe02cac00, aSubject=0x7fffe0c1de70, aTopic=0x7ffff5b5393d "timer-callback", aData=0x0) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStorage.cpp:449
The PStorage:GetValue is from an "insecure caller", so I would assume google.com. The nsDOMStorageDBWrapper::FlushAndDeleteTemporaryTables seems related to that. PStorage is an entirely 'sync' protocol. This is a *big* *big* problem! It means that web content can force the fennec UI to not only block, but block on disk IO! If there's not a feasible fix for this, I would strongly recommend disabling PStorage for 4.0. I also see some boning loading old-school sites like timecube.com. Will poke in a bit.
Got these backtraces from loading timecube.com, after several loads of about:blank. tl;dr nsOfflineCacheDevice::Init nsCookieService::TryInitDB (x2, similar stacks) ContentParent::RecvStartVisitedQuery, ending up in History::GetIsVisitedStatement/Connection::Clone (x4, similar stacks)
We're doing this (gdb) printf "%s", PrintJSStack() 0 anonymous(aTotalResults = -1, aAddonCount = 2, aAddons = [object Object],[object Object]) ["about:home":326] header = [object HTMLHeadingElement @ 0x7fffd9d35580 (native @ 0xf8cbb0)] loading = [object HTMLDivElement @ 0x7fffd9d35478 (native @ 0xf8ce30)] list = [object HTMLDivElement @ 0x7fffd9d35420 (native @ 0xf8ca30)] this = [object Object] 1 anonymous(aDelay = 10000) ["about:home":351] file = [xpconnect wrapped nsILocalFile @ 0x888ec0 (native @ 0xdebd40)] this = [object Object] 2 updateAddons() ["about:home":363] this = [object Window @ 0xf56020 (native @ 0xf54468)] 3 initAddons() ["about:home":368] this = [object Window @ 0xf56020 (native @ 0xf54468)] 4 init() ["about:home":119] this = [object Window @ 0xf56020 (native @ 0xf54468)] 5 onload(event = [object Event @ 0x80f5d0 (native @ 0xf51c20)]) ["about:home":1] this = [object Window @ 0xf56020 (native @ 0xf54468)] near startup too. Not sure if that's already known.
(gdb) printf "%s", PrintJSStack() 0 anonymous() ["file:///home/cjones/mozilla/mm-dbg/dist/bin/components/nsPlacesAutoComplete.js":257] db = undefined this = [object Object] 1 anonymous() ["resource://gre/modules/XPCOMUtils.jsm":176] this = [object Object] 2 anonymous() ["file:///home/cjones/mozilla/mm-dbg/dist/bin/components/nsPlacesAutoComplete.js":287] replacementText = "" this = [object Object] 3 anonymous() ["resource://gre/modules/XPCOMUtils.jsm":176] this = [object Object] 4 PAC_getBoundSearchQuery(aTokens = , aMatchBehavior = 1) ["file:///home/cjones/mozilla/mm-dbg/dist/bin/components/nsPlacesAutoComplete.js":892] query = undefined this = [object Object] 5 PAC_getSearch(aTokens = ) ["file:///home/cjones/mozilla/mm-dbg/dist/bin/components/nsPlacesAutoComplete.js":860] this = [object Object] 6 PAC_startSearch(aListener = [xpconnect wrapped nsIAutoCompleteObserver @ 0x7fffe058d390 (native @ 0x7fffe037f2f0)], aPreviousResult = null, aSearchParam = "", aSearchString = "") ["file:///home/cjones/mozilla/mm-dbg/dist/bin/components/nsPlacesAutoComplete.js":484] queries = undefined tokens = undefined query = undefined result = [xpconnect wrapped nsIAutoCompleteSimpleResult @ 0x7fffe058c1d0 (native @ 0x7fffe058c160)] searchParamParts = this = [object Object] 7 fetch(onResult = undefined, query = "") ["file:///home/cjones/mozilla/mm-dbg/dist/bin/components/AutoCompleteCache.js":93] this = [object Object] 8 anonymous([xpconnect wrapped nsITimer @ 0x12b31b0 (native @ 0x1116a00)]) ["file:///home/cjones/mozilla/mm-dbg/dist/bin/components/AutoCompleteCache.js":140] this = [object Object]
No more sore thumbs while browsing around normally, with desktop-fennec. This doesn't match my experience on device, so I'm not sure if any of the backtraces above are "the bug".
(In reply to comment #17) These two are local storage. (In reply to comment #19) > nsOfflineCacheDevice::Init Don't know anything about this. > nsCookieService::TryInitDB (x2, similar stacks) This only happens during startup (we have to do this at some point; no avoiding it) > ContentParent::RecvStartVisitedQuery, ending up in > History::GetIsVisitedStatement/Connection::Clone (x4, similar stacks) You should only ever see this once...very confused why you'd see it four times! (In reply to comment #20) I believe the add-ons manager does some synchronous disk I/O still. (In reply to comment #21) This is initializing the location bar code; should happen only once.
(In reply to comment #23) > (In reply to comment #20) > I believe the add-ons manager does some synchronous disk I/O still. This is the creation of the add-on recommended cache. It's a Fennec only concept and this actual path should only happen for new profiles. Afterward, the cache is refreshed during normal add-on update checking. > > (In reply to comment #21) > This is initializing the location bar code; should happen only once. Actually, Fennec has a cache for the default autocomplete lookup ("") and this is very likely the refreshing of that cache. We only update the cache on certain triggers though.
(In reply to comment #24) > Actually, Fennec has a cache for the default autocomplete lookup ("") and this > is very likely the refreshing of that cache. We only update the cache on > certain triggers though. It is certainly doing the initialization of the location bar code too, though. It shouldn't be happening more than once (and if it is, somebody is likely calling a createInstance on it instead of getService, which would be unfortunate).
(In reply to comment #25) > (In reply to comment #24) > > Actually, Fennec has a cache for the default autocomplete lookup ("") and this > > is very likely the refreshing of that cache. We only update the cache on > > certain triggers though. > It is certainly doing the initialization of the location bar code too, though. > It shouldn't be happening more than once (and if it is, somebody is likely > calling a createInstance on it instead of getService, which would be > unfortunate). I see your point. Nope. We use getService: http://mxr.mozilla.org/mobile-browser/source/components/AutoCompleteCache.js#63
(In reply to comment #23) > > ContentParent::RecvStartVisitedQuery, ending up in > > History::GetIsVisitedStatement/Connection::Clone (x4, similar stacks) > You should only ever see this once...very confused why you'd see it four times! Sorry, "x4, similar stacks" was unclear --- I hit SQLiteMutex::lock four times with similar stacks, in quick succession. The stacks were probably all part of the same operation, but I didn't verify.
(In reply to comment #27) > Sorry, "x4, similar stacks" was unclear --- I hit SQLiteMutex::lock four times > with similar stacks, in quick succession. The stacks were probably all part of > the same operation, but I didn't verify. I should note that SQLiteMutex is only our wrapper for the internal mutex SQLite uses (and exposes) for situations where we need a set of SQLite operations to be atomic. We don't use it much. If you actually want to see all times we actually acquire a sqlite3_mutex on the main thread, you'll want to do something like this unit test does: https://mxr.mozilla.org/mozilla-central/source/storage/test/test_true_async.cpp
Chris, thanks for the analyzes. Stacks from comment 17 can be optimized: - the first one: we can keep the information of having a transaction on the connection using our own flag - the second one: we don't have to begin a transaction every time we iterate tables to be potentially flushed, we can do it at the moment we find an actual data to flush (will not happen that often) Other way is to use Shawn's proposal to use async API for DOM storage. I found the sync approach pretty fast in Firefox, so I avoided more complexity to an already complex patch for temporary tables. Although the code is quit stable, I think it is too late to change it to use async API. So, the main question: what's causing (technically) such huge dalay to acquire the mutex? Is it some concurrency with other connections?
(In reply to comment #30) > Other way is to use Shawn's proposal to use async API for DOM storage. I found > the sync approach pretty fast in Firefox, so I avoided more complexity to an > already complex patch for temporary tables. We should do this, regardless (after 4.0 though). Looks like you filed bug 627635 on this though. > Although the code is quit stable, I think it is too late to change it to use > async API. Agreed. > So, the main question: what's causing (technically) such huge dalay to acquire > the mutex? Is it some concurrency with other connections? SQLite holds the mutex while doing disk I/O.
(In reply to comment #30) > Other way is to use Shawn's proposal to use async API for DOM storage. I found > the sync approach pretty fast in Firefox Did you test on machines with slow disks/filesystems, like the Samsung Galaxy S***? The fundamental problem with the current implementation is that it blocks the UI thread on content-controlled disk IO. This is incompatible with our target of a responsive UI and 60fps UI animations.
tracking-fennec: 2.0b4+ → 2.0b5+
Blocks: 600307
Assignee: nobody → doug.turner
not blocking b5 anymore. this seems to happen less and less, but has not gone away entirely. still blocking 2.0 so we keep trying to find any potential issue.
tracking-fennec: 2.0b5+ → 2.0+
Let's try using strace on the galaxy s
This might be fixed by either of the two listed depend bugs. If not, we should remove from blocker list.
Depends on: 634666, localStorageIO
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → INCOMPLETE
No longer blocks: 600307
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: