Closed Bug 799261 Opened 13 years ago Closed 13 years ago

Crash (and phone reboot) on visiting website

Categories

(Core :: Networking, defect)

x86
macOS
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 764758

People

(Reporter: mgoodwin, Assigned: mayhemer)

References

()

Details

(Keywords: crash, csectype-dos)

Attachments

(3 files)

Issue: Phone crashes on visiting website. Reproduced on multiple devices (though I've only seen it on a Nexus S). STR: 1) Visit http://asimilyas.webfactional.com/site_media/mobile/ in Firefox on B2G 2) Observe crash and reboot
Also crashes desktop firefox...
Group: core-security
(In reply to Mark Goodwin [:mgoodwin] from comment #1) > Also crashes desktop firefox... Can you get a crash report URL?
Keywords: crash
(In reply to Jason Smith [:jsmith] from comment #2) > (In reply to Mark Goodwin [:mgoodwin] from comment #1) > > Also crashes desktop firefox... > > Can you get a crash report URL? https://crash-stats.mozilla.com/report/index/bp-40c1f5f2-2c64-4356-9be1-278d52121008
I am guessing the crash is this line in the attached logcat: I/Gecko ( 1093): [Child 1093] ###!!! ABORT: aborting because of fatal error: file /data/jenkins/jobs/build-otoro-ics/workspace/gecko/dom/ipc/ContentChild.cpp, line 820
Attached file callstack
mUpdate->OnByteProgress(mBytesRead); (gdb) p mBytesRead $1 = -1 mBytesRead is from type unsigned and produces an integer overflow.
I meant to say OnByteProgress requires an unsigned but mBytesRead is signed (-1). nsOfflineCacheUpdate::OnByteProgress(uint64_t byteIncrement)
Component: General → Networking
Product: Boot2Gecko → Core
Looks like this code was introduced in bug 744710. Honza, how bad is this? Is whatever OnByteProgress set used to guard access to any arrays or whatever?
Looks similar-ish to bug 764758...
We know there is also bug 764758 that IMO is just a duplicate. Unfortunatelly, current m-c desktop browser build doesn't crash for me locally (win7-64 box) when I go to the URL. Mark, what other settings and what version of the _desktop firefox_ are you using that crashes for you? Looks like this happens more often on OSX. I suspect we get call to OnStopRequest more then one time, but could also well be a different issue. We can fix this by a simple null check, however the logic of the code doesn't allow mUpdate member to be null.
Mark, do you have web apps installed?
(In reply to Honza Bambas (:mayhemer) from comment #11) > Mark, what other settings and what version of the _desktop firefox_ are you > using that crashes for you? Looks like this happens more often on OSX. The crash happened on a (stock) nightly (m-c desktop build). It's not happening on the current m-c.
(In reply to Honza Bambas (:mayhemer) from comment #12) > Mark, do you have web apps installed? I do.
I can reproduce on Mac. Finally, web apps are not involved. GDB is hard to use, so it may take a little while :)
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Status: ASSIGNED → NEW
OK, getting closer... This actually can be quit serious according a debug failure like this (win7, debug build, ran in debugger, after 10th reload of the <URL>): > msvcr100d.dll!memset(unsigned char * dst=0x000000dd, unsigned char value='p', unsigned long count=208602408) Line 127 Asm msvcr100d.dll!_free_dbg_nolock(void * pUserData=0x0c6f0548, int nBlockUse=1) Line 1430 + 0x1b bytes C++ msvcr100d.dll!_free_dbg(void * pUserData=0x0c6f0548, int nBlockUse=1) Line 1265 + 0xd bytes C++ msvcr100d.dll!free(void * pUserData=0x0c6f0548) Line 49 + 0xb bytes C++ mozalloc.dll!moz_free(void * ptr=0x0c6f0548) Line 51 + 0xa bytes C++ xul.dll!operator delete(void * ptr=0x0c6f0548) Line 224 + 0xa bytes C++ xul.dll!nsOfflineCacheUpdateItem::`scalar deleting destructor'() + 0x20 bytes C++ xul.dll!nsOfflineCacheUpdateItem::Release() Line 283 + 0x171 bytes C++ xul.dll!nsRefPtr<nsOfflineCacheUpdateItem>::~nsRefPtr<nsOfflineCacheUpdateItem>() Line 875 C++ xul.dll!nsRefPtr<nsOfflineCacheUpdateItem>::`scalar deleting destructor'() + 0xf bytes C++ xul.dll!nsTArrayElementTraits<nsRefPtr<nsOfflineCacheUpdateItem> >::Destruct(nsRefPtr<nsOfflineCacheUpdateItem> * e=0x068fd020) Line 360 C++ xul.dll!nsTArray<nsRefPtr<nsOfflineCacheUpdateItem>,nsTArrayDefaultAllocator>::DestructRange(unsigned int start=0, unsigned int count=35) Line 1223 + 0x9 bytes C++ xul.dll!nsTArray<nsRefPtr<nsOfflineCacheUpdateItem>,nsTArrayDefaultAllocator>::RemoveElementsAt(unsigned int start=0, unsigned int count=35) Line 944 C++ xul.dll!nsTArray<nsRefPtr<nsOfflineCacheUpdateItem>,nsTArrayDefaultAllocator>::Clear() Line 955 C++ xul.dll!nsTArray<nsRefPtr<nsOfflineCacheUpdateItem>,nsTArrayDefaultAllocator>::~nsTArray<nsRefPtr<nsOfflineCacheUpdateItem>,nsTArrayDefaultAllocator>() Line 441 + 0xf bytes C++ xul.dll!nsOfflineCacheUpdate::~nsOfflineCacheUpdate() Line 1158 + 0x37 bytes C++ xul.dll!nsOfflineCacheUpdate::`scalar deleting destructor'() + 0xf bytes C++ xul.dll!nsOfflineCacheUpdate::Release() Line 1135 + 0x16c bytes C++ xul.dll!nsRefPtr<nsOfflineCacheUpdate>::~nsRefPtr<nsOfflineCacheUpdate>() Line 875 C++ xul.dll!mozilla::docshell::OfflineCacheUpdateGlue::~OfflineCacheUpdateGlue() Line 56 + 0x21 bytes C++ xul.dll!mozilla::docshell::OfflineCacheUpdateGlue::`scalar deleting destructor'() + 0xf bytes C++ xul.dll!mozilla::docshell::OfflineCacheUpdateGlue::Release() Line 42 + 0x160 bytes C++ xul.dll!ReleaseSliceNow(int slice=100, void * data=0x09436e20) Line 569 + 0xe bytes C++ xul.dll!XPCIncrementalReleaseRunnable::ReleaseNow(bool limited=true) Line 627 + 0x10 bytes C++ xul.dll!XPCIncrementalReleaseRunnable::Run() Line 660 C++ We may schedule nsOfflineCacheUpdate::Run() after that update has already finished (and failed) with the first explicit item being a 404. This may run items that had already been canceled again (we also fail this [1] assertion). I don't understand at this very moment why there is such a memory corruption. Will continue with that tomorrow. [1] http://hg.mozilla.org/mozilla-central/file/ec10630b1a54/uriloader/prefetch/nsOfflineCacheUpdate.cpp#l1710
Status: NEW → ASSIGNED
I'm no longer able to reproduce this heap corruption. It could be that I made some mistake in the debugger and corrupted the memory my self. I also cannot reproduce the null deref on my win7 box. Testing further with a log-enabled OSX nightly. Will refer tomorrow.
https://tbpl.mozilla.org/?tree=Try&rev=03c64f4ca0d3 God it! First, forget about the crash in comment 16. I am persuaded it was some mistake I made in the debugger, like manual memory corruption or so. Cause of the crash, quit racy: - important note here is that the first item in the manifest leads to a 404 - we post nsOfflineCacheUpdate::Run() up to the count of the parallel limit - there may be a lot of them, up to 16 - the channel for the first item finishes with an error (like 404) before all those nsOfflineCacheUpdate::Run() gets processed - nsOfflineCacheUpdate switches to STATE_FINISHED and cancels all items that are in progress (what are most of them) - canceling an item means to a) cancel its http channel if there is such yet b) set its mState to UNINITIALIZED - then, another nsOfflineCacheUpdate::Run() gets called and searches for an item that is in state UNINITIALIZED (most of them) ; [this is the bug actually, it should just immediately return when the update is already done] - it finds it and opens another channel on it - problem is that OnStopRequest on the item posts nsOfflineCacheUpdateItem::Run() [method of the item in this case, not the update] that may get triggered later, after the previous step - call to nsOfflineCacheUpdateItem::Run() is already in the queue while we have opened a new (second) channel for which the same item is again a listener - nsOfflineCacheUpdateItem::Run() gets fired, clears mUpdate - the second channel now calls OnStopRerquest => Bang! I don't think this is that highly serious issue. Patch will be very simple and easy to uplift.
I'll add the patch to bug 764758.
(In reply to Honza Bambas (:mayhemer) from comment #18) Excellent! Well done.
(In reply to Mark Goodwin [:mgoodwin] from comment #20) BTW, thanks for the test case and all info you've provided. I don't think I would fix this w/o your help, at least that quickly.
How does the crash as described in comment 18 end up rebooting a phone?
Severity: normal → critical
Keywords: csec-dos
nevermind -- missed the "b2g" and was thinking Fennec
I've just landed a fix on inbound as part of bug 764758 here: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=d7b878ec3d7d. This can be duped at will.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: