Closed
Bug 775090
Opened 10 years ago
Closed 10 years ago
Firefox startup crash in PR_EnumerateAddrInfo | PR_GetHostByAddr
Categories
(Core :: Networking, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: marcia, Unassigned)
References
Details
(Keywords: crash, qawanted, topcrash, Whiteboard: [startupcrash])
Crash Data
This bug was filed from the Socorro interface and is report bp-3cfc2b99-4374-4cf0-a9f4-0009c2120718 . ============================================================= This crash appears as a new signature which just happens in 14.0.1. https://crash-stats.mozilla.com/report/list?signature=PR_EnumerateAddrInfo%20|%20PR_GetHostByAddr%20|%20PR_ExitMonitor%20|%20nspr4.dll@0x26cf. Currently the advanced query shows it ranking #15 Suspect a third party issue but manual correlations are not showing anything so far. Comments so far are not useful. Frame Module Signature Source 0 nspr4.dll PR_EnumerateAddrInfo nsprpub/pr/src/misc/prnetdb.c:2117 1 nspr4.dll PR_GetHostByAddr nsprpub/pr/src/misc/prnetdb.c:1171 2 nspr4.dll PR_ExitMonitor nsprpub/pr/src/threads/prmon.c:132 3 nspr4.dll nspr4.dll@0x26cf 4 winmm.dll timeGetTime 5 xul.dll nsSocketTransportService::Poll netwerk/base/src/nsSocketTransportService2.cpp:431 6 nspr4.dll PR_ExitMonitor nsprpub/pr/src/threads/prmon.c:132 7 xul.dll nsSocketTransportService::Run netwerk/base/src/nsSocketTransportService2.cpp:652 8 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:656 9 xul.dll nsThread::ThreadFunc xpcom/threads/nsThread.cpp:289 10 nspr4.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:426 11 nspr4.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:122 12 msvcr100.dll _callthreadstartex f:\dd\vctools\crt_bld\self_x86\crt\src\threadex.c:314 13 msvcr100.dll _threadstartex f:\dd\vctools\crt_bld\self_x86\crt\src\threadex.c:292 14 kernel32.dll BaseThreadInitThunk 15 ntdll.dll __RtlUserThreadStart 16 ntdll.dll _RtlUserThreadStart
Updated•10 years ago
|
Whiteboard: [startupcrash]
Comment 1•10 years ago
|
||
Let's wait for more data - volume is pretty low so far from what I can see. Since this crash is only affecting 14.0.1, possible causes could be: * External issue where our beta population isn't representative * External issue where only {release channel, version 14} is affected * Bug 772282, which hasn't been on a beta release yet * A crash signature move for some reason * A buildID-specific build issue Let's keep up with this, investigate any actionable leads, and discuss during tomorrow's channel meeting.
tracking-firefox14:
--- → ?
![]() |
||
Comment 2•10 years ago
|
||
(In reply to Marcia Knous [:marcia] from comment #0) > Comments so far are not useful. FWIW, a german Commenter mentions getting these Crashes even in Safe-Mode and after "Reinstalling".
Comment 3•10 years ago
|
||
It seems related to IPv6: 89% (449/502) vs. 34% (22863/66763) wship6.dll
Comment 4•10 years ago
|
||
(In reply to Scoobidiver from comment #3) > It seems related to IPv6: > 89% (449/502) vs. 34% (22863/66763) wship6.dll Given this, CC'ing the networking team. Did we take any major IPv6 changes recently? Also tracking for 14 given the this is #14 on the top crash list currently. We need to start thinking about the types of software that only target our release software, but for some reason wouldn't show up in correlations - firewalls?
Reporter | ||
Comment 5•10 years ago
|
||
Here are some URLs so far: 186 about:blank 23 about:home 5 about:sessionrestore 4 http://www.google.co.uk/ 4 jar:file:///C:/Program%20Files/Mozilla%20Firefox/omni.ja!/chrome/browser/content 4 http://www.facebook.com/ 2 http://www.mozilla.com/ru/firefox/14.0.1/whatsnew/?oldversion=13.0.1 2 http://vuku.ru/ 2 http://www.mozilla.com/en-US/firefox/14.0.1/whatsnew/?oldversion=13.0.1 1 http://www.lenta.ru/ 1 http://bl150w.blu150.mail.live.com/default.aspx#!/mail/InboxLight.aspx 1 http://s2.gladiators.ru/xml/main/news.php?id=11443&enableChat=1 1 https://www.google.com/search?q=facebook&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla: 1 http://gmx.de/ 1 http://www.diesiedleronline.de/de/spielen 1 http://go.divx.com/divx/windows/uninstallsurvey/de 1 http://www.facebook.com/home.php? 1 https://www.google.com/search?q=soundcloud&ie=utf-8&oe=utf-8&aq=t&rls=org.mozill 1 http://support.mozilla.org/1/firefox/14.0.1/WINNT/es-ES/prefs-main 1 http://www.aol.com/ 1 http://www.yandex.ru/?clid=187997 1 http://www.dailymail.co.uk/home/index.html 1 https://www.mozilla.org/de/download/?product=firefox-14.0.1&os=win&lang=de 1 http://www.google.ca/ 1 http://www.chip.de/downloads/Adobe-Flash-Player_13003561.html 1 http://www.iransetup.com/ 1 http://www.repubblica.it/ 1 http://google.se/ 1 https://www.google.com/search?q=jesus%20manuel%20chavez%20plascencia&ie=utf-8&oe 1 http://yandex.ru/yandsearch?text=%D0%BE%D0%B4%D0%BD%D0%BE%D0%BA%D0%BB%D0%B0%D1%8 1 http://smotri.com/broadcast/list/ 1 http://odnoklassniki.ru/ 1 https://www.norsk-tipping.no/ 1 http://www.hentaimedia.com/ 1 http://www.apeha.ru/ 1 http://www.yandex.ru/?vid=101&clid=48577 1 http://www.ask.com/?o=10148&l=dis&tb=PTV 1 https://www.google.com/ 1 https://www.google.de/search?q=Firefox+14&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla 1 http://www.yahoo.com/?ilc=1 1 http://go.microsoft.com/fwlink/?LinkId=69157 1 https://login.live.com/ 1 http://firefox.yandex.ru/ 1 https://services.addons.mozilla.org/en-US/firefox/discovery/pane/14.0.1/WINNT/no 1 http://get.adobe.com/de/flashplayer/ 1 http://www.jeanmarcmorandini.com/ 1 http://www.jappy.de/ 1 http://www.google.cl/ 1 https://www.google.com/search?q=vkontakte&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla 1 http://de.mg41.mail.yahoo.com/ 1 http://donbass.ua/news/health/2012/07/18/v-donbasse-malyshi-travjatsja-rakami-i- 1 https://www.google.com/search?q=screen%20for%20desk&ie=utf-8&oe=utf-8&aq=t&rls=o 1 about:addons 1 http://de-de.facebook.com/ 1 http://home.webalta.ru/
![]() |
||
Comment 6•10 years ago
|
||
I can see the stack gets screwed at call to nspr4.dll@0x26cf. timeGetTime calls a code at address 700826C0 (winmm.dll!_soundPlay@8+0F1h) when you disassemble ; my winmm.dll loaded at 70080000. This could be related to ASLR, but ASLR is enabled since Fx13 (bug 728429). In one of the reports, nspr4.dll is loaded at 0x6dac0000 and winmm.dll at 0x73f50000, so I don't quit understand. Other threads are just waiting.
![]() |
||
Comment 7•10 years ago
|
||
CC'ing Benjamin and Kyle, since they stand behind ASLR.
Comment 8•10 years ago
|
||
Crashes on line 2117 at a call to _pr_ipv6_is_present() Note, this is the same function call that crashes in bug 718389, and that bug does have a resolution yet. Checked changelog for PR_EnumerateAddrInfo and _pr_ipv6_is_present: -- no changes since 2008 in the code for the functions. The following thoughts seem to correlate with Honza's comment 6: In bug 718389 the crash went up and down with different builds/releases, so I'm wondering if it's an intermittent build thing? I'm also wondering if the function (_pr_ipv6_is_present) is being declared but not defined due to a build error? Or not linked correctly? The error is EXCEPTION_ACCESS_VIOLATION_READ at the time the function call is made, so something isn't being read right. I don't know how likely that is, but it seems worth it to ask the question.
Comment 9•10 years ago
|
||
Updates to 14.0.1 are now fully throttled, which will stop the bleeding. If this does in fact end up being a one-off (two-off?) build-related issue, let's figure out how to identify that the issue is present in a build. We may be able to get away with re-spinning 14.0.1 for all remaining users who will update.
![]() |
||
Comment 10•10 years ago
|
||
See also: https://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=contains&query=timeGetTime&reason_type=contains&date=07%2F19%2F2012%2019%3A38%3A30&range_value=1&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=_PR_MD_PR_POLL%20|%20_MD_CURRENT_THREAD%20|%20timeGetTime Strictly only on 13.0.1. https://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=contains&query=timeGetTime&reason_type=contains&date=07%2F19%2F2012%2019%3A38%3A30&range_value=1&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=_PR_MD_PR_POLL%20|%20timeGetTime (https://crash-stats.mozilla.com/report/index/4f5f6a0d-702b-4fd7-928c-7a9ac2120718) (https://crash-stats.mozilla.com/report/index/bac2cde4-0bf6-460f-82e0-d288f2120719) The second set of reports is strictly on 13.* and 14.0.1 versions. So I really bet this is some ASLR regression...
Updated•10 years ago
|
tracking-firefox15:
--- → +
Comment 11•10 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #10) > The second set of reports is strictly on 13.* and 14.0.1 versions. So I > really bet this is some ASLR regression... Is there a way to confirm without STR, prior to re-spinning?
Er, Firefox has had ASLR for a *long* time. All we did was make sure that binary extensions have it too. I think it's very unlikely that this is related.
Comment 13•10 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #12) > Er, Firefox has had ASLR for a *long* time. All we did was make sure that > binary extensions have it too. I think it's very unlikely that this is > related. Right, but see Comment 8 - we suspect this has happened before and gone away before as well. What else is variable in that way besides ASLR? PGO?
ASLR is not variable in that way. ASLR has been on for years.
Comment 15•10 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #14) > ASLR is not variable in that way. ASLR has been on for years. OK, understood. I don't know the build system inside out, so this might be a rathole-type of question, but what variable things are there - is there anything else that would change the binary between releases? Something windows related? Something that only affects certain users? What can we check in the releases binaries? No accusation - trying to rule out possibilities. Per Honza's comment 6, the stack trace is also weird. There have been no changes to the code. How likely is it that winmm.dll is corrupted or infected? Or how likely is it that the crash reporter is reporting wrong because of an optimization? And, of most importance, is there any way we can check this? I'm going to continue to investigate the code and build from Necko/NSPR side - is there something weird affecting the build flags around _pr_ipv6_is_present()?
tracking-firefox15:
+ → ---
Comment 16•10 years ago
|
||
(sorry to interrupt the actual technical conversation) Just wanted to let you know that I re-checked, and the only other bug that landed between our final 14 beta and the 14.0.1 release build is bug 772841. Doubt that could be related though.
tracking-firefox15:
--- → +
Comment 17•10 years ago
|
||
(Ooops sorry for the tracking-firefox15 change - that was a refresh issue, unintentional)
Comment 18•10 years ago
|
||
Update after looking at code in and around the stack trace: The last part of the stack trace is just weird. It doesn't fit with the call chain in mozilla-release. PR_GetHostByAddr() doesn't call PR_EnumerateAddrInfo() according to what I see in mxr: http://mxr.mozilla.org/mozilla-release/source/nsprpub/pr/src/misc/prnetdb.c#1013 And the line number is the end of the function. I know that compiler optimization can make it tricky for the stack trace to be collected sometimes, so I checked for places where these two functions were called next to each other just in case: I don't see any such occurrences. So, I'm not sure what code to look at for the last three calls mentioned. Moving higher up the crash stack trace, before winmm.dll: -- PR_IntervalNow: no recent changes to the code -- The call at netwerk/base/src/nsSocketTransportService2.cpp:431 is actually PR_IntervalToSeconds() - no recent changes to that code either. The rest of the stack points to timeGetTime which is in PR_IntervalNow, so I'm not sure that this line number is right. -- In Poll, I did notice that the param *interval is not null-checked and is set with the return value of PR_IntervalNow. That is a potential problem, but Poll seems only to be called by DoPollIteration, and interval is declared right before the call, so it shouldn't be null. -- In 2011, there was a change from PRBool to bool; this affects param bool wait, but, again, the passed in var is of type bool. And I don't think this should have made a difference anyway. I can keep looking at it from different angles, but something has been corrupted in the actual code path, or the reporter is having a difficult time understanding the trace. Not sure how to proceed with that. I'll also poke around in nspr4.dll and xul.dll tomorrow (Friday) to check what I can. As per comment 6 and 15, I'm still wondering about winmm.dll. But then why don't we see more crashes affected by winmm.dll? And why only this stack trace? Why is PR_IntervalNow not crashing all over? Please comment if you see something here that I don't.
![]() |
||
Comment 19•10 years ago
|
||
(In reply to Steve Workman [:sworkman] from comment #15) > Or how likely is it that the crash reporter is reporting wrong > because of an optimization? That surely can happen, bsmedberg and others might know more there. I heard that loading a minidump with the MSVC debugger can lead to better stacks in such cases as it has more data available to walk the stack correctly.
Comment 20•10 years ago
|
||
I can't believe that nobody has looked at the minidump in a debugger; guessing based on the crash-stats dump is not a great use of time for something this critical: > nspr4.dll!PR_EnumerateAddrInfo(iterPtr=0x00000000, base=0x01525eb0, port=0x01bb, result=0x08af88b8) Line 2117 C xul.dll!nsDNSRecord::GetNextAddr(port=0x01bb, addr=0x08af88b8) Line 150 C++ xul.dll!nsSocketTransport::OnSocketEvent(type=0x00000000, status=0x00000000, param=0x05b9b6c0) Line 1490 C++ nspr4.dll!PR_ExitMonitor(mon=) Line 134 C xul.dll!nsThread::ProcessNextEvent(mayWait=true, result=0x045ef927) Line 662 C++ xul.dll!NS_ProcessNextEvent_P(thread=0x0101de80, mayWait=true) Line 245 C++ xul.dll!nsSocketTransportService::Run() Line 654 C++ xul.dll!nsThread::ProcessNextEvent(mayWait=true, result=0x045ef9c4) Line 662 C++ xul.dll!nsThread::ThreadFunc(arg=0x0201de01) Line 289 C++ nspr4.dll!_PR_NativeRunThread(arg=0x0200f640) Line 448 C The disassembly is: 2109: PR_IMPLEMENT(void *) PR_EnumerateAddrInfo(void *iterPtr, 2110: const PRAddrInfo *base, 2111: PRUint16 port, 2112: PRNetAddr *result) 2113: { 6D1487C0 push ebp 6D1487C1 mov ebp,esp 6D1487C3 and esp,0FFFFFFF8h 6D1487C6 push ecx 2114: #if defined(_PR_HAVE_GETADDRINFO) 2115: PRADDRINFO *ai; 2116: #if defined(_PR_INET6_PROBE) 2117: if (!_pr_ipv6_is_present()) { 6D1487C7 cmp dword ptr ds:[6D02858Ch],0 <-- CRASH HERE reading 0x6D02858C 6D1487CE push ebx 6D1487CF push esi 6D1487D0 push edi 6D1487D1 je 6D14EAF0 6D1487D7 cmp dword ptr [__type_info_root_node+0BCh (6D167A88h)],0 6D1487DE je 6D14EAFA _pr_ipv6_is_present and PR_CallOnce is inlined at 0x6D14EAF0 in this function: 2117: if (!_pr_ipv6_is_present()) { 6D14EAF0 call _PR_InitStuff (6D143F40h) 6D14EAF5 jmp PR_EnumerateAddrInfo+17h (6D1487D7h) 6D14EAFA mov eax,1 6D14EAFF mov ecx,65817A8Ch 6D14EB04 xchg eax,dword ptr [ecx] 6D14EB06 test eax,eax 6D14EB08 jne 6D14EB46 6D14EB0A call _pr_init_ipv6 (6D145920h) So the initial "cmp" should be checking _pr_initialized from here: http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/misc/prinit.c#771 and then the second cmp is checking once->initialized two lines below According to MSVC, &_pr_initialized &_pr_initialized 0x6d16858c __pr_initialized int * So it kinda looks to me like the compiler got the wrong address for _pr_initialized in the code.
Comment 21•10 years ago
|
||
Trying this is nightly shows that the matching code in nightly is loading/checking the correct address of __pr_initialized
Comment 22•10 years ago
|
||
According to dumpbin, the initial disassembly of PR_EnumerateAddrInfo is: _PR_EnumerateAddrInfo: 100087C0: 55 push ebp 100087C1: 8B EC mov ebp,esp 100087C3: 83 E4 F8 and esp,0FFFFFFF8h 100087C6: 51 push ecx 100087C7: 83 3D 8C 85 02 10 cmp dword ptr [__pr_initialized],0 00 So when relocated into this minidump, I believe the address of __pr_initialized should have been 0x1002858c - 0x10001000 (original base address) + 0x6d140000 (new base address) == 0x6d16858c which is what MSVC said. So ISTM that either: * the memory got corrupted * the relocation process produced the wrong result Either way I don't think this is a problem with PGO or code generation.
Comment 23•10 years ago
|
||
OK, I have a theory. If you look at the addresses for the crashes, they *all* end in 02858c, and the first byte is either 00, 10, 5c, 5e, 60, 63, 65, 66, 67, 68, 69, 6a, 6b, 6c, 6d, 6e, 6f, etc. Also, the absolute addresses later in the function are correct. therefore it's extremely likely that we're dealing with a memory corruption bug here, not a compiler/linker bug. In other words, something is corrupting the first byte of this address, leaving the rest untouched. (And this makes this bug so much harder to figure out :( )
Comment 24•10 years ago
|
||
Correlation report: PR_EnumerateAddrInfo | PR_GetHostByAddr | PR_ExitMonitor | nspr4.dll@0x26cf|EXCEPTION_ACCESS_VIOLATION_READ (502 crashes) 89% (449/502) vs. 34% (22863/66763) wship6.dll 88% (444/502) vs. 35% (23402/66763) WSHTCPIP.DLL 88% (444/502) vs. 36% (24074/66763) Wldap32.dll 83% (419/502) vs. 33% (22038/66763) NapiNSP.dll 82% (413/502) vs. 33% (22026/66763) pnrpnsp.dll 81% (409/502) vs. 33% (21952/66763) nlaapi.dll 71% (355/502) vs. 28% (18675/66763) FWPUCLNT.DLL 71% (355/502) vs. 28% (18784/66763) RpcRtRemote.dll 100% (502/502) vs. 58% (38474/66763) rasadhlp.dll 71% (355/502) vs. 29% (19184/66763) cryptsp.dll 75% (375/502) vs. 33% (21900/66763) DWrite.dll 100% (502/502) vs. 59% (39116/66763) browsercomps.dll 100% (502/502) vs. 59% (39552/66763) softokn3.dll 100% (502/502) vs. 59% (39690/66763) firefox.exe 100% (502/502) vs. 59% (39712/66763) xpcom.dll 100% (502/502) vs. 60% (40065/66763) dbghelp.dll 94% (473/502) vs. 55% (36545/66763) nssckbi.dll 94% (473/502) vs. 55% (36711/66763) freebl3.dll 94% (473/502) vs. 55% (36716/66763) nssdbm3.dll 95% (477/502) vs. 56% (37532/66763) feclient.dll 95% (475/502) vs. 56% (37624/66763) winrnr.dll 90% (454/502) vs. 54% (36262/66763) rsaenh.dll 101% (507/502) vs. 65% (43414/66763) mswsock.dll 78% (390/502) vs. 42% (27911/66763) t2embed.dll 100% (502/502) vs. 64% (42988/66763) dnsapi.dll 80% (401/502) vs. 45% (29889/66763) ntmarta.dll 100% (502/502) vs. 66% (43784/66763) wintrust.dll 99% (499/502) vs. 69% (46376/66763) urlmon.dll 55% (278/502) vs. 27% (17821/66763) explorerframe.dll 55% (278/502) vs. 27% (17839/66763) dui70.dll 56% (280/502) vs. 27% (18283/66763) duser.dll 88% (444/502) vs. 61% (41004/66763) propsys.dll 95% (478/502) vs. 70% (46401/66763) iertutil.dll 100% (501/502) vs. 76% (50827/66763) wininet.dll 88% (444/502) vs. 66% (44321/66763) powrprof.dll 88% (444/502) vs. 67% (44652/66763) winnsi.dll 88% (444/502) vs. 67% (44652/66763) nsi.dll 88% (444/502) vs. 67% (44658/66763) IPHLPAPI.DLL 88% (444/502) vs. 67% (44662/66763) dwmapi.dll 83% (418/502) vs. 65% (43554/66763) MMDevAPI.dll 83% (415/502) vs. 65% (43376/66763) AudioSes.dll 80% (401/502) vs. 63% (41833/66763) normaliz.dll 88% (444/502) vs. 73% (48423/66763) lpk.dll 71% (355/502) vs. 56% (37240/66763) devobj.dll 71% (355/502) vs. 56% (37244/66763) sechost.dll 71% (355/502) vs. 56% (37244/66763) CRYPTBASE.dll 71% (355/502) vs. 56% (37244/66763) KERNELBASE.dll 69% (346/502) vs. 54% (36320/66763) profapi.dll 71% (355/502) vs. 56% (37590/66763) cfgmgr32.dll 90% (454/502) vs. 78% (51742/66763) msctf.dll 28% (143/502) vs. 16% (10414/66763) mdnsNSP.dll 29% (147/502) vs. 16% (11010/66763) WLIDNSP.DLL 76% (380/502) vs. 64% (42994/66763) psapi.dll 17% (85/502) vs. 6% (3973/66763) ntdsapi.dll 20% (101/502) vs. 10% (6875/66763) wshbth.dll 22% (112/502) vs. 15% (9953/66763) d3d10.dll 22% (112/502) vs. 15% (9953/66763) d3d10core.dll 23% (117/502) vs. 16% (10754/66763) dxgi.dll 23% (116/502) vs. 16% (10683/66763) d3d10_1core.dll 23% (116/502) vs. 16% (10683/66763) d3d10_1.dll 22% (111/502) vs. 15% (10036/66763) d2d1.dll 9% (45/502) vs. 4% (2414/66763) wkscli.dll 16% (80/502) vs. 11% (7245/66763) AudioEng.dll Has anybody in QA attempted to reproduce this on ipv6 machines? That seems to have high relevancy to the crash happening.
Reporter | ||
Comment 25•10 years ago
|
||
Adding QA wanted and seeing about ipv6 machines.
Keywords: qawanted
Reporter | ||
Comment 26•10 years ago
|
||
The other thing that Juan and I talked about a few minutes ago was whether anything from the July 10th Patch Tuesday might have tickled something here. He mentioned there was comments relating to people rolling back their machines/brownouts/ - he can clarify further since I think those comments were in Spanish.
Comment 27•10 years ago
|
||
(In reply to Marcia Knous [:marcia] from comment #25) > Adding QA wanted and seeing about ipv6 machines. I tried this on a Win 7 VM, fully updated - no crash. Network Adapter and test-ipv6.org both say that IPv6 is enabled. Maybe QA will have different luck.
Comment 28•10 years ago
|
||
We had a user report this at https://support.mozilla.org/en-US/questions/932781. It ended up being bp-7ac3330b-2a28-4c3c-8599-696122120720. If needed i can reach out to the user to gather needed information.
Reporter | ||
Comment 29•10 years ago
|
||
Tyler: If you could please get some additional information from the user, that would be great. Would be interested in if they applied any MS patches from the most recent Patch Tuesday. Also I notice they have Ad Aware but wondering which version as on the site there are three. (In reply to Tyler Downer [:Tyler] from comment #28) > We had a user report this at > https://support.mozilla.org/en-US/questions/932781. It ended up being > bp-7ac3330b-2a28-4c3c-8599-696122120720. If needed i can reach out to the > user to gather needed information.
Comment 30•10 years ago
|
||
(In reply to Tyler Downer [:Tyler] from comment #28) > We had a user report this at > https://support.mozilla.org/en-US/questions/932781. It ended up being > bp-7ac3330b-2a28-4c3c-8599-696122120720. If needed i can reach out to the > user to gather needed information. This looks like an instance of this crash to me...
Comment 31•10 years ago
|
||
cor-el asked what may be a telling question on https://support.mozilla.org/en-US/questions/932781?page=2. "What are the connection settings? Tools > Options > Advanced : Network : Connection > Settings https://support.mozilla.org/kb/Options+window+-+Advanced+panel Does it help if you disable IPv6? http://kb.mozillazine.org/Error_loading_websites#IPv6"
Comment 32•10 years ago
|
||
I have reached out to the User, and am waiting to hear back a reply to my question for the info Marcia asked above, as well as cor-els comment. As soon as I receive those i will update here.
Comment 33•10 years ago
|
||
(In reply to Alex Keybl [:akeybl] from comment #31) > > Does it help if you disable IPv6? > Tried this in my VM; still not reproducible. I doubt this should make a difference thought - the function in question is looking at OS capability for NSPR rather than a configuration pref in Firefox. Nonetheless, playing with IPv6 config is worthwhile. I did a little bit of that when trying to reproduce in my Win 7 VM: specifically, I tried different combinations of the Windows 7 network adapter settings, IPv4 only, v6 only and v4 & v6 - still not reproducible. I'll search for other Windows IPv6 settings - maybe something in the registry? Re the memory address of the var being wrong, beyond reproducing the bug and hooking up a debugger, I'm not sure how else to determine what is causing it.
Comment 34•10 years ago
|
||
This signature is dropping down the crash charts day-to-day (it's a permanent startup crasher of course, so that makes sense). We've now had close to 50million ADU at our peak late last week, but we've only had (2593 throttled crashes) * (10x throttling factor) / (~3 launches per user at the least). That's about 8k users. Once we fully unthrottle, we can expect no more than another ~10k users lost (sad I know). Given the fact that this is such an amorphous issue, possibly malware related, and we don't have any actionable leads at the moment, let's deprioritize this to an ongoing investigation as opposed to a chemspill driver (15+). We should be looking into hardening around this and bug 718389 for the next release. That being said, if anybody has ideas of what we could ask affected users to run (something with full crash dumps enabled?) instead of asking them for access to their computer, please do share. Also, any methods of comparing the builds affected by bug 718389 and this bug would be greatly appreciated. Steve - feel free to reassign later this week when you're going out of town.
Assignee: nobody → sworkman
status-firefox14:
--- → wontfix
Comment 35•10 years ago
|
||
What we're seeing here is a byte in the code page getting modified after the binary corresponding to it has been loaded by the OS loader. This is very worrying, as it means that some code in our address space is modifying the page's protection bits, and overwrite a byte in our native code. This may result in any number of weird cases (crahses or worse) if this turns out to happen systematically on different addresses... I think determining why this is happening is potentially more important that the percentages of users affected by this crash.
Comment 36•10 years ago
|
||
(In reply to Ehsan Akhgari [:ehsan] from comment #35) So, this is outside my range of expertise in the DNS code - It definitely seems like there's a serious issue here - is there anything to be done to harden the build or code against this? I ask because it seems unlikely that we'll get debugger access to an affected machine, and I haven't been able to reproduce it for debugging internally. Or what other options are there?
Comment 37•10 years ago
|
||
(In reply to Steve Workman [:sworkman] from comment #36) > (In reply to Ehsan Akhgari [:ehsan] from comment #35) > So, this is outside my range of expertise in the DNS code - It definitely > seems like there's a serious issue here - is there anything to be done to > harden the build or code against this? I ask because it seems unlikely that > we'll get debugger access to an affected machine, and I haven't been able to > reproduce it for debugging internally. Or what other options are there? Just to be clear, there is probably nothing wrong with our DNS code. See comment 20 through comment 23. The generated code in nspr4.dll attempts to access the correct address. However, the code that gets loaded from nspr4.dll has one of its bytes modified, which causes the CPU to try to read from an invalid address, which is what triggers the crash. What we need to find out here is what is modifying our code pages. I don't believe we can protect against this kind of stuff easily since we have no way of checking whether a page in our code has been modified before starting to execute it.
Comment 38•10 years ago
|
||
(In reply to Ehsan Akhgari [:ehsan] from comment #37) > Just to be clear, there is probably nothing wrong with our DNS code. See > comment 20 through comment 23. Yup, that's what I understood - if there's a way to harden it or something else though ... but you think not. Or at least not easily. > The generated code in nspr4.dll attempts to > access the correct address. However, the code that gets loaded from > nspr4.dll has one of its bytes modified, which causes the CPU to try to read > from an invalid address, which is what triggers the crash. What we need to > find out here is what is modifying our code pages. I don't believe we can > protect against this kind of stuff easily since we have no way of checking > whether a page in our code has been modified before starting to execute it. :( Yeah, it was a long shot to ask.
Reporter | ||
Updated•10 years ago
|
QA Contact: mozillamarcia.knous
Reporter | ||
Comment 39•10 years ago
|
||
Juan and I have installed various versions of Ad Aware from the http://lavasoft.com/ site. So far I have not generated any crashes. http://lavasoft.com/products/ad_aware_pro.php has both safe networking and safe browsing options so that is the one we are targeting ATM.
Comment 40•10 years ago
|
||
(In reply to comment #38) > (In reply to Ehsan Akhgari [:ehsan] from comment #37) > > Just to be clear, there is probably nothing wrong with our DNS code. See > > comment 20 through comment 23. > > Yup, that's what I understood - if there's a way to harden it or something else > though ... but you think not. Or at least not easily. Not without some black arts, and serious performance penalties. :(
Comment 41•10 years ago
|
||
Re-assigning to you Tyler, since I'm going on vacation for two weeks and it looks like we're still exploring the possibility of getting the user's machine - it seems like you're a good point person for the time being. I suggest you coordinate with bsmedberg or alex regarding re-assignment if/when you get the user's machine. Thanks!
Assignee: sworkman → tdowner
Updated•10 years ago
|
Blocks: buildID-specific
Comment 42•10 years ago
|
||
User has not gotten back to me after attempting to reach out to them through several channels. I've also looked for other users affected by this crash, and haven't found any (or at least none have gotten back to me with crash ID's so I can see if it is this crash). Unassigning.
Assignee: tdowner → nobody
Comment 43•10 years ago
|
||
I received a response from one user who was experiencing this problem, and he/she was running four or five tabs with social networking sites, hotmail, his school's site (duoc.cl), and that he had the following antivirus programs installed: eset online scanner, ccleaner, malwarebytes The problem happened while upgrading Firefox. The user removed Firefox from his machine and could not send us crash ids for this.
Updated•10 years ago
|
Component: General → Networking
Product: Firefox → Core
Comment 44•10 years ago
|
||
(In reply to juan becerra [:juanb] from comment #43) > I received a response from one user who was experiencing this problem, and > he/she was running four or five tabs with social networking sites, hotmail, > his school's site (duoc.cl), and that he had the following antivirus > programs installed: eset online scanner, ccleaner, malwarebytes Can we try re-testing with these sites and apps installed on a few Windows machines? Should be fairly quick to test, since post-install Firefox should just not start if we're able to repro.
Comment 45•10 years ago
|
||
I've been testing on a couple of machines, one real, virtual, with the software mentioned in comment #43 doing all sorts of common user actions including application updates, but so far I have not encountered any problems.
Comment 46•10 years ago
|
||
Another user has offered to help and he provided us with a system profile. A quick look shows a few modules associated with Firefox that look suspicious, Google Desktop, RealPlayer Browser Record Plugin, AOL 9.1. Other software installed in that machine that looks suspicious: SkyCaddie, Babylon 1.5.3.17, Prevx Computer Security Investigator, Skype 5.0.0.156 We can try installing all of this and see if we are able to reproduce the problem, but I will hold off on uploading the system profile information (xml file) until we hear back from the user if this is ok.
Reporter | ||
Comment 47•10 years ago
|
||
This moved down a bit in rank in 14.0.1 to #40 in the last week, but it would still be good to figure out what is going on here.
Comment 48•10 years ago
|
||
(In reply to comment #46) > Another user has offered to help and he provided us with a system profile. A > quick look shows a few modules associated with Firefox that look suspicious, > Google Desktop, RealPlayer Browser Record Plugin, AOL 9.1. > > Other software installed in that machine that looks suspicious: SkyCaddie, > Babylon 1.5.3.17, Prevx Computer Security Investigator, Skype 5.0.0.156 > > We can try installing all of this and see if we are able to reproduce the > problem, but I will hold off on uploading the system profile information (xml > file) until we hear back from the user if this is ok. FWIW I installed RealPlayer and couldn't reproduce. I could not get download links for the other two.
Comment 49•10 years ago
|
||
There aren't any crashes for versions other than 14.0.1 for this, so I'm untracking it for 15.
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Comment 50•4 years ago
|
||
some websites : https://cysec-co.com http://www.repubblica.it/
You need to log in
before you can comment on or make changes to this bug.
Description
•