Closed Bug 1394483 Opened 7 years ago Closed 7 years ago

FireFox/Win7 misaligned access in xul.dll with high cpu usage

Categories

(Core :: General, defect)

55 Branch
defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: mynova, Unassigned)

Details

Attachments

(4 files)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0 Build ID: 20170814072924 Steps to reproduce: This issue has been like that for years on my Windows7 systems, finally decided to file a bz to see if anyone can help. Since searching online found very similar reports: http://forums.mozillazine.org/viewtopic.php?f=38&t=2580941 without followup all the way through.. I've tried the refresh Firefox, and that did not help, after a few hours or day of usage, with maybe 3 windows and 60 tabs, or sometimes less, I find FF in 100% cpu loop, on a 4 cpu system, that appears as 25% cpu. Actual results: FF locks up on 1 cpu and there are 2 threads one called: Firefox.exe+0x62ed and another xul.dll+0x680db6. the firefox thread has this stack: 0, ntoskrnl.exe!_misaligned_access+0x1833 1, ntoskrnl.exe!KeSetEvent+0x3f2 2, dxgmms1.sys!VidMmInterface+0x292b0 3, dxgmms1.sys!VidMmInterface+0x2bfe9 4, dxgkrnl.sys!TdrResetFromTimeout+0x139f0 5, dxgkrnl.sys!TdrResetFromTimeout+0x129ff 6, win32k.sys!EngUnlockDirectDrawSurface+0x2c62 7, ntoskrnl.exe!KeSynchronizeExecution+0x3a23 8, wow64win.dll+0x416aa 9, wow64win.dll+0x27c9d 10, wow64.dll!Wow64SystemServiceEx+0xd7 11, wow64cpu.dll!TurboDispatchJumpAddressEnd+0x2d 12, wow64.dll!Wow64SystemServiceEx+0x1ce 13, wow64.dll!Wow64KiUserCallbackDispatcher+0x204 14, wow64win.dll+0x32caa 15, ntdll.dll!KiUserCallbackDispatcher+0x1f 16, wow64win.dll+0x4019a 17, wow64win.dll+0x32877 18, wow64win.dll+0x1c137 19, wow64.dll!Wow64SystemServiceEx+0xd7 20, wow64cpu.dll!TurboDispatchJumpAddressEnd+0x2d 21, wow64.dll!Wow64SystemServiceEx+0x1ce 22, wow64.dll!Wow64LdrpInitialize+0x42a 23, ntdll.dll!RtlUniform+0x6e6 24, ntdll.dll!EtwEventSetInformation+0x1da09 25, ntdll.dll!LdrInitializeThunk+0xe 26, gdi32.dll!D3DKMTRender+0x15 (No unwind info) 27, igd10iumd32.dll!OpenAdapter10_2+0x4d18f (No unwind info) 28, igd10iumd32.dll!OpenAdapter10_2+0x4d5e4 (No unwind info) 29, igd10iumd32.dll!OpenAdapter10_2+0x131074 (No unwind info) 30, d3d11.dll!D3D11CreateDeviceAndSwapChain+0x2e910 (No unwind info) 31, d3d11.dll!D3D11CreateDeviceAndSwapChain+0x55538 (No unwind info) 32, xul.dll!PrintJSStack+0x13ed35 (No unwind info) 33, xul.dll!PrintJSStack+0x13ea56 (No unwind info) 34, xul.dll!PrintJSStack+0x13d8bf (No unwind info) 35, xul.dll!workerlz4_decompress+0x8a3a7 (No unwind info) 36, xul.dll!workerlz4_decompress+0x239c9c (No unwind info) 37, xul.dll+0x3618b2 (No unwind info) 38, xul.dll+0x3610b5 (No unwind info) 39, xul.dll+0x361000 (No unwind info) 40, xul.dll+0x360fdd (No unwind info) 41, user32.dll!gapfnScSendMessage+0x332 (No unwind info) 42, user32.dll!GetThreadDesktop+0xd7 (No unwind info) 43, user32.dll!GetThreadDesktop+0x185 (No unwind info) 44, user32.dll!GetThreadDesktop+0x1e1 (No unwind info) 45, ntdll.dll!KiUserCallbackDispatcher+0x2e (No unwind info) 46, user32.dll!DispatchMessageW+0xf (No unwind info) 47, xul.dll+0x35f406 (No unwind info) 48, xul.dll+0x35f125 (No unwind info) 49, xul.dll+0x1a0e3c (No unwind info) 50, xul.dll+0x30c233 (No unwind info) 51, xul.dll+0x427abf (No unwind info) 52, xul.dll+0x427a8e (No unwind info) 53, xul.dll+0x50257e (No unwind info) 54, xul.dll+0x50230d (No unwind info) 55, xul.dll+0x5022c2 (No unwind info) 56, xul.dll+0x502044 (No unwind info) 57, xul.dll+0x5d16e6 (No unwind info) 58, xul.dll+0x697f14 (No unwind info) 59, xul.dll+0x697ed6 (No unwind info) 60, firefox.exe+0x1bb2 (No unwind info) 61, firefox.exe+0x6277 (No unwind info) 62, kernel32.dll!BaseThreadInitThunk+0x12 (No unwind info) 63, ntdll.dll!RtlInitializeExceptionChain+0x63 (No unwind info) 64, ntdll.dll!RtlInitializeExceptionChain+0x36 (No unwind info) and the xul.dll has this stack: 0, ntoskrnl.exe!_misaligned_access+0x1833 1, ntoskrnl.exe!KeWaitForMultipleObjects+0xf5d 2, ntoskrnl.exe!KeWaitForMutexObject+0x19f 3, ntoskrnl.exe!_misaligned_access+0xbd4 4, ntoskrnl.exe!_misaligned_access+0x186d 5, ntoskrnl.exe!IoFreeErrorLogEntry+0x287 6, ntdll.dll!RtlAllocateHeap+0x76 7, 0x50bb16b140 8, 0x774be17213223ad0 9, 0x3800000000 10, 0x774be17213223ad0 11, 0x3800000000 12, 0x11457e67c 13, 0x2ce50b18774be40c 14, 0x6a028c00000035 15, 0x230001006a0000 16, 0x1457edbc0008904e 17, 0x6a000000000000 18, 0x226ecf8c8 19, 0x5800000000 20, 0x1457e65000c6156b 21, 0x6a0000604fa9e7 22, 0x5000000000 23, 0x13223ad0bb16b140 24, 0x1457e65c0000000b 25, 0x50604449ea 26, 0x604449831457e680 27, 0x1457e6f41457e6a0 28, 0x1457e70800000000 29, 0x13223ad01457e6e4 30, 0x1457e6d0bb16b140 31, 0x1457e6a06043bb6a 32, 0xbb79cdd01457e6f4 33, 0xbb79cdd0bb546120 34, 0x605180c01457e6e8 35, 0x1457e6b000000000 36, 0x80b0175c1457e738 37, 0x1457e72000000000 38, 0x80b0175c26ecfc88 39, 0x1457e6e800000001 40, 0x1457e7401457e6e4 41, 0xbb546120603d82f4 42, 0x1457e740bb79cdd0 43, 0x603d82f41457e768 44, 0x80b0175c1457e708 45, 0x1f 46, ntdll.dll!RtlAllocateHeap+0x76 (No unwind info) 47, igdusc32.dll!USC::CShaderInstruction::ResetPrecision+0x2b2f97 (No unwind info) 48, igdusc32.dll!USC::CShaderInstruction::ResetPrecision+0x1fcf9a (No unwind info) 49, igdusc32.dll!USC::CShaderInstruction::ResetPrecision+0x1fcf33 (No unwind info) 50, igdusc32.dll!USC::CShaderInstruction::ResetPrecision+0x1f411a (No unwind info) 51, igdusc32.dll!USC::CShaderInstruction::ResetPrecision+0x1908a4 (No unwind info) 52, igdusc32.dll!USC::CShaderInstruction::ResetPrecision+0x19017c (No unwind info) 53, igdusc32.dll!USC::CShaderInstruction::ResetPrecision+0x77508 (No unwind info) 54, igdusc32.dll!USC::CShaderInstruction::ResetPrecision+0x66803 (No unwind info) 55, igdusc32.dll!USC::CShaderInstruction::ResetPrecision+0x6644d (No unwind info) 56, igdusc32.dll!USC::CShaderInstruction::ResetPrecision+0x663c2 (No unwind info) 57, igdusc32.dll!USC::CShaderInstruction::ResetPrecision+0x66140 (No unwind info) 58, igdusc32.dll!USC::CShaderInstruction::ResetPrecision+0x2f83d (No unwind info) 59, igdusc32.dll!USC::CShaderInstruction::SupportsPredicate+0x6951 (No unwind info) 60, igdusc32.dll!USC::CShaderInstruction::SupportsPredicate+0x90c2 (No unwind info) 61, igdusc32.dll!USC::CShaderInstruction::SupportsPredicate+0xa340 (No unwind info) 62, igd10iumd32.dll!OpenAdapter10_2+0x128739 (No unwind info) 63, d3d11.dll!D3D11CreateDeviceAndSwapChain+0x4a7e8 (No unwind info) 64, d3d11.dll!D3D11CreateDeviceAndSwapChain+0x1d2f2 (No unwind info) 65, d3d11.dll!D3D11CreateDeviceAndSwapChain+0x19d02 (No unwind info) 66, d3d11.dll!D3D11CreateDeviceAndSwapChain+0x10859 (No unwind info) 67, d3d11.dll!D3D11CreateDeviceAndSwapChain+0x929 (No unwind info) 68, d3d11.dll!D3D11CreateDeviceAndSwapChain+0x2bac1 (No unwind info) 69, d3d11.dll!D3D11CreateDeviceAndSwapChain+0xa9c (No unwind info) 70, d3d11.dll!D3D11CreateDeviceAndSwapChain+0x3b39e (No unwind info) 71, d3d11.dll!D3D11CreateDeviceAndSwapChain+0x3afde (No unwind info) 72, d3d11.dll!D3D11CreateDeviceAndSwapChain+0x3a8b7 (No unwind info) 73, d3d11.dll!D3D11CreateDeviceAndSwapChain+0x1811f (No unwind info) 74, d3d11.dll!D3D11CreateDeviceAndSwapChain+0x7829 (No unwind info) 75, d3d11.dll!D3D11CreateDeviceAndSwapChain+0x7758 (No unwind info) 76, xul.dll!PrintJSStack+0xcd79d (No unwind info) 77, xul.dll!PrintJSStack+0xcd768 (No unwind info) 78, xul.dll!PrintJSStack+0xcd307 (No unwind info) 79, xul.dll!PrintJSStack+0xcd2db (No unwind info) 80, xul.dll!PrintJSStack+0x13c56a (No unwind info) 81, xul.dll!PrintJSStack+0x143899 (No unwind info) 82, xul.dll+0x23ddfe (No unwind info) 83, xul.dll+0x30d3bb (No unwind info) 84, xul.dll+0x23ce62 (No unwind info) 85, xul.dll+0x683447 (No unwind info) 86, xul.dll+0x427abf (No unwind info) 87, xul.dll+0x427a8e (No unwind info) 88, xul.dll+0x680e8c (No unwind info) 89, xul.dll+0x680dc1 (No unwind info) 90, kernel32.dll!BaseThreadInitThunk+0x12 (No unwind info) 91, mozglue.dll!mozilla::BaseTimeDurationPlatformUtils::TicksFromMilliseconds+0xd3 (No unwind info) 92, ntdll.dll!RtlInitializeExceptionChain+0x63 (No unwind info) 93, ntdll.dll!RtlInitializeExceptionChain+0x36 (No unwind info) Expected results: FF should not lock up on such a high cpu/misasligned access.
Above stacks are with FF 55.0.3 (32bit), running on 64bit Windows 7 system.
Hi Nova, Can you please retest this issue using a new Firefox profile to eliminate custom settings as a possible cause (https://goo.gl/L92rLJ)? If the issue is still reproducible, can you try collecting a performance profile to see if anything stands out there? The link below has directions for how to collect and submit a profile. https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Performance_Problem
Flags: needinfo?(mynova)
Ok, created new profile, this takes a few days to reproduce.. (lost all my user/passwords, as they were stored in that other profile :) will report back when/if it reproduces.
btw, after new profile, added 7 LEGACY addons, without which I might as well switch browser all together :) 1) Adobe Acrobad DC 2) Classic Theme Restorer 3) Google Translator for FF 4) NoScript 5) Right Links 6) Session Manager 7) Tab Mix Plus given that all these will be desupported soon, I doubt they are the cause, but if u see no value, i.e. one must debug with no addons, then let me know, and we can close this bug, as i said.. not possible to use the browser w/o these adds and if the price is to restart every few days, so be it :) Nova
ok, it reproduced, but then went to the performance report url, and it says one must run the nightly developer build?? oh dear.. I suggest you change the reply to state that up front, i.e. that one must run the dev build.. i'll try that now.. each time it formats env and must reinstall all things again...
Thanks for testing! You didn't lost your data, are still in the other profile. In order to return to the old profile you can use Profile Manager. Here are more information about this: https://goo.gl/VTpi82. So you reproduce the issue with all the 7 legacy add-ons installed? I forget to mention that you have to use the Nightly build in order to collect the performance profile, sorry about that. You can download it from here: https://ftp.mozilla.org/pub/firefox/nightly/latest-mozilla-central/ You don't have to install it, you can just extract the zip file. Also you can open it with the created profile where you reproduce the issue. Please let me know if you need more information.
no worries, yes, I need these (now legacy) addons to get things done.. issue reproduce relatively quick. u said no need to install it from which file to extract the exe? found this: https://ftp.mozilla.org/pub/firefox/nightly/latest-mozilla-central/firefox-57.0a1.en-US.win32.installer.exe but i guess u mean some .zip to extract just the binary and replace in the existing tree?
Here is the file: https://ftp.mozilla.org/pub/firefox/nightly/latest-mozilla-central/firefox-57.0a1.en-US.win32.zip After you extract the file, there is a "firefox" folder where you can find the "firefox.exe" file. For opening it with a profile you should follow the steps from the provided link in comment 8.
Renamed firefox.exe to firefox-original.exe, and attempted to put in place the firefox.zip from above zip file, however it does not execute, says: Error: Platform version '55.0.3' is not compatible with minVersion >= 57.0a1 MaxVersion < 57.0a1, so it sounds like to debug i would need to fully install/use version 57.. this is very strange, isn't there some 'tracing' or instrumentation or way to split the binary into threads and figure out what is spinning on cpu, this is a tight loop left untouched it runs for hours or days, and cpu becomes overheated.. on Chrome there is a thread for each tab , so it is relatively easy to spot mis-behaving tabs.. With all these years and efforts, there is no such methods on Firefox? I really want to help, but running development version of firefox, it sounds like all the addons will not work? Sorry for all this noise, I posted what I thought should be more than enough to isolate the cpu loop on the first posting of the bug, and as found others ran into the same issue long ago: http://forums.mozillazine.org/viewtopic.php?f=38&t=2580941, and i've been having it for over a YEAR, until had enough then decided to file this bug.. but it may be more than a normal/avg user can handle. My suggestion is to add an instrumentation/tracing method that user , any user, with any version enables, then runs their test, and uploads the test/traces... Nova
Went back to default firefox.exe binary, and within 2hrs it is now spinning 100% (of 1 cpu), so that is 25% solid. Ran process monitor (sysinternals), and it has Stack Summary, so uploading few screenshot of that. then in that trace it has many attempts to access a volume with serial EACB-A7C3 (which happens to be my Win7 OS disk C:), except it calls it as follows: Date & Time: 09/02/2017 12:16:17 AM Event Class: File System Operation: QueryInformationVolume Result: BUFFER OVERFLOW Path: C:\Program Files (x86)\Mozilla Firefox\firefox.exe TID: 5508 Duration: 0.0000064 VolumeCreationTime: 11/05/2012 1:57:40 AM VolumeSerialNumber: EACB-A7C3 SupportsObjects: True VolumeLabel: WinX This volume label is no longer used for over 2 years, since I've moved to Windows 7 from WinXp. There is a save-Stack for that exact call, and I'll upload that as well. that buffer overflow looks bad i would think, and it's in the cpu loop, repeatedly called upon... another line shows this call failing with similar result: Date & Time: 09/02/2017 12:16:17 AM Event Class: File System Operation: QueryAllInformationFile Result: BUFFER OVERFLOW Path: C:\Program Files (x86)\Mozilla Firefox\firefox.exe TID: 5508 Duration: 0.0000069 CreationTime: 09/01/2017 11:40:06 PM LastAccessTime: 08/28/2017 7:37:01 PM LastWriteTime: 08/28/2017 7:37:01 PM ChangeTime: 09/01/2017 11:46:55 PM FileAttributes: A AllocationSize: 532,480 EndOfFile: 532,432 NumberOfLinks: 1 DeletePending: False Directory: False IndexNumber: 0x6500000003c878 EaSize: 0 Access: Read Attributes, Synchronize Position: 0 Mode: Synchronous IO Non-Alert AlignmentRequirement: Word stack for that is called: Stack-for-Bufferoverflow2.CSV will upload these files in ffcpu.zip see if that helps anything.
Flags: needinfo?(mynova)
Attached file FFCPU.ZIP
output from Stack Summary, as well as proc mon, showing buffer overflow, not sure if that is the issue though.
when wrote: "volume label is no longer used " the volume name CHANGED from WinX to Win7, I'm surprised that FF still uses the OLD name, can't find it anywhere, on dev manager/storage the new name Win7 appears as the LABEL, but the same Volume ID number is there. I have removed/reinstalled FF many times after moving from WinXP to Win7, so not sure how it still has this WinX volume name. Searching registry for WinX or the volume ID yielded no hits.
Current volume info: Volume in drive C is Win7 Volume Serial Number is EACB-A7C3
I've pressed SUSPEND on that firefox.exe in case u guys want to debug this spinner.. (new profile and freshly reset), i suspect it might be the OS disk having been upgraded (3 years ago) from winxp to win7, however, no other app/program is having any issues, only FF spinning on cpu.. and printing the old disk volume label, which i have no clue how it has access to...
From comment 9 I understand that the issue is only reproducible with the legacy add-ons installed, right? So probably there is an add-on that caused this. Can you please disable one by one to see which add-on caused this? Also, did you managed to collect a performance profile as I suggested in comment 4?
Flags: needinfo?(mynova)
1) I've attempted to collect, however, see comment 11, it seems that i must reinstall completely firefox 57... just dropping in the binary did not do it. 2) It seems the firefox framework requires a specific binary to debug, and the current one, that's spinning 100% cannot be diagnosed, even though procmon clearly shows a tight loop as above screenshot.. debugging such issues should be the foundation of any design/implementation, esp. one that supports addons.. More I use Chrome, it seems there a TAB is a PROCESS, and such issues are much easier to handle.. 3) I did not try without the addons, b/c it would be useless funtionality-wise, however, for DEBUG purposes did that just now, started in SAFEMODE, and all addons show as disabled. I'll report back in few day.
in safe-mode, with all addons disabled, it did not reproduce in few days, now enabled 1/2 and rest disabled.
Moving to Core:XUL for more attention.
Component: Untriaged → XUL
Product: Firefox → Core
Component: XUL → General
(In reply to Nova from comment #19) > in safe-mode, with all addons disabled, it did not reproduce in few days, > now enabled 1/2 and rest disabled. How are things performing now?
I'm not a developer, so having a hard time tracking this enable/disabled on 4 PCs. If there was a simple way to debug this I would do it. Either drop in a debug firefox.exe or somehow "create dump file" (Process Hacker has this option), when the issue reproduces. It might be add-on related, in which case I would argue it is useless to chase this in this debugging methods, since all these addons will be obsoleted by newer firefox soon enough anyways. If you do have a simple method to debugging, let me know, I cannot uninstall firefox and reinstall a full nightly build, I'm simple an end user, not a developer and naively thought that the original installation allows for debugging of cpu loops.
Flags: needinfo?(mynova)
(In reply to Nova from comment #22) > I'm not a developer, so having a hard time tracking this enable/disabled on > 4 PCs. If there was a simple way to debug this I would do it. Either drop in > a debug firefox.exe or somehow "create dump file" (Process Hacker has this > option), when the issue reproduces. It might be add-on related, in which > case I would argue it is useless to chase this in this debugging methods, > since all these addons will be obsoleted by newer firefox soon enough > anyways. > > If you do have a simple method to debugging, let me know, I cannot uninstall > firefox and reinstall a full nightly build, I'm simple an end user, not a > developer and naively thought that the original installation allows for > debugging of cpu loops. Thanks for trying to figure it out :) A goal for us at this point was identifying the addon that might be the culprit, so that we could identify the addon author. I don't have a simple debugging to offer, other than enabling addons one at a time until it becomes obvious which addon is contributing or causing the issue. Totally understand if you don't have time to test this out given the timeframe required to get the issue to manifest. I'm going to close this INCOMPLETE. Feel free to re-open if you do find time to test it.
Status: UNCONFIRMED → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
Thanks. Hopefully you come up with some "fencing" framework where by such debugging can be all programatic, and misbehaving addons can simply be immediately flagged, as they consume cpu/IO/memory, then terminated. Left to the users (authors of addons and users of them) this is a neverending debug in my view, since it's not just the addon, it can be a "combination of addons" coupled with the exact pages that are browsed.. in short, possibilities unlimited :)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: