Closed Bug 1130029 Opened 10 years ago Closed 4 years ago

Possible memory corruption with drag&drop to Amazon Cloud Drive

Categories

(Core :: DOM: Copy & Paste and Drag & Drop, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: cjones, Unassigned)

Details

(Keywords: memory-leak, sec-moderate, Whiteboard: The crazy behavior is more concerning than the eventual OOM crashes.)

I'm afraid this bug report isn't too informative, but I've seen the troubling symptom twice now and feel obligated to file. The symptom is firefox misbehaving seemingly randomly: parts of the UI disappearing, pages suddenly misrendering, random failures to execute actions like drag&drop. In neither case did firefox crash; I quickly force-killed it out of concern of memory corruption. An OOM state is possible, but in the first instance of this symptom, |top| didn't show firefox using excessive memory. My setup is * fedora 21 x86-64 machine with 32GB memory and <2 yr old CPU * firefox 35 distro build * ~10 browser windows open with a total of maybe ~100 tabs In the last few days I've started uploading a large number of photos to Amazon Cloud Drive. The bad behavior has happened during two upload sessions. I've never seen the behavior before. My very approximate STR are 0. load Cloud Drive (and other ~10 windows and ~100 tabs) 1. use nautilus to select 100-200MB of images to upload 2. drag&drop images to Cloud Drive 3. go off and browse other sites 4. when the upload chunk completes, goto step (1) There are lots of moving parts (and probably not stringently stress tested) in this process, so a gnome bug is certainly not unlikely. But I've only seen the symptoms manifest in firefox. Again, I wish I could provide more info.
So far no luck reproducing, but I don't have that many images locally (or I have, but Amazon doesn't allow file names like 'Screenshot from 2014-07-22 19:52:55.png').
I saw the apparent memory-corruption symptom again just now, and I've seen something possibly related where picture upload just fails with Amazon claiming that Firefox doesn't support drag&drop (???). This is after successfully uploading a lot of pictures. Only restarting firefox seems to make that symptom go away. This has happened a handful of times, maybe 5. The possible corruption just now happened when I attempted to upload 160 images with total size ~500MB. I looked at memory again and here's what the state was KiB Mem : 32579012 total, 594452 free, 4783448 used, 27201112 buff/cache ... PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 27358 cjones 20 0 3510676 1.866g 110776 S 22.2 6.0 71:59.84 firefox So again, FF isn't using an absurd amount of memory. Unsurprisingly, a huge chunk of system RAM is devoted to buffer cache, but even so there's a good amount of free mem, 600MB. If FF were using a huge amount of memory just *before* this snapshot and then dropped it after OOM pressure, then I would have expected the reclaimed memory to be in the "Free" pool instead of buffercache. So this looks pretty strongly like memory corruption. Next time I see the random failure with the "drag&drop unsupported" symptom, I'll see if I can find an error message in console.
This could use some help from QA.
Just got the "Your browser doesn't support this, use Chrome" error again. Nothing in the console.
If you're worried we've got memory corruption could you try this in an ASAN build? It's possible that will require too much memory to get this far, but it might tease something out. https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-release-linux64-asan/ we also have asan builds on other branches, but that's closest to what you have in comment 0.
I guess I should have added the ni? with the last comment as instructed in your username
Flags: needinfo?(cjones.bugs)
I've finished my initial mega-upload, so I'm not going to have real-world usage patterns like before for the foreseeable future. If there's not much interest at moz in following this up, please feel free to close the bug.
Flags: needinfo?(cjones.bugs)
QA Contact: kjozwiak
I think I ended up reproducing what Chris described in comment #0 when using the latest release build. You'll get to a point where fx will start behaving very strangely. Pages will not scroll correctly, text will randomly get selected when trying to scroll, incorrect titles etc.. I uploaded a .gif that illustrates the possible issues: * http://gfycat.com/ApprehensiveLividBlacklab I also tried this on m-c and got a bunch of crashes. I tried reproducing the crashes on m-r but couldn't get it to crash after about 2 hours of trying. The crashes happen pretty often on m-c: - https://crash-stats.mozilla.com/report/index/878761b2-f755-4344-82eb-0f2b82150416 - https://crash-stats.mozilla.com/report/index/4ad62cf8-5d31-4f52-a9db-53de32150416 - https://crash-stats.mozilla.com/report/index/677cb287-4df7-4e57-b650-d79ff2150416 Like Chris mentioned, this starts happening once you have a bunch of windows/tabs opened and start dropping files into the browser for upload. Everything starts slowing down and at some point things start breaking in the m-r build while on m-c, you'll get constant crashes. Chris, is this the same behavior you noticed when doing your mega upload?
Flags: needinfo?(cjones.bugs)
Build Used: https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-release-linux64-asan/1429112120/ I also reproduced the same behavior I seen in comment # 8 in an asan build and received the following: ASAN:SIGSEGV ================================================================= ==4351==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000310 (pc 0x7fb318a49341 sp 0x7fff2dadcea0 bp 0x7fff2dadcf70 T0) #0 0x7fb318a49340 in nsQueryReferent nsIWeakReferenceUtils.h:39 #1 0x7fb31b770ecc in ActorDestroy TabChild.cpp:1668 #2 0x7fb317a18f3d in DestroySubtree PBrowserChild.cpp:3324 #3 0x7fb317a17351 in Send__delete__ PBrowserChild.cpp:1571 #4 0x7fb31b7947c4 in Run TabChild.cpp:780 #5 0x7fb31705738f in ProcessNextEvent nsThread.cpp:855 #6 0x7fb3170b473a in NS_ProcessNextEvent nsThreadUtils.cpp:265 #7 0x7fb3178cf1d9 in Run MessagePump.cpp:99 #8 0x7fb31787c04c in RunInternal message_loop.cc:233 #9 0x7fb31bc48fd7 in Run nsBaseAppShell.cpp:164 #10 0x7fb31d746722 in XRE_RunAppShell nsEmbedFunctions.cpp:738 #11 0x7fb31787c04c in RunInternal message_loop.cc:233 #12 0x7fb31d745d24 in XRE_InitChildProcess nsEmbedFunctions.cpp:575 #13 0x48a9f1 in content_process_main plugin-container.cpp:211 #14 0x7fb314c83ec4 in __libc_start_main libc-start.c:287 #15 0x489dcc in _start ??:? AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV ??:0 ?? ==4351==ABORTING Quick note, this is really, really hard to reproduce.. You need a lot of opened windows/tabs and need to keep dragging 200+ files for a good 40 minutes before the browser completely bricks. Users could potentially run into this if they have a lot of files and periodically upload them in the same browser session.
Ok, that is e10s-enabled profile then. We have had dnd in e10s just couple of days. (but the crash doesn't look dnd related.)
(In reply to Kamil Jozwiak [:kjozwiak] from comment #9) > ASAN:SIGSEGV > ================================================================= > ==4351==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000310 (pc > 0x7fb318a49341 sp 0x7fff2dadcea0 bp 0x7fff2dadcf70 T0) > #0 0x7fb318a49340 in nsQueryReferent nsIWeakReferenceUtils.h:39 > #1 0x7fb31b770ecc in ActorDestroy TabChild.cpp:1668 > #2 0x7fb317a18f3d in DestroySubtree PBrowserChild.cpp:3324 > #3 0x7fb317a17351 in Send__delete__ PBrowserChild.cpp:1571 > #4 0x7fb31b7947c4 in Run TabChild.cpp:780 > #5 0x7fb31705738f in ProcessNextEvent nsThread.cpp:855 > #6 0x7fb3170b473a in NS_ProcessNextEvent nsThreadUtils.cpp:265 > #7 0x7fb3178cf1d9 in Run MessagePump.cpp:99 > #8 0x7fb31787c04c in RunInternal message_loop.cc:233 > #9 0x7fb31bc48fd7 in Run nsBaseAppShell.cpp:164 > #10 0x7fb31d746722 in XRE_RunAppShell nsEmbedFunctions.cpp:738 > #11 0x7fb31787c04c in RunInternal message_loop.cc:233 > #12 0x7fb31d745d24 in XRE_InitChildProcess nsEmbedFunctions.cpp:575 > #13 0x48a9f1 in content_process_main plugin-container.cpp:211 > #14 0x7fb314c83ec4 in __libc_start_main libc-start.c:287 > #15 0x489dcc in _start ??:? Which changeset is this from? check about:buildconfig (I'd like to know which line TabChild.cpp:1668 is about)
> Which changeset is this from? check about:buildconfig > (I'd like to know which line TabChild.cpp:1668 is about) It doesn't list the changeset when going into about:buildconfig :/ I was using the build I mentioned in comment # 9. Here's the info from about:support: (not sure if that's even helpful) * Version: 37.0.2 * browser.startup.homepage_override.buildID: 20150415083520 Let me try reproducing the a m-c/m-r asan that I've built so I can provide a changeset if I can reproduce. Olli, is there anywhere else I can look for the changeset in that build?
(In reply to Kamil Jozwiak [:kjozwiak] from comment #8) > Chris, is this the same behavior you noticed when doing your mega upload? Yep, that definitely looks and sounds like the same flavor of symptoms.
Olli, here's the one that I received when using m-c asan changeset ec1351f9bc58: (I'll try it with m-r asan next) ASAN:SIGSEGV ================================================================= ==5160==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x00000047c0ce sp 0x7f004ce64d20 bp 0x7f004ce64d30 T2) #0 0x47c0cd in mozalloc_abort mozalloc_abort.cpp:33 #1 0x7f0058980d15 in Abort nsDebugImpl.cpp:475 #2 0x7f00589809dc in NS_DebugBreak nsDebugImpl.cpp:428 #3 0x7f00593c7255 in ~Logger logging.cc:47 #4 0x7f00593e4dc6 in LogWrapper logging.h:59 #5 0x7f00593e4c5c in RandInt rand_util.cc:20 #6 0x7f0059415d6a in GenerateRandomChannelID child_process_info.cc:58 #7 0x7f0059455581 in InitializeChannel GeckoChildProcessHost.cpp:408 #8 0x7f0059454b9b in RunPerformAsyncLaunch GeckoChildProcessHost.cpp:488 #9 0x7f005945bde5 in DispatchToMethod<mozilla::ipc::GeckoChildProcessHost, bool (mozilla::ipc::GeckoChildProcessHost::*)(std::vector<std::basic_string<char>, std::allocator<std::basic_string<char> > >, base::ProcessArchitecture), std::vector<std::basic_string<char>, std::allocator<std::basic_string<char> > >, base::ProcessArchitecture> tuple.h:400 #10 0x7f00593cd5f4 in RunTask message_loop.cc:361 #11 0x7f00593ce6a7 in DoWork message_loop.cc:456 #12 0x7f00593d1d9c in Run message_pump_libevent.cc:328 #13 0x7f00593cb8cc in RunInternal message_loop.cc:233 #14 0x7f00593f5273 in ThreadMain thread.cc:170 #15 0x7f00593f580c in ThreadFunc platform_thread_posix.cc:39 #16 0x7f006a945181 in start_thread pthread_create.c:312 (discriminator 2) #17 0x7f0069a4647c in clone clone.S:111 AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV ??:0 ?? Thread T2 (Gecko_IOThread) created by T0 here: #0 0x424cf6 in __interceptor_pthread_create _asan_rtl_ #1 0x7f00593f4c43 in CreateThread platform_thread_posix.cc:144 #2 0x7f0058b162e0 in NS_InitXPCOM2 XPCOMInit.cpp:544 #3 0x7f006044c37b in Initialize nsAppRunner.cpp:1394 #4 0x7f006044d295 in XRE_main nsAppRunner.cpp:4472 #5 0x47b07a in do_main nsBrowserApp.cpp:294 #6 0x7f006996dec4 in __libc_start_main libc-start.c:287 ==5160==ABORTING
Flags: needinfo?(cjones.bugs)
I got the following with the latest version of asan m-r using changeset b95583c8e7e7 which looks similar to the one I got m-c in comment # 15: [Parent 40458] ###!!! ABORT: file /home/kjozwiak/code/mozilla-release/ipc/chromium/src/base/rand_util_posix.cc, line 19 [Parent 40458] ###!!! ABORT: file /home/kjozwiak/code/mozilla-release/ipc/chromium/src/base/rand_util_posix.cc, line 19 ASAN:SIGSEGV ================================================================= ==40458==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f5b9b278b3e sp 0x7f5b7d32fd20 bp 0x7f5b7d32fd30 T4) #0 0x7f5b9b278b3d in mozalloc_abort mozalloc_abort.cpp:37 #1 0x7f5b8992b1c5 in Abort nsDebugImpl.cpp:469 #2 0x7f5b8992ae8c in NS_DebugBreak nsDebugImpl.cpp:426 #3 0x7f5b8a3441c5 in ~Logger logging.cc:47 #4 0x7f5b8a300026 in LogWrapper logging.h:59 #5 0x7f5b8a351c4c in RandInt rand_util.cc:20 #6 0x7f5b8a376e50 in GenerateRandomChannelID child_process_info.cc:58 #7 0x7f5b8a3a4db1 in InitializeChannel GeckoChildProcessHost.cpp:411 #8 0x7f5b8a3a43cb in RunPerformAsyncLaunch GeckoChildProcessHost.cpp:486 #9 0x7f5b8a3ab5a5 in DispatchToMethod<mozilla::ipc::GeckoChildProcessHost, bool (mozilla::ipc::GeckoChildProcessHost::*)(std::vector<std::basic_string<char>, std::allocator<std::basic_string<char> > >, base::ProcessArchitecture), std::vector<std::basic_string<char>, std::allocator<std::basic_string<char> > >, base::ProcessArchitecture> tuple.h:400 #10 0x7f5b8a34a334 in RunTask message_loop.cc:361 #11 0x7f5b8a34b3e7 in DoWork message_loop.cc:447 #12 0x7f5b8a2faeac in Run message_pump_libevent.cc:328 #13 0x7f5b8a34860c in RunInternal message_loop.cc:233 #14 0x7f5b8a369c69 in ThreadMain thread.cc:170 #15 0x7f5b8a2fc67c in ThreadFunc platform_thread_posix.cc:39 #16 0x7f5b9ae86181 in start_thread pthread_create.c:312 (discriminator 2) #17 0x7f5b99f8747c in clone clone.S:111 AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV ??:0 ?? Thread T4 (Gecko_IOThread) created by T0 here: #0 0x424996 in __interceptor_pthread_create _asan_rtl_ #1 0x7f5b8a2fc47e in CreateThread platform_thread_posix.cc:144 #2 0x7f5b8a3696b8 in StartWithOptions thread.cc:92 #3 0x7f5b89ab03f0 in NS_InitXPCOM2 XPCOMInit.cpp:538 #4 0x7f5b90bf17a4 in Initialize nsAppRunner.cpp:1389 #5 0x7f5b90bf276d in XRE_main nsAppRunner.cpp:4507 #6 0x47ad1a in do_main nsBrowserApp.cpp:292 #7 0x7f5b99eaeec4 in __libc_start_main libc-start.c:287 ==40458==ABORTING
Keywords: mlk, sec-moderate
Whiteboard: The crazy behavior is more concerning than the eventual OOM crashes.
Group: core-security → dom-core-security
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INCOMPLETE
Group: dom-core-security
You need to log in before you can comment on or make changes to this bug.