Closed Bug 1205110 Opened 10 years ago Closed 5 years ago

frequent crashes when visiting linkedin.com profiles, often with high purple buffer value

Categories

(Core :: General, defect)

40 Branch
x86
All
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Performance Impact none
Tracking Status
platform-rel --- +

People

(Reporter: bugzilla.mozilla.org, Unassigned)

References

Details

(Keywords: crash, perf, Whiteboard: [platform-rel-Linkedin])

Crash Data

Attachments

(10 files, 3 obsolete files)

70.83 KB, image/png
Details
85.37 KB, image/png
Details
87.31 KB, image/png
Details
115.05 KB, image/png
Details
102.80 KB, image/png
Details
230.41 KB, image/png
Details
44.83 KB, text/plain
Details
183.79 KB, application/x-gzip
Details
496.19 KB, application/x-gzip
Details
676.80 KB, application/x-gzip
Details
For quite a while I've been getting reproducible crashes when visiting profile pages on linkedin.com. Going by https://community.linkedin.com/questions/2148/mozilla-firefox-not-loading-linkedin-pages.html others seem to be experiencing similar issues. What happens is that upon visiting one of the triggering pages, FF will become unresponsive, grey out, consume 100% of CPU and have a quick increase in RAM usage until it crashes. Example crashes bp-40972f72-d0a9-4347-9ac3-504672150915 bp-890c81e7-827c-46eb-9540-9b4152150915 possibly related bugs bug 1165934 bug 1140519 bug 1098484
I've deleted cookies to no avail.
OS: Unspecified → Linux
Hardware: Unspecified → x86
I'm running Ubuntu Trusty.
This is reproducible in safe-mode.
See Also: → 1165934
Severity: normal → critical
Crash Signature: [@ OOM | small ]
Keywords: crash
Are you still seeing this? If so, is there any chance that you can do some process sampling or checks with the gecko profiler ( https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Performance_Problem ) to get a better idea of what's hanging here?
Component: General → Untriaged
Flags: needinfo?(bugzilla.mozilla.org)
I am sorry about the delay in responding. The problem is still the same. I will have to read up on the URL given to give the additional requested information.
Flags: needinfo?(bugzilla.mozilla.org)
Attached image Screenshot.png
Hi, The issue is pretty hard to reproduce, but it's still reproducible on the latest Firefox release (45.0.2, Build ID 20160407164938). I can only reproduce if opening Firefox and going to LinkedIn.com is the first thing I do once Linux x32 is up and running. Opening 5-6 tabs of people's profiles results in a browser hang, unresponsive slow script dialog and eventually crash (bp-8354bf87-5323-476f-8e47-2ec202160415). I've tried to get a gecko profile, but it has proved impossible due to the immediate hang. As the browser hangs the CPU usage is all over the place, see the attached screenshot. The issue in only reproducible on Linux x32. And it is no longer reproducible if you have worked on the machine before attempting to reproduce this issue. Thanks, Cipri
(In reply to Rolf Leggewie from comment #6) > Apparently, this is the #1 crasher for FF44 with a number of open bug > tickets for it. Unfortunately, OOM|small can happen for a lot of different reasons (you'll see that basically any time Firefox fails to allocate a small amount of memory due to lack of continuous address space), so that by itself doesn't say much. What matters more for diagnosing what's wrong here is what's below that (i.e. where we were trying to allocate memory when we failed to do so). Unfortunately, that's also broken for some reason (we should be seeing more useful call stack information than just libxul.so at random addresses)! I don't understand why we're not getting usable crash stacks in the submitted crash reports. I know at least the people on my team have been explicitly using official Mozilla binaries rather than Ubuntu's while trying to reproduce. Ted, do you have any thoughts for what might be going on?
Flags: needinfo?(ted)
Are these Ubuntu binaries? The build ID there doesn't match the build id from our official 40.0.3 Linux x86 build: http://ftp.mozilla.org/pub/firefox/candidates/40.0.3-candidates/build1/linux-i686/en-US/firefox-40.0.3.txt If Ubuntu builds are missing symbols we need Chris Coulson to help.
Flags: needinfo?(ted)
OK, there's another issue there (from the modules tab): Ø libxul.so 000000000000000000000000000000000 libxul.so This is probably due to us trying to `mmap` libxul and failing due to OOM (or address space exhaustion or fragmentation or whatever). Breakpad generates the debug identifiers there by `mmap`ing each file and reading some data out of it: https://chromium.googlesource.com/breakpad/breakpad/+/master/src/client/linux/minidump_writer/linux_dumper.cc#148 On Windows we VirtualAlloc some extra memory and free it in the pre-minidump-writing callback to work around this problem: https://dxr.mozilla.org/mozilla-central/rev/21bf1af375c1fa8565ae3bb2e89bd1a0809363d4/toolkit/crashreporter/nsExceptionHandler.cpp#396 We could do the same thing on Linux (although it's probably only necessary on Linux/x86, since x86-64 has lots of address space). Alternately we could try to fix Breakpad to not `mmap` the entire file.
Flags: needinfo?(bugzilla.mozilla.org)
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #11) > ... > We could do the same thing on Linux (although it's probably only necessary > on Linux/x86, since x86-64 has lots of address space). Alternately we could > try to fix Breakpad to not `mmap` the entire file. followup bug?
Flags: needinfo?(ted)
StartingLinkedIn
MemoryEatenUp, but swap hardly touched
System Monitor
All memory used now
After killing Firefox CPU is freed and memory released.
Attachment #8764651 - Attachment is obsolete: true
Ubuntu System monitor
In my case Firefox 45.0 on Ubuntu 16.04 LTS 64 bit LinkedIn "People you may know" gobbles up all memory. I have attached above screenshots to illustrate the issue. Will run the Gekko profiler and revert.
Component: Untriaged → DOM: CSS Object Model
Product: Firefox → Core
For an out-of-memory crash, the stack is uninteresting. What is needed is knowledge of what memory usage is increasing. Can you keep an eye on about:memory and see what memory usage is going up?
Component: DOM: CSS Object Model → General
Flags: needinfo?(ciprian.muresan)
Keywords: qawanted
Attached file linkedin.memory2.txt
I've attached the logs I got through about:memory, but honestly I'm not entirely sure of how to interpret it. Could you please help me understand this?
Flags: needinfo?(ciprian.muresan) → needinfo?(dbaron)
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #13) > (In reply to Ted Mielczarek [:ted.mielczarek] from comment #11) > > ... > > We could do the same thing on Linux (although it's probably only necessary > > on Linux/x86, since x86-64 has lots of address space). Alternately we could > > try to fix Breakpad to not `mmap` the entire file. > > followup bug? Feel free to file one.
Flags: needinfo?(ted)
This: │ ├──289.36 MB (56.11%) ── purple-buffer seems the most suspicious. mccr8, could you take a look, or if not, bounce to somebody else?
Flags: needinfo?(dbaron) → needinfo?(continuation)
Maybe Olli could take a look on Tuesday. If not, I should be able to on Wednesday or Thursday.
I have seen a large-ish purple buffer a few times, though I can't reproduce the rest of it. I'll try to figure out what is in the purple buffer.
Assignee: nobody → continuation
Well, I can't reproduce the issue here, and I am not seeing a large purple buffer, so I am not going to be able to investigate this.
Assignee: continuation → nobody
Flags: needinfo?(continuation)
platform-rel: --- → ?
Whiteboard: [platform-rel-Linkedin]
platform-rel: ? → +
Florin, is it possible to get some help with QA on LinkedIn profiling to see if this is still happening in case there have been design changes on their end since this was filed?
Flags: needinfo?(florin.mezei)
I'm sorry for not getting back sooner to this ticket. I continued to experience this problem reproducibly until I switched my i386 system to an amd64 kernel and amd64 firefox. It seems memory addressing is more capable with 64bit such as to make it harder to reproduce there.
Flags: needinfo?(bugzilla.mozilla.org)
Ciprian, could you please take a look and see if this is still reproducible on latest Firefox?
Flags: needinfo?(florin.mezei) → needinfo?(ciprian.muresan)
Attached file memory-report-linkedin.json.gz (obsolete) —
The issue is still reproducible on the latest Firefox release (51.0.1) and on the latest Nightly (54.0a1, Build ID 20170214110212). bp-678566e5-5847-459c-a84c-985ab2170215 - Release crash bp-e8d90d36-78a3-43a1-ad10-128762170215 - Nightly crash Attached about:memory logs from when the issue started to appear.
Flags: needinfo?(ciprian.muresan)
Attached file memory-report-linkedin2.json.gz (obsolete) —
Attached about:memory logs after I left the issue manifest itself a bit.
Is that the memory log from only the parent process? Can you attach one from the content process, please?
Flags: needinfo?(ciprian.muresan)
(In reply to Ciprian Muresan [:cmuresan] from comment #32) > bp-e8d90d36-78a3-43a1-ad10-128762170215 - Nightly crash The stack is around allocating from GetBorderTopWidth.
Apparently, if I wait too much before saving memory logs, it won't save logs for the content process.
Attachment #8837539 - Attachment is obsolete: true
Attachment #8837540 - Attachment is obsolete: true
Flags: needinfo?(ciprian.muresan)
Thanks, Ciprian. Olli, WDYT about the memory report here? It doesn't look like a whole lot of memory used to me.
Flags: needinfo?(bugs)
Nick, do you have any thoughts on how we could capture what's causing the OOM here?
Flags: needinfo?(n.nethercote)
Various things... On Windows we have a mechanism that periodically takes memory reports if we're close to running out of address space, and the most recent memory report gets incorporated into the crash report. But that doesn't run on Linux. So the best way to make progress is to get memory reports from about:memory as close to the point of crashing as possible. The "linkedin.memory2.txt" attachment *does* have the content process. Search for "Web Content (pid 16186)". The only suspicious thing I see in the "linkedin.memory2.txt" attachment is the high purple buffer value, which dbaron mentioned in comment 25. It's possible that was a temporary transient thing. Other than that, the most important numbers are as follows. > Main Process > 340.95 MB ── resident > 1,045.40 MB ── vsize > > Web Content (pid 16186) > 624.46 MB ── resident > 2,216.17 MB ── vsize Those look pretty normal. It's typical on Linux for vsize to be significantly higher than resident. (E.g. in my current Linux session I have 278 & 1206 in the main process and 410 & 1024 in the content process.) I wouldn't expect this to be a problem... unless you're running a 32-bit build, which is not standard on Linux. Rolf said in comment 30 that he was running a 32-bit build, then switched to 64-bit and the problem went away. And Ciprian is also running a 32-bit build (comment 7). In the memory-report-linkedin3.json.gz attachment, the only surprising things I see are these: > Web Content (pid 2561) > > ├────459 (12.96%) -- top(https://www.linkedin.com/, id=6442450949)/active > │ ├──389 (10.99%) -- window(https://www.linkedin.com/)/dom > │ │ ├──378 (10.67%) ── event-listeners > │ │ └───11 (00.31%) ── event-targets > │ ├───37 (01.04%) ++ window(https://ad-emea.doubleclick.net/adi/linkedin.dart/oz-winner;optout=false;lang=en;tile=2;sz=300x250;s=0;v=6;u=cQENd4lKkTBjdD1LnSRvgz9f;mod=50;title=en;func=qa;coid=1963799;ind=4;occ=407;pocc=3;pocc=8687;pocc=3076;cntry=ro;reg=0;sub=0;gdr=m;seg=9005;seg=548;sjt=554;tile_p=2;adsuite=v2.2.6-min;sfadapter=t;ord=3358423959628?li_ads_3p=control)/dom > │ └───33 (00.93%) ++ (9 tiny) > ├────342 (09.66%) -- top(https://www.linkedin.com/in/hanspeschke?authType=NAME_SEARCH&authToken=jgA3&locale=de_DE&srchid=4932510911487150419061&srchindex=1&srchtotal=229198&trk=vsrp_people_res_name&trkInfo=VSRPsearchId%3A4932510911487150419061%2CVSRPtargetId%3A361578890%2CVSRPcmpt%3Aprimary%2CVSRPnm%3Atrue%2CauthType%3ANAME_SEARCH, id=6442450959)/active > │ ├──257 (07.26%) -- window(https://www.linkedin.com/in/hanspeschke?authType=NAME_SEARCH&authToken=jgA3&locale=de_DE&srchid=4932510911487150419061&srchindex=1&srchtotal=229198&trk=vsrp_people_res_name&trkInfo=VSRPsearchId%3A4932510911487150419061%2CVSRPtargetId%3A361578890%2CVSRPcmpt%3Aprimary%2CVSRPnm%3Atrue%2CauthType%3ANAME_SEARCH)/dom > │ │ ├──255 (07.20%) ── event-listeners > │ │ └────2 (00.06%) ── event-targets > │ └───85 (02.40%) ++ (12 tiny) > ├────333 (09.40%) -- top(https://www.linkedin.com/in/hanspeschke?authType=NAME_SEARCH&authToken=jgA3&locale=de_DE&srchid=4932510911487150419061&srchindex=1&srchtotal=229198&trk=vsrp_people_res_name&trkInfo=VSRPsearchId%3A4932510911487150419061%2CVSRPtargetId%3A361578890%2CVSRPcmpt%3Aprimary%2CVSRPnm%3Atrue%2CauthType%3ANAME_SEARCH, id=6442450953)/active > │ ├──257 (07.26%) -- window(https://www.linkedin.com/in/hanspeschke?authType=NAME_SEARCH&authToken=jgA3&locale=de_DE&srchid=4932510911487150419061&srchindex=1&srchtotal=229198&trk=vsrp_people_res_name&trkInfo=VSRPsearchId%3A4932510911487150419061%2CVSRPtargetId%3A361578890%2CVSRPcmpt%3Aprimary%2CVSRPnm%3Atrue%2CauthType%3ANAME_SEARCH)/dom > │ │ ├──255 (07.20%) ── event-listeners > │ │ └────2 (00.06%) ── event-targets > > 2,116 (100.0%) -- observer-service-suspect > ├────522 (24.67%) ── referent(topic=memory-pressure) > ├────205 (09.69%) ── referent(topic=xpcom-shutdown) > ├────159 (07.51%) ── referent(topic=dom-private-storage2-changed) > ├────159 (07.51%) ── referent(topic=dom-storage2-changed) > ├────159 (07.51%) ── referent(topic=network:offline-status-changed) > ├────138 (06.52%) ── referent(topic=service-worker-get-client) > ├────114 (05.39%) ── referent(topic=chrome-flush-skin-caches) > ├────110 (05.20%) ── referent(topic=agent-sheet-added) > ├────110 (05.20%) ── referent(topic=agent-sheet-removed) > ├────110 (05.20%) ── referent(topic=author-sheet-added) > ├────110 (05.20%) ── referent(topic=author-sheet-removed) > ├────110 (05.20%) ── referent(topic=user-sheet-added) > └────110 (05.20%) ── referent(topic=user-sheet-removed) > > Main Process > > 2,163 (100.0%) -- event-counts > ├──2,145 (99.17%) -- window-objects > │ ├──1,695 (78.36%) -- top(chrome://browser/content/browser.xul, id=3)/active > │ │ ├──1,693 (78.27%) -- window(chrome://browser/content/browser.xul)/dom > │ │ │ ├──1,661 (76.79%) ── event-listeners > │ │ │ └─────32 (01.48%) ── event-targets > │ │ └──────2 (00.09%) ── window(about:blank)/dom/event-targets [2] > │ ├────200 (09.25%) -- top(about:memory, id=106)/active > │ │ ├──184 (08.51%) -- window(about:newtab)/dom > │ │ │ ├──183 (08.46%) ── event-listeners > │ │ │ └────1 (00.05%) ── event-targets > │ │ └───16 (00.74%) ++ window(about:memory)/dom > │ ├────198 (09.15%) -- top(about:newtab, id=150)/active/window(about:newtab)/dom > │ │ ├──197 (09.11%) ── event-listeners > │ │ └────1 (00.05%) ── event-targets That seems like a lot of event listeners and observers. Not sure what to make of that. It could just be LinkedIn, or maybe we have some kind of leak? The fact that we have many event listeners in the main process is interesting. Anyway, I think the biggest question here is about 32-bit builds on Linux. I don't think we distribute them. Are they a high priority? A standard 64-bit build should avoid these problems.
Flags: needinfo?(n.nethercote)
Whiteboard: [platform-rel-Linkedin] → [platform-rel-Linkedin][qf]
This is incredibly reproducible on Firefox 52.0.2 64-bit on Windows 10. Merely scrolling through your social feed, or even worse, checking messages will kill the content process. I've eliminated all extensions (except for the Gecko profiler), and I even set dom.ipc.processCount to 50 to ensure that my tab was in its own process, thus eliminating other website interference.
Scrolling down Linkedin creates all the time more and more DOM content, so that naturally takes more and more memory. But how much... that is hard to say. (This could be something totally different, but reminds me a bit about similar issue in Facebook, where they added all the time more and more objects to some array.) But so far I haven't managed to reproduce this on FF. No unexpected memory usage or slowness. I did get some weird behavior in Chrome. After scrolling down for awhile, scrolling became really jank-y and the relevant process started to take some CPU constantly. Kenan, could you perhaps use about:memory when you're starting to see some badness. Hopefully the child process isn't all blocked and about:memory can still get a memory report.
Flags: needinfo?(bugs) → needinfo?(koenigseggcc)
Whiteboard: [platform-rel-Linkedin][qf] → [platform-rel-Linkedin][qf-]
I reported about the this issue, or about the issue I see to LinkedIn.
Hi Olli, I have an about:memory measurement from my session as the tab was loading. However, to clarify, the content process dies (i.e. it says 'Gah. Your tab just crashed.') pretty much 100% of the time when browsing to messages. I've seen this on every computer I use on Windows 10. The profiles have a bunch of tabs in them, so if you search for 'linkedin' I think that will give you what you want.
Memory profile when loading LinkedIn in new tab.
Memory profile after LinkedIn crashes its content process.
Flags: needinfo?(koenigseggcc)
Hmm, is the tab load memory-report somehow busted. Can't seem to load it locally to about:memory. Oh, btw, remove Gecko Profiler if you're seeing crashes.
Kenan, do you get crash-reports?
I've disabled the Gecko profiler. Seems to have made the crashses less likely, but they still happen. I do get crash reports, and I just added a comment to one of them referencing the link to this ticket.
Could you give link to the relevant crash report here?
"Uptime 16 seconds " hints that is isn't about LinkedIn. And the stack trace tells that the crash is about the Gecko profiler.
The previo1330193(In reply to Kenan from comment #50) > I think this is one of them. > https://crash-stats.mozilla.com/report/index/6bcec12d-6ca5-4bc9-916a- > 0d5db2170408 See bug 1330193
Keywords: perf
Keywords: qawanted

Can you still reproduce?

Flags: needinfo?(koenigseggcc)
Flags: needinfo?(bugzilla.mozilla.org)
OS: Linux → All
Summary: frequent crashes when visiting linkedin.com profiles → frequent crashes when visiting linkedin.com profiles, often with high purple buffer value

Have not seen this issue for a long while, now on 80.0.1 (64-bit) Ubuntu 20.04.

I think things are fine. Sorry for the slow response.

Flags: needinfo?(koenigseggcc)

Resolving this per the last few comments. Thank you!

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
Flags: needinfo?(bugzilla.mozilla.org)
Performance Impact: --- → -
Whiteboard: [platform-rel-Linkedin][qf-] → [platform-rel-Linkedin]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: