frequent crashes when visiting linkedin.com profiles

NEW
Unassigned

Status

()

Core
General
--
critical
2 years ago
3 months ago

People

(Reporter: Rolf Leggewie, Unassigned)

Tracking

(Blocks: 1 bug, {crash, perf, qawanted})

40 Branch
x86
Linux
crash, perf, qawanted
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(platform-rel +)

Details

(Whiteboard: [platform-rel-Linkedin][qf-], crash signature)

Attachments

(10 attachments, 3 obsolete attachments)

(Reporter)

Description

2 years ago
For quite a while I've been getting reproducible crashes when visiting profile pages on linkedin.com.  Going by https://community.linkedin.com/questions/2148/mozilla-firefox-not-loading-linkedin-pages.html others seem to be experiencing similar issues.

What happens is that upon visiting one of the triggering pages, FF will become unresponsive, grey out, consume 100% of CPU and have a quick increase in RAM usage until it crashes.

Example crashes
bp-40972f72-d0a9-4347-9ac3-504672150915
bp-890c81e7-827c-46eb-9540-9b4152150915

possibly related bugs
bug 1165934
bug 1140519
bug 1098484
(Reporter)

Comment 1

2 years ago
I've deleted cookies to no avail.
(Reporter)

Updated

2 years ago
OS: Unspecified → Linux
Hardware: Unspecified → x86
(Reporter)

Comment 2

2 years ago
I'm running Ubuntu Trusty.
(Reporter)

Comment 3

2 years ago
This is reproducible in safe-mode.

Updated

2 years ago
See Also: → bug 1165934

Updated

2 years ago
Severity: normal → critical
Crash Signature: [@ OOM | small ]
Keywords: crash

Comment 4

2 years ago
Are you still seeing this? If so, is there any chance that you can do some process sampling or checks with the gecko profiler ( https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Performance_Problem ) to get a better idea of what's hanging here?
Component: General → Untriaged
Flags: needinfo?(bugzilla.mozilla.org)
(Reporter)

Comment 5

2 years ago
I am sorry about the delay in responding.

The problem is still the same.  I will have to read up on the URL given to give the additional requested information.
Flags: needinfo?(bugzilla.mozilla.org)
(Reporter)

Comment 6

2 years ago
Apparently, this is the #1 crasher for FF44 with a number of open bug tickets for it.

http://wc.devil.mx/proxify.php?u=rSKmbsBFPEniK20TNhrhQ6p1BcqcoIq5JkQgHVP5L%2F3arQZvCO3K6n4K16RUiUFttnLxX94E8bhtOe530xhnPq%2BU93Z6boDCc0pGvQ%3D%3D
Created attachment 8741682 [details]
Screenshot.png

Hi,

The issue is pretty hard to reproduce, but it's still reproducible on the latest Firefox release (45.0.2, Build ID 20160407164938). I can only reproduce if opening Firefox and going to LinkedIn.com is the first thing I do once Linux x32 is up and running. Opening 5-6 tabs of people's profiles results in a browser hang, unresponsive slow script dialog and eventually crash (bp-8354bf87-5323-476f-8e47-2ec202160415). I've tried to get a gecko profile, but it has proved impossible due to the immediate hang.
As the browser hangs the CPU usage is all over the place, see the attached screenshot.
The issue in only reproducible on Linux x32. And it is no longer reproducible if you have worked on the machine before attempting to reproduce this issue.

Thanks,
Cipri
(In reply to Rolf Leggewie from comment #6)
> Apparently, this is the #1 crasher for FF44 with a number of open bug
> tickets for it.

Unfortunately, OOM|small can happen for a lot of different reasons (you'll see that basically any time Firefox fails to allocate a small amount of memory due to lack of continuous address space), so that by itself doesn't say much. What matters more for diagnosing what's wrong here is what's below that (i.e. where we were trying to allocate memory when we failed to do so). Unfortunately, that's also broken for some reason (we should be seeing more useful call stack information than just libxul.so at random addresses)!

I don't understand why we're not getting usable crash stacks in the submitted crash reports. I know at least the people on my team have been explicitly using official Mozilla binaries rather than Ubuntu's while trying to reproduce. Ted, do you have any thoughts for what might be going on?
Flags: needinfo?(ted)
Are these Ubuntu binaries? The build ID there doesn't match the build id from our official 40.0.3 Linux x86 build:
http://ftp.mozilla.org/pub/firefox/candidates/40.0.3-candidates/build1/linux-i686/en-US/firefox-40.0.3.txt

If Ubuntu builds are missing symbols we need Chris Coulson to help.
Flags: needinfo?(ted)
OK, there's another issue there (from the modules tab):
 Ø libxul.so 		000000000000000000000000000000000	libxul.so

This is probably due to us trying to `mmap` libxul and failing due to OOM (or address space exhaustion or fragmentation or whatever). Breakpad generates the debug identifiers there by `mmap`ing each file and reading some data out of it:
https://chromium.googlesource.com/breakpad/breakpad/+/master/src/client/linux/minidump_writer/linux_dumper.cc#148

On Windows we VirtualAlloc some extra memory and free it in the pre-minidump-writing callback to work around this problem:
https://dxr.mozilla.org/mozilla-central/rev/21bf1af375c1fa8565ae3bb2e89bd1a0809363d4/toolkit/crashreporter/nsExceptionHandler.cpp#396

We could do the same thing on Linux (although it's probably only necessary on Linux/x86, since x86-64 has lots of address space). Alternately we could try to fix Breakpad to not `mmap` the entire file.

Comment 12

2 years ago
Are you able to get a short profile?
 https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Profiling_with_the_Built-in_Profiler
Flags: needinfo?(bugzilla.mozilla.org)

Comment 13

2 years ago
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #11)
> ...
> We could do the same thing on Linux (although it's probably only necessary
> on Linux/x86, since x86-64 has lots of address space). Alternately we could
> try to fix Breakpad to not `mmap` the entire file.

followup bug?
Flags: needinfo?(ted)

Comment 14

2 years ago
Created attachment 8764649 [details]
Screenshot from 2016-06-18 21-58-14_StartingLinkedIn.png

StartingLinkedIn

Comment 15

2 years ago
Created attachment 8764650 [details]
Screenshot from 2016-06-18 22-00-24_MemoryEatenUp.png

MemoryEatenUp, but swap hardly touched

Comment 16

2 years ago
Created attachment 8764651 [details]
Screenshot from 2016-06-18 22-01-40_SystemMonitor.png

System Monitor

Comment 17

2 years ago
Created attachment 8764654 [details]
Screenshot from 2016-06-18 22-02-56_AllMemoryUsed.png

All memory used now

Comment 18

2 years ago
Created attachment 8764656 [details]
Screenshot from 2016-06-18 22-03-27_killed Firefox.png

After killing Firefox CPU is freed and memory released.
Attachment #8764651 - Attachment is obsolete: true

Comment 19

2 years ago
Created attachment 8764658 [details]
Screenshot from 2016-06-18 22-01-40_SystemMonitor.png

Ubuntu System monitor

Comment 20

2 years ago
In my case Firefox 45.0 on Ubuntu 16.04 LTS 64 bit LinkedIn "People you may know" gobbles up all memory. I have attached above screenshots to illustrate the issue. Will run the Gekko profiler and revert.
I've managed to get a crash on the latest Nightly:
https://crash-stats.mozilla.com/report/index/75c61a78-5dff-4efc-a4bb-843d82160623
Component: Untriaged → DOM: CSS Object Model
Product: Firefox → Core
For an out-of-memory crash, the stack is uninteresting.  What is needed is knowledge of what memory usage is increasing.  Can you keep an eye on about:memory and see what memory usage is going up?
Component: DOM: CSS Object Model → General
Flags: needinfo?(ciprian.muresan)
Keywords: qawanted
Created attachment 8764860 [details]
linkedin.memory2.txt

I've attached the logs I got through about:memory, but honestly I'm not entirely sure of how to interpret it. Could you please help me understand this?
Flags: needinfo?(ciprian.muresan) → needinfo?(dbaron)
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #13)
> (In reply to Ted Mielczarek [:ted.mielczarek] from comment #11)
> > ...
> > We could do the same thing on Linux (although it's probably only necessary
> > on Linux/x86, since x86-64 has lots of address space). Alternately we could
> > try to fix Breakpad to not `mmap` the entire file.
> 
> followup bug?

Feel free to file one.
Flags: needinfo?(ted)
This:
│  ├──289.36 MB (56.11%) ── purple-buffer
seems the most suspicious.

mccr8, could you take a look, or if not, bounce to somebody else?
Flags: needinfo?(dbaron) → needinfo?(continuation)
Maybe Olli could take a look on Tuesday. If not, I should be able to on Wednesday or Thursday.
I have seen a large-ish purple buffer a few times, though I can't reproduce the rest of it. I'll try to figure out what is in the purple buffer.
Assignee: nobody → continuation
Well, I can't reproduce the issue here, and I am not seeing a large purple buffer, so I am not going to be able to investigate this.
Assignee: continuation → nobody
Flags: needinfo?(continuation)
platform-rel: --- → ?
Whiteboard: [platform-rel-Linkedin]
platform-rel: ? → +
Florin, is it possible to get some help with QA on LinkedIn profiling to see if this is still happening in case there have been design changes on their end since this was filed?
Flags: needinfo?(florin.mezei)
(Reporter)

Comment 30

11 months ago
I'm sorry for not getting back sooner to this ticket.  I continued to experience this problem reproducibly until I switched my i386 system to an amd64 kernel and amd64 firefox.  It seems memory addressing is more capable with 64bit such as to make it harder to reproduce there.
Flags: needinfo?(bugzilla.mozilla.org)
Ciprian, could you please take a look and see if this is still reproducible on latest Firefox?
Flags: needinfo?(florin.mezei) → needinfo?(ciprian.muresan)
Created attachment 8837539 [details]
memory-report-linkedin.json.gz

The issue is still reproducible on the latest Firefox release (51.0.1) and on the latest Nightly (54.0a1, Build ID 20170214110212).

bp-678566e5-5847-459c-a84c-985ab2170215 - Release crash
bp-e8d90d36-78a3-43a1-ad10-128762170215 - Nightly crash

Attached about:memory logs from when the issue started to appear.
Flags: needinfo?(ciprian.muresan)
Created attachment 8837540 [details]
memory-report-linkedin2.json.gz

Attached about:memory logs after I left the issue manifest itself a bit.
Is that the memory log from only the parent process? Can you attach one from the content process, please?
Flags: needinfo?(ciprian.muresan)
(In reply to Ciprian Muresan [:cmuresan] from comment #32)
> bp-e8d90d36-78a3-43a1-ad10-128762170215 - Nightly crash

The stack is around allocating from GetBorderTopWidth.
Created attachment 8837987 [details]
memory-report-linkedin3.json.gz

Apparently, if I wait too much before saving memory logs, it won't save logs for the content process.
Attachment #8837539 - Attachment is obsolete: true
Attachment #8837540 - Attachment is obsolete: true
Flags: needinfo?(ciprian.muresan)
Thanks, Ciprian. Olli, WDYT about the memory report here? It doesn't look like a whole lot of memory used to me.
Flags: needinfo?(bugs)
Nick, do you have any thoughts on how we could capture what's causing the OOM here?
Flags: needinfo?(n.nethercote)
Various things...

On Windows we have a mechanism that periodically takes memory reports if we're close to running out of address space, and the most recent memory report gets incorporated into the crash report. But that doesn't run on Linux. So the best way to make progress is to get memory reports from about:memory as close to the point of crashing as possible.

The "linkedin.memory2.txt" attachment *does* have the content process. Search for "Web Content (pid 16186)".

The only suspicious thing I see in the "linkedin.memory2.txt" attachment is the high purple buffer value, which dbaron mentioned in comment 25. It's possible that was a temporary transient thing. Other than that, the most important numbers are as follows.

> Main Process
>   340.95 MB ── resident
> 1,045.40 MB ── vsize
> 
> Web Content (pid 16186)
>   624.46 MB ── resident
> 2,216.17 MB ── vsize

Those look pretty normal. It's typical on Linux for vsize to be significantly higher than resident. (E.g. in my current Linux session I have 278 & 1206 in the main process and 410 & 1024 in the content process.) I wouldn't expect this to be a problem... unless you're running a 32-bit build, which is not standard on Linux. Rolf said in comment 30 that he was running a 32-bit build, then switched to 64-bit and the problem went away. And Ciprian is also running a 32-bit build (comment 7).

In the memory-report-linkedin3.json.gz attachment, the only surprising things I see are these:

> Web Content (pid 2561)
> 
>    ├────459 (12.96%) -- top(https://www.linkedin.com/, id=6442450949)/active
>    │    ├──389 (10.99%) -- window(https://www.linkedin.com/)/dom
>    │    │  ├──378 (10.67%) ── event-listeners
>    │    │  └───11 (00.31%) ── event-targets
>    │    ├───37 (01.04%) ++ window(https://ad-emea.doubleclick.net/adi/linkedin.dart/oz-winner;optout=false;lang=en;tile=2;sz=300x250;s=0;v=6;u=cQENd4lKkTBjdD1LnSRvgz9f;mod=50;title=en;func=qa;coid=1963799;ind=4;occ=407;pocc=3;pocc=8687;pocc=3076;cntry=ro;reg=0;sub=0;gdr=m;seg=9005;seg=548;sjt=554;tile_p=2;adsuite=v2.2.6-min;sfadapter=t;ord=3358423959628?li_ads_3p=control)/dom
>    │    └───33 (00.93%) ++ (9 tiny)
>    ├────342 (09.66%) -- top(https://www.linkedin.com/in/hanspeschke?authType=NAME_SEARCH&authToken=jgA3&locale=de_DE&srchid=4932510911487150419061&srchindex=1&srchtotal=229198&trk=vsrp_people_res_name&trkInfo=VSRPsearchId%3A4932510911487150419061%2CVSRPtargetId%3A361578890%2CVSRPcmpt%3Aprimary%2CVSRPnm%3Atrue%2CauthType%3ANAME_SEARCH, id=6442450959)/active
>    │    ├──257 (07.26%) -- window(https://www.linkedin.com/in/hanspeschke?authType=NAME_SEARCH&authToken=jgA3&locale=de_DE&srchid=4932510911487150419061&srchindex=1&srchtotal=229198&trk=vsrp_people_res_name&trkInfo=VSRPsearchId%3A4932510911487150419061%2CVSRPtargetId%3A361578890%2CVSRPcmpt%3Aprimary%2CVSRPnm%3Atrue%2CauthType%3ANAME_SEARCH)/dom
>    │    │  ├──255 (07.20%) ── event-listeners
>    │    │  └────2 (00.06%) ── event-targets
>    │    └───85 (02.40%) ++ (12 tiny)
>    ├────333 (09.40%) -- top(https://www.linkedin.com/in/hanspeschke?authType=NAME_SEARCH&authToken=jgA3&locale=de_DE&srchid=4932510911487150419061&srchindex=1&srchtotal=229198&trk=vsrp_people_res_name&trkInfo=VSRPsearchId%3A4932510911487150419061%2CVSRPtargetId%3A361578890%2CVSRPcmpt%3Aprimary%2CVSRPnm%3Atrue%2CauthType%3ANAME_SEARCH, id=6442450953)/active
>    │    ├──257 (07.26%) -- window(https://www.linkedin.com/in/hanspeschke?authType=NAME_SEARCH&authToken=jgA3&locale=de_DE&srchid=4932510911487150419061&srchindex=1&srchtotal=229198&trk=vsrp_people_res_name&trkInfo=VSRPsearchId%3A4932510911487150419061%2CVSRPtargetId%3A361578890%2CVSRPcmpt%3Aprimary%2CVSRPnm%3Atrue%2CauthType%3ANAME_SEARCH)/dom
>    │    │  ├──255 (07.20%) ── event-listeners
>    │    │  └────2 (00.06%) ── event-targets
> 
> 2,116 (100.0%) -- observer-service-suspect
> ├────522 (24.67%) ── referent(topic=memory-pressure)
> ├────205 (09.69%) ── referent(topic=xpcom-shutdown)
> ├────159 (07.51%) ── referent(topic=dom-private-storage2-changed)
> ├────159 (07.51%) ── referent(topic=dom-storage2-changed)
> ├────159 (07.51%) ── referent(topic=network:offline-status-changed)
> ├────138 (06.52%) ── referent(topic=service-worker-get-client)
> ├────114 (05.39%) ── referent(topic=chrome-flush-skin-caches)
> ├────110 (05.20%) ── referent(topic=agent-sheet-added)
> ├────110 (05.20%) ── referent(topic=agent-sheet-removed)
> ├────110 (05.20%) ── referent(topic=author-sheet-added)
> ├────110 (05.20%) ── referent(topic=author-sheet-removed)
> ├────110 (05.20%) ── referent(topic=user-sheet-added)
> └────110 (05.20%) ── referent(topic=user-sheet-removed)
> 
> Main Process
> 
> 2,163 (100.0%) -- event-counts
> ├──2,145 (99.17%) -- window-objects
> │  ├──1,695 (78.36%) -- top(chrome://browser/content/browser.xul, id=3)/active
> │  │  ├──1,693 (78.27%) -- window(chrome://browser/content/browser.xul)/dom
> │  │  │  ├──1,661 (76.79%) ── event-listeners
> │  │  │  └─────32 (01.48%) ── event-targets
> │  │  └──────2 (00.09%) ── window(about:blank)/dom/event-targets [2]
> │  ├────200 (09.25%) -- top(about:memory, id=106)/active
> │  │    ├──184 (08.51%) -- window(about:newtab)/dom
> │  │    │  ├──183 (08.46%) ── event-listeners
> │  │    │  └────1 (00.05%) ── event-targets
> │  │    └───16 (00.74%) ++ window(about:memory)/dom
> │  ├────198 (09.15%) -- top(about:newtab, id=150)/active/window(about:newtab)/dom
> │  │    ├──197 (09.11%) ── event-listeners
> │  │    └────1 (00.05%) ── event-targets

That seems like a lot of event listeners and observers. Not sure what to make
of that. It could just be LinkedIn, or maybe we have some kind of leak? The fact that we have many event listeners in the main process is interesting.

Anyway, I think the biggest question here is about 32-bit builds on Linux. I
don't think we distribute them. Are they a high priority? A standard 64-bit
build should avoid these problems.
Flags: needinfo?(n.nethercote)
Whiteboard: [platform-rel-Linkedin] → [platform-rel-Linkedin][qf]

Comment 40

9 months ago
This is incredibly reproducible on Firefox 52.0.2 64-bit on Windows 10. Merely scrolling through your social feed, or even worse, checking messages will kill the content process. I've eliminated all extensions (except for the Gecko profiler), and I even set dom.ipc.processCount to 50 to ensure that my tab was in its own process, thus eliminating other website interference.

Comment 41

9 months ago
Scrolling down Linkedin creates all the time more and more DOM content, so that naturally takes more and more memory. But how much... that is hard to say.
(This could be something totally different, but reminds me a bit about similar issue in Facebook, where they added all the time more and more objects to some array.)

But so far I haven't managed to reproduce this on FF. No unexpected memory usage or slowness.

I did get some weird behavior in Chrome. After scrolling down for awhile, scrolling became really jank-y and the relevant process started to take some CPU constantly.

Kenan, could you perhaps use about:memory when you're starting to see some badness. Hopefully the child process isn't all blocked and about:memory can still get a memory report.
Flags: needinfo?(bugs) → needinfo?(koenigseggcc)

Updated

9 months ago
Whiteboard: [platform-rel-Linkedin][qf] → [platform-rel-Linkedin][qf-]

Comment 42

9 months ago
I reported about the this issue, or about the issue I see to LinkedIn.

Comment 43

9 months ago
Hi Olli,

I have an about:memory measurement from my session as the tab was loading. However, to clarify, the content process dies (i.e. it says 'Gah. Your tab just crashed.') pretty much 100% of the time when browsing to messages. I've seen this on every computer I use on Windows 10.

The profiles have a bunch of tabs in them, so if you search for 'linkedin' I think that will give you what you want.

Comment 44

9 months ago
Created attachment 8855101 [details]
tab load memory-report.json

Memory profile when loading LinkedIn in new tab.

Comment 45

9 months ago
Created attachment 8855102 [details]
after crash memory-report.json

Memory profile after LinkedIn crashes its content process.
Flags: needinfo?(koenigseggcc)

Comment 46

9 months ago
Hmm, is the tab load memory-report somehow busted. Can't seem to load it locally to about:memory.

Oh, btw, remove Gecko Profiler if you're seeing crashes.

Comment 47

9 months ago
Kenan, do you get crash-reports?

Updated

9 months ago
Depends on: 1354047

Comment 48

9 months ago
I've disabled the Gecko profiler. Seems to have made the crashses less likely, but they still happen. I do get crash reports, and I just added a comment to one of them referencing the link to this ticket.

Comment 49

9 months ago
Could you give link to the relevant crash report here?

Comment 51

9 months ago
"Uptime 	16 seconds " hints that is isn't about LinkedIn. 
And the stack trace tells that the crash is about the Gecko profiler.
The previo1330193(In reply to Kenan from comment #50)
> I think this is one of them.
> https://crash-stats.mozilla.com/report/index/6bcec12d-6ca5-4bc9-916a-
> 0d5db2170408

See bug 1330193
You need to log in before you can comment on or make changes to this bug.