Open Bug 1414593 Opened 7 years ago Updated 2 years ago

Permaorange Ccov tp5o_webext | application crashed [@ libc-2.23.so + 0x35428]

Categories

(Firefox :: New Tab Page, defect, P3)

defect

Tracking

()

Tracking Status
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- wontfix

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash, intermittent-failure, regression, Whiteboard: [stockwell unknown] [AS61MVP])

Crash Data

Summary: Intermittent tp5o_webext | application crashed [@ libc-2.23.so + 0x35428] → Permaorange Ccov tp5o_webext | application crashed [@ libc-2.23.so + 0x35428]
- filed 4 days ago
- 33 failures in the last 7 days
- occurs only on linux64-ccov platform and opt buid type
- recent log file: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=143018864&lineNumber=17201
Flags: needinfo?(rwood)
Whiteboard: [stockwell needswork]
the root cause here is from an activity stream update in bug 1410541:
https://hg.mozilla.org/mozilla-central/rev/d1f0c44b2d79

some related parts of the crash:
[task 2017-11-07T20:09:24.851Z] 20:09:24     INFO -  PROCESS-CRASH | tp5o_webext | application crashed [@ libc-2.23.so + 0x35428]
[task 2017-11-07T20:09:24.852Z] 20:09:24     INFO -  Crash dump filename: /tmp/tmpEOfFTQ/profile/minidumps/3804c9ff-92fb-de08-5dd8-326d2f3cc90d.dmp
[task 2017-11-07T20:09:24.853Z] 20:09:24     INFO -  Operating system: Linux
[task 2017-11-07T20:09:24.853Z] 20:09:24     INFO -                    0.0.0 Linux 3.13.0-112-generic #159-Ubuntu SMP Fri Mar 3 15:26:07 UTC 2017 x86_64
[task 2017-11-07T20:09:24.854Z] 20:09:24     INFO -  CPU: amd64
[task 2017-11-07T20:09:24.854Z] 20:09:24     INFO -       family 6 model 62 stepping 4
[task 2017-11-07T20:09:24.855Z] 20:09:24     INFO -       4 CPUs
[task 2017-11-07T20:09:24.856Z] 20:09:24     INFO -  GPU: UNKNOWN
[task 2017-11-07T20:09:24.857Z] 20:09:24     INFO -  Crash reason:  SIGABRT
[task 2017-11-07T20:09:24.857Z] 20:09:24     INFO -  Crash address: 0x3e800000543
[task 2017-11-07T20:09:24.858Z] 20:09:24     INFO -  Process uptime: not available
[task 2017-11-07T20:09:24.858Z] 20:09:24     INFO -  Thread 0 (crashed)
[task 2017-11-07T20:09:24.859Z] 20:09:24     INFO -   0  libc-2.23.so + 0x35428
[task 2017-11-07T20:09:24.860Z] 20:09:24     INFO -      rax = 0x0000000000000000   rdx = 0x0000000000000006
[task 2017-11-07T20:09:24.860Z] 20:09:24     INFO -      rcx = 0xffffffffffffffff   rbx = 0x00007fc81b76e000
[task 2017-11-07T20:09:24.861Z] 20:09:24     INFO -      rsi = 0x0000000000000543   rdi = 0x0000000000000543
[task 2017-11-07T20:09:24.861Z] 20:09:24     INFO -      rbp = 0x00007fc815c701dc   rsp = 0x00007ffcafb9d2d8
[task 2017-11-07T20:09:24.861Z] 20:09:24     INFO -       r8 = 0x0000000002b0eda0    r9 = 0xfeff0011feff0900
[task 2017-11-07T20:09:24.862Z] 20:09:24     INFO -      r10 = 0x0000000000000008   r11 = 0x0000000000000202
[task 2017-11-07T20:09:24.862Z] 20:09:24     INFO -      r12 = 0x0000000000000216   r13 = 0x00007fc815c70390
[task 2017-11-07T20:09:24.866Z] 20:09:24     INFO -      r14 = 0x0000000000002c8e   r15 = 0x0000000002a94cc0
[task 2017-11-07T20:09:24.867Z] 20:09:24     INFO -      rip = 0x00007fc81a2ed428
[task 2017-11-07T20:09:24.867Z] 20:09:24     INFO -      Found by: given as instruction pointer in context
[task 2017-11-07T20:09:24.868Z] 20:09:24     INFO -   1  libc-2.23.so + 0x3702a
[task 2017-11-07T20:09:24.868Z] 20:09:24     INFO -      rsp = 0x00007ffcafb9d2e0   rip = 0x00007fc81a2ef02a
[task 2017-11-07T20:09:24.869Z] 20:09:24     INFO -      Found by: stack scanning
[task 2017-11-07T20:09:24.869Z] 20:09:24     INFO -   2  libfontconfig.so.1.9.0 + 0x261dc
[task 2017-11-07T20:09:24.870Z] 20:09:24     INFO -      rsp = 0x00007ffcafb9d398   rip = 0x00007fc815c701dc
[task 2017-11-07T20:09:24.870Z] 20:09:24     INFO -      Found by: stack scanning
[task 2017-11-07T20:09:24.871Z] 20:09:24     INFO -   3  libfontconfig.so.1.9.0 + 0x26390
[task 2017-11-07T20:09:24.871Z] 20:09:24     INFO -      rsp = 0x00007ffcafb9d3a8   rip = 0x00007fc815c70390
[task 2017-11-07T20:09:24.872Z] 20:09:24     INFO -      Found by: stack scanning
[task 2017-11-07T20:09:24.872Z] 20:09:24     INFO -   4  libc-2.23.so + 0x8453c
[task 2017-11-07T20:09:24.873Z] 20:09:24     INFO -      rsp = 0x00007ffcafb9d3c0   rip = 0x00007fc81a33c53c
[task 2017-11-07T20:09:24.873Z] 20:09:24     INFO -      Found by: stack scanning
[task 2017-11-07T20:09:24.874Z] 20:09:24     INFO -   5  libc-2.23.so + 0x190210
[task 2017-11-07T20:09:24.874Z] 20:09:24     INFO -      rsp = 0x00007ffcafb9d3c8   rip = 0x00007fc81a448210
[task 2017-11-07T20:09:24.875Z] 20:09:24     INFO -      Found by: stack scanning
[task 2017-11-07T20:09:24.875Z] 20:09:24     INFO -   6  libc-2.23.so + 0x193760
[task 2017-11-07T20:09:24.876Z] 20:09:24     INFO -      rsp = 0x00007ffcafb9d3d0   rip = 0x00007fc81a44b760
[task 2017-11-07T20:09:24.876Z] 20:09:24     INFO -      Found by: stack scanning
[task 2017-11-07T20:09:24.877Z] 20:09:24     INFO -   7  libc-2.23.so + 0x190210
[task 2017-11-07T20:09:24.877Z] 20:09:24     INFO -      rsp = 0x00007ffcafb9d3e0   rip = 0x00007fc81a448210
[task 2017-11-07T20:09:24.878Z] 20:09:24     INFO -      Found by: stack scanning
[task 2017-11-07T20:09:24.878Z] 20:09:24     INFO -   8  libfontconfig.so.1.9.0 + 0x261dc


:Mardak, I see you authored bug 1410541, can you help figure out why when using web extensions in talos we crash during code coverage collection?
Blocks: 1410541
Flags: needinfo?(rwood) → needinfo?(edilee)
How did you bisect / test? Looks like `try: -b o -p linux64-ccov -t g5-e10s --artifact` ? I'll try to bisect within the activity stream changes.

https://github.com/mozilla/activity-stream/compare/6245c446e61bb13fed4bd8f7c600de44d9fed1ef...95b4c35393b7192d680d1291b6960200be5e7570
Flags: needinfo?(edilee) → needinfo?(jmaher)
I did a bunch of try pushes with that syntax, but oddly the jobs are not added, when the builds are done I had to 'add new jobs' for the g5 job.

If you have ways to bisect the activity stream data, I am happy to do another large round of try pushes and baby sit them- likewise point me at the try pushes you have and I will add the jobs :)
Flags: needinfo?(jmaher)
Thanks for identifying the root cause here- hopefully this is an easy fix, if not it can wait a few weeks to resolve for when it makes the most sense.
Last week's summary:
- 46 total failures
- platform: linux64-ccov
- build: opt

Here is a recent log:
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=145858061&lineNumber=17409

And snippet from it:

[task 2017-11-18T21:58:40.953Z] 21:58:40     INFO -  TEST-INFO | 1337: exit 6
17405
[task 2017-11-18T21:58:40.969Z] 21:58:40     INFO -  mozcrash Downloading symbols from: https://queue.taskcluster.net/v1/task/ZvMxtzqHRbykZKZOFvT99A/artifacts/public/build/target.crashreporter-symbols.zip
17406
[task 2017-11-18T21:58:48.681Z] 21:58:48     INFO -  mozcrash Copy/paste: /builds/worker/workspace/build/linux64-minidump_stackwalk /tmp/tmpL6RM1R/profile/minidumps/2d5ec068-bd42-e86b-0206-533542a20a6f.dmp /tmp/tmpkyRSAz
17407
[task 2017-11-18T21:58:56.789Z] 21:58:56     INFO -  mozcrash Saved minidump as /builds/worker/workspace/build/blobber_upload_dir/2d5ec068-bd42-e86b-0206-533542a20a6f.dmp
17408
[task 2017-11-18T21:58:56.804Z] 21:58:56     INFO -  mozcrash Saved app info as /builds/worker/workspace/build/blobber_upload_dir/2d5ec068-bd42-e86b-0206-533542a20a6f.extra
17409
[task 2017-11-18T21:58:56.805Z] 21:58:56     INFO -  PROCESS-CRASH | tp5o_webext | application crashed [@ libc-2.23.so + 0x35428]
17410
[task 2017-11-18T21:58:56.805Z] 21:58:56     INFO -  Crash dump filename: /tmp/tmpL6RM1R/profile/minidumps/2d5ec068-bd42-e86b-0206-533542a20a6f.dmp
17411
[task 2017-11-18T21:58:56.806Z] 21:58:56     INFO -  Operating system: Linux
17412
[task 2017-11-18T21:58:56.806Z] 21:58:56     INFO -                    0.0.0 Linux 3.13.0-112-generic #159-Ubuntu SMP Fri Mar 3 15:26:07 UTC 2017 x86_64
17413
[task 2017-11-18T21:58:56.806Z] 21:58:56     INFO -  CPU: amd64
17414
[task 2017-11-18T21:58:56.807Z] 21:58:56     INFO -       family 6 model 62 stepping 4
17415
[task 2017-11-18T21:58:56.807Z] 21:58:56     INFO -       4 CPUs
17416
[task 2017-11-18T21:58:56.808Z] 21:58:56     INFO -  GPU: UNKNOWN
17417
[task 2017-11-18T21:58:56.808Z] 21:58:56     INFO -  Crash reason:  SIGABRT
17418
[task 2017-11-18T21:58:56.808Z] 21:58:56     INFO -  Crash address: 0x3e800000539
17419
[task 2017-11-18T21:58:56.809Z] 21:58:56     INFO -  Process uptime: not available
Flags: needinfo?(edilee)
Flags: needinfo?(edilee) → needinfo?(andrei.br92)
Assignee: nobody → andrei.br92
Flags: needinfo?(andrei.br92)
I have this patch which should account for the extra coverage changes needed:
https://pastebin.mozilla.org/9073271

then:
./mach talos-test -a tp5o_webext
Seems there was a similar crash fixed recently in bug 1411322. I created bug 1419779 for this.
andreio tracked this down to the splitting of `createChannel` to have a separate `simulateMessagesForExistingTabs`:
https://github.com/mozilla/activity-stream/pull/3695/files#diff-a12f94b26c6ac483de6e2a17547ced6bR141
Status: NEW → ASSIGNED
Iteration: --- → 1.26
Component: Talos → Activity Streams: Newtab
Priority: P5 → P2
Product: Testing → Firefox
Version: Version 3 → unspecified
See Also: → 1419779
Iteration: 1.26 → 1.27
Iteration: 1.27 → 60.1 - Jan 29
Iteration: 60.1 - Jan 29 → 60.2 - Feb 12
Whiteboard: [stockwell unknown] → [stockwell unknown] [AS60MVP]
Iteration: 60.2 - Feb 12 → 60.3 - Feb 26
Priority: P2 → P1
Flags: needinfo?(andrei.br92)
Iteration: 60.3 - Feb 26 → 60.4 - Mar 12
Iteration: 60.4 - Mar 12 → ---
Priority: P1 → P3
Iteration: --- → 61.1 - Mar 26
Whiteboard: [stockwell unknown] [AS60MVP] → [stockwell unknown] [AS61MVP]
Iteration: 61.1 - Mar 26 → ---
Iteration: --- → 61.2 - Apr 9
Priority: P3 → P2
Blocks: 1437659
Iteration: 61.2 - Apr 9 → ---
Flags: needinfo?(andrei.br92)
Assignee: andrei.br92 → nobody
Severity: critical → normal
Status: ASSIGNED → NEW
Priority: P2 → P3
Any plan to fix this? Is this leak affecting real users too?
Flags: needinfo?(edilee)
andreio had previously taken a look and sounds like he ran into issues reproducing it on try. The failures seem to be mostly for linux code coverage runs, so probably not affecting real users.
Flags: needinfo?(edilee)
(In reply to Ed Lee :Mardak from comment #19)
> andreio had previously taken a look and sounds like he ran into issues
> reproducing it on try. The failures seem to be mostly for linux code
> coverage runs, so probably not affecting real users.

Usually the only difference with code coverage builds is that they are slower than other builds. Maybe the leak somehow depends on timing.
Component: Activity Streams: Newtab → New Tab Page
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.