591898 - [Linux SeaMonkey 2.1, crashtest] 457362-1.xhtml segfaults on tinderbox but not locally

Reporter

Description

•

15 years ago

On 13th August the SeaMonkey Linux comm-central-trunk debug test crashtest was switched from cn-sea-qm-centos5-01 to cb-seamonkey-linux-01, since then Crashtest 457362-1.xhtml segfaults on that tinderbox with: TEST-UNEXPECTED-FAIL | file:///builds/slave/comm-central-trunk-linux-debug-unittest-crashtest/build/reftest/tests/layout/base/crashtests/457362-1.xhtml | Exited with code -11 during test run If you run the tests locally with: make crashtest TEST_PATH=layout/base/crashtests/crashtests.list there is no segfault: REFTEST INFO | Result summary: REFTEST INFO | Successful: 305 (0 pass, 305 load only) REFTEST INFO | Unexpected: 0 (0 unexpected fail, 0 unexpected pass, 0 unexpected asserts, 0 unexpected fixed asserts, 0 failed load, 0 exception) REFTEST INFO | Known problems: 2 (0 known fail, 0 known asserts, 0 random, 2 skipped, 0 slow) REFTEST INFO | Total canvas count = 0 The pushes that happened in the window between passing and failing are: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=29114207a571&tochange=e1d20276ef6d http://hg.mozilla.org/comm-central/pushloghtml?fromchange=16d674d435cf&tochange=579f8b02ac29

Shyam Mani [:fox2mike]

Comment 1

•

15 years ago

Seems more like releng territory, punting to them.

Assignee: server-ops → nobody

Component: Server Operations: Tinderbox Maintenance → Release Engineering

QA Contact: mrz → release

Robert Kaiser

Comment 2

•

15 years ago

(In reply to comment #0) > On 13th August the SeaMonkey Linux comm-central-trunk debug test crashtest was > switched from cn-sea-qm-centos5-01 to cb-seamonkey-linux-01, It was never switched between any boxes, it just executes on whatever box is free for it, those two are just two random possibilities out of the five we have. If it only fails on a single box, it might be a box problem and we need to find it, else this is a code problem - which from all I've seen is way more likely.

Component: Release Engineering → Testing Infrastructure

Product: mozilla.org → SeaMonkey

QA Contact: release → testing-infrastructure

Version: other → Trunk

Robert Kaiser

Updated

•

15 years ago

Summary: Crashtest 457362-1.xhtml segfaults on tinderbox but not locally → [SeaMonkey 2.1, crashtest] 457362-1.xhtml segfaults on tinderbox but not locally

Ian Neal

Reporter

Comment 3

•

15 years ago

It's just strange that the tests work locally but don't on tinderbox, someone with tinderbox access would need to debug the seg fault. Perhaps a package requirement introduced by one of the pushes in http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=29114207a571&tochange=e1d20276ef6d

Robert Kaiser

Comment 4

•

15 years ago

(In reply to comment #3) > It's just strange that the tests work locally but don't on tinderbox, someone > with tinderbox access would need to debug the seg fault. Perhaps a package > requirement introduced by one of the pushes in > http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=29114207a571&tochange=e1d20276ef6d I haven't seen an trace of any such thing at least for Linux. But then, all the OSes tested for Firefox are different than what we test right now, because we can't afford Talos minis and they run all their tests exclusively on such boxes, with different OSes than the builder machines, while we run them all on the builder machines. So if anything is needed on test boxes only, we don't see it, and have not much of a chance to get the right packages. Oh, and I doubt anyone with debug knowledge and access to out build boxes exists, unless Callek know enough (I don't), and then we have few enough boxes that we can't just set one aside for a longer period to dig in very much.

Bruno 'Aqualon' Escherl

Comment 5

•

15 years ago

Attached file output of ddd bt full during test hang — Details

I got a hang during that test, attached ddd to it and got the output in the attachment. I cut it after line #40 since it was endlessly going on with the same output around nsWindow::OnExposeEvent (hit ctrl+c at about #6000) But it's not completely reproducable, I also got clean testruns whithout any crash or hang for that test.

Bruno 'Aqualon' Escherl

Comment 6

•

15 years ago

I let the test run again and at some point, I've got the following output: REFTEST TEST-START | file:///home/i6stud/sibresch/nobackup/seamonkey_hg/trees/comm-central/mozilla/layout/base/crashtests/457362-1.xhtml ++DOMWINDOW == 92 (0x2ab7fff84468) [serial = 1526] [outer = 0x2ab7f80b0c00] WARNING: g_closure_ref: assertion `closure->ref_count < CLOSURE_MAX_REF_COUNT' failed: 'glib warning', file /home/i6stud/sibresch/nobackup/seamonkey_hg/trees/comm-central/mozilla/toolkit/xre/nsSigHandlers.cpp, line 193 (seamonkey-bin:31479): GLib-GObject-CRITICAL **: g_closure_ref: assertion `closure->ref_count < CLOSURE_MAX_REF_COUNT' failed WARNING: g_closure_ref: assertion `closure->ref_count < CLOSURE_MAX_REF_COUNT' failed: 'glib warning', file /home/i6stud/sibresch/nobackup/seamonkey_hg/trees/comm-central/mozilla/toolkit/xre/nsSigHandlers.cpp, line 193 The second block was repeated endlessly again. I interrupted the test again and crashtest.log had 4.7 Million lines.

Bruno 'Aqualon' Escherl

Comment 7

•

15 years ago

m-c rev 29114207a571 doesn't give me a hang, but m-c rev 9fd11a17eb1a does (all tests run with a 64bit Seamonkey debug build). I sometimes also get a warning after 457362-1.xhtml is loaded, but no idea if this is related: WARNING: ContentViewer exists outside gHistoryMaxViewer range: '!viewer', file /home/i6stud/sibresch/nobackup/seamonkey_hg/trees/comm-central/mozilla/docshell/shistory/src/nsSHistory.cpp, line 846 The strange thing is, I don't see the issue, if I only run the crashtests from mozilla/layout/base/. I have to run make crashtest in mozilla/. CCing Markus as the patch author of bug 506826 and roc/dbaron as the reviewers. Perhaps they can tell us more about this issue.

Bruno 'Aqualon' Escherl

Comment 8

•

15 years ago

Additional info, every time I attach ddd I get a different beginning of the stack trace. Only the repeating OnExposeEvent calls are the same.

Bruno 'Aqualon' Escherl

Comment 9

•

15 years ago

I've tested again on another machine with a newer kernel (2.6.31.12 instead of 2.6.27.45) and a newer Xorg (1.6.5 instead of 1.5.2) and can't see the problem there. So maybe it's an issue with the OS itself :/

Robert Kaiser

Comment 10

•

15 years ago

(In reply to comment #9) > I've tested again on another machine with a newer kernel (2.6.31.12 instead of > 2.6.27.45) and a newer Xorg (1.6.5 instead of 1.5.2) and can't see the problem > there. So maybe it's an issue with the OS itself :/ Make sure what you test is a debug build, as you found an assertion failure there, and assertions are fatal on debug but ignored on optimized builds. It's of course entirely possible that the GTK version in CentOS 5 has some subtle problem we are running into there, but while we can upgrade it in some way as long as it doesn't harm the runtime requirements of the builds generated on the same machines, we'd need RPMs applicable to this OS. And we can't run tests on any other platform, as we can't afford the luxury of running Talos with a different set of machines and OSes like FF does.

Ian Neal

Reporter

Comment 11

•

15 years ago

I've tested again on a debug build, still not crashing. My kernel is 2.6.32.19-163.fc12.i686 My xorg-x11-server-Xorg is 1.7.6-4.fc12.i686 My gtk2 is 2.18.9-3.fc12.i686 My gcc is 4.4.4 2010630 (Red Hat 4.4.4-10)

Serge Gautherie (:sgautherie)

Updated

•

15 years ago

Summary: [SeaMonkey 2.1, crashtest] 457362-1.xhtml segfaults on tinderbox but not locally → [Linux SeaMonkey 2.1, crashtest] 457362-1.xhtml segfaults on tinderbox but not locally

Serge Gautherie (:sgautherie)

Comment 12

•

15 years ago

http://brasstacks.mozilla.com/topfails/tests/SeaMonkey doesn't list this test :-/

Depends on: 564234

Justin Wood (:Callek)

Updated

•

15 years ago

Whiteboard: [orange]

Justin Wood (:Callek)

Updated

•

15 years ago

Depends on: 587189

Ed Morley [:emorley]

Comment 13

•

13 years ago

Mass marking whiteboard:[orange] bugs WFM (to clean up TBPL bug suggestions) that: * Haven't changed in > 6months * Whose whiteboard contains none of the strings: {disabled,marked,random,fuzzy,todo,fails,failing,annotated,leave open,time-bomb} * Passed a (quick) manual inspection of bug summary/whiteboard to ensure they weren't a false positive. I've also gone through and searched for cases where the whiteboard wasn't labelled correctly after test disabling, by using attachment description & basic comment searches. However if the test for which this bug was about has in fact been disabled/annotated/..., please accept my apologies & reopen/mark the whiteboard appropriately so this doesn't get re-closed in the future (and please ping me via IRC or email so I can try to tweak the saved searches to avoid more edge cases). Sorry for the spam! Filter on: #FFA500

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → WORKSFORME

Nobody; OK to take it and work on it

Assignee

Updated

•

13 years ago

Keywords: intermittent-failure

Nobody; OK to take it and work on it

Assignee

Updated

•

13 years ago

Whiteboard: [orange]