Closed Bug 711639 Opened 13 years ago Closed 12 years ago

Some test failures result in multiple "Fetching summary failed" messages before succeeding, or just never succeed

Categories

(Tree Management Graveyard :: TBPL, defect)

defect
Not set
critical

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: philor, Unassigned)

Details

The most common one, but not the only one we have right now, is https://tbpl.mozilla.org/php/getParsedLog.php?id=7986895&tree=Mozilla-Inbound with the failure lines being

73429 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_audio_event_adopt.html | Test timed out.
73517 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_buffered.html | Test timed out.
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_bug448534.html | application timed out after 330 seconds with no output
PROCESS-CRASH | /tests/content/media/test/test_bug448534.html | application crashed (minidump found)
Thread 0 (crashed)
TEST-UNEXPECTED-FAIL | plugin process 2250 | automationutils.processLeakLog() | missing output line for total leaks!

so my guess is that we're picking an unfortunate search term out of | plugin process 2250 | and timing out searching Bugzilla for that.
Bug 484123 seems like it might be a problem, maybe from lines like:
TEST-UNEXPECTED-FAIL | unknown test url | [SimpleTest/SimpleTest.js, window.onerror] - An error occurred: uncaught exception: [Exception... "An attempt was made to use an object that is not, or is no longer, usable"  code: "11" nsresult: "0x8053000b (NS_ERROR_DOM_INVALID_STATE_ERR)"  location: "http://mochi.test:8888/tests/dom/tests/mochitest/ajax/offline/test_updatingManifest.html Line: 121"] at :0
https://tbpl.mozilla.org/php/getParsedLog.php?id=7998520&tree=Mozilla-Inbound / bug 706897 is the other one that is extremely resistant to loading the summary, as you can see by how often I've given up lately.

Much as I like blaming bug searches, since they've been our usual source of summary-loading pain, I just don't see anything in those which should be slow to fetch.

Well, unless there's something I'm not seeing in http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/tip/php/inc/AnnotatedSummaryGenerator.php that causes us to search for " " when the filename-containing bit instead contains words and spaces - sw:orange&summary=%20 is a brutally slow thing to hit the API for, just like "" used to be before we stopped searching for that.
Summary: Stop searching Bugzilla for things that we won't find, and timing out not finding them → Some test failures result in multiple "Fetching summary failed" messages before succeeding, or just never succeed
To make it even harder to debug, it probably was bug searches, but the real summary would be "Some failures result in extremely slow bug searches while bmo is failed over to the San Jose colo."
And no, I can't believe that I was dumb enough to taunt Happy Fun Phx.
I don't have any theory for how this could be the case, since fetching the log itself has never actually been the problem (you can see by how quickly the scrape has always showed up, that being how long it takes to fetch the log and not do much to it), but somehow this was fixed by bug 717005. I started seeing successful fetches of the ones that were always trouble when that was on tbpl-dev, and today I actually got a couple of failed runs with ~5000 failures to display suggestions (though not to submit bug comments, so some bug I filed long long ago about knowing what the maximum comment size is and limiting to it has been incorrectly closed, but that's not this).
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Product: Webtools → Tree Management
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.