Closed Bug 1494195 Opened 7 years ago Closed 7 years ago

Intermittent TEST-UNEXPECTED-TIMEOUT | devtools/server/tests/unit/test_MemoryActor_saveHeapSnapshot_XX.js | Test timed out

Categories

(DevTools :: Memory, defect, P2)

defect

Tracking

(firefox64 fixed)

RESOLVED FIXED
Firefox 64
Tracking Status
firefox64 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: jimb)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell disable-recommended])

Attachments

(1 file)

Filed by: toros [at] mozilla.com https://treeherder.mozilla.org/logviewer.html#?job_id=201597595&repo=mozilla-inbound https://queue.taskcluster.net/v1/task/V0xQLCtAQ1SnWU3m168bQQ/runs/0/artifacts/public/logs/live_backing.log 03:13:10 INFO - TEST-START | toolkit/components/extensions/test/xpcshell/test_ext_native_messaging_perf.js 03:13:13 INFO - TEST-PASS | toolkit/components/extensions/test/xpcshell/test_ext_native_messaging_perf.js | took 3226ms 03:13:13 INFO - TEST-START | toolkit/components/reputationservice/test/unit/test_app_rep.js 03:13:13 INFO - TEST-PASS | toolkit/components/reputationservice/test/unit/test_app_rep.js | took 297ms 03:13:13 INFO - TEST-START | devtools/server/tests/unit/test_MemoryActor_saveHeapSnapshot_02.js 03:18:13 WARNING - TEST-UNEXPECTED-TIMEOUT | devtools/server/tests/unit/test_MemoryActor_saveHeapSnapshot_02.js | Test timed out 03:18:13 INFO - TEST-INFO took 300000ms 03:18:13 INFO - >>>>>>> 03:18:13 INFO - (xpcshell/head.js) | test MAIN run_test pending (1) 03:18:14 INFO - (xpcshell/head.js) | test pending (2) 03:18:14 INFO - PID 7280 | DBG-SERVER: Packet 0 sent from "root" 03:18:14 INFO - PID 7280 | Connecting client. 03:18:14 INFO - (xpcshell/head.js) | test MAIN run_test finished (2) 03:18:14 INFO - running event loop 03:18:14 INFO - PID 7280 | DBG-SERVER: Received packet 0: { 03:18:14 INFO - PID 7280 | "from": "root", 03:18:14 INFO - PID 7280 | "applicationType": "xpcshell-tests", 03:18:14 INFO - PID 7280 | "testConnectionPrefix": "server1.conn0.", 03:18:14 INFO - PID 7280 | "traits": { 03:18:14 INFO - PID 7280 | "sources": true, 03:18:14 INFO - PID 7280 | "highlightable": true, 03:18:14 INFO - PID 7280 | "networkMonitor": true, 03:18:14 INFO - PID 7280 | "storageInspector": true, 03:18:14 INFO - PID 7280 | "storageInspectorReadOnly": true, 03:18:14 INFO - PID 7280 | "conditionalBreakpoints": true, 03:18:14 INFO - PID 7280 | "debuggerSourceActors": true, 03:18:14 INFO - PID 7280 | "wasmBinarySource": true, 03:18:14 INFO - PID 7280 | "bulk": true, 03:18:14 INFO - PID 7280 | "selectorEditable": true, 03:18:14 INFO - PID 7280 | "getCssPath": true, 03:18:14 INFO - PID 7280 | "getXPath": true, 03:18:14 INFO - PID 7280 | "directorScripts": true, 03:18:14 INFO - PID 7280 | "noBlackBoxing": false, 03:18:14 INFO - PID 7280 | "noPrettyPrinting": false, 03:18:14 INFO - PID 7280 | "memoryActorAllocations": true, 03:18:14 INFO - PID 7280 | "webConsoleCommands": true, 03:18:14 INFO - PID 7280 | "allowChromeProcess": false, 03:18:14 INFO - PID 7280 | "profilerDataFilterable": true, 03:18:14 INFO - PID 7280 | "heapSnapshots": true, 03:18:14 INFO - PID 7280 | "documentLoadingMarkers": true, 03:18:14 INFO - PID 7280 | "webExtensionAddonConnect": true 03:18:14 INFO - PID 7280 | } 03:18:14 INFO - PID 7280 | }
Summary: Intermittent TEST-UNEXPECTED-TIMEOUT | devtools/server/tests/unit/test_MemoryActor_saveHeapSnapshot_02.js | Test timed out → Intermittent TEST-UNEXPECTED-TIMEOUT | devtools/server/tests/unit/test_MemoryActor_saveHeapSnapshot_XX.js | Test timed out
(In reply to Andreea Pavel [:apavel] from comment #5) > r/b point out to Bug 1493887: > > https://treeherder.mozilla.org/#/jobs?repo=mozilla- > inbound&selectedJob=201885192&searchStr=windows,7,pgo,xpcshell,tests,test- > windows7-32-pgo%2Fopt-xpcshell, > x(x)&group_state=expanded&tochange=0bb55b604bfdba0b208d2e6c71ae825d483e6a73&f > romchange=1160f8e0e57354c7aac8cd092d6173560dc43aa5 > > > Dao, can you please take a look? This doesn't sound right. The patch for bug 1493887 is in front-end UI code and should be irrelevant to xpcshell tests.
Flags: needinfo?(dao+bmo)
Related error in the test logs: 09:38:56 INFO - PID 8188 | console.error: "Error while calling actor 'heapSnapshotFile's method 'transferHeapSnapshot'" "No heap snapshot with id: 192.f" 09:38:56 INFO - PID 8188 | console.error: "transferHeapSnapshot@resource://devtools/shared/base-loader.js -> resource://devtools/server/actors/heap-snapshot-file.js:45:13\\nasync*handler@resource://devtools/shared/base-loader.js -> resource://devtools/shared/protocol.js:1178:21\\nonPacket@resource://devtools/shared/base-loader.js -> resource://devtools/server/main.js:1711:15\\nsend/<@resource://devtools/shared/base-loader.js -> resource://devtools/shared/transport/local-transport.js:64:11\\nexports.makeInfallible/<@resource://devtools/shared/base-loader.js -> resource://devtools/shared/ThreadSafeDevToolsUtils.js:109:14\\nexports.makeInfallible/<@resource://devtools/shared/base-loader.js -> resource://devtools/shared/ThreadSafeDevToolsUtils.js:109:14\\n_do_main@Z:\\\\task_1538039047\\\\build\\\\tests\\\\xpcshell\\\\head.js:219:3\\n_execute_test@Z:\\\\task_1538039047\\\\build\\\\tests\\\\xpcshell\\\\head.js:533:5\\n@-e:1:1\\n" 09:38:56 INFO - PID 8188 | DBG-SERVER: Packet 8 sent from "server1.conn0.heapSnapshotFileActor3" 09:38:56 INFO - PID 8188 | DBG-SERVER: Received packet 8: { 09:38:56 INFO - PID 8188 | "from": "server1.conn0.heapSnapshotFileActor3", 09:38:56 INFO - PID 8188 | "error": "unknownError", 09:38:56 INFO - PID 8188 | "message": "No heap snapshot with id: 192.f" 09:38:56 INFO - PID 8188 | } This is a bit puzzling that the pointed change could cause a spike in intermittents here though.
Also why this error doesn't fail the test earlier and we have to hit the timeout ?
`192.f` is indeed an invalid snapshot id according to the code in https://searchfox.org/mozilla-central/rev/819cd31a93fd50b7167979607371878c4d6f18e8/devtools/shared/heapsnapshot/HeapSnapshotFileUtils.js#49-51: function isValidSnapshotFileId(snapshotId) { return /^\d+(\-\d+)?$/.test(snapshotId); }
Note: before the error, `saveHeapSnapshot` returned this snapshot id in the same log: 09:38:56 INFO - PID 8188 | DBG-SERVER: Packet 5 sent to "server1.conn0.memoryActor11" 09:38:56 INFO - PID 8188 | DBG-SERVER: Received packet 5: { 09:38:56 INFO - PID 8188 | "type": "saveHeapSnapshot", 09:38:56 INFO - PID 8188 | "to": "server1.conn0.memoryActor11" 09:38:56 INFO - PID 8188 | } 09:38:56 INFO - PID 8188 | DBG-SERVER: Packet 6 sent from "server1.conn0.memoryActor11" 09:38:56 INFO - PID 8188 | DBG-SERVER: Received packet 6: { 09:38:56 INFO - PID 8188 | "snapshotId": "192.f", 09:38:56 INFO - PID 8188 | "from": "server1.conn0.memoryActor11" 09:38:56 INFO - PID 8188 | }
My current guess is that something's wrong here: https://searchfox.org/mozilla-central/rev/819cd31a93fd50b7167979607371878c4d6f18e8/devtools/shared/heapsnapshot/HeapSnapshot.cpp#1466-1467 outSnapshotId.Assign(Substring(outFilePath, tempPath.Length() + 1, outFilePath.Length() - tempPath.Length() - sizeof(".fxsnapshot.gz"))); But I don't see what, and I don't see why it wouldn't fail all the time. Filesystem limitations ? This happens mostly on Windows but also on Linux with the jsdcov config.
Actually I could reproduce on my Linux.
I instrumented the code to print the file path. I get /tmp/197.fxsnapshot-1.gz. I guess this is the explanation :) This means that how we create the snapshot id isn't unique enough when we create several snapshots in the same ms. When dealing with Firefox only this is good enough, but in tests we always execute the same operations in around the same timeframe, and therefore we often get the same ms value. That's why I reproduce easily on my Linux, and likely this happens on the test VM because /tmp isn't cleared between tests (?). I guess this is more present on Windows because of the lower timer precision. Jim, what's the right way to fix this? Should we extract `outSnapshotId` differently (for example from the start of the string to the first dot)?
I still don't know why this changed especially with bug 1491946 though.
> I still don't know why this changed especially with bug 1491946 though. ah I know: likely before bug 1491946, the `-1` suffix was added to `197` instead of `fxsnapshot`. Therefore we were getting a valid snapshot id.
Component: General → Memory
See Also: → 1495473
Priority: P5 → P2
I think my prefered fix would be to change File::CreateUnique [1] to special case the `.gz` (and possible others ? .bz2, .xz come to my mind) and try to find a previous dot for these extensions, as this behavior can break other use cases (eg saving a .tar.gz could give a .tar-1.gz and I believe that archival applications won't like this). [1] https://searchfox.org/mozilla-central/rev/a11c128b90ea85d7bb68f00c5de52c0794e0b215/xpcom/io/nsLocalFileCommon.cpp#48
julienw: Hi, Are there any updates on this? This is in the disable-recommended list now. Thanks!
Flags: needinfo?(felash)
I'm waiting either for feedback and guidance, or for somebody that knows more C++ than myself to do it.
Flags: needinfo?(felash)
Joel there are 135 total failures in the last 7 days, should we make a disable patch for this ?
Flags: needinfo?(jmaher)
> njn: Hi, Could you please have a look into this? Thank you! I might be able to help if somebody gives me clear instructions on how I might help. I'd prefer not to piece together context from a stream of comments.
Flags: needinfo?(n.nethercote)
:njn, this test ( devtools/server/tests/unit/test_MemoryActor_saveHeapSnapshot_XX.js ) is failing at a very high rate. this is an xpcshell based test that runs, and it is failing mostly on windows 7 opt, but also frequent enough on windows 10 opt. You can get log files for yesterdays failures here: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2018-10-10&endday=2018-10-10&tree=all&bug=1494195
Flags: needinfo?(jmaher)
:njn, I investigated this in the previous comments (I agree that all the "intermittent" comments are a bit overwhelming). See especially comment 11 to 19. The root cause for this issue is that we rely on `nsLocalFileCommon::CreateUnique` to generate a unique file name, and the code uses the basename for this file name as a unique ID. The change in bug 1491946 made the file name `XXX.fxsnapshot.gz`, with 2 dots, and so when there's a name conflict `createUnique` generates a `XXX.fxsnapshot-1.gz`, while before we were using `XXX-1.fxsnapshot`. Comment 22 is my proposed fix, but this is a bit out of my knowledge (crossplatform path handling in C++). Another option would be to somehow generate a unique name without using `createUnique`. I wanted some feedback from :jimb but couldn't get any so far. Hope this helps
Blocks: 1491946
IMO the first step to do here is the usual: back out the patch that is causing the problems. After that, I suppose changing CreateUnique() to handle double suffixes specially would fix it. But I'm wary about changing long-lived XPCOM functions like that, because it's possible that would fix this problem but cause other problems elsewhere. I figure the CreateUnique() call in question is this one: https://searchfox.org/mozilla-central/source/devtools/shared/heapsnapshot/HeapSnapshot.cpp#1436? Assuming that's right, I think the following would be a better solution: instead of generating the unique filename based on `XXX.fxsnapshot.gz`, generate it based on `X.fxsnapshot` and then append ".gz" afterwards. jimb definitely will be able to make that change.
Yes, there should be no problem backing out that patch, if it's causing difficulties.
Flags: needinfo?(jimb)
The original patch interacts badly with `nsLocalFileCommon::CreateUnique`, which begins generating filenames like "XXX.fxsnapshot-1.gz", which no longer match the "*.fxsnapshot.gz" pattern.
Assignee: nobody → jimb
Status: NEW → ASSIGNED
Attachment #9016472 - Flags: review?(nfitzgerald)
(In reply to Jim Blandy :jimb from comment #40) > Try push: I don't see any devtools failures that involve memory here.
Attachment #9016472 - Flags: review?(nfitzgerald) → review+
jimb: Hi, can we add the checkin-needed tag for your patch? Thanks!
Flags: needinfo?(jimb)
Adding the checkin-needed
Flags: needinfo?(jimb)
Keywords: checkin-needed
Pushed by ebalazs@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/7f35ad97d197 Back out bug 1491946, using '.fxsnapshot.gz' for memory heap snapshots. r=fitzgen
Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 64
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: