Closed
Bug 1494195
Opened 7 years ago
Closed 7 years ago
Intermittent TEST-UNEXPECTED-TIMEOUT | devtools/server/tests/unit/test_MemoryActor_saveHeapSnapshot_XX.js | Test timed out
Categories
(DevTools :: Memory, defect, P2)
DevTools
Memory
Tracking
(firefox64 fixed)
RESOLVED
FIXED
Firefox 64
| Tracking | Status | |
|---|---|---|
| firefox64 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: jimb)
References
(Blocks 1 open bug)
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell disable-recommended])
Attachments
(1 file)
|
10.88 KB,
patch
|
fitzgen
:
review+
|
Details | Diff | Splinter Review |
Filed by: toros [at] mozilla.com
https://treeherder.mozilla.org/logviewer.html#?job_id=201597595&repo=mozilla-inbound
https://queue.taskcluster.net/v1/task/V0xQLCtAQ1SnWU3m168bQQ/runs/0/artifacts/public/logs/live_backing.log
03:13:10 INFO - TEST-START | toolkit/components/extensions/test/xpcshell/test_ext_native_messaging_perf.js
03:13:13 INFO - TEST-PASS | toolkit/components/extensions/test/xpcshell/test_ext_native_messaging_perf.js | took 3226ms
03:13:13 INFO - TEST-START | toolkit/components/reputationservice/test/unit/test_app_rep.js
03:13:13 INFO - TEST-PASS | toolkit/components/reputationservice/test/unit/test_app_rep.js | took 297ms
03:13:13 INFO - TEST-START | devtools/server/tests/unit/test_MemoryActor_saveHeapSnapshot_02.js
03:18:13 WARNING - TEST-UNEXPECTED-TIMEOUT | devtools/server/tests/unit/test_MemoryActor_saveHeapSnapshot_02.js | Test timed out
03:18:13 INFO - TEST-INFO took 300000ms
03:18:13 INFO - >>>>>>>
03:18:13 INFO - (xpcshell/head.js) | test MAIN run_test pending (1)
03:18:14 INFO - (xpcshell/head.js) | test pending (2)
03:18:14 INFO - PID 7280 | DBG-SERVER: Packet 0 sent from "root"
03:18:14 INFO - PID 7280 | Connecting client.
03:18:14 INFO - (xpcshell/head.js) | test MAIN run_test finished (2)
03:18:14 INFO - running event loop
03:18:14 INFO - PID 7280 | DBG-SERVER: Received packet 0: {
03:18:14 INFO - PID 7280 | "from": "root",
03:18:14 INFO - PID 7280 | "applicationType": "xpcshell-tests",
03:18:14 INFO - PID 7280 | "testConnectionPrefix": "server1.conn0.",
03:18:14 INFO - PID 7280 | "traits": {
03:18:14 INFO - PID 7280 | "sources": true,
03:18:14 INFO - PID 7280 | "highlightable": true,
03:18:14 INFO - PID 7280 | "networkMonitor": true,
03:18:14 INFO - PID 7280 | "storageInspector": true,
03:18:14 INFO - PID 7280 | "storageInspectorReadOnly": true,
03:18:14 INFO - PID 7280 | "conditionalBreakpoints": true,
03:18:14 INFO - PID 7280 | "debuggerSourceActors": true,
03:18:14 INFO - PID 7280 | "wasmBinarySource": true,
03:18:14 INFO - PID 7280 | "bulk": true,
03:18:14 INFO - PID 7280 | "selectorEditable": true,
03:18:14 INFO - PID 7280 | "getCssPath": true,
03:18:14 INFO - PID 7280 | "getXPath": true,
03:18:14 INFO - PID 7280 | "directorScripts": true,
03:18:14 INFO - PID 7280 | "noBlackBoxing": false,
03:18:14 INFO - PID 7280 | "noPrettyPrinting": false,
03:18:14 INFO - PID 7280 | "memoryActorAllocations": true,
03:18:14 INFO - PID 7280 | "webConsoleCommands": true,
03:18:14 INFO - PID 7280 | "allowChromeProcess": false,
03:18:14 INFO - PID 7280 | "profilerDataFilterable": true,
03:18:14 INFO - PID 7280 | "heapSnapshots": true,
03:18:14 INFO - PID 7280 | "documentLoadingMarkers": true,
03:18:14 INFO - PID 7280 | "webExtensionAddonConnect": true
03:18:14 INFO - PID 7280 | }
03:18:14 INFO - PID 7280 | }
Updated•7 years ago
|
Summary: Intermittent TEST-UNEXPECTED-TIMEOUT | devtools/server/tests/unit/test_MemoryActor_saveHeapSnapshot_02.js | Test timed out → Intermittent TEST-UNEXPECTED-TIMEOUT | devtools/server/tests/unit/test_MemoryActor_saveHeapSnapshot_XX.js | Test timed out
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 5•7 years ago
|
||
r/b point out to Bug 1493887:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&selectedJob=201885192&searchStr=windows,7,pgo,xpcshell,tests,test-windows7-32-pgo%2Fopt-xpcshell,x(x)&group_state=expanded&tochange=0bb55b604bfdba0b208d2e6c71ae825d483e6a73&fromchange=1160f8e0e57354c7aac8cd092d6173560dc43aa5
Dao, can you please take a look?
Flags: needinfo?(dao+bmo)
Comment 6•7 years ago
|
||
(In reply to Andreea Pavel [:apavel] from comment #5)
> r/b point out to Bug 1493887:
>
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> inbound&selectedJob=201885192&searchStr=windows,7,pgo,xpcshell,tests,test-
> windows7-32-pgo%2Fopt-xpcshell,
> x(x)&group_state=expanded&tochange=0bb55b604bfdba0b208d2e6c71ae825d483e6a73&f
> romchange=1160f8e0e57354c7aac8cd092d6173560dc43aa5
>
>
> Dao, can you please take a look?
This doesn't sound right. The patch for bug 1493887 is in front-end UI code and should be irrelevant to xpcshell tests.
Flags: needinfo?(dao+bmo)
| Comment hidden (Intermittent Failures Robot) |
Comment 9•7 years ago
|
||
I think this started on: bug 1491946
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&selectedJob=202206564&searchStr=windows,7,pgo,xpcshell,tests,test-windows7-32-pgo%2Fopt-xpcshell,x(x)&tochange=0bb55b604bfdba0b208d2e6c71ae825d483e6a73&fromchange=20faad8222c235821be825d16f7a4cf776b7a098
jimb, could you, please, take a look?
Flags: needinfo?(jimb)
| Comment hidden (Intermittent Failures Robot) |
Comment 11•7 years ago
|
||
Related error in the test logs:
09:38:56 INFO - PID 8188 | console.error: "Error while calling actor 'heapSnapshotFile's method 'transferHeapSnapshot'" "No heap snapshot with id: 192.f"
09:38:56 INFO - PID 8188 | console.error: "transferHeapSnapshot@resource://devtools/shared/base-loader.js -> resource://devtools/server/actors/heap-snapshot-file.js:45:13\\nasync*handler@resource://devtools/shared/base-loader.js -> resource://devtools/shared/protocol.js:1178:21\\nonPacket@resource://devtools/shared/base-loader.js -> resource://devtools/server/main.js:1711:15\\nsend/<@resource://devtools/shared/base-loader.js -> resource://devtools/shared/transport/local-transport.js:64:11\\nexports.makeInfallible/<@resource://devtools/shared/base-loader.js -> resource://devtools/shared/ThreadSafeDevToolsUtils.js:109:14\\nexports.makeInfallible/<@resource://devtools/shared/base-loader.js -> resource://devtools/shared/ThreadSafeDevToolsUtils.js:109:14\\n_do_main@Z:\\\\task_1538039047\\\\build\\\\tests\\\\xpcshell\\\\head.js:219:3\\n_execute_test@Z:\\\\task_1538039047\\\\build\\\\tests\\\\xpcshell\\\\head.js:533:5\\n@-e:1:1\\n"
09:38:56 INFO - PID 8188 | DBG-SERVER: Packet 8 sent from "server1.conn0.heapSnapshotFileActor3"
09:38:56 INFO - PID 8188 | DBG-SERVER: Received packet 8: {
09:38:56 INFO - PID 8188 | "from": "server1.conn0.heapSnapshotFileActor3",
09:38:56 INFO - PID 8188 | "error": "unknownError",
09:38:56 INFO - PID 8188 | "message": "No heap snapshot with id: 192.f"
09:38:56 INFO - PID 8188 | }
This is a bit puzzling that the pointed change could cause a spike in intermittents here though.
Comment 12•7 years ago
|
||
Also why this error doesn't fail the test earlier and we have to hit the timeout ?
Comment 13•7 years ago
|
||
`192.f` is indeed an invalid snapshot id according to the code in https://searchfox.org/mozilla-central/rev/819cd31a93fd50b7167979607371878c4d6f18e8/devtools/shared/heapsnapshot/HeapSnapshotFileUtils.js#49-51:
function isValidSnapshotFileId(snapshotId) {
return /^\d+(\-\d+)?$/.test(snapshotId);
}
Comment 14•7 years ago
|
||
Note: before the error, `saveHeapSnapshot` returned this snapshot id in the same log:
09:38:56 INFO - PID 8188 | DBG-SERVER: Packet 5 sent to "server1.conn0.memoryActor11"
09:38:56 INFO - PID 8188 | DBG-SERVER: Received packet 5: {
09:38:56 INFO - PID 8188 | "type": "saveHeapSnapshot",
09:38:56 INFO - PID 8188 | "to": "server1.conn0.memoryActor11"
09:38:56 INFO - PID 8188 | }
09:38:56 INFO - PID 8188 | DBG-SERVER: Packet 6 sent from "server1.conn0.memoryActor11"
09:38:56 INFO - PID 8188 | DBG-SERVER: Received packet 6: {
09:38:56 INFO - PID 8188 | "snapshotId": "192.f",
09:38:56 INFO - PID 8188 | "from": "server1.conn0.memoryActor11"
09:38:56 INFO - PID 8188 | }
Comment 15•7 years ago
|
||
My current guess is that something's wrong here:
https://searchfox.org/mozilla-central/rev/819cd31a93fd50b7167979607371878c4d6f18e8/devtools/shared/heapsnapshot/HeapSnapshot.cpp#1466-1467
outSnapshotId.Assign(Substring(outFilePath, tempPath.Length() + 1,
outFilePath.Length() - tempPath.Length() - sizeof(".fxsnapshot.gz")));
But I don't see what, and I don't see why it wouldn't fail all the time. Filesystem limitations ? This happens mostly on Windows but also on Linux with the jsdcov config.
Comment 16•7 years ago
|
||
Actually I could reproduce on my Linux.
Comment 17•7 years ago
|
||
I instrumented the code to print the file path. I get /tmp/197.fxsnapshot-1.gz. I guess this is the explanation :)
This means that how we create the snapshot id isn't unique enough when we create several snapshots in the same ms.
When dealing with Firefox only this is good enough, but in tests we always execute the same operations in around the same timeframe, and therefore we often get the same ms value. That's why I reproduce easily on my Linux, and likely this happens on the test VM because /tmp isn't cleared between tests (?).
I guess this is more present on Windows because of the lower timer precision.
Jim, what's the right way to fix this? Should we extract `outSnapshotId` differently (for example from the start of the string to the first dot)?
Comment 18•7 years ago
|
||
I still don't know why this changed especially with bug 1491946 though.
Comment 19•7 years ago
|
||
> I still don't know why this changed especially with bug 1491946 though.
ah I know: likely before bug 1491946, the `-1` suffix was added to `197` instead of `fxsnapshot`. Therefore we were getting a valid snapshot id.
Updated•7 years ago
|
Component: General → Memory
Updated•7 years ago
|
Priority: P5 → P2
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 22•7 years ago
|
||
I think my prefered fix would be to change File::CreateUnique [1] to special case the `.gz` (and possible others ? .bz2, .xz come to my mind) and try to find a previous dot for these extensions, as this behavior can break other use cases (eg saving a .tar.gz could give a .tar-1.gz and I believe that archival applications won't like this).
[1] https://searchfox.org/mozilla-central/rev/a11c128b90ea85d7bb68f00c5de52c0794e0b215/xpcom/io/nsLocalFileCommon.cpp#48
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 26•7 years ago
|
||
julienw: Hi, Are there any updates on this?
This is in the disable-recommended list now. Thanks!
Flags: needinfo?(felash)
Comment 27•7 years ago
|
||
I'm waiting either for feedback and guidance, or for somebody that knows more C++ than myself to do it.
Flags: needinfo?(felash)
Comment 28•7 years ago
|
||
This is a frequent intermittent, with 234 failures in the last 2 weeks.
Recent failure logs:
https://treeherder.mozilla.org/logviewer.html#?job_id=203994316&repo=mozilla-central&lineNumber=10691
https://treeherder.mozilla.org/logviewer.html#?job_id=203907753&repo=mozilla-central&lineNumber=11127
https://treeherder.mozilla.org/logviewer.html#?job_id=203986992&repo=mozilla-central&lineNumber=10608
njn: Hi, Could you please have a look into this? Thank you!
Flags: needinfo?(n.nethercote)
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 32•7 years ago
|
||
Joel there are 135 total failures in the last 7 days, should we make a disable patch for this ?
Flags: needinfo?(jmaher)
| Comment hidden (Intermittent Failures Robot) |
Comment 34•7 years ago
|
||
> njn: Hi, Could you please have a look into this? Thank you!
I might be able to help if somebody gives me clear instructions on how I might help. I'd prefer not to piece together context from a stream of comments.
Flags: needinfo?(n.nethercote)
Comment 35•7 years ago
|
||
:njn, this test ( devtools/server/tests/unit/test_MemoryActor_saveHeapSnapshot_XX.js ) is failing at a very high rate. this is an xpcshell based test that runs, and it is failing mostly on windows 7 opt, but also frequent enough on windows 10 opt. You can get log files for yesterdays failures here:
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2018-10-10&endday=2018-10-10&tree=all&bug=1494195
Flags: needinfo?(jmaher)
Comment 36•7 years ago
|
||
:njn, I investigated this in the previous comments (I agree that all the "intermittent" comments are a bit overwhelming).
See especially comment 11 to 19. The root cause for this issue is that we rely on `nsLocalFileCommon::CreateUnique` to generate a unique file name, and the code uses the basename for this file name as a unique ID. The change in bug 1491946 made the file name `XXX.fxsnapshot.gz`, with 2 dots, and so when there's a name conflict `createUnique` generates a `XXX.fxsnapshot-1.gz`, while before we were using `XXX-1.fxsnapshot`.
Comment 22 is my proposed fix, but this is a bit out of my knowledge (crossplatform path handling in C++).
Another option would be to somehow generate a unique name without using `createUnique`. I wanted some feedback from :jimb but couldn't get any so far.
Hope this helps
Blocks: 1491946
Comment 37•7 years ago
|
||
IMO the first step to do here is the usual: back out the patch that is causing the problems.
After that, I suppose changing CreateUnique() to handle double suffixes specially would fix it. But I'm wary about changing long-lived XPCOM functions like that, because it's possible that would fix this problem but cause other problems elsewhere.
I figure the CreateUnique() call in question is this one: https://searchfox.org/mozilla-central/source/devtools/shared/heapsnapshot/HeapSnapshot.cpp#1436? Assuming that's right, I think the following would be a better solution: instead of generating the unique filename based on `XXX.fxsnapshot.gz`, generate it based on `X.fxsnapshot` and then append ".gz" afterwards. jimb definitely will be able to make that change.
| Assignee | ||
Comment 38•7 years ago
|
||
Yes, there should be no problem backing out that patch, if it's causing difficulties.
Flags: needinfo?(jimb)
| Assignee | ||
Comment 39•7 years ago
|
||
The original patch interacts badly with `nsLocalFileCommon::CreateUnique`, which begins generating filenames like "XXX.fxsnapshot-1.gz", which no longer match the "*.fxsnapshot.gz" pattern.
| Assignee | ||
Comment 40•7 years ago
|
||
| Comment hidden (Intermittent Failures Robot) |
| Assignee | ||
Comment 42•7 years ago
|
||
(In reply to Jim Blandy :jimb from comment #40)
> Try push:
I don't see any devtools failures that involve memory here.
Updated•7 years ago
|
Attachment #9016472 -
Flags: review?(nfitzgerald) → review+
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 45•7 years ago
|
||
jimb: Hi, can we add the checkin-needed tag for your patch? Thanks!
Flags: needinfo?(jimb)
Comment 47•7 years ago
|
||
Pushed by ebalazs@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7f35ad97d197
Back out bug 1491946, using '.fxsnapshot.gz' for memory heap snapshots. r=fitzgen
Keywords: checkin-needed
Comment 48•7 years ago
|
||
| bugherder | ||
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox64:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 64
| Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•