Closed Bug 1701168 Opened 3 years ago Closed 3 years ago

Perma [tier2] TEST-UNEXPECTED-TIMEOUT | Launcher specs Puppeteer Puppeteer.connect should be able to reconnect to a disconnected browser (launcher.spec.ts) | expected PASS

Categories

(Remote Protocol :: CDP, defect, P5)

defect

Tracking

(firefox88 unaffected, firefox89 fixed, firefox90 fixed)

RESOLVED FIXED
90 Branch
Tracking Status
firefox88 --- unaffected
firefox89 --- fixed
firefox90 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: whimboo)

References

(Regression, )

Details

(Keywords: intermittent-failure, regression, Whiteboard: [retriggered][stockwell disable-recommended])

Attachments

(2 files, 1 obsolete file)

Filed by: apavel [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=334525580&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/YASpRlvpTx6CkMvJ-_vW-Q/runs/0/artifacts/public/logs/live_backing.log


[task 2021-03-26T07:44:47.740Z] TEST-START | Launcher specs Puppeteer Puppeteer.connect should support ignoreHTTPSErrors option (launcher.spec.ts)
[task 2021-03-26T07:44:47.741Z] PID 361 | ["test-start",{"title":"should support ignoreHTTPSErrors option","fullTitle":"Launcher specs Puppeteer Puppeteer.connect should support ignoreHTTPSErrors option","file":"/builds/worker/checkouts/gecko/remote/test/puppeteer/test/launcher.spec.ts","currentRetry":0}]
[task 2021-03-26T07:44:47.785Z] PID 361 | *** You are running in headless mode.
[task 2021-03-26T07:44:48.088Z] PID 361 | [GFX1-]: glxtest: libEGL initialize failed
[task 2021-03-26T07:44:48.088Z] PID 361 | [GFX1-]: glxtest: Unable to open a connection to the X server
[task 2021-03-26T07:44:48.088Z] PID 361 | [GFX1-]: glxtest: libEGL initialize failed
[task 2021-03-26T07:44:48.090Z] PID 361 | 1616744688090	RemoteAgent	DEBUG	Resetting recommended pref prompts.contentPromptSubDialog
[task 2021-03-26T07:44:48.091Z] PID 361 | 1616744688090	RemoteAgent	ERROR	unable to stop listener: [Exception... "ServiceManager::GetService returned failure code:"  nsresult: "0x80570016 (NS_ERROR_XPC_GS_RETURNED_FAILURE)"  location: "JS frame :: chrome://remote/content/cdp/observers/ChannelEventSink.jsm :: ChannelEventSinkFactory.getService :: line 110"  data: no] Stack trace: ChannelEventSinkFactory.getService()@ChannelEventSink.jsm:110
[task 2021-03-26T07:44:48.092Z] PID 361 | NetworkObserver()@NetworkObserver.jsm:58
[task 2021-03-26T07:44:48.092Z] PID 361 | get networkObserver()@Session.jsm:69
[task 2021-03-26T07:44:48.093Z] PID 361 | destructor()@Session.jsm:55
[task 2021-03-26T07:44:48.093Z] PID 361 | onClosed()@Connection.jsm:282
[task 2021-03-26T07:44:48.093Z] PID 361 | close()@WebSocketTransport.jsm:63
[task 2021-03-26T07:44:48.094Z] PID 361 | close()@Connection.jsm:268
[task 2021-03-26T07:44:48.094Z] PID 361 | destructor()@Target.jsm:57
[task 2021-03-26T07:44:48.094Z] PID 361 | destroyTarget()@TargetList.jsm:91
[task 2021-03-26T07:44:48.095Z] PID 361 | destructor()@TargetList.jsm:102
[task 2021-03-26T07:44:48.095Z] PID 361 | close()@RemoteAgent.jsm:136
[task 2021-03-26T07:44:49.170Z] PID 361 | 1616744689170	RemoteAgent	DEBUG	Setting recommended pref prompts.contentPromptSubDialog to false
[task 2021-03-26T07:44:49.179Z] PID 361 | DevTools listening on ws://localhost:38407/devtools/browser/38f14bf3-4e51-4c53-8f5f-9ecde3c74435

[task 2021-03-26T07:44:51.710Z] PID 361 | 1616744691708	RemoteAgent	TRACE	<-(connection {41c99779-a8e8-45e6-b293-cc074eff52f2}) {"sessionId":"80bded0e-fb07-4a22-8af5-3be62b3736d8","method":"Runtime.executionContextCreated","params":{"context":{"id":2,"o ... es/nested-frames.html","name":"__puppeteer_utility_world__","auxData":{"isDefault":false,"frameId":"37","type":"isolated"}}}}
[task 2021-03-26T07:44:51.710Z] PID 361 | 1616744691708	RemoteAgent	TRACE	<-(connection {41c99779-a8e8-45e6-b293-cc074eff52f2}) {"sessionId":"80bded0e-fb07-4a22-8af5-3be62b3736d8","id":23,"result":{"executionContextId":2}}
[task 2021-03-26T07:44:51.710Z] PID 361 | 1616744691708	RemoteAgent	TRACE	<-(connection {41c99779-a8e8-45e6-b293-cc074eff52f2}) {"sessionId":"4b4d4d3f-e082-4088-a9b4-3b26ac89e0bb","id":22,"result":{"identifier":"6648ee01-43df-4847-8d9f-98098074ba3c"}}
[task 2021-03-26T07:44:51.710Z] PID 361 | 1616744691709	RemoteAgent	TRACE	<-(connection {41c99779-a8e8-45e6-b293-cc074eff52f2}) {"id":24,"sessionId":"80bded0e-fb07-4a22-8af5-3be62b3736d8","error":{"message":"No frame for given id found","data":"createIs ... /cdp/domains/DomainCache.jsm:101:25\nreceiveMessage@chrome://remote/content/cdp/sessions/ContentProcessSession.jsm:84:45\n"}}
[task 2021-03-26T07:44:51.710Z] PID 361 | 1616744691709	RemoteAgent	TRACE	<-(connection {41c99779-a8e8-45e6-b293-cc074eff52f2}) {"id":25,"sessionId":"80bded0e-fb07-4a22-8af5-3be62b3736d8","error":{"message":"No frame for given id found","data":"createIs ... /cdp/domains/DomainCache.jsm:101:25\nreceiveMessage@chrome://remote/content/cdp/sessions/ContentProcessSession.jsm:84:45\n"}}
[task 2021-03-26T07:44:51.710Z] PID 361 | 1616744691710	RemoteAgent	TRACE	<-(connection {41c99779-a8e8-45e6-b293-cc074eff52f2}) {"id":26,"sessionId":"80bded0e-fb07-4a22-8af5-3be62b3736d8","error":{"message":"No frame for given id found","data":"createIs ... /cdp/domains/DomainCache.jsm:101:25\nreceiveMessage@chrome://remote/content/cdp/sessions/ContentProcessSession.jsm:84:45\n"}}
[task 2021-03-26T07:44:51.711Z] PID 361 | 1616744691711	RemoteAgent	TRACE	<-(connection {41c99779-a8e8-45e6-b293-cc074eff52f2}) {"id":27,"sessionId":"80bded0e-fb07-4a22-8af5-3be62b3736d8","error":{"message":"No frame for given id found","data":"createIs ... /cdp/domains/DomainCache.jsm:101:25\nreceiveMessage@chrome://remote/content/cdp/sessions/ContentProcessSession.jsm:84:45\n"}}
[task 2021-03-26T07:44:51.747Z] PID 361 | 1616744691746	RemoteAgent	TRACE	(connection {41c99779-a8e8-45e6-b293-cc074eff52f2})-> {"sessionId":"4b4d4d3f-e082-4088-a9b4-3b26ac89e0bb","method":"Page.createIsolatedWorld","params":{"frameId":"19","worldName":"__puppeteer_utility_world__","grantUniveralAccess":true},"id":28}
[task 2021-03-26T07:44:51.748Z] PID 361 | 1616744691746	RemoteAgent	TRACE	(connection {41c99779-a8e8-45e6-b293-cc074eff52f2})-> {"sessionId":"80bded0e-fb07-4a22-8af5-3be62b3736d8","method":"Emulation.setDeviceMetricsOverride","params":{"mobile":false,"width":800,"height":600,"deviceScaleFactor":1,"screenOrientation":{"angle":0,"type":"portraitPrimary"}},"id":29}
[task 2021-03-26T07:44:51.748Z] PID 361 | 1616744691747	RemoteAgent	TRACE	(connection {41c99779-a8e8-45e6-b293-cc074eff52f2})-> {"sessionId":"80bded0e-fb07-4a22-8af5-3be62b3736d8","method":"Emulation.setTouchEmulationEnabled","params":{"enabled":false},"id":30}
[task 2021-03-26T07:44:51.748Z] PID 361 | 1616744691747	RemoteAgent	TRACE	<-(connection {41c99779-a8e8-45e6-b293-cc074eff52f2}) {"sessionId":"80bded0e-fb07-4a22-8af5-3be62b3736d8","id":30,"result":{}}
[task 2021-03-26T07:44:51.749Z] PID 361 | 1616744691748	RemoteAgent	TRACE	<-(connection {41c99779-a8e8-45e6-b293-cc074eff52f2}) {"sessionId":"4b4d4d3f-e082-4088-a9b4-3b26ac89e0bb","method":"Runtime.executionContextCreated","params":{"context":{"id":2,"origin":"about:blank","name":"__puppeteer_utility_world__","auxData":{"isDefault":false,"frameId":"19","type":"isolated"}}}}
[task 2021-03-26T07:44:51.749Z] PID 361 | 1616744691748	RemoteAgent	TRACE	<-(connection {41c99779-a8e8-45e6-b293-cc074eff52f2}) {"sessionId":"4b4d4d3f-e082-4088-a9b4-3b26ac89e0bb","id":28,"result":{"executionContextId":2}}
[task 2021-03-26T07:44:51.749Z] PID 361 | 1616744691748	RemoteAgent	TRACE	<-(connection {41c99779-a8e8-45e6-b293-cc074eff52f2}) {"sessionId":"80bded0e-fb07-4a22-8af5-3be62b3736d8","id":29,"result":{}}
[task 2021-03-26T07:44:51.787Z] PID 361 | 1616744691786	RemoteAgent	TRACE	(connection {41c99779-a8e8-45e6-b293-cc074eff52f2})-> {"sessionId":"4b4d4d3f-e082-4088-a9b4-3b26ac89e0bb","method":"Emulation.setDeviceMetricsOverride","params":{"mobile":false,"width":800,"height":600,"deviceScaleFactor":1,"screenOrientation":{"angle":0,"type":"portraitPrimary"}},"id":31}
[task 2021-03-26T07:44:51.787Z] PID 361 | 1616744691787	RemoteAgent	TRACE	(connection {41c99779-a8e8-45e6-b293-cc074eff52f2})-> {"sessionId":"4b4d4d3f-e082-4088-a9b4-3b26ac89e0bb","method":"Emulation.setTouchEmulationEnabled","params":{"enabled":false},"id":32}
[task 2021-03-26T07:44:51.787Z] PID 361 | 1616744691787	RemoteAgent	TRACE	<-(connection {41c99779-a8e8-45e6-b293-cc074eff52f2}) {"sessionId":"4b4d4d3f-e082-4088-a9b4-3b26ac89e0bb","id":32,"result":{}}
[task 2021-03-26T07:45:09.646Z] TEST-UNEXPECTED-TIMEOUT | Launcher specs Puppeteer Puppeteer.connect should be able to reconnect to a disconnected browser (launcher.spec.ts) | expected PASS
[task 2021-03-26T07:45:09.646Z] TEST-INFO took 20014ms
[task 2021-03-26T07:45:09.646Z] PID 361 | ["fail",{"title":"should be able to reconnect to a disconnected browser","fullTitle":"Launcher specs Puppeteer Puppeteer.connect should be able to reconnect to a disconnected browser","file":"/builds/worker/checkouts/gecko/remote/test/puppeteer/test/launcher.spec.ts","duration":20014,"currentRetry":0,"err":"Timeout of 20000ms exceeded. For async tests and hooks, ensure \"done()\" is called; if returning a Promise, ensure it resolves. (/builds/worker/checkouts/gecko/remote/test/puppeteer/test/launcher.spec.ts)","stack":"Error: Timeout of 20000ms exceeded. For async tests and hooks, ensure \"done()\" is called; if returning a Promise, ensure it resolves. (/builds/worker/checkouts/gecko/remote/test/puppeteer/test/launcher.spec.ts)\n    at createTimeoutError (/builds/worker/checkouts/gecko/remote/test/puppeteer/node_modules/mocha/lib/errors.js:491:15)\n    at Test.Runnable._timeoutError (/builds/worker/checkouts/gecko/remote/test/puppeteer/node_modules/mocha/lib/runnable.js:431:10)\n    at Timeout.<anonymous> (/builds/worker/checkouts/gecko/remote/test/puppeteer/node_modules/mocha/lib/runnable.js:246:24)\n    at ontimeout (timers.js:436:11)\n    at tryOnTimeout (timers.js:300:5)\n    at listOnTimeout (timers.js:263:5)\n    at Timer.processTimers (timers.js:223:10)"}]
[task 2021-03-26T07:45:09.646Z] TEST-START | Launcher specs Puppeteer Puppeteer.connect should be able to connect to the same page simultaneously (launcher.spec.ts)
[task 2021-03-26T07:45:09.647Z] PID 361 | ["test-start",{"title":"should be able to connect to the same page simultaneously","fullTitle":"Launcher specs Puppeteer Puppeteer.connect should be able to connect to the same page simultaneously","file":"/builds/worker/checkouts/gecko/remote/test/puppeteer/test/launcher.spec.ts","currentRetry":0}]
[task 2021-03-26T07:45:09.647Z] TEST-FAIL | Launcher specs Puppeteer Puppeteer.connect should be able to connect to the same page simultaneously (launcher.spec.ts) | took 1ms
[task 2021-03-26T07:45:09.648Z] PID 361 | ["fail",{"title":"should be able to connect to the same page simultaneously","fullTitle":"Launcher specs Puppeteer Puppeteer.connect should be able to connect to the same page simultaneously","file":"/builds/worker/checkouts/gecko/remote/test/puppeteer/test/launcher.spec.ts","duration":2,"currentRetry":0,"err":"Could not find expected browser (firefox) locally. Run `PUPPETEER_PRODUCT=firefox npm install` to download a supported Firefox browser binary.","stack":"Error: Could not find expected browser (firefox) locally. Run `PUPPETEER_PRODUCT=firefox npm install` to download a supported Firefox browser binary.\n    at FirefoxLauncher.launch (/builds/worker/checkouts/gecko/remote/test/puppeteer/src/node/Launcher.ts:279:30)"}]
[task 2021-03-26T07:45:09.648Z] TEST-START | Launcher specs Puppeteer Puppeteer.connect should be able to reconnect (launcher.spec.ts)
[task 2021-03-26T07:45:09.648Z] PID 361 | ["test-start",{"title":"should be able to reconnect","fullTitle":"Launcher specs Puppeteer Puppeteer.connect should be able to reconnect","file":"/builds/worker/checkouts/gecko/remote/test/puppeteer/test/launcher.spec.ts","currentRetry":0}]
[task 2021-03-26T07:45:09.649Z] TEST-FAIL | Launcher specs Puppeteer Puppeteer.connect should be able to reconnect (launcher.spec.ts) | took 1ms
[task 2021-03-26T07:45:09.649Z] PID 361 | ["fail",{"title":"should be able to reconnect","fullTitle":"Launcher specs Puppeteer Puppeteer.connect should be able to reconnect","file":"/builds/worker/checkouts/gecko/remote/test/puppeteer/test/launcher.spec.ts","duration":1,"currentRetry":0,"err":"Could not find expected browser (firefox) locally. Run `PUPPETEER_PRODUCT=firefox npm install` to download a supported Firefox browser binary.","stack":"Error: Could not find expected browser (firefox) locally. Run `PUPPETEER_PRODUCT=firefox npm install` to download a supported Firefox browser binary.\n    at FirefoxLauncher.launch (/builds/worker/checkouts/gecko/remote/test/puppeteer/src/node/Launcher.ts:279:30)"}]
[task 2021-03-26T07:45:09.649Z] TEST-START | Launcher specs Puppeteer Puppeteer.executablePath should work (launcher.spec.ts)
[task 2021-03-26T07:45:09.650Z] TEST-SKIP | Launcher specs Puppeteer Puppeteer.executablePath should work (launcher.spec.ts) | took 0ms
[task 2021-03-26T07:45:09.650Z] PID 361 | ["pending",{"title":"should work","fullTitle":"Launcher specs Puppeteer Puppeteer.executablePath should work","file":"/builds/worker/checkouts/gecko/remote/test/puppeteer/test/launcher.spec.ts","currentRetry":0}]```
Flags: needinfo?(shivams2799)

I don't see why bug 1700423 should be the regression here. As such I did a backfill for two more changesets:

https://treeherder.mozilla.org/jobs?repo=autoland&tochange=685f36f262fa8ba519d7cdbf05c581b62b87636d&fromchange=b210027c920358c08ed92ca72f68aaa952f32f84&searchStr=pup

Lets see which of these two actually caused it.

Component: Lint and Formatting → Agent
Product: Firefox Build System → Remote Protocol

Btw. there are two command send:

https://treeherder.mozilla.org/logviewer?job_id=334525580&repo=autoland&lineNumber=16921-16922

[task 2021-03-26T07:44:51.787Z] PID 361 | 1616744691786	RemoteAgent	TRACE	(connection {41c99779-a8e8-45e6-b293-cc074eff52f2})-> {"sessionId":"4b4d4d3f-e082-4088-a9b4-3b26ac89e0bb","method":"Emulation.setDeviceMetricsOverride","params":{"mobile":false,"width":800,"height":600,"deviceScaleFactor":1,"screenOrientation":{"angle":0,"type":"portraitPrimary"}},"id":31}
[task 2021-03-26T07:44:51.787Z] PID 361 | 1616744691787	RemoteAgent	TRACE	(connection {41c99779-a8e8-45e6-b293-cc074eff52f2})-> {"sessionId":"4b4d4d3f-e082-4088-a9b4-3b26ac89e0bb","method":"Emulation.setTouchEmulationEnabled","params":{"enabled":false},"id":32}

For none of both a response is returned.

Matt to run the page specific tests you can do:

  1. Change the following line to describe.only(
  2. Run mach puppeteer-test --subset -vv
Component: Agent → Layout
Product: Remote Protocol → Core

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #4)

Matt to run the page specific tests you can do:

  1. Change the following line to describe.only(
  2. Run mach puppeteer-test --subset -vv

I get 'RuntimeError: Process hasn't been started yet' from mach when trying to run that :(

I can reproduce it locally and the causing changeset is:
https://hg.mozilla.org/integration/autoland/rev/bd899813fd0c

(In reply to Matt Woodrow (:mattwoodrow) from comment #6)

I get 'RuntimeError: Process hasn't been started yet' from mach when trying to run that :(

Is that the only log output that you can see here? If not mind supplying some more lines?

Blocks: 1701674

Matt, it would be great if we could make some progress on that bug.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #8)

I can reproduce it locally and the causing changeset is:
https://hg.mozilla.org/integration/autoland/rev/bd899813fd0c

Markus, in the case of Matt being away, maybe you might have an idea why this changeset is causing a hang?

Flags: needinfo?(mstange.moz)

I'm not sure. The patch causes some things to happen without an initial delay. Maybe there's some kind of observer that's registered within that delay, which now ends up being registered after the thing happened, and so now it never fires?
I won't have time to debug this anytime soon, sorry.

Flags: needinfo?(mstange.moz)

There are 79 total failures in the last 7 days on linux1804-64 opt
Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=335429022&repo=autoland&lineNumber=16902

[task 2021-04-05T05:42:42.341Z] TEST-START | Launcher specs Puppeteer Puppeteer.connect should be able to reconnect to a disconnected browser (launcher.spec.ts)
[task 2021-04-05T05:42:42.341Z] PID 356 | ["test-start",{"title":"should be able to reconnect to a disconnected browser","fullTitle":"Launcher specs Puppeteer Puppeteer.connect should be able to reconnect to a disconnected browser","file":"/builds/worker/checkouts/gecko/remote/test/puppeteer/test/launcher.spec.ts","currentRetry":0}]
[task 2021-04-05T05:42:42.343Z] PID 356 | 1617601362340 RemoteAgent TRACE <-(connection {ea1c3793-119b-4834-8642-3aa0ccf4211b}) {"method":"Target.targetDestroyed","params":{"targetId":"68b90adc-0038-4031-b7fa-8e00008043cf"}}
[task 2021-04-05T05:42:42.343Z] PID 356 | 1617601362341 RemoteAgent TRACE <-(connection {137097be-1a3b-43e1-9637-40341359e75f}) {"method":"Target.targetDestroyed","params":{"targetId":"68b90adc-0038-4031-b7fa-8e00008043cf"}}
[task 2021-04-05T05:42:42.363Z] PID 356 | JavaScript error: chrome://remote/content/cdp/sessions/TabSession.jsm, line 53: TypeError: can't access property "sendAsyncMessage", this.mm is null
[task 2021-04-05T05:42:42.363Z] PID 356 | JavaScript error: chrome://remote/content/cdp/domains/parent/Emulation.jsm, line 154: TypeError: can't access property "customUserAgent", browsingContext is null
[task 2021-04-05T05:42:42.363Z] PID 356 | JavaScript error: chrome://remote/content/cdp/sessions/TabSession.jsm, line 78: TypeError: can't access property "sendAsyncMessage", this.mm is null
[task 2021-04-05T05:42:42.388Z] PID 356 | *** You are running in headless mode.
[task 2021-04-05T05:42:42.673Z] PID 356 | [GFX1-]: glxtest: libEGL initialize failed
[task 2021-04-05T05:42:42.674Z] PID 356 | [GFX1-]: glxtest: Unable to open a connection to the X server
[task 2021-04-05T05:42:42.674Z] PID 356 | [GFX1-]: glxtest: libEGL initialize failed
[task 2021-04-05T05:42:42.724Z] PID 356 | 1617601362723 RemoteAgent DEBUG Resetting recommended pref prompts.contentPromptSubDialog
[task 2021-04-05T05:42:42.725Z] PID 356 | 1617601362724 RemoteAgent ERROR unable to stop listener: [Exception... "ServiceManager::GetService returned failure code:" nsresult: "0x80570016 (NS_ERROR_XPC_GS_RETURNED_FAILURE)" location: "JS frame :: chrome://remote/content/cdp/observers/ChannelEventSink.jsm :: ChannelEventSinkFactory.getService :: line 110" data: no] Stack trace: ChannelEventSinkFactory.getService()@ChannelEventSink.jsm:110
[task 2021-04-05T05:42:42.725Z] PID 356 | NetworkObserver()@NetworkObserver.jsm:58
[task 2021-04-05T05:42:42.726Z] PID 356 | get networkObserver()@Session.jsm:69
[task 2021-04-05T05:42:42.726Z] PID 356 | destructor()@Session.jsm:55
[task 2021-04-05T05:42:42.726Z] PID 356 | onClosed()@Connection.jsm:282
[task 2021-04-05T05:42:42.726Z] PID 356 | close()@WebSocketTransport.jsm:63
[task 2021-04-05T05:42:42.726Z] PID 356 | close()@Connection.jsm:268
[task 2021-04-05T05:42:42.726Z] PID 356 | destructor()@Target.jsm:57
[task 2021-04-05T05:42:42.727Z] PID 356 | destroyTarget()@TargetList.jsm:91
[task 2021-04-05T05:42:42.727Z] PID 356 | destructor()@TargetList.jsm:102
[task 2021-04-05T05:42:42.727Z] PID 356 | close()@RemoteAgent.jsm:136
[task 2021-04-05T05:42:43.667Z] PID 356 | 1617601363666 RemoteAgent DEBUG Setting recommended pref prompts.contentPromptSubDialog to false
[task 2021-04-05T05:42:43.676Z] PID 356 | DevTools listening on ws://localhost:45358/devtools/browser/b82b03e9-1ba0-4899-8b24-5efecdb39f61

[task 2021-04-05T05:42:44.261Z] PID 356 | 1617601364261 RemoteAgent TRACE <-(connection {85eb3b63-2ad0-4c60-9b91-9d0ece1938c1}) {"id":26,"sessionId":"a366302c-217b-42d6-8b5b-b39f5d99b02b","error":{"message":"No frame for given id found","data":"createIs ... /cdp/domains/DomainCache.jsm:101:25\nreceiveMessage@chrome://remote/content/cdp/sessions/ContentProcessSession.jsm:84:45\n"}}
[task 2021-04-05T05:42:44.261Z] PID 356 | 1617601364261 RemoteAgent TRACE <-(connection {85eb3b63-2ad0-4c60-9b91-9d0ece1938c1}) {"id":27,"sessionId":"a366302c-217b-42d6-8b5b-b39f5d99b02b","error":{"message":"No frame for given id found","data":"createIs ... /cdp/domains/DomainCache.jsm:101:25\nreceiveMessage@chrome://remote/content/cdp/sessions/ContentProcessSession.jsm:84:45\n"}}
[task 2021-04-05T05:42:44.262Z] PID 356 | 1617601364261 RemoteAgent TRACE <-(connection {85eb3b63-2ad0-4c60-9b91-9d0ece1938c1}) {"sessionId":"dcb7a614-bf67-4ecd-86b8-d7d1b6a02f94","method":"Runtime.executionContextCreated","params":{"context":{"id":2,"origin":"about:blank","name":"puppeteer_utility_world","auxData":{"isDefault":false,"frameId":"19","type":"isolated"}}}}
[task 2021-04-05T05:42:44.262Z] PID 356 | 1617601364261 RemoteAgent TRACE <-(connection {85eb3b63-2ad0-4c60-9b91-9d0ece1938c1}) {"sessionId":"dcb7a614-bf67-4ecd-86b8-d7d1b6a02f94","id":28,"result":{"executionContextId":2}}
[task 2021-04-05T05:42:44.301Z] PID 356 | 1617601364300 RemoteAgent TRACE (connection {85eb3b63-2ad0-4c60-9b91-9d0ece1938c1})-> {"sessionId":"a366302c-217b-42d6-8b5b-b39f5d99b02b","method":"Emulation.setDeviceMetricsOverride","params":{"mobile":false,"width":800,"height":600,"deviceScaleFactor":1,"screenOrientation":{"angle":0,"type":"portraitPrimary"}},"id":29}
[task 2021-04-05T05:42:44.301Z] PID 356 | 1617601364301 RemoteAgent TRACE (connection {85eb3b63-2ad0-4c60-9b91-9d0ece1938c1})-> {"sessionId":"a366302c-217b-42d6-8b5b-b39f5d99b02b","method":"Emulation.setTouchEmulationEnabled","params":{"enabled":false},"id":30}
[task 2021-04-05T05:42:44.302Z] PID 356 | 1617601364301 RemoteAgent TRACE <-(connection {85eb3b63-2ad0-4c60-9b91-9d0ece1938c1}) {"sessionId":"a366302c-217b-42d6-8b5b-b39f5d99b02b","id":30,"result":{}}
[task 2021-04-05T05:42:44.302Z] PID 356 | 1617601364301 RemoteAgent TRACE (connection {85eb3b63-2ad0-4c60-9b91-9d0ece1938c1})-> {"sessionId":"dcb7a614-bf67-4ecd-86b8-d7d1b6a02f94","method":"Emulation.setDeviceMetricsOverride","params":{"mobile":false,"width":800,"height":600,"deviceScaleFactor":1,"screenOrientation":{"angle":0,"type":"portraitPrimary"}},"id":31}
[task 2021-04-05T05:42:44.302Z] PID 356 | 1617601364302 RemoteAgent TRACE (connection {85eb3b63-2ad0-4c60-9b91-9d0ece1938c1})-> {"sessionId":"dcb7a614-bf67-4ecd-86b8-d7d1b6a02f94","method":"Emulation.setTouchEmulationEnabled","params":{"enabled":false},"id":32}
[task 2021-04-05T05:42:44.302Z] PID 356 | 1617601364302 RemoteAgent TRACE <-(connection {85eb3b63-2ad0-4c60-9b91-9d0ece1938c1}) {"sessionId":"dcb7a614-bf67-4ecd-86b8-d7d1b6a02f94","id":32,"result":{}}
[task 2021-04-05T05:42:44.303Z] PID 356 | 1617601364303 RemoteAgent TRACE <-(connection {85eb3b63-2ad0-4c60-9b91-9d0ece1938c1}) {"sessionId":"a366302c-217b-42d6-8b5b-b39f5d99b02b","id":29,"result":{}}
[task 2021-04-05T05:43:02.354Z] TEST-UNEXPECTED-TIMEOUT | Launcher specs Puppeteer Puppeteer.connect should be able to reconnect to a disconnected browser (launcher.spec.ts) | expected PASS
[task 2021-04-05T05:43:02.354Z] TEST-INFO took 20013ms
[task 2021-04-05T05:43:02.355Z] PID 356 | ["fail",{"title":"should be able to reconnect to a disconnected browser","fullTitle":"Launcher specs Puppeteer Puppeteer.connect should be able to reconnect to a disconnected browser","file":"/builds/worker/checkouts/gecko/remote/test/puppeteer/test/launcher.spec.ts","duration":20014,"currentRetry":0,"err":"Timeout of 20000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (/builds/worker/checkouts/gecko/remote/test/puppeteer/test/launcher.spec.ts)","stack":"Error: Timeout of 20000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (/builds/worker/checkouts/gecko/remote/test/puppeteer/test/launcher.spec.ts)\n at createTimeoutError (/builds/worker/checkouts/gecko/remote/test/puppeteer/node_modules/mocha/lib/errors.js:491:15)\n at Test.Runnable._timeoutError (/builds/worker/checkouts/gecko/remote/test/puppeteer/node_modules/mocha/lib/runnable.js:431:10)\n at Timeout.<anonymous> (/builds/worker/checkouts/gecko/remote/test/puppeteer/node_modules/mocha/lib/runnable.js:246:24)\n at ontimeout (timers.js:436:11)\n at tryOnTimeout (timers.js:300:5)\n at listOnTimeout (timers.js:263:5)\n at Timer.processTimers (timers.js:223:10)"}]

Henrik are you working on this until Matt is back?

Flags: needinfo?(hskupin)
Whiteboard: [retriggered] → [retriggered][stockwell needswork:owner]

No, I don't see what I can do here unless Matt can have a look at the patch. Note that also the external Puppeteer CI is failing with the same failure now. When will Matt be back? How important is bug 1675614? There are also 3 other regressions listed, so I wonder if a backout would be feasible given that it cannot be fixed in the short term? But not sure who can / will decide that.

Flags: needinfo?(hskupin) → needinfo?(aryx.bugmail)

Markus, what's your take on backing out bug 1675614 for its regressions (see also comment 16)?

Flags: needinfo?(aryx.bugmail) → needinfo?(mstange.moz)
See Also: → 1673938

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #17)

Markus, what's your take on backing out bug 1675614 for its regressions (see also comment 16)?

I don't think we should at this point. All the regressions look to be racy tests that are failing due to paint timing changing, and are unlikely to reflect problems that would affect our users. It's also a fairly significant win in responsiveness, reducing median keypress latency by 20%.

I'm trying to figure out how to reproduce this test failure, going to try reinstalling linux in a VM to see if it reproduces there. This is a tier-2 test though, so won't be my top priority for the week sorry.

Flags: needinfo?(mstange.moz)
Flags: needinfo?(matt.woodrow)

Matt, is there anything I could help with? Maybe some more logging enabled? While this suite is still tier2 we already agreed in our team to make it tier1 given it's outside importance.

As the Puppeteer CI results show this problem manifests only on Linux and Windows, but not on MacOS:
https://github.com/puppeteer/puppeteer/runs/2221412765

Flags: needinfo?(matt.woodrow)
OS: Unspecified → All
Hardware: Unspecified → All

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #19)

Matt, is there anything I could help with? Maybe some more logging enabled? While this suite is still tier2 we already agreed in our team to make it tier1 given it's outside importance.

As the Puppeteer CI results show this problem manifests only on Linux and Windows, but not on MacOS:
https://github.com/puppeteer/puppeteer/runs/2221412765

Logging what happens with _awaitViewportDimensions (ideally before and after the change) would probably help to narrow down where we're failing to get a response

Flags: needinfo?(matt.woodrow)
Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell needswork:owner]

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #4)

Matt to run the page specific tests you can do:

  1. Change the following line to describe.only(
  2. Run mach puppeteer-test --subset -vv

I'm still not having any success with this. Managed to get it to run on linux, but I get 19 different unexpected timeouts over the ~10mins it took to run, and none of them look anything like the log from treeherder.

(In reply to Matt Woodrow (:mattwoodrow) from comment #23)

  1. Change the following line to describe.only(
  2. Run mach puppeteer-test --subset -vv

I'm still not having any success with this. Managed to get it to run on linux, but I get 19 different unexpected timeouts over the ~10mins it took to run, and none of them look anything like the log from treeherder.

If other tests in that file are also failing you add the .only suffix directly to the test in question.

Note that we have three tests on Linux that are failing with a timeout now, and all of them started failing with your patch:

TEST-UNEXPECTED-TIMEOUT | Launcher specs Puppeteer Puppeteer.connect should be able to reconnect to a disconnected browser (launcher.spec.ts) | expected PASS
TEST-UNEXPECTED-TIMEOUT | Page Page.close should not be visible in browser.pages (page.spec.ts) | expected PASS
TEST-UNEXPECTED-TIMEOUT | Target should be able to use the default page in the browser (target.spec.ts) | expected PASS

Maybe it might be easier to investigate those others because they do not explicitly start a new browser instance. I'll have a look and check which of them might be easier to debug / investigate.

So the test failure from page.spec.ts helped me here.

The problem is actually related to setDeviceMetricsOverride(), and specifically to [await this.executeInChild("_awaitViewportDimensions", ...)](await this.executeInChild("_awaitViewportDimensions"). This calls into _awaitViewportDimensions().

Here the Promise for the resize event never gets fulfilled, which means that for an expected resize of the browser's DOM window there is no resize event fired.

Note that when adding some dump() statements the code intermittently passes. But nevertheless it's clearly missing resize events.

Flags: needinfo?(matt.woodrow)

(In reply to Markus Stange [:mstange] from comment #11)

I'm not sure. The patch causes some things to happen without an initial delay. Maybe there's some kind of observer that's registered within that delay, which now ends up being registered after the thing happened, and so now it never fires?

Looking at setDeviceMetricsOverride,

    const targetWidth = width > 0 ? width : layoutViewport.clientWidth;
    const targetHeight = height > 0 ? height : layoutViewport.clientHeight;

    browser.style.setProperty("min-width", targetWidth + "px");
    browser.style.setProperty("max-width", targetWidth + "px");
    browser.style.setProperty("min-height", targetHeight + "px");
    browser.style.setProperty("max-height", targetHeight + "px");

    // Wait until the viewport has been resized
    await this.executeInChild("_awaitViewportDimensions", {
      width: targetWidth,
      height: targetHeight,
    });

which basically means we first set expected width/height and then wait for the resize in content. I guess the resize now happens too quickly and is already done by the time we call _awaitViewportDimensions in the child process.

I'll try some patches around that idea.

No, _awaitViewportDimensions returns early if the current dimension is already the expected one. But it sounds like it could be a race that happens after running this check and before the event listener has been added.

I'll check what happens when we register the promise before setting the browser dimensions.

Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell needswork:owner]

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #28)

I'll check what happens when we register the promise before setting the browser dimensions.

Julian actually did an attempt yesterday to register the promise earlier, but also that failed:
https://treeherder.mozilla.org/jobs?repo=try&revision=6d7dab5bb7437a8015240a4e9329d8c7c9c09c74&selectedTaskRun=Ti92eyteQwSSTTRRCGNuKg.0

The expected resize event actually never gets fired and it still hangs.

Attached patch dumping code (obsolete) — Splinter Review

Ok, here some updates. I added some more code around the lines Julian has already added. Specifically a new dumping function that gets executed each 100ms and simply dumps the innerWidth and innerHeight of the content window (see the attached patch). Doing that makes the test pass!

But whenever I do not access both properties and dump a simple string, the hang can be seen again because no resize event is being emitted. Olli mentioned to me that there might be perhaps a missing refreshdriver tick?

Attached patch Dumping codeSplinter Review

Sorry, the last patch actually contained working code so that the hang doesn't appear. Here the corrected version.

Attachment #9215415 - Attachment is obsolete: true

There is one additional fact... With the dumping patch applied there will be two tabs open, and the 2nd one selected. By running the test you can see that we are waiting for the resize event, and that it doesn't happen. But when the hang occurs and I actually move the mouse above the 1st tab in the tab bar (no click needed) the resize event will instantly be emitted.

That does sound like a real bug related to refreshdriver handling.

I also pushed a try build this morning by running the Puppeteer tests with a debug Firefox build. Treeherder still shows the job as running while Taskcluster itself marked it as failed:

https://treeherder.mozilla.org/jobs?repo=try&selectedTaskRun=UxsPNlDpStem-zLkVVFIfQ.0&revision=f88e70c18cb9d12112804955cb18506e5b795b22&searchStr=pup

The live logs indicate that the resize event is never fired until the tab gets closed:

https://firefox-ci-tc.services.mozilla.com/tasks/UxsPNlDpStem-zLkVVFIfQ/runs/0/logs/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FUxsPNlDpStem-zLkVVFIfQ%2Fruns%2F0%2Fartifacts%2Fpublic%252Flogs%252Flive.log#L328-566

[task 2021-04-15T09:39:45.729Z] PID 357 | 1618479585728	RemoteAgent	TRACE	<-(connection {1c5a9f30-9cb7-4e97-88ae-fe650e083634}) {"sessionId":"b3ba42cf-8ed0-4e2f-9d7e-74130b908b92","id":29,"result":{}}
[task 2021-04-15T09:39:45.742Z] PID 357 | ** 1618479585741: dumper
[task 2021-04-15T09:39:45.742Z] PID 357 | ** 1618479585742: Registered resize listener
[task 2021-04-15T09:39:45.743Z] PID 357 | ** 1618479585742: Resize id 1
[task 2021-04-15T09:39:45.744Z] PID 357 | ** 1618479585742: modified browser dimensions
[task 2021-04-15T09:39:45.744Z] PID 357 | ** 1618479585743: awaiting for resize event
[task 2021-04-15T09:39:45.812Z] PID 357 | ** 1618479585811: dumper
[..]
[task 2021-04-15T09:40:05.278Z] PID 357 | ** 1618479605277: dumper
[task 2021-04-15T09:40:05.334Z] TEST-UNEXPECTED-TIMEOUT | Page Page.close should not be visible in browser.pages (page.spec.ts) | expected PASS
[..]
[task 2021-04-15T09:40:05.336Z] PID 357 | 1618479605335	RemoteAgent	TRACE	(connection {1c5a9f30-9cb7-4e97-88ae-fe650e083634})-> {"method":"Target.disposeBrowserContext","params":{"browserContextId":6},"id":30}
[task 2021-04-15T09:40:05.345Z] PID 357 | [Parent 401, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80040111 (NS_ERROR_NOT_AVAILABLE): file /builds/worker/checkouts/gecko/netwerk/cache/nsApplicationCacheService.cpp:161
[task 2021-04-15T09:40:05.355Z] PID 357 | [Child 453, Main Thread] WARNING: Scrolled rect smaller than scrollport?: file /builds/worker/checkouts/gecko/layout/generic/nsGfxScrollFrame.cpp:7064
[task 2021-04-15T09:40:05.355Z] PID 357 | [Child 453, Main Thread] WARNING: Scrolled rect smaller than scrollport?: file /builds/worker/checkouts/gecko/layout/generic/nsGfxScrollFrame.cpp:7067
[task 2021-04-15T09:40:05.357Z] PID 357 | ** 1618479605356: dumper
[task 2021-04-15T09:40:05.367Z] PID 357 | 1618479605367	RemoteAgent	TRACE	<-(connection {1c5a9f30-9cb7-4e97-88ae-fe650e083634}) {"method":"Target.targetDestroyed","params":{"targetId":"a997e629-2a57-4e62-8094-f8de1ba82810"}}
[task 2021-04-15T09:40:05.369Z] PID 357 | ** 1618479605368: Resize event

I run this job with Pernosco enabled, and I will talk with Kyle in how to get a recording.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #36)

I run this job with Pernosco enabled, and I will talk with Kyle in how to get a recording.

As he replied he might get around to do it today. Matt, if the above already gives you more information please let us know. Thanks.

:whimboo asked me to record this in Pernosco. I had to record all of the mach puppeteer-test stuff to get it so there's some non-Firefox stuff in here too.

https://pernos.co/debug/6dyRXaIiUtxGxWtCuO8QdQ/index.html

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #34)

There is one additional fact... With the dumping patch applied there will be two tabs open, and the 2nd one selected. By running the test you can see that we are waiting for the resize event, and that it doesn't happen. But when the hang occurs and I actually move the mouse above the 1st tab in the tab bar (no click needed) the resize event will instantly be emitted.

Is the resize event listening on the 1st tab?

From what I can tell, we defer reflowing for the resize because the tab/presshell is hidden. Previously we got lucky and had a paint scheduled, which flushes all pending work (including the resize), but now we're painting earlier and this doesn't happen.

As soon as you do something which triggers the refresh driver to flush layout, then we process the delayed resize, and the event gets fired.

Flags: needinfo?(matt.woodrow) → needinfo?(hskupin)

Doing something that queries computed style or layout would also force it to flush immediately.

(In reply to Matt Woodrow (:mattwoodrow) from comment #42)

Is the resize event listening on the 1st tab?
From what I can tell, we defer reflowing for the resize because the tab/presshell is hidden. Previously we got lucky and had a paint scheduled, which flushes all pending work (including the resize), but now we're painting earlier and this doesn't happen.

Yes, looks like we seem to hang for the first tab, which indeed is in the background.

As soon as you do something which triggers the refresh driver to flush layout, then we process the delayed resize, and the event gets fired.

But how to trigger the layout flush? We already check the innerWidth and innerHeight in the content process, which by your words should trigger a reflow and emit the resize event. But that doesn't work, and after testing I noticed that it only works via a call to setTimeout and a value greater than 0:

win.setTimeout(() => win.innerHeight, 1);

Using 0 for the delayed execution or accessing win.innerHeight doesn't make a difference.

Flags: needinfo?(hskupin) → needinfo?(matt.woodrow)

It means the simplest change on our side could be:

--- a/remote/cdp/domains/content/Emulation.jsm
+++ b/remote/cdp/domains/content/Emulation.jsm
@@ -29,17 +29,18 @@ class Emulation extends ContentProcessDo
   async _awaitViewportDimensions({ width, height }) {
     const win = this.content;
 
-    if (win.innerWidth === width && win.innerHeight === height) {
-      return;
-    }
-
     await new Promise(resolve => {
-      win.addEventListener("resize", function resized() {
+      win.setTimeout(() => {
         if (win.innerWidth === width && win.innerHeight === height) {
-          win.removeEventListener("resize", resized);
           resolve();
         }
-      });
+        win.addEventListener("resize", function resized() {
+          if (win.innerWidth === width && win.innerHeight === height) {
+            win.removeEventListener("resize", resized);
+            resolve();
+          }
+        });
+      }, 1);
     });
   }
 }

But this simply feels wrong, and somewhat paving over an underlying issue.

Just talked to Olli via Element and he proposed me to make use of a requestAnimationFrame handler. Running with that seems to work all fine, and so far not a single failure.

Olli, shall we take that as a fix for this bug? Or is there something else we should worry about in Gecko itself?

Flags: needinfo?(matt.woodrow) → needinfo?(bugs)

I think that fix should be fine. It would be good to know why the test passed before and why it doesn't anymore, but debugging that might take some time.

Flags: needinfo?(bugs)
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Component: Layout → CDP
Product: Core → Remote Protocol
Attachment #9217065 - Attachment description: WIP: Bug 1701168 - [remote] Properly wait for resize events on browser dimension changes. → Bug 1701168 - [remote] Properly wait for resize events on browser dimension changes.
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/788d8c4c6003
[remote] Properly wait for resize events on browser dimension changes. r=remote-protocol-reviewers,jdescottes
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 90 Branch

I've submitted https://github.com/puppeteer/puppeteer/pull/7114 to re-enable the test upstream.

Blocks: 1673938
See Also: 1673938

Ryan, I would like to request uplift of this patch to beta, but sadly our product doesn't have the flag set yet (already filed bug 1706768). Would you have to wait until it has been done, or is there a way to get it uplifted? It will fix the perma failure for pup jobs that we currently have on beta.

Flags: needinfo?(ryanvm)

Comment on attachment 9217065 [details]
Bug 1701168 - [remote] Properly wait for resize events on browser dimension changes.

Beta/Release Uplift Approval Request

  • User impact if declined: Under some conditions the a call to Emulation.setDeviceMetricsOverride can hang on Linux, and cause timeouts for CDP clients like Puppeteer.

This patch also fixes a permanent failure for pup jobs.

  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It just delays forcing a layout flush until the next requestAnimationFrame.
  • String changes made/needed:
Attachment #9217065 - Flags: approval-mozilla-beta?
Flags: needinfo?(ryanvm)

Comment on attachment 9217065 [details]
Bug 1701168 - [remote] Properly wait for resize events on browser dimension changes.

Low risk, has tests and is an 89 regression, uplift approved for 89 beta 3, thanks.

Attachment #9217065 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: