Closed Bug 1486993 Opened 6 years ago Closed 6 years ago

Perma ts_paint_heavy | 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/P-Ym59czSdyGKkGIIYwBQQ/artifacts/public%2Ftoday-simple.tgz due to failing hook

Categories

(Testing :: Talos, defect, P5)

Version 3
defect

Tracking

(firefox62 unaffected, firefox63 fixed)

RESOLVED FIXED
Tracking Status
firefox62 --- unaffected
firefox63 --- fixed

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell fixed:other])

Tarek: Hi. 

Tomprince looked into this when we found the failure and he discovered that:

tomprince> This hook (https://tools.taskcluster.net/hooks/garbage/heavyprofile) looks like it is failing (https://tools.taskcluster.net/groups/Up162HlfTROcPkEHOVKYeQ/tasks/Up162HlfTROcPkEHOVKYeQ/runs/0/logs/public%2Flogs%2Flive.log)

Noticed the it started perma-failing here: https://tinyurl.com/yclmvsdc

Can you please, take a look?
Flags: needinfo?(tarek)
See Also: → 1478291
From the log:

Updating profile located at '/app/profile'
Starting the Fox...
Traceback (most recent call last):
  File "/usr/local/bin/hp-creator", line 11, in <module>
    load_entry_point('heavy-profile', 'console_scripts', 'hp-creator')()
  File "/app/heavy-profile/heavyprofile/creator.py", line 123, in main
    loop.run_until_complete(build_profile(args))
  File "/usr/local/lib/python3.6/asyncio/base_events.py", line 467, in run_until_complete
    return future.result()
  File "/app/heavy-profile/heavyprofile/creator.py", line 85, in build_profile
    Firefox(**caps)) as session:
  File "/usr/local/lib/python3.6/site-packages/arsenic/__init__.py", line 16, in __aenter__
    self.session = await start_session(self.service, self.browser, self.bind)
  File "/usr/local/lib/python3.6/site-packages/arsenic/__init__.py", line 29, in start_session
    return await driver.new_session(browser, bind=bind)
  File "/usr/local/lib/python3.6/site-packages/arsenic/webdriver.py", line 50, in new_session
    session_id = response['sessionId']
KeyError: 'sessionId'
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f7703c40a90>
[taskcluster 2018-08-29 00:03:02.716Z] === Task Finished ===

Bug 1486467 changed sessionId to session_id
Blocks: 1486467
Severity: normal → blocker
Wdspec tests are working as expected:

https://treeherder.mozilla.org/logviewer.html#?job_id=196382519&repo=mozilla-inbound

> 1535530470580	Marionette	TRACE	0 <- [1,1,null,{"sessionId":"b3cb50a5-481a-483c-b7a2-58b1a043492e","capabilities":{"browserName":"firefox","browserVersion":"63.0a ... 3,"moz:profile":"/tmp/rust_mozprofile.pqEjVRuicNSl","moz:useNonSpecCompliantPointerOrigin":false,"moz:webdriverClick":true}}]
> 1535530470585	webdriver::server	DEBUG	<- 200 OK {"value":{"sessionId":"b3cb50a5-481a-483c-b7a2-58b1a043492e","capabilities":{"acceptInsecureCerts":false,"browserName":"firefox","browserVersion":"63.0a1","moz:accessibilityChecks":false,"moz:geckodriverVersion":"0.21.0","moz:headless":false,"moz:processID":1113,"moz:profile":"/tmp/rust_mozprofile.pqEjVRuicNSl","moz:useNonSpecCompliantPointerOrigin":false,"moz:webdriverClick":true,"pageLoadStrategy":"normal","platformName":"linux","platformVersion":"4.4.0-1014-aws","rotatable":false,"setWindowRect":true,"timeouts":{"implicit":0,"pageLoad":300000,"script":30000},"unhandledPromptBehavior":"dismiss and notify"}}}

The session id gets correctly serialized in geckodriver (webdriver) as `sessionId`.

I don't know how Talos is using all that, so I would have to check.
After digging more through the code I have seen that heavy-profile is using an outdated version of arsenic, which doesn't check for errors at all:

https://github.com/tarekziade/arsenic/blob/master/src/arsenic/webdriver.py#L48-L50

Upstream has at least a bit of error checks, whereby this code looks awful. :(

https://github.com/HDE/arsenic/blob/master/src/arsenic/webdriver.py#L50-L58

Basically when I run this code locally it works just fine with my changes for geckodriver. But the important thing to note is that it DOESN'T use the version of Firefox as downloaded and unpacked, but the system default one as given by the PATH environment variable.

When I don't have a version of Firefox in the PATH, I get exactly the same error as above. Sadly without any information if there was a real error in creating the session. And I expect that this happened here.

It would be great if arsenic could be rebased against upstream/master so that we get at least a bit more details. But as long as Tarek isn't around it will be hard.
I tried to run the job via an interactive task and upgrading Arsenic to the upstream/master version. Here the result of what I get:

Starting the Fox...
Traceback (most recent call last):
  File "/usr/local/bin/hp-creator", line 11, in <module>
    load_entry_point('heavy-profile', 'console_scripts', 'hp-creator')()
  File "/app/heavy-profile/heavyprofile/creator.py", line 123, in main
    loop.run_until_complete(build_profile(args))
  File "/usr/local/lib/python3.6/asyncio/base_events.py", line 467, in run_until_complete
    return future.result()
  File "/app/heavy-profile/heavyprofile/creator.py", line 85, in build_profile
    Firefox(**caps)) as session:
  File "/usr/local/lib/python3.6/site-packages/arsenic/__init__.py", line 16, in __aenter__
    self.session = await start_session(self.service, self.browser, self.bind)
  File "/usr/local/lib/python3.6/site-packages/arsenic/__init__.py", line 29, in start_session
    return await driver.new_session(browser, bind=bind)
  File "/usr/local/lib/python3.6/site-packages/arsenic/webdriver.py", line 57, in new_session
    raise SessionStartError(err_resp['error'], err_resp.get('message', ''), original_response)
arsenic.errors.SessionStartError: unknown error: newSession
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f905a7efb00>

Printing the response here shows a panic in geckodriver:

*** response: {'value': {'error': 'unknown error', 'message': 'newSession', 'stacktrace': 'stack backtrace:\n   0:           0x5787ed - back
trace::backtrace::trace::h59229d13f6a8837d\n   1:           0x578942 - backtrace::capture::Backtrace::new::h23089c033eded8f0\n   2:         
  0x4463cc - webdriver::error::WebDriverError::new::hd11f864b1fdf4510\n   3:           0x454c93 - geckodriver::marionette::MarionetteSession
::response::h1547050944cdade1\n   4:           0x469104 - geckodriver::marionette::MarionetteConnection::send_command::h5f9bb9f7e777eadb\n  
 5:           0x42625f - <webdriver::server::Dispatcher<T, U>>::run::h8f5348b8f5f7c053\n   6:           0x40b22c - std::panicking::try::do_c
all::hb67c6fb6bcd96195\n   7:           0x5dc20a - panic_unwind::__rust_maybe_catch_panic\n                        at /checkout/src/libpanic
_unwind/lib.rs:98\n   8:           0x41b943 - <F as alloc::boxed::FnBox<A>>::call_box::h4100941edc372034\n   9:           0x5d48a4 - alloc::
boxed::{{impl}}::call_once<(),()>\n                        at /checkout/src/liballoc/boxed.rs:650\n                         - std::sys_commo
n::thread::start_thread\n                        at /checkout/src/libstd/sys_common/thread.rs:21\n                         - std::sys::imp::
thread::{{impl}}::new::thread_start\n                        at /checkout/src/libstd/sys/unix/thread.rs:84'}}

Upgrading geckodriver to version 0.21.0 gives me:

*** response: {'value': {'sessionId': '49ff1359-62ee-4ea5-8712-ffbb62c747d6', 'capabilities': {'acceptInsecureCerts': True, 'browserName': '
firefox', 'browserVersion': '63.0a1', 'capabilities': {'desiredCapabilities': {'acceptInsecureCerts': True, 'browserName': 'firefox', 'mario
nette': True}}, 'marionette': True, 'moz:accessibilityChecks': False, 'moz:headless': True, 'moz:processID': 95, 'moz:profile': '/app/profil
e', 'moz:useNonSpecCompliantPointerOrigin': False, 'moz:webdriverClick': True, 'pageLoadStrategy': 'normal', 'platformName': 'linux', 'platf
ormVersion': '4.4.0-1014-aws', 'rotatable': False, 'setWindowRect': True, 'timeouts': {'implicit': 0, 'pageLoad': 300000, 'script': 30000}, 
'unhandledPromptBehavior': 'dismiss and notify'}}}

So we should get the task definition update to make use of geckodriver 0.21.0. I filed bug 1487031 for that.
I can confirm that with the latest code from the heavy-profile repository it works fine once geckodriver has been upgraded.

So lets find the task configuration and get bug 1487031 fixed.
No longer depends on: 1487031
disabled in bug 1487031
Flags: needinfo?(tarek)
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Whiteboard: [stockwell disable-recommended] → [stockwell fixed:other]
You need to log in before you can comment on or make changes to this bug.