Closed
Bug 1204488
Opened 9 years ago
Closed 9 years ago
[pulsetranslator] Missing Pulse notifications for Firefox beta/release builds on normalized queue
Categories
(Webtools :: Pulse, defect)
Webtools
Pulse
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: whimboo, Assigned: whimboo)
Details
(Whiteboard: [qa-automation-blocked])
Attachments
(2 files, 3 obsolete files)
18.84 KB,
text/plain
|
Details | |
3.43 KB,
patch
|
whimboo
:
review+
|
Details | Diff | Splinter Review |
For the last Firefox 41.0b9 beta candidate builds we haven't gotten any pulse message that the builds have been made available. As result no single test has been run. All was working fine for Firefox 41.0b8 on Sep 7th for both staging and production: https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&revision=4ed5ddd4ab04&exclusion_profile=false https://treeherder.allizom.org/#/jobs?repo=mozilla-beta&revision=4ed5ddd4ab04 First I thought it might have been because of the replacement of the CI master machines on bug 1200800. But I'm not sure anymore because staging was replaced on Sep 4th and processed the builds on Sep 7th. So I feel it would be more of an issue regarding any of the Pulse tools we make use of. Or maybe Releng has changed the way what gets exposed in the pulse messages. Ben or Rail, has there anything been changed here lately?
Assignee | ||
Comment 1•9 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #0) > Or maybe Releng has changed the way what gets exposed in the pulse messages. > Ben or Rail, has there anything been changed here lately? I should also put ni? on them. ;)
Flags: needinfo?(rail)
Flags: needinfo?(bhearsum)
Assignee | ||
Comment 2•9 years ago
|
||
Maybe this is due to the landing of bug 1150162?
Comment 3•9 years ago
|
||
I don't think we changed anything that would stop sending pulse messages for any buildbot jobs. Maybe it was some kind of hick up?
Flags: needinfo?(rail)
Comment 4•9 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #2) > Maybe this is due to the landing of bug 1150162? This definitely didn't cause it. The release pulse messages aren't processed any differently than the CI ones, so I'd very surprised if they aren't coming through. I retriggered a release update verify job, and saw a "build.release-mozilla-release-macosx64_release_update_verify_6.5.started" message when watching on Pulse Inspector. I also saw "build.release-mozilla-release-macosx64_release_update_verify_6.5.finished" shortly after killing it.
Flags: needinfo?(bhearsum)
Assignee | ||
Comment 5•9 years ago
|
||
Ben, this bug is not about updates but about builds. Means routing keys like 'build.release-mozilla-beta-linux_build.8.log_uploaded'.
Assignee | ||
Comment 6•9 years ago
|
||
Asking for ni again to ensure that we clarify comment 4 and comment 5. Thanks.
Flags: needinfo?(bhearsum)
Comment 7•9 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #5) > Ben, this bug is not about updates but about builds. Means routing keys like > 'build.release-mozilla-beta-linux_build.8.log_uploaded'. Doesn't matter, these messages aren't handled specially. I didn't trigger a build to test this, because it's more risky than triggering a test. Maybe you can watch in Pulse Inspector after the Beta starts today?
Flags: needinfo?(bhearsum)
Assignee | ||
Comment 8•9 years ago
|
||
Ok, so I have running a local debug script which will log all build messages containing 'release'. Lets see tomorrow what might be the cause of it. Thank you for the clarification.
Assignee | ||
Comment 9•9 years ago
|
||
So I was running the queue and got some build notifications from the builds exchange for release-mozilla-release and release-mozilla-esr38. I pushed them locally through pulsetranslator to check if the tool stumbles over a problem but all looks fine and the normalized build messages are getting created. Then I took some of them to push into mozmill-ci's pulse listener and also here everything is working fine. The Jenkins build is getting triggered. Given those results I can only assume that something is wrong with publishing release builds to the normalized queue. Interestingly all works fine for nightly builds!?
Component: Infrastructure → Pulse
Flags: needinfo?(mcote)
Flags: needinfo?(jgriffin)
Product: Mozilla QA → Webtools
QA Contact: hskupin
Summary: Missing Pulse notifications for Firefox 41.0b9 builds (no single tests have been run) → [pulsetranslator] Missing Pulse notifications for Firefox beta/release builds on normalized queue
Version: Firefox 42 → other
Assignee | ||
Comment 10•9 years ago
|
||
This is definitely a problem with publishing release builds by pulsetranslator. Beside my local debug client for the exchange/build queue, I also setup one for exchange/build/normalized and received message differ in the fact that the normalized queue does not list any single release related message.
Whiteboard: [qa-automation-blocked]
Assignee | ||
Comment 11•9 years ago
|
||
Jonathan, I think it is time that we get someone else with Heroku access who is online during European time. I asked IT if they can help, but even the sysadmin doesn't have access to Heroku. It's like an isolated silo. :(
Assignee | ||
Comment 12•9 years ago
|
||
fox2mike actually sent me some logs and it looks like we get timeout errors: 2015-09-15T06:30:56.762957+00:00 app[worker.1]: LogTimeoutError: key: build.release-mozilla-release-macosx64_repack_17.1.log_uploaded, url: http://stage.mozilla.org/pub/mozilla.org/firefox/nightly/41.0-candidates/build1/logs/release-mozilla-release-macosx64_repack_17-bm84-build1-build1.txt.gz The full log can be seen as attachment. Fox2mike also restarted the service but I don't see that this helped. We are still failing.
Comment 13•9 years ago
|
||
Just out of curiosity, why don't you watch for #.finished instead of #.log_uploaded? Is this to make sure that treeherder has logs to process?
Comment 14•9 years ago
|
||
pulsetranslator listens for #.log_uploaded because logs are relevant to some of its consumers, e.g., ActiveData
Flags: needinfo?(jgriffin)
Comment 16•9 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #12) > Created attachment 8661109 [details] > log.txt > > fox2mike actually sent me some logs and it looks like we get timeout errors: > > 2015-09-15T06:30:56.762957+00:00 app[worker.1]: LogTimeoutError: key: > build.release-mozilla-release-macosx64_repack_17.1.log_uploaded, url: > http://stage.mozilla.org/pub/mozilla.org/firefox/nightly/41.0-candidates/ > build1/logs/release-mozilla-release-macosx64_repack_17-bm84-build1-build1. > txt.gz > > The full log can be seen as attachment. > > Fox2mike also restarted the service but I don't see that this helped. We are > still failing. Yes, I see these errors in papertrail. This seems to be a side effect of the netapp migration. The log url's being published to pulse by build are on http://stage.mozilla.org..., which doesn't exist any longer. When browsing, it gets redirected to archive.mozilla.org, but apparently that doesn't happen in pulsetranslator. I'll see if I can work around it there, but it would be better to have buildbot publish the actual log url.
Assignee | ||
Comment 18•9 years ago
|
||
So after talking to Jonathan on IRC we figured that the issue should be located in the get_url_info() method of loghandler. There we only check for a 200 response code and wait until we reach the timeout. Given that it is a redirect we will never get this code returned: https://github.com/mozilla/pulsetranslator/blob/master/pulsetranslator/loghandler.py#L26 I suggested to switch to the requests module so we do not have to differentiate between HTTPSConnection and HTTPConnection. Given the importance of this issue for us I will have a look at it.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Assignee | ||
Comment 19•9 years ago
|
||
The reason why we are failing is that we are doing a HEAD request, which returns immediately but not following the redirect. So what we would need is an additional check for 301 and retry the new location. A patch should be ready soon.
Assignee | ||
Comment 20•9 years ago
|
||
Here the patch which will fix this problem and also enhances what we actually log to the loghandler error log. It should give us a better insight in the future what's actually wrong with the HEAD request.
Attachment #8661423 -
Flags: review?(jgriffin)
Assignee | ||
Comment 21•9 years ago
|
||
Updated patch so we also log an error if the logurl is 404 or any other kind of failure.
Attachment #8661423 -
Attachment is obsolete: true
Attachment #8661423 -
Flags: review?(jgriffin)
Attachment #8661425 -
Flags: review?(jgriffin)
Assignee | ||
Comment 22•9 years ago
|
||
Well, it's time for bed. This time without the debugging code. Sorry.
Attachment #8661425 -
Attachment is obsolete: true
Attachment #8661425 -
Flags: review?(jgriffin)
Attachment #8661427 -
Flags: review?(jgriffin)
Comment 23•9 years ago
|
||
Comment on attachment 8661427 [details] [diff] [review] Patch v1.2 Review of attachment 8661427 [details] [diff] [review]: ----------------------------------------------------------------- Nice, thanks ::: pulsetranslator/loghandler.py @@ +33,5 @@ > > + # In case of a redirect make another head request for the new location. > + if resp.status_code == 301: > + resp.raise_for_status() > + resp = requests.head(resp.headers['location']) resp.raise_for_status() should be after the new requests.head()
Attachment #8661427 -
Flags: review?(jgriffin) → review+
Updated•9 years ago
|
Flags: needinfo?(jgriffin)
Assignee | ||
Comment 24•9 years ago
|
||
Fixed line ordering. I will push it in a moment.
Attachment #8661427 -
Attachment is obsolete: true
Attachment #8661468 -
Flags: review+
Assignee | ||
Comment 25•9 years ago
|
||
Landed patch on master as: https://github.com/mozilla/pulsetranslator/commit/9b759687fdde84830f95f9fa2904d640af510bee Jonathan, can you please publish?
Flags: needinfo?(jgriffin)
Comment 26•9 years ago
|
||
remote: Compressing source files... done. remote: Building source: remote: remote: -----> Python app detected remote: -----> Installing dependencies with pip remote: Collecting requests==2.7.0 (from -r requirements.txt (line 3)) remote: Downloading requests-2.7.0-py2.py3-none-any.whl (470kB) remote: Installing collected packages: requests remote: Successfully installed requests-2.7.0 remote: You are using pip version 7.1.0, however version 7.1.2 is available. remote: You should consider upgrading via the 'pip install --upgrade pip' command. remote: remote: -----> Discovering process types remote: Procfile declares types -> worker remote: remote: -----> Compressing... done, 45.6MB remote: -----> Launching... done, v21 remote: https://pulsetranslator.herokuapp.com/ deployed to Heroku remote: remote: Verifying deploy... done. To https://git.heroku.com/pulsetranslator.git 15415df..9b75968 master -> master
Flags: needinfo?(jgriffin)
Comment 27•9 years ago
|
||
This caused: Sep 15 14:49:17 pulsetranslator app/worker.2: Traceback (most recent call last): Sep 15 14:49:17 pulsetranslator app/worker.2: File "pulsetranslator/runservice.py", line 12, in <module> Sep 15 14:49:17 pulsetranslator app/worker.2: from pulsetranslator import PulseBuildbotTranslator Sep 15 14:49:17 pulsetranslator app/worker.2: File "/app/pulsetranslator/pulsetranslator.py", line 21, in <module> Sep 15 14:49:17 pulsetranslator app/worker.2: from loghandler import LogHandler Sep 15 14:49:17 pulsetranslator app/worker.2: File "/app/pulsetranslator/loghandler.py", line 13, in <module> Sep 15 14:49:17 pulsetranslator app/worker.2: from .translatorexceptions import LogTimeoutError Sep 15 14:49:17 pulsetranslator app/worker.2: ValueError: Attempted relative import in non-package fixing
Comment 28•9 years ago
|
||
fixed with remote: Compressing source files... done. remote: Building source: remote: remote: -----> Python app detected remote: -----> Installing dependencies with pip remote: You are using pip version 7.1.0, however version 7.1.2 is available. remote: You should consider upgrading via the 'pip install --upgrade pip' command. remote: remote: -----> Discovering process types remote: Procfile declares types -> worker remote: remote: -----> Compressing... done, 45.6MB remote: -----> Launching... done, v22 remote: https://pulsetranslator.herokuapp.com/ deployed to Heroku remote: remote: Verifying deploy.... done. To https://git.heroku.com/pulsetranslator.git 9b75968..c0aa93c master -> master
Assignee | ||
Comment 29•9 years ago
|
||
> ValueError: Attempted relative import in non-package I really do not understand this failure. The folder contains a __init__.py file so it should be a module directory. Also all is working fine for me locally with the relative import. Jonathan, any idea why that happens? Maybe it's how it gets published to Heroku? The actual fix for this landed by jgriffin as: https://github.com/mozilla/pulsetranslator/commit/c0aa93c830f16d7b5396bf3ae6b88c24b0dd5d87 Thinking more about the 301 check it might have been put into a loop until we have a status code not equal to it. But we won't have multiple redirects for our stuff so fixing it might not be that important right now. I'm closing this bug and will keep an eye on it for the next beta release or a new build for the 41.0 release if it has to happen.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 30•9 years ago
|
||
Hmm it seems to work when calling 'runtranslator', but not if invoking python runservice.py directly.
Assignee | ||
Comment 31•9 years ago
|
||
Maybe its the chained import of relative modules. Anyway I don't think it's important to spend any more time on it. Thanks for fixing it!
You need to log in
before you can comment on or make changes to this bug.
Description
•