Closed
Bug 1048737
Opened 10 years ago
Closed 8 years ago
addons cron on mxr-processor1 takes too long to run
Categories
(Developer Services :: General, task)
Developer Services
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ashish, Unassigned)
Details
(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3289])
History: https://bugzilla.mozilla.org/buglist.cgi?list_id=10923699&short_desc=mxr-processor1.private.scl3.mozilla.com&query_format=advanced&short_desc_type=allwordssubstr&component=Server%20Operations%3A%20MOC&product=mozilla.org > root 3467 0.0 0.0 106100 1164 ? S Jul31 0:00 /bin/sh /data/www/mxr.mozilla.org/update-full-onetree.sh addons > root 7149 0.0 0.0 106100 436 ? S Aug01 0:00 \_ /bin/sh /data/www/mxr.mozilla.org/update-full-onetree.sh addons > root 7150 0.0 0.0 141156 2152 ? S Aug01 0:00 \_ perl update-xref.pl addons > root 7159 0.0 0.0 106092 1024 ? S Aug01 0:00 \_ sh -c time /data/www/mxr.mozilla.org/genxref /data/mxr-data/addons/addons >> /data/mxr-data/addons/genxref.log 2>&1 > root 7160 99.4 0.8 327552 139744 ? R Aug01 5690:11 \_ /usr/bin/perl /data/www/mxr.mozilla.org/genxref /data/mxr-data/addons/addons genxref always takes a very long time (more than a week) to complete. I wonder when was the last time it actually completed a full run because each time the check alerts, the oncalls kill the process (as per documentation) (current run from July 31 was positively spawned by hand in Bug 1047212). So, to actually determine how long the run would take, I'm not killing gexref.
Comment 1•10 years ago
|
||
<nagios-scl3:#sysadmins> Tue 05:27:21 PDT [5435] mxr-processor1.private.scl3.mozilla.com:File Age - /var/lock/mxr/long is OK: OK: 1dir(s) -- /var/lock/mxr/long: 0 files (http://m.mozilla.org/File+Age+-+/var/lock/mxr/long)
Reporter | ||
Comment 2•10 years ago
|
||
:jakem - Do you have any suggestions here?
Flags: needinfo?(nmaul)
Summary: mxr-processor1.private.scl3 takes too long to run → addons cron on mxr-processor1 takes too long to run
Comment 3•10 years ago
|
||
No great ideas, sorry. Our best bet will be to work with the AMO folks to develop a better way for this to work. Or perhaps we could reduce it to running monthly. @jorgev might be the best person to talk to here... not sure. I've CC'd him. Failing that, @oremj or @jason would be my next guesses. Also CC'd. I do know that this is a fairly unusual test by MXR standards. MXR expects to check out code from a repo and then scan it... there is no such repo in this case, so instead there's a script that checks a database to find the paths/names of all current addons, then fetches them. It has some intelligence as to minimizing the amount of data it has to fetch (relying on already-fetched content), but I'm not convinced it works properly in all cases. It then must also unpack every XPI that it fetched so it can scan their contents. Finally, MXR can scan the files as normal. It could also be that we've simply outgrown this design altogether. I imagine the sheer volume of AMO Addons has only increased over time, so maybe it simply doesn't complete in a reasonable time frame anymore. MXR is on the way out and DXR is the new hotness. I don't know how soon that might be usable, but it might be worth considering a stopgap solution here, with the proper solution being engineered for use with DXR, not MXR. It's theoretically possible to spin up another processor node that attempts to process only the Addons tree. This won't really make addons faster, but we could set separate schedules/alerts on it, and it would have less of an effect on the other trees. I also suspect several pieces of the current addons tree processing are probably serial in nature. If they could be parallelized, they might run much faster overall (given enough cores and I/O throughput). Doing this probably requires a separate box though... parallelizing addons on the existing system would have the effect of choking off CPU from other trees. It might run in days (or even hours) instead of weeks, but that's way too long for everything else to be stalled. :)
Flags: needinfo?(nmaul)
Comment 4•10 years ago
|
||
I can't find the bug atm, but I found a case where two addon's were causing genxref to loop for days. I managed to "fix" it, but the regexps are so bewildering that I couldn't tell why it worked or if it broke something else. The issue went away before I pushed anything, though. MXR is going away, so significant time or cost should not be spent on this. If someone wants to debug and provide a patch, I'm happy to apply it. Otherwise, I would actually suggest we create a new semi-monthly cronjob that just runs addons, and tweak nagios if needed. Also, Dev Services, not WebOps. :-)
Component: WebOps: Other → WebOps: Source Control
Comment 5•10 years ago
|
||
I'm not familiar with the implementation of the Add-ons MXR, but I would be happy to help make it more efficient. An obvious first question to ask would be if we need to process all add-ons every time, or if we can just update the index incrementally with only the ones that have been updated since the last run (a couple hundred per week).
Comment 6•10 years ago
|
||
This just came up again. Anything to help here before it does away?
Updated•10 years ago
|
Group: infra
Component: WebOps: Source Control → General
Product: Infrastructure & Operations → Developer Services
Updated•10 years ago
|
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/678] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3279] [kanban:https://kanbanize.com/ctrl_board/4/678]
Updated•10 years ago
|
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3279] [kanban:https://kanbanize.com/ctrl_board/4/678] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3284] [kanban:https://kanbanize.com/ctrl_board/4/678]
Updated•10 years ago
|
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3284] [kanban:https://kanbanize.com/ctrl_board/4/678] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3289] [kanban:https://kanbanize.com/ctrl_board/4/678]
Updated•10 years ago
|
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3289] [kanban:https://kanbanize.com/ctrl_board/4/678] → [kanban:https://kanbanize.com/ctrl_board/4/678]
Updated•10 years ago
|
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/678] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3289]
Updated•9 years ago
|
Assignee: server-ops-webops → nobody
QA Contact: nmaul
Comment 7•8 years ago
|
||
service decommissioned
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•