Closed
Bug 1048920
Opened 10 years ago
Closed 10 years ago
OrangeFactor/logparser not ingesting data since 29th July
Categories
(Tree Management Graveyard :: OrangeFactor, defect)
Tree Management Graveyard
OrangeFactor
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Unassigned)
Details
On: http://brasstacks.mozilla.com/orangefactor/?display=OrangeFactor&endday=2014-08-05&startday=2014-07-28&tree=trunk There are no new data-points since 29th July, and: "plus 1646 oranges with no daily test-run count"
Reporter | ||
Comment 1•10 years ago
|
||
At the end of /home/webtools/apps/logparser/savelogs.err was: 2014-08-05 08:17:24,939 - BuildLogMonitor - ERROR - HTTP Error 404: Not Found Traceback (most recent call last): File "/home/webtools/apps/logparser/src/logparser/logparser/savelogs.py", line 170, in on_build_complete buildername=buildername) File "/home/webtools/apps/logparser/src/logparser/logparser/savelogs.py", line 134, in _download_and_parse_log remote = urllib2.urlopen(logurl) File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib64/python2.6/urllib2.py", line 397, in open response = meth(req, response) File "/usr/lib64/python2.6/urllib2.py", line 510, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib64/python2.6/urllib2.py", line 435, in error return self._call_chain(*args) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 518, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 404: Not Found I tried restarting the logparser service and then started seeing the following in savelogs.err: ...2014-08-05 08:23:03,942 - BuildLogMonitor - ERROR - [Errno 2] No such file or directory: u'/home/webtools/apps/logparser/incoming-logs/mozilla-inbound-panda_android-android-opt-1406667129-jsreftest-1.txt.gz' Traceback (most recent call last): File "/home/webtools/apps/logparser/src/logparser/logparser/savelogs.py", line 60, in parse lp.parseFiles() File "/home/webtools/apps/logparser/src/logparser/logparser/logparser.py", line 103, in parseFiles raise inst IOError: [Errno 2] No such file or directory: u'/home/webtools/apps/logparser/incoming-logs/mozilla-inbound-panda_android-android-opt-1406667129-jsreftest-1.txt.gz' 2014-08-05 08:23:06,298 - BuildLogMonitor - ERROR - error parsing file /home/webtools/apps/logparser/incoming-logs/mozilla-inbound-panda_android-android-debug-1406667129-mochitest-7.txt.gz Traceback (most recent call last): File "/home/webtools/apps/logparser/src/logparser/logparser/logparser.py", line 99, in parseFiles testdata = self._parseSingleFile(logname) File "/home/webtools/apps/logparser/src/logparser/logparser/logparser.py", line 70, in _parseSingleFile fp = open(log, "rb") IOError: [Errno 2] No such file or directory: u'/home/webtools/apps/logparser/incoming-logs/mozilla-inbound-panda_android-android-debug-1406667129-mochitest-7.txt.gz' 2014-08-05 08:23:06,298 - BuildLogMonitor - ERROR - [Errno 2] No such file or directory: u'/home/webtools/apps/logparser/incoming-logs/mozilla-inbound-panda_android-android-debug-1406667129-mochitest-7.txt.gz' Traceback (most recent call last): File "/home/webtools/apps/logparser/src/logparser/logparser/savelogs.py", line 60, in parse lp.parseFiles() File "/home/webtools/apps/logparser/src/logparser/logparser/logparser.py", line 103, in parseFiles raise inst IOError: [Errno 2] No such file or directory: u'/home/webtools/apps/logparser/incoming-logs/mozilla-inbound-panda_android-android-debug-1406667129-mochitest-7.txt.gz'
Reporter | ||
Comment 2•10 years ago
|
||
[root@orangefactor1.dmz.phx1 logparser]# ps aux | egrep '^webtools' webtools 29966 46.7 1.6 319408 32532 ? Sl 08:15 6:07 /home/webtools/apps/logparser/bin/python /home/webtools/apps/logparser/bin/savelogs --es --durable --es-server=elasticsearch-zlb.webapp.scl3.mozilla.com:9200 --es-server=elasticsearch-zlb.dev.vlan81.phx1.mozilla.com:9200 --savedir=/home/webtools/apps/logparser/incoming-logs --outputdir=/home/webtools/apps/logparser/finished-logs --outputlog=/home/webtools/apps/logparser/savelogs.out --errorlog=/home/webtools/apps/logparser/savelogs.err --pidfile=/home/webtools/apps/logparser/logparser.pid webtools 29973 47.0 0.9 224696 18924 ? S 08:15 6:09 /home/webtools/apps/logparser/bin/python /home/webtools/apps/logparser/bin/savelogs --es --durable --es-server=elasticsearch-zlb.webapp.scl3.mozilla.com:9200 --es-server=elasticsearch-zlb.dev.vlan81.phx1.mozilla.com:9200 --savedir=/home/webtools/apps/logparser/incoming-logs --outputdir=/home/webtools/apps/logparser/finished-logs --outputlog=/home/webtools/apps/logparser/savelogs.out --errorlog=/home/webtools/apps/logparser/savelogs.err --pidfile=/home/webtools/apps/logparser/logparser.pid webtools 29974 45.6 0.9 224792 19040 ? R 08:15 5:58 /home/webtools/apps/logparser/bin/python /home/webtools/apps/logparser/bin/savelogs --es --durable --es-server=elasticsearch-zlb.webapp.scl3.mozilla.com:9200 --es-server=elasticsearch-zlb.dev.vlan81.phx1.mozilla.com:9200 --savedir=/home/webtools/apps/logparser/incoming-logs --outputdir=/home/webtools/apps/logparser/finished-logs --outputlog=/home/webtools/apps/logparser/savelogs.out --errorlog=/home/webtools/apps/logparser/savelogs.err --pidfile=/home/webtools/apps/logparser/logparser.pid [root@orangefactor1.dmz.phx1 logparser]# cat logparser.pid 29966 Are we supposed to have multiple processes running?
Flags: needinfo?(jgriffin)
Comment 3•10 years ago
|
||
Yes, there are 3 processes...1 the parent process, and 2 child processes that handle the logs themselves. I'll take a look and see what else might be going wrong.
Flags: needinfo?(jgriffin)
Reporter | ||
Comment 4•10 years ago
|
||
Ah ok - thank you. With the "No such file or directory" IOError in comment 1, my first thought was race condition between the processes (for that exception at least). For the 404s, the stdout log seems to show test and build logs being parsed successfully, so I don't know if it's just the odd log (for special job types perhaps eg fuzzer?) that we're not finding?
Comment 5•10 years ago
|
||
So the logparser is behaving correctly, but for some reason it's far behind, and is currently processing logs from July 30. I'm not sure why this is the case. We can see if it catches up naturally, or I can kill the pending queue. Any preferences?
Reporter | ||
Comment 6•10 years ago
|
||
Do we think that the additional suites/platforms/repos added in bug 817269 might be the cause of the backlog? Given that the actual log parsing itself (beyond the reading of the header of the log) isn't actually resulting in anything we use at the moment - could we perhaps skip parsing the whole log to speed things up? I'm also unsure as to why we extract quite so much from the log, when many of the fields are already present in the pulse data? Either way, roll on OrangeFactor v2 based on the treeherder API! :-)
Comment 7•10 years ago
|
||
(In reply to Ed Morley [:edmorley] from comment #6) > Do we think that the additional suites/platforms/repos added in bug 817269 > might be the cause of the backlog? > Possibly. I'm going to turn off Talos tests again and that should help things a bit.
Comment 8•10 years ago
|
||
(In reply to Ed Morley [:edmorley] from comment #6) > Given that the actual log parsing itself (beyond the reading of the header > of the log) isn't actually resulting in anything we use at the moment - > could we perhaps skip parsing the whole log to speed things up? I'm also > unsure as to why we extract quite so much from the log, when many of the > fields are already present in the pulse data? It's because we used to display the actual failure data from the logs in OF, but that hasn't worked in a long time due to problems with ES. We could certainly hook things up more efficiently now, e.g., by writing to ES directly based on pulse data.
Comment 9•10 years ago
|
||
Seems like the logparser is slowly catching up...it's now parsing logs from Aug 2.
Comment 10•10 years ago
|
||
This has caught up now; I'm not sure if turning off Talos test parsing caused this or not, but it seems resolved.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 11•10 years ago
|
||
Great, thank you :-)
Assignee | ||
Updated•10 years ago
|
Product: Testing → Tree Management
Updated•4 years ago
|
Product: Tree Management → Tree Management Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•