Closed Bug 1136380 Opened 9 years ago Closed 9 years ago

Get "Log could not be found" intermittently when loading a logline

Categories

(Tree Management :: Treeherder, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: camd, Assigned: camd)

References

Details

Attachments

(1 file)

Usually clicking the line again loads it fine.

I was able to reproduce this on prod with this log: https://treeherder.mozilla.org/logviewer.html#?job_id=1079889&repo=mozilla-central

Sometimes reloading a few times and clicking it will give that error.

The logslice endpoint returns: 
{"detail": "[Errno 13] Permission denied: '/data/www/treeherder.mozilla.org/treeherder-service/treeherder/webapp/log_cache/tmp8E1HW4'"}

This was what we got before my fix here:
https://github.com/mozilla/treeherder-service/commit/05dd682eb1c60a06c7badba1a915e08885b5c8d1
Priority: -- → P2
I think this is a regression from the Django 1.7 switch - I don't suppose you could take a look at this and bug 1133273? Given:
<RyanVM|sheriffduty> camd: btw, I'm hearing complaints from devs about the "Log failed to load" caching bug

It would be good to tie up the loose ends from some of the recent landings before we move onto other stuff / new features :-)
Assignee: nobody → cdawson
Blocks: 1119479
Priority: P2 → P1
I think I can reproduce Ed's error reliably by scrolling "up" in any log until I hit the top chunk. Or scrolling to the bottom, then scrolling back upwards. The latter scenario I see paired green/red dialogs.

I'll include a screen grab of what I'm seeing for posterity.
Attached image scrollToTopChunk
What I observe when scrolling to any top chunk on production.
I also see one which may be caused by a very long log-step (could be another, different cause)
https://treeherder.mozilla.org/logviewer.html#?job_id=1184268&repo=mozilla-central
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=097cad4009ca (OSX 10.10 opt, Mochitest2, classified)

There is also an issue in the run-data header, making it so wide the failure step container is way off to the right, out of frame. I'll open that in a separate bug if there isn't one.
David, pretty sure this is the issue I saw you mention on irc (only glanced at irc logs since had some new relic alerts earlier for something else), though there wasn't enough detail there to be sure. Can you confirm it's the same?

Cameron, don't suppose you could take a look at this? Pretty sure it's a regression from your django update, and that landed over a month ago now, so would be good to get this wrapped up pretty soon and before we start anything more new :-)
Flags: needinfo?(cdawson)
I didnt check the console, will do that the next time it happens but from the feels it is the same
Flags: needinfo?(cdawson) → needinfo?
Transcribed from IRC as camd/fubar were looking at it this aft:

camd   it ends up it's trying to read from a cache file and getting a permission problem intermittently

fubar  one web node had the log_cache dir owned by varnish instead of treeherder
fubar  changed it to treeherder:treeherder and I am unable to reproduce the error
fubar  oh ho ho. treeherder user has different uids
fubar  ok, pulled the odd node out, fixed the user, restarted things, and I think we're all set
Flags: needinfo?
This appears fixed.  marking resolved (thanks to Fubar).
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
(In reply to Cameron Dawson [:camd] from comment #9)
> This appears fixed.  marking resolved (thanks to Fubar).

Thank you both for figuring this out! :-)
Component: Treeherder: Log Viewer → TreeHerder
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: