Closed Bug 1183246 Opened 5 years ago Closed 1 year ago

We frequently parse logs twice

Categories

(Tree Management :: Treeherder: Log Parsing & Classification, task, P2)

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: wlach, Unassigned)

References

Details

It seems like we're occasionally parsing the same log twice in treeherder (see bug 1182282). 

The current "protection" we have against this happening consists only of seeing if the log in question has already been parsed: 

https://github.com/mozilla/treeherder/blob/master/treeherder/log_parser/tasks.py#L26
https://github.com/mozilla/treeherder/blob/master/treeherder/log_parser/utils.py#L24

... however that doesn't account for the case that a log parse is already in progress. I wonder if, before starting to actually parse the log, we should set a log parse status of "parsing" (and not parse the log if that status is set). Thoughts? Other solutions?
Flags: needinfo?(mdoglio)
Flags: needinfo?(emorley)
Flags: needinfo?(cdawson)
See Also: → 1182282
This might make an interesting bug for :moijes12 to work on, depending on what others say.
I agree.  This seems like a good first bug, to be sure.
Flags: needinfo?(cdawson)
It occurred to me later that it might be a problem if log parsing got accidentally interrupted and no retry was scheduled. Then logs could be stuck in the "parsing" state indefinitely. Do we need to set some kind of timeout here?
I think memcached entries for logs in progress is the answer here: no DB table churn, and expiry for free.
Flags: needinfo?(emorley)
Not convinced this is a good first bug; we need to look at all root causes (eg repeat parsing should only happen if say parsing is slow and the user selects the job in the UI, thus causing an _hp task to be triggered; or are we broken in other ways?).
If a parse-log task fails, we have a retry policy in place.
Flags: needinfo?(mdoglio)
The current celery setup doesn't allow multiple workers to pick the same task. When a worker acknowledges a task, that's removed from the queue. In case of catchable failure, the retry mechanism triggers a new task that is then picked up by one of the workers in the pool as if it was new. Ideally all the tasks that we want to be able to retry should be idempotent. If that's not the case, it may be a problem.
Priority: -- → P2
Component: Treeherder: Data Ingestion → Treeherder: Log Parsing & Classification

Cam, is this still an issue (see comment 7)? Thanks.

Flags: needinfo?(cdawson)

Whee, this bug is so old I'm still CC'ed on it.

I wrote this bug before I understood much about treeherder's architecture, I think :mdoglio's last comment was right: we should rely on celery not to perform the same operation more than once. For the rare case that this happens anyway, we should make any db writes performed as a result of log parsing are idempotent, as I did for perfherder in comment 0.

My recommendation would be to resolve this as worksforme.

Type: defect → task
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → WORKSFORME

Nothing needed from me.

Flags: needinfo?(cdawson)
You need to log in before you can comment on or make changes to this bug.