test_cycle_data.py intermittently fails on Travis in test_cycle_job_with_performance_data()

RESOLVED FIXED

Status

Tree Management
Treeherder
P1
normal
RESOLVED FIXED
a year ago
a year ago

People

(Reporter: emorley, Assigned: wlach)

Tracking

Details

(Reporter)

Description

a year ago
This is a new test, added in bug 1265503.

Example intermittent failure:

_____________________ test_cycle_job_with_performance_data _____________________

test_repository = <Repository: test_treeherder_jobs development>
failure_classifications = None
jm = <treeherder.model.derived.jobs.JobsModel object at 0x7f4aa4cac1d0>
sample_data = <tests.sampledata.SampleData object at 0x7f4aa6e32cd0>
sample_resultset = [{'author': 'Eric Chou <echou@mozilla.com>', 'push_timestamp': 1384347643, 'revision': '45f8637cb9f78f19cb8463ff174e81...86"

This reverts commit c624b65c1333d2f951533ff5f5d3094681cb1916.', 'repository': 'test_treeherder_jobs', ...}]}, ...]
test_perf_signature = <PerformanceSignature: tttttttttttttttttttttttttttttttttttttttt mysuite mytest win win7 x86 2017-01-03 23:00:00.096378>

    def test_cycle_job_with_performance_data(test_repository, failure_classifications,
                                             jm, sample_data, sample_resultset,
                                             test_perf_signature):
        # build a date that will cause the data to be cycled
        time_now = time.time()
        cycle_date_ts = int(time_now - 7 * 24 * 3600)
    
        job_data = sample_data.job_data[:1]
        job_data[0]['job']['submit_timestamp'] = cycle_date_ts
>       test_utils.do_job_ingestion(jm, job_data, sample_resultset, False)

tests/model/test_cycle_data.py:223: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/test_utils.py:101: in do_job_ingestion
    jm.store_job_data(blobs)
treeherder/model/derived/jobs.py:633: in store_job_data
    job, push_id, lower_tier_signatures)
treeherder/model/derived/jobs.py:996: in _load_job
    self._schedule_log_parsing(jl, result)
treeherder/model/derived/jobs.py:1053: in _schedule_log_parsing
    parse_job_log(func_name, routing_key, job_log)
treeherder/log_parser/tasks.py:57: in parse_job_log
    signature.apply_async()
../../../venv/lib/python2.7/site-packages/celery/canvas.py:251: in apply_async
    return _apply(args, kwargs, **options)
../../../venv/lib/python2.7/site-packages/celery/app/task.py:565: in apply_async
    link=link, link_error=link_error, **options)
../../../venv/lib/python2.7/site-packages/celery/app/task.py:763: in apply
    request=request, propagate=throw)
../../../venv/lib/python2.7/site-packages/celery/app/trace.py:355: in eager_trace_task
    uuid, args, kwargs, request)
../../../venv/lib/python2.7/site-packages/celery/app/trace.py:250: in trace_task
    task_request, exc, uuid, RETRY, call_errbacks=False,
../../../venv/lib/python2.7/site-packages/celery/app/trace.py:240: in trace_task
    R = retval = fun(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (1L, 'normal'), kwargs = {}, e = timeout('timed out',)
number_of_prior_retries = 0, params = {'number_of_prior_retries': 0}
timeout = 10

    @wraps(f)
    def inner(*args, **kwargs):
        try:
            return f(*args, **kwargs)
        except self.NON_RETRYABLE_EXCEPTIONS:
            raise
        except Exception as e:
            number_of_prior_retries = task_func.request.retries
            # Whilst the New Relic agent does report the exception that caused a retry,
            # it does so in a form like:
            #   `celery.exceptions:Retry: Retry in 640s: error('Error -3 while decompressing: incorrect header check',)`
            # ...which causes all retry exceptions to be lumped together in the same
            # `celery.exceptions:Retry` group. The original exception is then only
            # reported to New Relic once the max number of retries has been reached.
            # As such we manually report the retried exceptions to New Relic here, so
            # that the original exception is shown verbatim immediately, and then filter
            # out the automatic `celery.exceptions:Retry` exceptions via the web UI. See:
            # https://docs.newrelic.com/docs/agents/python-agent/back-end-services/python-agent-celery#ignoring-task-retry-errors
            params = {
                "number_of_prior_retries": number_of_prior_retries,
            }
            newrelic.agent.record_exception(params=params)
            # Implement exponential backoff with some randomness to prevent
            # thundering herd type problems. Constant factor chosen so we get
            # reasonable pause between the fastest retries.
            timeout = 10 * int(random.uniform(1.9, 2.1) ** number_of_prior_retries)
>           raise task_func.retry(exc=e, countdown=timeout)
E           Retry: Retry in 10s: timeout('timed out',)

treeherder/workers/task.py:57: Retry
----------------------------- Captured stderr call -----------------------------
[2017-01-03 23:00:30,415] ERROR [treeherder.log_parser.utils:55] Failed to download/parse log for 1: timed out
[2017-01-03 23:00:31,069] ERROR [treeherder.model.derived.jobs:644] Retry in 10s: timeout('timed out',)
Traceback (most recent call last):
  File "/home/travis/build/mozilla/treeherder/treeherder/model/derived/jobs.py", line 633, in store_job_data
    job, push_id, lower_tier_signatures)
  File "/home/travis/build/mozilla/treeherder/treeherder/model/derived/jobs.py", line 996, in _load_job
    self._schedule_log_parsing(jl, result)
  File "/home/travis/build/mozilla/treeherder/treeherder/model/derived/jobs.py", line 1053, in _schedule_log_parsing
    parse_job_log(func_name, routing_key, job_log)
  File "/home/travis/build/mozilla/treeherder/treeherder/log_parser/tasks.py", line 57, in parse_job_log
    signature.apply_async()
  File "/home/travis/venv/lib/python2.7/site-packages/celery/canvas.py", line 251, in apply_async
    return _apply(args, kwargs, **options)
  File "/home/travis/venv/lib/python2.7/site-packages/celery/app/task.py", line 565, in apply_async
    link=link, link_error=link_error, **options)
  File "/home/travis/venv/lib/python2.7/site-packages/celery/app/task.py", line 763, in apply
    request=request, propagate=throw)
  File "/home/travis/venv/lib/python2.7/site-packages/celery/app/trace.py", line 355, in eager_trace_task
    uuid, args, kwargs, request)
  File "/home/travis/venv/lib/python2.7/site-packages/celery/app/trace.py", line 250, in trace_task
    task_request, exc, uuid, RETRY, call_errbacks=False,
  File "/home/travis/venv/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/travis/build/mozilla/treeherder/treeherder/workers/task.py", line 57, in inner
    raise task_func.retry(exc=e, countdown=timeout)
Retry: Retry in 10s: timeout('timed out',)
I guess this is on me to fix. I'll start by just disabling the test.
Assignee: nobody → wlachance

Comment 2

a year ago
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/78f66bc7c5f831acd675ef0f7224ac07f3ec8889
Bug 1328451 - Add a mock log parser fixture to cycle data test

Prevents an intermittent from popping up
Actually fixing the test was extremely simple, so I just did that.
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
(Reporter)

Comment 4

a year ago
Awesome - thank you :-)
You need to log in before you can comment on or make changes to this bug.