Closed Bug 1473068 Opened 2 years ago Closed 3 months ago

Periodic "An Unexpected Error Occurred" when browsing reports and comments

Categories

(Socorro :: Webapp, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: marcia, Assigned: willkg)

Details

Attachments

(8 files)

Seen while looking at crash stats.

Periodically while looking through Socorro I will get this error. It doesn't happen consistently, but it was commonly seen when trying to look at comments or reports.

See attached screenshot for an example.
The only errors Sentry shows on the 2nd are assertion errors for a missing elasticsearch index, which AFAIK shouldn't trigger an error message like "Forbidden".
I just hit this again while doing another query a few moments ago. It happened when I added the "Moz Crash Reason Field" to the query.
I've also been seeing this for a few days (weeks?).

I tend to get this when opening URL from "Crash Signature" link in bugzilla.

Perhaps a different bug, highly reproducible - I also get errors "Bad request" "The request's parameters could not be understood." when opening several valid URL that I have in a Firefox bookmarks folder and I do "Open all in tabs".  Specifically, I have 4 crash-stats URL in the folder and usually all four fail.  Open the URLs individually and no problem.
Seen again this morning while looking at a signature.
I've been hitting intermittent issues in the morning like these. I finally think I've got it reproducible:

1. go to https://crash-stats.mozilla.com/
2. log in
3. do a bunch of stuff like super searches
4. go home for the night
5. come back next morning and then suddenly I'm getting 403 errors for XHR requests and "Bad request" for non-XHR requests

Judging by that, I'm pretty sure the problem is the SessionRefresh middleware which is refreshing the Mozilla SSO session.

Making this a P2 to look into further.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
Component: General → Webapp
Priority: -- → P2
seeing this many times every day now - 30-50% of queries fail
Wayne: When you say "seeing this many times", can you be more specific? Do you have steps to reproduce?
Flags: needinfo?(vseerror)
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #7)
> Wayne: When you say "seeing this many times", can you be more specific? Do
> you have steps to reproduce?

I haven't tried it yet in firefox safe mode (I use nightly.  In a crash bug report to open a crash signature link (like in https://bugzilla.mozilla.org/show_bug.cgi?id=1356399 ) I typlically right+click.  For a few weeks now I often get ....

https://crash-stats.mozilla.com/oidc/callback/?code=IB7nzGOoRLbMsrS-&state=Meq1IsnI8TJ18I4HI8Y1sLr7NfHq2N7a#crash-reports

Bad Request
The request's parameters could not be understood.

Error Message:

<!DOCTYPE html>
<html lang="en-US" class="production">
    <head>
        <meta charset="UTF-8" />
        <title>Bad Request</title>
        

        <link href="/static/css/crashstats-base.min.1d48c34e39de.css" rel="stylesheet" type="text/css" />
        
<style type="text/css">
ul.errorlist li {
    list-style: unset;
    margin-left: 30px;
}
.body p {
    color: red;
}
</style>

    </head>
    <body>
        <div class="page-header">
            <a href="/">
    <span class="title">Mozilla Crash Reports</span>
</a>
        </div>
        
<div id="mainbody">
    <div class="page-heading">
        <h2>Bad Request</h2>
    </div>
    <div class="panel">
        <div class="title">
            <h2>The request's parameters could not be understood.</h2>
        </div>
        <div class="body">
            
                <p>
                    Error Message:
                </p>
                <pre>
                </pre>
            
        </div>
    </div>
</div>
        <div id="footer" class="page-footer">
            <div class="nav">
    <div class="about">
        <b>Mozilla Crash Reports</b> - Powered by <a href="https://github.com/mozilla/socorro">Socorro</a> - All dates are UTC
    </div>
    <ul>
        <li><a href="/documentation/">User Documentation</a></li>
        <li><a href="/api/">API</a></li>
        <li><a href="/crontabber-state/">Crontabber State</a></li>
        <li><a href="https://github.com/mozilla/socorro">Source</a></li>
        <li><a href="https://www.mozilla.org/privacy/websites/">Privacy Policy</a></li>
    </ul>
</div>
        </div>
        
        <script type="text/javascript" src="/static/js/error.min.c59e02301ac7.js" charset="utf-8"></script>
    </body>
</html>
Flags: needinfo?(vseerror)
I also have a bookmark group which has a few crash-stats links (queries).  When I open the group typically all the links fail. Right now it's not failing for me. (see comment 3)
I talked to Peter about this. He and I (mostly him) spent a lot of time figuring out how API and XHR things work with the Mozilla SSO and building in the bits into mozilla-django-oidc to support that.

I'm going to do a quick fix now and write up a bug for a more correct/extensive fix later.
Commits pushed to master at https://github.com/mozilla-services/socorro

https://github.com/mozilla-services/socorro/commit/9757b33a29b86486d5930109329afa4fd0126e10
fix bug 1473068: fix session refresh errors with xhr

The search page does XHR requests when switching between searches. Other
pages do similar things. When the SessionRefresh middleware handles those
requests, it can return a 403 or a redirect, but the js can't handle that.

This fixes the SessionRefresh middleware to ignore those urls and not
refresh the session on them.

https://github.com/mozilla-services/socorro/commit/a1f03fa1294da0d04623ca2790ec515f83597189
Merge pull request #4537 from willkg/1473068-session-refresh

fix bug 1473068: fix session refresh errors with xhr
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
We don't do deploys on Fridays, but I'll deploy it as soon as I can on Monday.
Commits pushed to master at https://github.com/mozilla-services/socorro

https://github.com/mozilla-services/socorro/commit/2b54b37e4aa3bcb6f2b57c1278a79967ce4212d4
fix bug 1473068: fix session refresh errors with xhr (buginfo)

This adds another url used by the supersearch page to the exempt urls
list.

https://github.com/mozilla-services/socorro/commit/94e9bd4b0c59cb42441c72b7761b220f501accce
Merge pull request #4540 from willkg/1473068-buginfo

fix bug 1473068: fix session refresh errors with xhr (buginfo)
Hello Will - Sadly I am still seeing this fairly regularly in Socorro when looking at nightly crash data. Sometimes reloading the page fixes it.
Marcia: Is it exactly the same as the screenshot you posted? Is it in any way different?

When you say "sometimes reloading the page fixes it", what happens when that doesn't fix it? What do you do?
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #19)
> Marcia: Is it exactly the same as the screenshot you posted? Is it in any
> way different?
> 
> When you say "sometimes reloading the page fixes it", what happens when that
> doesn't fix it? What do you do?

Yes, it looks the same as the first screenshot I posted. Sometimes if I get the error, I cut and paste the crash signature and perform the search from the search bar at the top of the screen. One thing to note - In the super search field, I am often changing the "To" date to be tomorrow's date, in order to capture the most recent crashes. Not sure if that factors in or not...
Reopening this per comment #18.

I don't have any good guesses as to what might be going on, so I'm going to need steps to reproduce. I've got a pretty full plate, so this isn't something I can get to soon.

If someone else could look into it or shed any light on the issue, that'd be helpful!
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Mark as defect.

Type: task → defect

Unassigning myself from this. I haven't been able to reproduce it, but that doesn't mean it's not still a problem. I don't have any good ideas on how to go forward.

Assignee: willkg → nobody

Add some more XHR endpoints to exempt from session renewal checks.

Assignee: nobody → jwhitlock

The latest changes, which will be deployed soon, should eliminate this issue when loading most tabs in the signature report. The aggregations and graph tabs are still vulnerable, but will require deeper changes, such as in the external library mozilla-django-oidc.

Please continue to report when this happens. Screenshots and URLs are helpful.

(In reply to John Whitlock [:jwhitlock] from comment #26)

The latest changes, which will be deployed soon, should eliminate this issue when loading most tabs in the signature report. The aggregations and graph tabs are still vulnerable, but will require deeper changes, such as in the external library mozilla-django-oidc.

Please continue to report when this happens. Screenshots and URLs are helpful.

Thanks John. I have continued to see this issue since I am in Socorro a good portion of the day. I will mention in Channel Meeting today to see if anyone else besides me sees the issue.

The changes are deployed. If our theory is correct, on signature reports the tabs Summary, Reports, Bugzilla, Comments, and Correlations should not show "An Unexpected Error Occurred". Please report if it does. The tabs Aggregations and Graphs may continue showing the bug. We're discussing the best way to apply the fix to these as well.

Given that no one has said anything here, I think we're probably fine now.

If there are still issues, please reopen this with the urls you had problems with.

Status: REOPENED → RESOLVED
Closed: 2 years ago1 year ago
Resolution: --- → FIXED

Oh, whoops--I missed comment #26.

Outstanding:

The aggregations and graph tabs are still vulnerable, but will require deeper changes, such as in the external library mozilla-django-oidc.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

We're not going to get to this any time soon, so I'm bumping it down to P3.

Priority: P2 → P3

I agree on P3, the error should be trigger much less often now that most API endpoints are exempted.

I've proposed two changes to the upstream project mozilla-django-oidc, which are different approaches to exempting a set of URLs. If either is merged and released, we can update Socorro's configuration for the remaining vulnerable views.

Bumping this to P2. Gabriele is hitting it periodically.

Priority: P3 → P2

Gabriele, do you recall which URLs you are seeing this on? I'd expect the aggregations or graphs tabs, but not on the other ones.

(and sorry if this is the wrong Gabriele!)

Flags: needinfo?(gsvelto)

Middle-clicking a few links here so that the tabs would be opened in rapid sequence always triggered this issue for me... but now it doesn't anymore. I'm really puzzled, I'm sure I've run into this issue just a couple of weeks ago.

Flags: needinfo?(gsvelto)

Thanks. This doesn't sound like the issue addressed in June 2019 - the XHR URLs on that page are /search/results/ and /search/fields, which have been in the OIDC_EXEMPT_URLS fields forever.

The issue may have been a load on the Elastic Search server, potentially from a different user. Will and Brian have been adjusting ES in the past few weeks, and maybe their mitigations helped as well.

If it happens again (May 2021?), please post the URL and when you saw it.

I just hit this again: I had three tabs open on various crashes and I restarted Firefox. Upon restart Firefox tried to reload the three tabs at the same time and they all returned the error page.

John: Do you think this bug is a manifestation of the problem discussed here?: https://github.com/mozilla/mozilla-django-oidc/pull/345

It's hard to tell without more information, like the URLs on https://crash-stats.mozilla.org that were displaying an error page or errors in the developer tools. I also have reservations about the proposed patch on PR 345.

The links from https://dbaron.org/mozilla/crashes-by-build go to /search pages, and make XHR requests to /search/results and /search/fields. I having trouble finding the errors returned from those pages. I can only find 11 requests with https://dbaron.org/... as the referrer, with 302s (for redirect to login) at the start of the session, and 200s (success) for the later requests when the bug was filed.

https://github.com/mozilla-services/socorro/pull/5561 updated mozilla-django-oidc to pick up the changes John made to exempt urls configuration and also better session handling. We theorize both those fixes should help with this bug.

John has switched to MLS full-time, so I'm going to grab this from him and look into it soon.

Assignee: jwhitlock → willkg

willkg merged PR #5565: "bug 1473068: add signature urls to OIDC_EXEMPT_URLS" in c170a9e.

I can't reproduce the issue, so I'm going to do a light test on stage to make sure it didn't break anything, then push it to prod.

I pushed the latest changes to prod in bug #1664250.

Can anyone still reproduce this?

I'll let it sit for a week and then close it if I haven't heard from anyone.

It's been a week. I don't see any new instances of the problem in Sentry and no one has said anything, so I'm marking this FIXED!

Status: REOPENED → RESOLVED
Closed: 1 year ago3 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.