Closed Bug 666022 Opened 13 years ago Closed 13 years ago

Some Firefox 4.0.1 -> 5.0 partial updates download as complete updates

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: u279076, Unassigned)

References

()

Details

Attachments

(1 file)

For some locales, an error displays in the update dialog stating something to the effect of "partial update cannot be verified, downloading complete update". As a result, some locales actually get a complete update instead of a partial update.

See the attached brasstacks log for details (simply select one of the locales reported as 'complete').

An example AUS URL for a "complete" update is as follows:
http://mozilla.mirrors.tds.net/pub/mozilla.org/firefox/releases/5.0/update/mac/sr/firefox-5.0.complete.mar

Please let me know if you need help interpreting the brasstacks logs.
This is kinda strange. The finalURI of the partial patch still points to our download server and not to a mirror.
Rob, can you help us in identifying what the status error 2147549183 means?
That is NS_ERROR_UNEXPECTED which is a generic error and is from nsIncrementalDownload which returns that for several cases.
http://mxr.mozilla.org/mozilla-central/source/netwerk/base/src/nsIncrementalDownload.cpp

Perhaps one of the networking folks can help figure out why? cc'ing Josh
Would also be handy if you can post in this bug whether you can reproduce manually.
Can we tell if it's a problem talking to download.m.o, or the actual mirror ?
(In reply to comment #5)
> Would also be handy if you can post in this bug whether you can reproduce
> manually.

It only failed a couple of times for all of the tests we ran. So no, I haven't tried that yet. But is there a way to let NSPR not recreate the log file for each start of Firefox? If that's possible I could re-run our automation and log all HTTP request/response headers.
(In reply to comment #6)
> Can we tell if it's a problem talking to download.m.o, or the actual mirror ?
I highly suspect it is for download.m.o since the url is
http://download.mozilla.org/?product=firefox-5.0-partial-4.0.1&os=osx&lang=sr&force=1

(In reply to comment #7)
> (In reply to comment #5)
> > Would also be handy if you can post in this bug whether you can reproduce
> > manually.
> 
> It only failed a couple of times for all of the tests we ran. So no, I
> haven't tried that yet. But is there a way to let NSPR not recreate the log
> file for each start of Firefox? If that's possible I could re-run our
> automation and log all HTTP request/response headers.
I'm going to leave that to someone on the networking team
Maybe Jason Duell can help figure this out?
> is there a way to let NSPR not recreate the log
> file for each start of Firefox?

Not by default.  If you apply this patch, and set "NSPR_LOG_MODULES=nsHttp:5,notrunc" in your environment, you should append to one big log file:

https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=534764&attachment=485332

More generally, I don't understand the details of how partial/updates use necko requests to have an idea of what's broken here, and I've never personally waded into nsIncrementalDownload.cpp, but I'm happy to try to be of more use if someone can clue me into what's going wrong (load of incremental/partial update is failing, but only for certain locales, and rarely enough that we can't capture it in a debugger?  Sounds like fun so far :)
(In reply to comment #10)
> Not by default.  If you apply this patch, and set
> "NSPR_LOG_MODULES=nsHttp:5,notrunc" in your environment, you should append
> to one big log file:
> 
> https://bugzilla.mozilla.org/page.cgi?id=splinter.
> html&bug=534764&attachment=485332

I can't patch the builds I'm testing with Mozmill. So it would be nice to get this checked in at some point.

I will re-run those update tests now and simply check if it could be related to a massive amount of requests as what we had yesterday.
There were 25 failures out of 248 locales in the update checks that led to this bug (see the URL). 

I ran 500 requests against download.m.o (using curl) and there were only 302 responses - no timeouts or other errors I could see. That hit both the Phoenix and San Jose datacenters, 250 each, and according to the X-Backend-Server header it hit pp-app-dist01..09 and pm-app-dist01..08. 

So it's not an issue now, but perhaps it was before. It will be interesting to see if the problem happens again. If it does I'll bet it's different locales.

What do you mean by 'massive amount of requests' whimboo ? There's a pretty high background level of update checks all day long, so it takes a lot of press/public awareness to raise the request rate significantly by manual checks. It might be possible for the machines serving download.m.o to get very busy due to other work. Perhaps mrz can suggest someone who can comment on that.
(In reply to comment #12)
> What do you mean by 'massive amount of requests' whimboo ? There's a pretty
> high background level of update checks all day long, so it takes a lot of

We don't run those tests beside the usual release testing work. It's probably something we should do to check if things like that also happens when we do not push a new release to the public. I can remember that we have already seen this issue in the past but never reported it as bug so far. A day after the release everything was fine. Something you already noticed when running your own tests.

My current test-run is still active but so far I can't see this issue:

http://mozmill-archive.brasstacks.mozilla.com/#/update/detail?branch=5.0&channel=release&from=2011-06-22&to=2011-06-22&target=5.0
Everything works now. I really have the impression it's related to our release days.
Sounds like we've done all the debugging we can here :(.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
Whimboo really wants this fixed before the next release, re-opening.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
IT, we would appreciate some input from your side. We were seeing odd behaviour from some update attempts that QA did yesterday. Specifically:
* Firefox checks for an update, receives a response from AUS like this: https://aus3.mozilla.org/update/3/Firefox/4.0.1/20110413222027/WINNT_x86-msvc/sr/release/Windows_NT%205.1/default/default/update.xml?force=1
* Firefox attempts to download the partial (http://download.mozilla.org/?product=firefox-5.0-partial-4.0.1&os=win&lang=son&force=1), but doesn't appear to get redirected properly (The "final URI spec" below should be an URL to the requested MAR on a mirror, not download.m.o still):
*** AUS:SVC Downloader:downloadUpdate - downloading from http://download.mozilla.org/?product=firefox-5.0-partial-4.0.1&os=osx&lang=sr&force=1 to /private/var/folders/rR/rRPQYt0bGFSKcIpcKmYxRU+++TM/-Tmp-/tmpvUx87Q.binary/Firefox.app/Contents/MacOS/updates/0/update.mar
*** AUS:SVC Downloader:onStartRequest - original URI spec: http://download.mozilla.org/?product=firefox-5.0-partial-4.0.1&os=osx&lang=sr&force=1, final URI spec: http://download.mozilla.org/?product=firefox-5.0-partial-4.0.1&os=osx&lang=sr&force=1
*** AUS:SVC Downloader:onStopRequest - original URI spec: http://download.mozilla.org/?product=firefox-5.0-partial-4.0.1&os=osx&lang=sr&force=1, final URI spec: http://download.mozilla.org/?product=firefox-5.0-partial-4.0.1&os=osx&lang=sr&force=1, status: 2147549183

Unfortunately, we don't have full HTTP headers, as Firefox doesn't log them during updates.

Do we know of any download.m.o machines that were acting up yesterday?
Could they act up in such a way that they would not redirect a request, under heavy load or other conditions experienced yesterday?
Assignee: nobody → server-ops-releng
Component: Release Engineering → Server Operations: RelEng
QA Contact: release → zandr
If there are no mirrors available serving the file in question, you will get a page that suggests downloading from releases.mozilla.org with a manual link on it.  The only time you'll get that is if bouncer is in good working condition and doesn't think there's any mirrors available that are serving the file in question.  If bouncer is over capacity you'd get redirected to status.mozilla.com (for the hardhat page)
(In reply to comment #18)
> If there are no mirrors available serving the file in question, you will get
> a page that suggests downloading from releases.mozilla.org with a manual
> link on it.  The only time you'll get that is if bouncer is in good working
> condition and doesn't think there's any mirrors available that are serving
> the file in question.  If bouncer is over capacity you'd get redirected to
> status.mozilla.com (for the hardhat page)

Sounds to me like we're hitting the "no mirrors available" state, given that it only happens right after a major release, and we never get redirected.

Is there any way we can force Bouncer to always serve files to our own machines? If not, this sounds like CANTFIX to me.
I suspect it would be difficult to implement but it seems like if bouncer knows that there are no mirrors it can redirect the user to then AUS could use this information and not offer an update when that is the case.
btw: that wouldn't help mozmill but it would help the users.
re: comment 11: Henrik, that patch is for a bug that got fixed another way, so it's not planned to land.  But if it would be useful for you for other purposes to have an append mode for NSPR logs we can open a new bug for it (use component NSPR and CC me).
Thanks Jason. I have filed bug 666376.
Henrik rebooted the machine that had issues (qa-horus). When he reran the tests the problem didn't occur again. If we spot specific problems with mirrors on the day we release then lets file them.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → WORKSFORME
(In reply to comment #24)
> Henrik rebooted the machine that had issues (qa-horus). When he reran the
> tests the problem didn't occur again. If we spot specific problems with
> mirrors on the day we release then lets file them.

No, here we are not talking about the same issue. The restart fixed another issue I have noticed but we never filed as bug. The failures on this bug have been also discovered on another machine.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Assignee: server-ops-releng → server-ops
Component: Server Operations: RelEng → Server Operations
QA Contact: zandr → mrz
Is this still an issue?
Resolving as incomplete, since I'm not clear on whether there is an action IT can take right now on it.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → INCOMPLETE
Now with bug 666376 fixed in Firefox 9, we could revisit this bug once we release Firefox 9 and the same issue happens again. Anthony, when we have reached this release and you can see this reported issue again, please immediately run an update test via our testrun_update.py script after you set the following environment variables:

export NSPR_LOG_MODULES=nsHttp:5, append
export NSPR_LOG_FILE=log.txt

I'm leaving this bug as closed for now.
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: