666022 - Some Firefox 4.0.1 -> 5.0 partial updates download as complete updates

Reporter

Description

•

13 years ago

For some locales, an error displays in the update dialog stating something to the effect of "partial update cannot be verified, downloading complete update". As a result, some locales actually get a complete update instead of a partial update.

See the attached brasstacks log for details (simply select one of the locales reported as 'complete').

An example AUS URL for a "complete" update is as follows:
http://mozilla.mirrors.tds.net/pub/mozilla.org/firefox/releases/5.0/update/mac/sr/firefox-5.0.complete.mar

Please let me know if you need help interpreting the brasstacks logs.

Nick Thomas [:nthomas] (UTC+12)

Comment 1

•

13 years ago

Digging a bit:

http://mozmill-archive.brasstacks.mozilla.com/#/update/report/ab6cdcd346f08a21493f2982e6d4ef56 is win32 nl failing over to the complete

https://aus3.mozilla.org/update/3/Firefox/4.0.1/20110413222027/WINNT_x86-msvc/nl/release/Windows_NT%205.1/default/default/update.xml?force=1 is both partial and complete

The mirror used for the complete was http://mozilla.c3sl.ufpr.br/releases//firefox/releases/5.0/update/win32/nl/firefox-5.0.complete.mar, which has the correct hash. Need to see what we hit for the partial though.

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 2

•

13 years ago

Attached file results of a broken partial update — Details

This is kinda strange. The finalURI of the partial patch still points to our download server and not to a mirror.

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 3

•

13 years ago

Rob, can you help us in identifying what the status error 2147549183 means?

Robert Strong (they/them - no direct email)

Comment 4

•

13 years ago

That is NS_ERROR_UNEXPECTED which is a generic error and is from nsIncrementalDownload which returns that for several cases.
http://mxr.mozilla.org/mozilla-central/source/netwerk/base/src/nsIncrementalDownload.cpp

Perhaps one of the networking folks can help figure out why? cc'ing Josh

Robert Strong (they/them - no direct email)

Comment 5

•

13 years ago

Would also be handy if you can post in this bug whether you can reproduce manually.

Nick Thomas [:nthomas] (UTC+12)

Comment 6

•

13 years ago

Can we tell if it's a problem talking to download.m.o, or the actual mirror ?

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 7

•

13 years ago

(In reply to comment #5)
> Would also be handy if you can post in this bug whether you can reproduce
> manually.

It only failed a couple of times for all of the tests we ran. So no, I haven't tried that yet. But is there a way to let NSPR not recreate the log file for each start of Firefox? If that's possible I could re-run our automation and log all HTTP request/response headers.

Robert Strong (they/them - no direct email)

Comment 8

•

13 years ago

(In reply to comment #6)
> Can we tell if it's a problem talking to download.m.o, or the actual mirror ?
I highly suspect it is for download.m.o since the url is
http://download.mozilla.org/?product=firefox-5.0-partial-4.0.1&os=osx&lang=sr&force=1

(In reply to comment #7)
> (In reply to comment #5)
> > Would also be handy if you can post in this bug whether you can reproduce
> > manually.
> 
> It only failed a couple of times for all of the tests we ran. So no, I
> haven't tried that yet. But is there a way to let NSPR not recreate the log
> file for each start of Firefox? If that's possible I could re-run our
> automation and log all HTTP request/response headers.
I'm going to leave that to someone on the networking team

Josh Aas

Comment 9

•

13 years ago

Maybe Jason Duell can help figure this out?

Jason Duell

Comment 10

•

13 years ago

> is there a way to let NSPR not recreate the log
> file for each start of Firefox?

Not by default.  If you apply this patch, and set "NSPR_LOG_MODULES=nsHttp:5,notrunc" in your environment, you should append to one big log file:

https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=534764&attachment=485332

More generally, I don't understand the details of how partial/updates use necko requests to have an idea of what's broken here, and I've never personally waded into nsIncrementalDownload.cpp, but I'm happy to try to be of more use if someone can clue me into what's going wrong (load of incremental/partial update is failing, but only for certain locales, and rarely enough that we can't capture it in a debugger?  Sounds like fun so far :)

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 11

•

13 years ago

(In reply to comment #10)
> Not by default.  If you apply this patch, and set
> "NSPR_LOG_MODULES=nsHttp:5,notrunc" in your environment, you should append
> to one big log file:
> 
> https://bugzilla.mozilla.org/page.cgi?id=splinter.
> html&bug=534764&attachment=485332

I can't patch the builds I'm testing with Mozmill. So it would be nice to get this checked in at some point.

I will re-run those update tests now and simply check if it could be related to a massive amount of requests as what we had yesterday.

Nick Thomas [:nthomas] (UTC+12)

Comment 12

•

13 years ago

There were 25 failures out of 248 locales in the update checks that led to this bug (see the URL). 

I ran 500 requests against download.m.o (using curl) and there were only 302 responses - no timeouts or other errors I could see. That hit both the Phoenix and San Jose datacenters, 250 each, and according to the X-Backend-Server header it hit pp-app-dist01..09 and pm-app-dist01..08. 

So it's not an issue now, but perhaps it was before. It will be interesting to see if the problem happens again. If it does I'll bet it's different locales.

What do you mean by 'massive amount of requests' whimboo ? There's a pretty high background level of update checks all day long, so it takes a lot of press/public awareness to raise the request rate significantly by manual checks. It might be possible for the machines serving download.m.o to get very busy due to other work. Perhaps mrz can suggest someone who can comment on that.

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 13

•

13 years ago

(In reply to comment #12)
> What do you mean by 'massive amount of requests' whimboo ? There's a pretty
> high background level of update checks all day long, so it takes a lot of

We don't run those tests beside the usual release testing work. It's probably something we should do to check if things like that also happens when we do not push a new release to the public. I can remember that we have already seen this issue in the past but never reported it as bug so far. A day after the release everything was fine. Something you already noticed when running your own tests.

My current test-run is still active but so far I can't see this issue:

http://mozmill-archive.brasstacks.mozilla.com/#/update/detail?branch=5.0&channel=release&from=2011-06-22&to=2011-06-22&target=5.0

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 14

•

13 years ago

Everything works now. I really have the impression it's related to our release days.

bhearsum@mozilla.com (:bhearsum)

Comment 15

•

13 years ago

Sounds like we've done all the debugging we can here :(.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → WORKSFORME

bhearsum@mozilla.com (:bhearsum)

Comment 16

•

13 years ago

Whimboo really wants this fixed before the next release, re-opening.

Status: RESOLVED → REOPENED

Resolution: WORKSFORME → ---

bhearsum@mozilla.com (:bhearsum)

Comment 17

•

13 years ago

IT, we would appreciate some input from your side. We were seeing odd behaviour from some update attempts that QA did yesterday. Specifically:
* Firefox checks for an update, receives a response from AUS like this: https://aus3.mozilla.org/update/3/Firefox/4.0.1/20110413222027/WINNT_x86-msvc/sr/release/Windows_NT%205.1/default/default/update.xml?force=1
* Firefox attempts to download the partial (http://download.mozilla.org/?product=firefox-5.0-partial-4.0.1&os=win&lang=son&force=1), but doesn't appear to get redirected properly (The "final URI spec" below should be an URL to the requested MAR on a mirror, not download.m.o still):
*** AUS:SVC Downloader:downloadUpdate - downloading from http://download.mozilla.org/?product=firefox-5.0-partial-4.0.1&os=osx&lang=sr&force=1 to /private/var/folders/rR/rRPQYt0bGFSKcIpcKmYxRU+++TM/-Tmp-/tmpvUx87Q.binary/Firefox.app/Contents/MacOS/updates/0/update.mar
*** AUS:SVC Downloader:onStartRequest - original URI spec: http://download.mozilla.org/?product=firefox-5.0-partial-4.0.1&os=osx&lang=sr&force=1, final URI spec: http://download.mozilla.org/?product=firefox-5.0-partial-4.0.1&os=osx&lang=sr&force=1
*** AUS:SVC Downloader:onStopRequest - original URI spec: http://download.mozilla.org/?product=firefox-5.0-partial-4.0.1&os=osx&lang=sr&force=1, final URI spec: http://download.mozilla.org/?product=firefox-5.0-partial-4.0.1&os=osx&lang=sr&force=1, status: 2147549183

Unfortunately, we don't have full HTTP headers, as Firefox doesn't log them during updates.

Do we know of any download.m.o machines that were acting up yesterday?
Could they act up in such a way that they would not redirect a request, under heavy load or other conditions experienced yesterday?

Assignee: nobody → server-ops-releng

Component: Release Engineering → Server Operations: RelEng

QA Contact: release → zandr

Dave Miller [:justdave]

Comment 18

•

13 years ago

If there are no mirrors available serving the file in question, you will get a page that suggests downloading from releases.mozilla.org with a manual link on it.  The only time you'll get that is if bouncer is in good working condition and doesn't think there's any mirrors available that are serving the file in question.  If bouncer is over capacity you'd get redirected to status.mozilla.com (for the hardhat page)

bhearsum@mozilla.com (:bhearsum)

Comment 19

•

13 years ago

(In reply to comment #18)
> If there are no mirrors available serving the file in question, you will get
> a page that suggests downloading from releases.mozilla.org with a manual
> link on it.  The only time you'll get that is if bouncer is in good working
> condition and doesn't think there's any mirrors available that are serving
> the file in question.  If bouncer is over capacity you'd get redirected to
> status.mozilla.com (for the hardhat page)

Sounds to me like we're hitting the "no mirrors available" state, given that it only happens right after a major release, and we never get redirected.

Is there any way we can force Bouncer to always serve files to our own machines? If not, this sounds like CANTFIX to me.

Robert Strong (they/them - no direct email)

Comment 20

•

13 years ago

I suspect it would be difficult to implement but it seems like if bouncer knows that there are no mirrors it can redirect the user to then AUS could use this information and not offer an update when that is the case.

Robert Strong (they/them - no direct email)

Comment 21

•

13 years ago

btw: that wouldn't help mozmill but it would help the users.

Jason Duell

Comment 22

•

13 years ago

re: comment 11: Henrik, that patch is for a bug that got fixed another way, so it's not planned to land.  But if it would be useful for you for other purposes to have an append mode for NSPR logs we can open a new bug for it (use component NSPR and CC me).

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 23

•

13 years ago

Thanks Jason. I have filed bug 666376.

Nick Thomas [:nthomas] (UTC+12)

Comment 24

•

13 years ago

Henrik rebooted the machine that had issues (qa-horus). When he reran the tests the problem didn't occur again. If we spot specific problems with mirrors on the day we release then lets file them.

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

Resolution: --- → WORKSFORME

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 25

•

13 years ago

(In reply to comment #24)
> Henrik rebooted the machine that had issues (qa-horus). When he reran the
> tests the problem didn't occur again. If we spot specific problems with
> mirrors on the day we release then lets file them.

No, here we are not talking about the same issue. The restart fixed another issue I have noticed but we never filed as bug. The failures on this bug have been also discovered on another machine.

Status: RESOLVED → REOPENED

Resolution: WORKSFORME → ---

Amy Rich [:arr] [:arich]

Updated

•

13 years ago

Assignee: server-ops-releng → server-ops

Component: Server Operations: RelEng → Server Operations

QA Contact: zandr → mrz

Corey Shields [:cshields]

Comment 26

•

13 years ago

Is this still an issue?

Justin Dow [:jabba]

Comment 27

•

13 years ago

Resolving as incomplete, since I'm not clear on whether there is an action IT can take right now on it.

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

Resolution: --- → INCOMPLETE

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 28

•

13 years ago

Now with bug 666376 fixed in Firefox 9, we could revisit this bug once we release Firefox 9 and the same issue happens again. Anthony, when we have reached this release and you can see this reported issue again, please immediately run an update test via our testrun_update.py script after you set the following environment variables:

export NSPR_LOG_MODULES=nsHttp:5, append
export NSPR_LOG_FILE=log.txt

I'm leaving this bug as closed for now.

Nobody; OK to take it and work on it

Updated

•

9 years ago

Product: mozilla.org → mozilla.org Graveyard