781012 - sanity check hg 2.5.4 on hgssh.stage.dmz.scl3.mozilla.com

Reporter

Description

•

13 years ago

IT has staged a 2.2.3 on hg1.dmz.scl3.mozilla.com - test to ensure no known breakage. Bonus points for documenting all the tests to simplify the next update cycle.

hwine

Reporter

Updated

•

13 years ago

Assignee: nobody → catlee

Chris AtLee [:catlee]

Updated

•

13 years ago

Whiteboard: [reit] → [reit][buildduty]

Justin Wood (:Callek)

Assignee

Comment 1

•

13 years ago

(In reply to Hal Wine [:hwine] from comment #0) > IT has staged a 2.2.3 on hg1.dmz.scl3.mozilla.com - test to ensure no known > breakage. Per IRC this was hgweb1.dmz.scl3.mozilla.com which itself is a VHost that doesn't respond to that and only responds to hg.m.o so you should adjust HOSTS locally for that.

Summary: sanity check hg 2.2.3 on hg1.dmz.scl3.mozilla.com → sanity check hg 2.2.3 on hgweb1.dmz.scl3.mozilla.com

Chris AtLee [:catlee]

Comment 2

•

13 years ago

I tried using bld-centos6-hp-002.build.mozilla.org (10.12.52.43) to test, but I got connection timeouts. bkero, let me know when I should try again.

Chris AtLee [:catlee]

Updated

•

13 years ago

Assignee: catlee → nobody

Ben Kero [:bkero]

Comment 3

•

13 years ago

Can you ensure that "nc -z hgweb1.dmz.scl3.mozilla.com 80" actually works? Otherwise we're going to have to add a network flow. Or wait for 781925 to finish, then actually use that address.

hwine

Reporter

Comment 4

•

13 years ago

There is no flow from build-vpn to DMZ (although the host cited in comment #2 doesn't have any tools installed). From a build-vpn box with tools: [cltbld@buildbot-master33 ~]$ time nc -zv hgweb1.dmz.scl3.mozilla.com 80 nc: connect to hgweb1.dmz.scl3.mozilla.com port 80 (tcp) failed: Connection timed out real 3m9.000s user 0m0.000s sys 0m0.001s [cltbld@buildbot-master33 ~]$ host hgweb1.dmz.scl3.mozilla.com hgweb1.dmz.scl3.mozilla.com has address 10.22.74.32 NOTE: the following command can be used to verify URLs without modifying the /etc/hosts file: curl -IH "Host: hg.mozilla.org" http://hgweb1.dmz.scl3.mozilla.com/build/tools

Rail Aliiev [:rail]

Comment 5

•

13 years ago

Still the same, connection time out for tcp 80.

hwine

Reporter

Comment 6

•

13 years ago

We need to test ssh access as well, so vm being set up in bug 781995 for this purpose

Depends on: 781995

hwine

Reporter

Comment 7

•

13 years ago

correct blocking bug number - apologies for spam

Depends on: 781955
No longer depends on: 781995

bhearsum@mozilla.com (:bhearsum)

Comment 8

•

13 years ago

This doesn't look like something actionable by buildduty to me.

Whiteboard: [reit][buildduty] → [reit]

Shyam Mani [:fox2mike]

Comment 9

•

13 years ago

Host is hg-test.dmz.scl3.mozilla.com Hal/releng, Can we get a list of tests you'd like to run, we'd like to make sure you have the right firewall holes opened up on this host for you.

Summary: sanity check hg 2.2.3 on hgweb1.dmz.scl3.mozilla.com → sanity check hg 2.3.x on hg-test.dmz.scl3.mozilla.com

hwine

Reporter

Comment 10

•

13 years ago

Shyam, We need http access (assuming https is terminated at the lb), and ssh access. That's it. Also, we need some dummy content we can update - one of my user repos is fine.

Shyam Mani [:fox2mike]

Comment 11

•

13 years ago

Hal, I was planning on getting a local dump of the entire repo for you to play with or to save time, we could just do a few important ones, like m-c and try?

hwine

Reporter

Comment 12

•

13 years ago

Yes - our interest is that nothing changed with the functionality of the hooks we care about. m-c & try would be good candidates to capture the hooks we care about. Also, on the test box, it'd be handy to have a shell account with sufficient rights to view all log files (to confirm actions, since it won't be hooked up to an operational pushlog, etc.)

hwine

Reporter

Comment 13

•

13 years ago

note: holding on this to understand bug introduced in hg 2.1 see http://bz.selenic.com/show_bug.cgi?id=3648#c8 Current discussion is about hg client versions - want to confirm that updating server version (from current 2.0.2) won't cause the problem more often or for more users.

OS: Mac OS X → All

Hardware: x86 → All

See Also: → http://bz.selenic.com/show_bug.cgi?id=3648

Shyam Mani [:fox2mike]

Comment 14

•

13 years ago

Sorry again, the right host is hgssh.stage.dmz.scl3.mozilla.com

Summary: sanity check hg 2.3.x on hg-test.dmz.scl3.mozilla.com → sanity check hg 2.3.x on hgssh.stage.dmz.scl3.mozilla.com

Ben Kero [:bkero]

Comment 15

•

13 years ago

I've created a user for you, the VM is available to test at hgssh.stage.dmz.scl3.mozilla.com. Please login with the username 'hwine' which has sudo access. The hg repos are mounted read-only, please do whatever testing you need on there.

hwine

Reporter

Comment 16

•

13 years ago

(In reply to Hal Wine [:hwine] from comment #13) > note: holding on this to understand bug introduced in hg 2.1 see > http://bz.selenic.com/show_bug.cgi?id=3648#c8 > > Current discussion is about hg client versions - want to confirm that > updating server version (from current 2.0.2) won't cause the problem more > often or for more users. per http://bz.selenic.com/show_bug.cgi?id=3648#c10 the case folding bug is client side only. Yay!

Ben Kero [:bkero]

Comment 17

•

13 years ago

does anything else need to be done for this?

hwine

Reporter

Comment 18

•

13 years ago

Yes - this is not yet done - the other issue was a temporary blocker. Now we can get back to the normal process.

(not currently active) Ted Mielczarek

Comment 19

•

13 years ago

FWIW, I ran the hghooks/pushlog unit tests on: $ hg --version Mercurial Distributed SCM (version 2.3.2) on my local machine and they all pass, so no problems there.

Ben Kero [:bkero]

Comment 20

•

13 years ago

This is a quarterly goal for us. Can we get a status update so we can fix 741353?

hwine

Reporter

Comment 21

•

13 years ago

I'll bring it up for discussion at our next team meeting, and we'll update after that.

Shyam Mani [:fox2mike]

Comment 22

•

13 years ago

(In reply to Hal Wine [:hwine] from comment #21) > I'll bring it up for discussion at our next team meeting, and we'll update > after that. Hal, any updates here?

hwine

Reporter

Comment 23

•

13 years ago

(In reply to Shyam Mani [:fox2mike] from comment #22) > > Hal, any updates here? Yeah - we're swamped, so this is unlikely to complete this quarter without help. Our main grief is we have to cross check across all the hg client versions we support (see, e.g. bug 779569 comment #0 - slightly out of date, but the list is still long). If you know of any compatibility matrix that shows the testing done by the mercurial team as part of their release process, that'd be very helpful.

(not currently active) Ted Mielczarek

Comment 24

•

13 years ago

(In reply to Hal Wine [:hwine] from comment #23) > Our main grief is we have to cross check across all the hg client versions > we support (see, e.g. bug 779569 comment #0 - slightly out of date, but the > list is still long). > I don't understand what you're afraid of here. The build slaves don't do anything particularly complicated, they just clone and pull/update, right? Unless you're afraid upstream broke the wire protocol this doesn't seem worth the effort. I think we should assume that our upstream tools have done at least minimal QA and not waste our time trying to do it for them. (Especially when we don't have the time to waste, as evidenced by how long this bug has been sitting.)

Chris AtLee [:catlee]

Comment 25

•

13 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #24) > (In reply to Hal Wine [:hwine] from comment #23) > > Our main grief is we have to cross check across all the hg client versions > > we support (see, e.g. bug 779569 comment #0 - slightly out of date, but the > > list is still long). > > > > I don't understand what you're afraid of here. The build slaves don't do > anything particularly complicated, they just clone and pull/update, right? > Unless you're afraid upstream broke the wire protocol this doesn't seem > worth the effort. I think we should assume that our upstream tools have done > at least minimal QA and not waste our time trying to do it for them. > (Especially when we don't have the time to waste, as evidenced by how long > this bug has been sitting.) Upstream has broken the wire protocol before for us. e.g. they changed how each side advertised their heads to each other, which busted build slaves trying to clone the try repo because the HTTP header sizes exploded in size and caused the load balancer to drop or truncate the requests.

(not currently active) Ted Mielczarek

Comment 26

•

13 years ago

Thanks for the info. That doesn't sound like "broke the wire protocol" to me, but "broke using our load balancer". That's a pretty fair thing to test against.

hwine

Reporter

Comment 27

•

13 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #24) > I think we should assume that our upstream tools have done > at least minimal QA and not waste our time trying to do it for them. Agreed that we shouldn't duplicate any upstream testing. But besides the case mentioned in comment 25, there have been some hg client side regressions, so we do need some level of internal testing. For example, hg 2.1 introduced a client side regression (still not fixed) that impacted non-case sensitive file systems (Mac & Windows for us). You might recall the hassle: http://bz.selenic.com/show_bug.cgi?id=3648#c8. (Fortunately, we move slowly enough to not have been affected by this bug in production. :/)

Justin Wood (:Callek)

Assignee

Updated

•

13 years ago

Assignee: nobody → bugspam.Callek

Ed Morley [:emorley]

Comment 28

•

13 years ago

Mercurial 2.5 has just been released (http://mercurial.selenic.com/wiki/WhatsNew) with some substantial performance improvements. Would be good to see if we can get onto 2.3 as soon as possible, to then pave the way for 2.5 :-)

hwine

Reporter

Comment 29

•

12 years ago

Another item to test - interaction with MercurialVCS per bug 828029

Comment 30

•

12 years ago

I'd suggest we just upgrade straight to 2.5. I can bang out the script to test all versions in an afternoon, but only on Linux. If you're concerned with OS X (case insensitive) and Windows support, you'll have to do that legwork on your own. The thing we seem to be concerned with here is wire protocol, not platform support. What operations/tests should this be doing? hg update? hg clone? hg pull? Between all mercurial versions 1.5 or newer?

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Comment 31

•

12 years ago

(In reply to Ben Kero [:bkero] from comment #30) > I'd suggest we just upgrade straight to 2.5. The (supposedly) final 2.5.x version, 2.5.3, just got released.

hwine

Reporter

Updated

•

12 years ago

Depends on: 867470

hwine

Reporter

Updated

•

12 years ago

Depends on: 868279

Justin Wood (:Callek)

Assignee

Comment 32

•

12 years ago

:ted, You mentioned in IRC that you had run the hooks through the testsuite for 2.5.4. And that one had failed, can you officially say/claim in here the following: * For the hook/test that failed {one of:} ** The test is broken, with the hook being fine, and will be fixed in Bug XXX ** The hook is broken and has problem X, described in Bug XXX and will be fixed there. ** The results of failed test are {x} but I {ted} have no time to look further... Bug XXX has been opened to track the issue. * That the [remaining] full list of hooks are working in a satisfactory way with hg 2.5.4. ** (making sure that we either have a full test suite that was run, or by human code scanning/etc of hooks without automated tests) * There are no config changes or patches blocking this deployment from the hook side. :ted, :fox2mike, :bkero, * hgweb templates? -- I would like to identify which teams/persons will be able to qualify our hgweb templates meet our use-case needs. -- Two primary concerns here around that: -- changes that break functionality [for human or app] (e.g. the pushlog json views) -- changes that will surprise our community/developers in their use-cases. -- surprises will likely not block rollout but will need to be enumerated and communicated prior to cutover to new hg server.

Flags: needinfo?(ted)

Flags: needinfo?(shyam)

Flags: needinfo?(bkero)

Summary: sanity check hg 2.3.x on hgssh.stage.dmz.scl3.mozilla.com → sanity check hg 2.5.4 on hgssh.stage.dmz.scl3.mozilla.com

(not currently active) Ted Mielczarek

Comment 33

•

12 years ago

The hooks work fine with 2.5.4. I tested the pushlog extension and I'm having some issues. I pushed one change to fix an issue I hit: http://hg.mozilla.org/hgcustom/pushlog/rev/e4b06bae4d5b Now the tests are running (and claim they're passing), but there's a lot of error spew from the hgweb HTTP server the test suite runs, so I'm not sure if it's actually passing or not. If the pushlog test suite succeeds then the pushlog JSON+ATOM feeds should be fine, and it runs some very basic sanity checks on the HTML view, so that shouldn't be horribly broken.

Flags: needinfo?(ted)

(not currently active) Ted Mielczarek

Comment 34

•

12 years ago

I poked at it a bit more and the errors I was seeing seem harmless, they look like they're just a result of the way I'm running the tests. I pushed a wallpaper fix, so all the pushlog tests pass with hg 2.5.4 now.

Shyam Mani [:fox2mike]

Comment 35

•

12 years ago

Not quite sure what use I can be with the templates stuff, ben might have a better answer for you there.

Flags: needinfo?(shyam)

Ben Kero [:bkero]

Comment 36

•

12 years ago

I've just finished setting up the SSH and HTTP portions of the hgssh.stage server. You have 2 repositories in place, mozilla-central and try All templates should be the same, all extensions should be the same. Please let me know if anything is missing.

Flags: needinfo?(bkero)

Justin Wood (:Callek)

Assignee

Comment 37

•

12 years ago

First [known] failure/issue: * pushing to try/ is hanging, in a test with no trychooser syntax present. ** Expected is remote: aborting with an informative error message. ** client not returning to shell :bkero, can you tell me if there is anything obvious here to fix, as well as what version of the hg hooks we have installed? (I'm working in https://etherpad.mozilla.org/hg-server-testing-plan so far which has a link to my pastebin of output -- I'll drop final plans to real bug/wiki when done)

Flags: needinfo?(bkero)

Ben Kero [:bkero]

Comment 38

•

12 years ago

I talked with Callek on IRC last night and we determined (with the --debug flag and running htop on the server) that the "hanging" was actually a large system load on the server (8+ on a 1-core server). It eventually finished I believe. I've also fixed the permission issue you were referring to earlier. Callek, is there anything more you need from me?

Flags: needinfo?(bkero)

hwine

Reporter

Comment 39

•

12 years ago

per Callek in IRC, all good to go. Callek - please confirm & resolve.

Status: NEW → ASSIGNED

Flags: needinfo?(bugspam.Callek)

Justin Wood (:Callek)

Assignee

Comment 40

•

12 years ago

(In reply to Hal Wine [:hwine] from comment #39) > per Callek in IRC, all good to go. Callek - please confirm & resolve. Indeed. I thought of one helpful tidbit that would be good for me, attaching the cltbld pubkey to my @gmail.com privkey for my hg user on the host. This would simplify the windows testing so I don't have to pull my own privkey onto a windows server, since ssh -A doesn't work for connecting to our windows systems.

Flags: needinfo?(bugspam.Callek)

Ben Kero [:bkero]

Comment 41

•

12 years ago

I've added the cltbld public key to Callek's LDAP credentials as he requested so he can do further testing on Windows.

Justin Wood (:Callek)

Assignee

Comment 42

•

12 years ago

Was just working on this, since I woke up early today. And realized (with Usul's help) that we added the cltbld@gitolite not the actual hg production key. :Usul is fixing that up now, but wanted an explicit additional request here for the fix.

Ludovic Hirlimann [:Usul]

Comment 43

•

12 years ago

added callek's http://pastebin.mozilla.org/2422550 key to ldap too

Justin Wood (:Callek)

Assignee

Comment 44

•

12 years ago

This has been sanity checked and releng signs off. I note that Bug 774766 was deployed outside of my testing window and any fallout from the upgrade around that change is un verified. (my comments regarding that are in its bug) I also note that due to technical limitations we did not test anything with the load balancer: * see-related https://bugzil.la/781012#c25 The following was part of my sanity check: * Clone over ssh yields same repo end-state as clone over ssh of current production repo * Clone over http yields same repo end state as clone over http of current production repo * Pushlog/json-pushes yields correct results * Pushing multiple heads on a branch fails * Pulling by rev for try continues to only get one head grabbed * Pushing to try still works with more heads. * Hg Phase behavior will not regress in the upgrade The following is suggestions/improvements to be made during or shortly after the upgrade: * A single push to try (to make sure the cache of the repo is populated before it opens, initial push will be slow but additional ones after that should be fast) * Set mercurial phase for try to be non-publishing (Bug 725362)

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

12 years ago

Product: mozilla.org → Release Engineering