Closed Bug 560444 Opened 14 years ago Closed 14 years ago

[amo] Upgrade to Python 2.6.4 or 2.6.5

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

All
Other
task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jbalogh, Assigned: fox2mike)

References

Details

We'll have to create new virtualenvs when we upgrade the python version.

I have 2.6.4 locally.  I installed 2.6.5 on khan.  I cannot reproduce bug 560381 on either.

Khan has 2.6.2 system-wide.  I'm guessing (hoping) that preview has the same.

This may affect bug 557941 and bug 559085.

We want to get this installed way before 5.10 rolls out on 5/4.  It hurts our load testing when Python is crashing in the middle of a request.

I couldn't find anything in http://python.org/download/releases/2.6.5/NEWS.txt suggesting what the problem was.
Blocks: 560381
Blocks: 557941, 559085
Upping this to critical.  If preview is not on 2.6.2 feel free to reduce.
Severity: normal → critical
Preview is on 2.6.2

[root@pm-app-amo24 ~]# rpm -qi python26
Name        : python26                     Relocations: (not relocatable)
Version     : 2.6.2                             Vendor: (none)
Over to Jeremy, guess he built these the last time.
Assignee: server-ops → jeremy.orem+bugs
I don't think these page after they're assigned so upping this to blocker is mostly for show, but this is blocking finishing load testing, which is blocking our 5.9.1 push to next.amo, and blocking QA on preview with 500 errors.  This is important.
Severity: critical → blocker
Blocks: 559374
Grabbing this back, I did the rpms the last time it seems, think I've managed to update them successfully.
Assignee: jeremy.orem+bugs → shyam
Updated:
  python26.i386 0:2.6.5-geekymedia1.1.rhel5                                                                                                                                                               

Dependency Updated:
  python26-devel.i386 0:2.6.5-geekymedia1.1.rhel5    python26-libs.i386 0:2.6.5-geekymedia1.1.rhel5    python26-tools.i386 0:2.6.5-geekymedia1.1.rhel5    tkinter26.i386 0:2.6.5-geekymedia1.1.rhel5   

Complete!

[root@pm-app-amo24 ~]# python26
Python 2.6.5 (r265:79063, Apr 20 2010, 23:45:36) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

Preview is now on 2.6.5 and you should be unblocked.

I'm having some issues building the x86_64 rpm, looking into that before I close the bug.
Status: NEW → ASSIGNED
Figured that out, thanks to Dave and we now have x86_64 and i686 rpms for python-2.6.5

Jeff, how do I rebuild the virtualenvs? Also, how did you install 2.6.5 on khan? from source? 

Leaving this open till the virtualenvs are rebuilt.
(In reply to comment #7)
> Jeff, how do I rebuild the virtualenvs? Also, how did you install 2.6.5 on
> khan? from source? 

2.6.5 isn't on khan yet, we used your RPMs for it's current version and should do the same this time.  Please upgrade it too.


And thanks for your help!
Done.

[root@khan ~]# python26 
Python 2.6.5 (r265:79063, Apr 20 2010, 23:45:36) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
Will you put the src rpm in dm-nagios01:/mnt/packages/mrepo-src/5Server-SRPMS/.
To rebuild virtualenvs:

1. virtualenv --python=/path/to/python26 /path/to/new/virtualenv
2. Run the `pip install` command from the update script to get this started.
3. rm -rf /path/to/old/virtualenv
4. ln -s /path/to/new/virtualenv /path/to/old/virtualenv

Now we'll have a new virtualenv on the same path as the old one.  The site will be broken in between 3 & 4, so any ideas on how to avoid that are welcome.  But it should be ok to take the site down for a couple minutes when we run this on prod.
(In reply to comment #10)
> Will you put the src rpm in dm-nagios01:/mnt/packages/mrepo-src/5Server-SRPMS/.

Done.
This didn't fix the bug we were hoping it would fix.  Are you doing anything special when you are building the package?

If I use your new new khan package I still get the crashes.

If I use a package of 2.6.5 built on khan (just a ./configure; make) I no longer get the crashes.
Well, I honestly don't know. Left to my choice, I wouldn't run RHEL for bleeding edge stuff. But we need support on hardware and bleeding edge stuff, which is kind of like having the cake and eating it too. 

What is it crashing on? Maybe that'll give us a clue? I'm really shooting in the dark here.
Heh, get a load of this:

python: Modules/gcmodule.c:262: update_refs: Assertion `gc->gc.gc_refs != 0' failed.

The source for that has a huge scary comment above it.  oremj straced it and it was crashing all over the code, no where in particular.
Sigh. :|

Jeremy, any ideas?

This rpm build python for RHEL is a hack in itself, since it allows multiple versions of python to exist. RHEL 6 beta supports python 2.6.2 by default, not sure if that might be something worth trying, but then we won't really take the beta to production.
Hudson started failing with this today, so now we can't run tests
Failing with what? these random python crashes? What do you propose we do to solve this?

IMHO, this RHEL + Python is a hackery. It always was, always will be :)

If you want everything to be built from source, I'd defer to Jeremy and Dave, they were the ones for the rpm in the first place, they have their reasons and I'd agree with them if this rpm wasn't this much of a hack.
(In reply to comment #18)
> Failing with what? these random python crashes? What do you propose we do to
> solve this?
Yes, with the crash in comment 15.

> IMHO, this RHEL + Python is a hackery. It always was, always will be :)
> 
> If you want everything to be built from source, I'd defer to Jeremy and Dave,
> they were the ones for the rpm in the first place, they have their reasons and
> I'd agree with them if this rpm wasn't this much of a hack.

I don't know enough about packaging to comment here.  Building it from source on khan works, could we build it from source somewhere, mush that into a package and distribute to our boxes?
If you guys want to `./configure && make install` our own Python, I think this problem will go away.  I've only seen this bug with the RHEL package, and I couldn't reproduce it with the Python I built on khan.  This is really not something I want to dive into.

But if it comes to that I can reproduce our bug at will, but the current Python package is a black box.

Seeing the patches that are being applied to Python might help.

A debug build of Python installed with this package should let us look at what's going on inside Python when this crashes.

Someone filed a redhat bug about a similar problem: https://bugzilla.redhat.com/show_bug.cgi?id=573156
(In reply to comment #20)
 
> Seeing the patches that are being applied to Python might help.

How can I get these to you? homedir on khan? 
 
> A debug build of Python installed with this package should let us look at
> what's going on inside Python when this crashes.

Updated:
  python26-debuginfo.i386 0:2.6.5-geekymedia1.1.rhel5

This is on preview. Let me know if that helps you find out what's going on.
(In reply to comment #13)
> If I use a package of 2.6.5 built on khan (just a ./configure; make) I no
> longer get the crashes.

So I just tried this, and it complains that autoconf is too old.  Which is probably why the rpm has a patch changing the autoconf requirement.  Did you install a newer autoconf on khan or something?
(In reply to comment #22)
> (In reply to comment #13)
> > If I use a package of 2.6.5 built on khan (just a ./configure; make) I no
> > longer get the crashes.
> 
> So I just tried this, and it complains that autoconf is too old.  Which is
> probably why the rpm has a patch changing the autoconf requirement.  Did you
> install a newer autoconf on khan or something?

nevermind, the answer to that is don't run autoconf first. :)
ok, I rebuilt the rpm with the majority of the RH patches removed, and installed on khan.  Care to give that a try?
(In reply to comment #24)
> ok, I rebuilt the rpm with the majority of the RH patches removed, and
> installed on khan.  Care to give that a try?

Is this /usr/bin/python26 ?
(In reply to comment #21)
> (In reply to comment #20)
> 
> > Seeing the patches that are being applied to Python might help.
> 
> How can I get these to you? homedir on khan? 

Sure.  Just let me know where you drop it on khan and I'll find it.
(In reply to comment #25)
> (In reply to comment #24)
> > ok, I rebuilt the rpm with the majority of the RH patches removed, and
> > installed on khan.  Care to give that a try?
> 
> Is this /usr/bin/python26 ?

/usr/bin/python26 was touched last night so I think that's the one.  I made a new virtualenv using it and still have the same crashes.
(In reply to comment #26)

> Sure.  Just let me know where you drop it on khan and I'll find it.

Kind of moot, since dave rebuilt everything without most of the patches? Dave, what were the patches you left in?
Thanks for the help guys.  I figured out what was going on in bug 560381 comment 2.  Having the rpm sources to build and edit Python did the trick.

I don't know if the crash is happening due to something that Redhat is changing, but right now I'm content with working around the problem in our code.  Eventually I'll work out a smaller test case and figure out if this belongs in the Redhat or Python bug tracker.

We can use whatever Python 2.6.x version you're comfortable with, though I tested and fixed the bug on justdave's new 2.6.5 sources.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Excellent. What was the fix?
<oremj> jbalogh: what was the fix for the python crash?
<jbalogh> an exception was being raised inside a lambda
<jbalogh> and the traceback for the exception wasn't getting refcounted properly
<jbalogh> so the fix is to avoid raising that exception
<oremj> interesting
<jbalogh> we're going to monkeypatch jinja to get around it
<oremj> so you can reproduce reliably by raising an exception in a lambda?
<jbalogh> I haven't tried that yet
<oremj> or is it still somewhat random?
<jbalogh> http://pastie.org/936661
<jbalogh> that one does fine
<jbalogh> it's something more intricate, I guess
<jbalogh> switching the bad jinja code to use def __getitem__ instead of __getitem__ = lambda: fixed the problem though
<jbalogh> but it only happened when interacting with cache-machine
<jbalogh> so I'm still a bit stumped
<oremj> very strange
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.