Closed Bug 335715 Opened 18 years ago Closed 17 years ago

hang while loading (lots of) images

Categories

(Core :: Graphics: ImageLib, defect)

1.8 Branch
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: ShawnCarnell, Unassigned)

References

()

Details

Attachments

(3 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.2) Gecko/20060308 Firefox/1.5.0.2
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.2) Gecko/20060308 Firefox/1.5.0.2

We have several folks witnessing firefox.exe hang when they visit a page
we're developing.  We are seeing this happen in our dev environment but
you can see pretty much the exact page at
http://imakealpha.com/.pages/edit.jsp (free login needed, (apologies)).

The facts: 

    * This doesn't happen in IE.
    * FF becomes unresponsive to UI and doesn't redraw.
    * When hung, FF consumes up to 50% of the CPU depending on the
      client box.
    * The hang appears to be related to the request/response ordering.
          o The problem only happens on identical dev hosts and not on
            our qa or prod hosts (yet).
          o If i perturb the request/response ordering by starting
            Charles on my client, things work.
            (http://xk72.com/charles/index.php)
    * The hang appears to be related to image processing.
          o The hang happens while loading images, often at about the
            same image even for different users.
          o The stack for FF consistently shows that it's looping
            in/around jpeg_idct_islow
            (http://lxr.mozilla.org/mozilla1.8/ident?i=jpeg_idct_islow)
                + For me, FF is making about 300 context switches per
                  second for jpeg_idct_islow.
          o Ironically, once when FF hung for me, Thunderbird froze,
            too, in jpeg_free_large.  (Hasn't repro'ed.  Red herring?)
    * If i debug firefox.exe, VS.NET reports that "The process appears
      to be deadlocked (or is not running any user-mode code)."
    * Ethereal consistently notes that 16-17 image responses have
      malformed packets (always the same images) but there appears to be
      nothing wrong with the images.

I really need to get more visibility into what's going on inside FF.  I
can't find a debug build of it online.  Even if i built one, turning on
debugging/logging might perturb the system enough to mask the bug.

Any ideas on what the problem could be or on how to go about diagnosing it? 

Reproducible: Always

Steps to Reproduce:
1. Just visit the URL and witness FF start to load but then hang.  

Unfortunately, this only manifests on an internal server but the external URL above is essentially the same page.  I think it's a timing issue.

Actual Results:  
FF hangs, waiting, consume some CPU in jpeg_idct_islow.

Expected Results:  
FF should just load the page.

Please feel free to reach out to me.  I'm actively debugging this and i'm eager to supply any additional information that folks need me to dig up.
Attached file Shark stack trace
This does not show a thread stuck in jpeg_idct_islow as we see on WinXP.
Shark Time Profile available at http://hometown.aol.com/shawncarnell/ff_hang_time_profile.mshark
http://hometown.aol.com/shawncarnell/ff_hang_unresponsive_apps.mshark was run with   "unresponsive applications" checked.
*** Bug 335705 has been marked as a duplicate of this bug. ***
Assignee: nobody → pavlov
Component: General → ImageLib
Product: Firefox → Core
QA Contact: general
Version: unspecified → 1.8 Branch
I get 
Browser Not Supported!

We don't currently support your browser, but we're working on it now so check back soon. If you don't want to wait, you can use the latest versions of Internet Explorer or Firefox today.
 

If I try testing with recent trunk builds...  can you open up the UA sniffing to allow a build like:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20060425 Minefield/3.0a1

to get access to the test case?

I am so sorry about that.  A new build went out today that had a completely broken config.   We should not be doing UA checks, we should be doing capability checks.  I will try to correct this going forward.

I've had the production config fixed and you should now be able to get in now.
We are still struggling to produce a consistently reproducable case for this outside of our dev environment but we've narrowed it down significantly.  The problem appears to crop up only with images containing EXIF data.  For instance, when we used the exif.jpg image in our QA system, we saw firefox.exe hang.  When we replaced it with no_exif.jpg, everything worked.  (See attachments.)

The circumstances under which we see the hang are hard to replicate.  We have a lot of XHR and heavy DOM node creation going on and that does appear to be a factor.  What is consistent so far is that non-EXIF images cause no problems but  images containing EXIF can (but do not always).

Also, if we set the EXIF images as the background a tag (img, div, etc.), all is well.  If we set them via the img src attribute, we hang.  

Could it be that the EXIF data is being written past the bounds of whatever buffer/struct it is supposed to live in?   

The images are attached and available at:
http://hometown.aol.com/shawncarnell/exif.jpg
http://hometown.aol.com/shawncarnell/no_exif.jpg
We now have a debug build of FF 1.5.0.2 Linux with which we can repro.  Is there some particular tracing you recommend we turn on to try and get more diagnostics?
I am able to reproduce this internally with Bon Echo 2.0a but have not been able to reproduce it on the external site yet with that browser version (only with 1.5.0.2).
*** Bug 330632 has been marked as a duplicate of this bug. ***
*** Bug 339910 has been marked as a duplicate of this bug. ***
On Windows:
- I have a Thunderbird Mail Client (Windows XP)
- opened the linux kernel mailing list from news server
- filter option "unread only" is active.  Can not change. Not know it is relevant?

Every time I want to open this news-reader, thunderbird hangs with this symptom (eats 100% of my CPU and a high CSwitch Delta).  Have no debugging, but I have an address from Process-Explorer, it's says
thunderbird.exe!jpeg_free_large+0x4041.png (continues)

Thunderbird version 1.5.0.7 (20060909)

This is a screenshot with opened process explorer:
http://home.arcor.de/henryn/thunderbird-hangs-on-jpeg_free_large+0x4041.png


On Linux:
- Have the same datas under Linux, the same effect.
- Have fast entered to frame with title lines and pressed down the key END: Every time I run Thunderbird and do open this news group, it hangs after the same News-Entry
- started thunderbirgs with 
strace -ff -o thunderbird thunderbird -t -T 2>strace.log
- Before problem begins process 4766 prints this lines:

getppid()                               = 4762                                                                  
read(9, "\200\362\301\10\0\0\0\0000v\324\277\220\364\32@\250q\327"..., 148) = 148                               
mmap2(NULL, 2097152, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4154a000             
mprotect(0x4154a000, 4096, PROT_NONE)   = 0                                                                     
clone(child_stack=0x41749ff0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|SIGRT_1) = 4771                 
kill(4762, SIGRTMIN)                    = 0       

- the heavy endless loop can not see in the strace
- only the process 4766 prints every 2-3 seconds this lines:

  poll([{fd=9, events=POLLIN}], 1, 2000)  = 0                                                                   
  getppid()

- process 4762 is comsuming all the CPU time
- No more outputs after 4762 hangs.
- No outputs, after I kill it with "kill -15 4762".  I feel, it's terminated (SIGRTMIN), but waits for any thing.  This are last lines from process 4762:

sched_get_priority_max(0)               = 0                                                                   
sched_get_priority_min(0)               = 0                                                                   
getrlimit(RLIMIT_STACK, {rlim_cur=RLIM_INFINITY, rlim_max=RLIM_INFINITY}) = 0                                 
pipe([9, 10])                           = 0                                                                   
clone(child_stack=0x8ca6e00, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND) = 4766                        
write(10, "\0\0\0\0\5\0\0\0\0\0\0\0\240L\33@(j\27@\234\205\0@1s\235"..., 148) = 148                           
rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0                                                             
write(10, "\200\362\301\10\0\0\0\0P{\324\277\220\364\32@X\224\311"..., 148) = 148                             
rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0                                                             
rt_sigsuspend([])                    = ? ERESTARTNOHAND (To be restarted)                                     
--- SIGRTMIN (Unknown signal 32) @ 0 (0) ---

- All complete strace files I have stored in one file (21kb):
http://home.arcor.de/henryn/thunderbird-20061121.tgz
Have installed new Windows Thunderbird Version 1.5.0.8 (20061025), my problem there not exist.
Assignee: pavlov → nobody
QA Contact: imagelib
this is WFM tested:
Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4 ID:2007051502
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a6pre) Gecko/20070626 ID:2007062604

at OP: do you still see the issues ?
While we could reproduce this with rv:1.8.0.2 Gecko/20060308 Firefox/1.5.0.2 we have not been able to reproduce this with rv:1.8.1.5 Gecko/20070713 Firefox/2.0.0.5.
Status: UNCONFIRMED → RESOLVED
Closed: 17 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: