Closed Bug 233497 Opened 21 years ago Closed 20 years ago

Strange xprint printing behaviour ((null) in printer list)

Categories

(Core :: Printing: Output, defect)

All
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: roland.mainz, Assigned: dbaron)

References

()

Details

Attachments

(3 files, 1 obsolete file)

FireFox 0.8 release build for Linux/x86
(http://mozilla.oregonstate.edu/pub/mozilla.org/firefox/releases/0.8/firefox-0.8-i686-pc-linux-gnu.tar.gz)
exhibits some strage behaviour for printing.

Steps to reproduce:
1. Open print dialog
2. Do NOT go to the print job options dialog
3. Printing works OK
4. Open print dialog
5. Go to the print job options dialog, change paper size to "iso-a3"
6. Print

Result:
- No print job will be queued.
- No error dialog appears (if printing fails the print errror dialog always
comes up, there is (theoretically) no way to bypass or disable it).
- Any further attempts to print fail (e.g. prefs.js gets values which need to be
manually cleaned!!).

After some digging it appears that all paper names have a "(null)/" prefix. This
usually indicates a tray name - but in this case no trays are defined for that
specific printer.

Looking at
http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsDeviceContextSpecG.cpp#864
it seems that something impossible happens:
864       /* Either "paper" or "tray/paper" */
865       if (default_medium->tray_name) {
866         papername = nsPrintfCString(256, "%s/%s", default_medium->tray_name,
default_medium->medium_name);
867       }
868       else {
869         papername.Assign(default_medium->medium_name);
870       }

|default_medium->tray_name| is |nsnull| but the |nsPrintfCString(...)| gets
executed, adding the "(null)/" prefix and then the paper name.
I guess this is a BAD compiler issue... ;-((
Assignee: blake → bryner
Component: General → Printing
Product: Firebird → Browser
Version: unspecified → Trunk
I don't have a printer handy to test this with, but I tried to test this with
printing to a file and do not see the problem using the steps given.

I also tried to make a compiler testcase and was unsuccessful in getting any
behavior like this to happen.

Maybe someone better than me at x86 assembly can tell whether that function is
doing the right thing in the released binaries.
If someone wants to take a stab at this, that would be great.  The section of
interest is just after the call to XpuGetMediumSourceSizeList().
I compiled the FireFox release branch source on my Linux/x86 laptop without
--enable-optimize and the problem does _not_ occur. Everything works perfectly.
Compiler version for the (working) test build was:
-- snip --
% gcc --version
gcc (GCC) 3.3 20030226 (prerelease) (SuSE Linux)
Copyright (C) 2002 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-- snip --

.. bryner said on IRC he used gcc 3.3.2 (stock with no patches).
Test build with --enable-optimize=-O compiled with gcc 3.3 20030226 (prerelease)
(SuSE Linux) works perfectly, too.
Test build with "--enable-optimize=-Os -g" compiled with gcc 3.3 20030226
(prerelease) (SuSE Linux) suffers from the problem described in comment #0.

Test build with "--enable-optimize=-Os" compiled with gcc 3.3 20030226
(prerelease) (SuSE Linux) suffers from the problem described in comment #0 ...
;-(
Test build with --enable-optimize=-O2 compiled with gcc 3.3 20030226
(prerelease) (SuSE Linux) works.
So far it seems to be a problem related to gcc's optimisation level "-Os" ...
;-(

Test build with --enable-optimize=-O3 compiled with gcc 3.3 20030226
(prerelease) (SuSE Linux) works.
bryner:
Your yesterday's idea to work around with -fno-strict-aliasing does not solve
the problem - the build compiled with "--enable-optimize=-Os
-fno-strict-aliasing" still suffers from the bugs ;-(
The least painful thing to do is to figure out what the command line is, 
remove all -c -g and -o ... options, and add -S -fverbose-asm. You should get 
a .s file which you can pass through c++filt.

With my gcc 3.3.2 and -Os I get:

        call    XpuGetMediumSourceSizeList@PLT
        movl    %eax, -528(%ebp)        #  mlist
        addl    $28, %esp
        testl   %eax, %eax
        je      .L287
        leal    -384(%ebp), %edi
        pushl   %edi
        call    nsXPIDLCString::nsXPIDLCString[in-charge]()@PLT
        popl    %ecx
        movl    -528(%ebp), %eax        #  mlist
        flds    16(%eax)        #  <variable>.ma2
        fadds   12(%eax)        #  <variable>.ma1
        cmpl    $0, (%eax)      #  <variable>.tray_name
        fstpl   -536(%ebp)      #  total_width
        flds    24(%eax)        #  <variable>.ma4
        fadds   20(%eax)        #  <variable>.ma3
        fstpl   -544(%ebp)      #  total_height
        je      .L288
        pushl   4(%eax) #  <variable>.medium_name
        leal    -188(%ebp), %esi
        pushl   (%eax)  #  <variable>.tray_name
        leal    .LC85@GOTOFF(%ebx), %eax
        pushl   %eax
        pushl   $256
        pushl   %esi
        call    nsPrintfCString::nsPrintfCString[in-charge](...)@PLT
        addl    $20, %esp
        pushl   %esi
        pushl   %edi
        call    nsXPIDLCString::operator=(nsACString const&)@PLT
        pushl   %esi
        call    nsPrintfCString::~nsPrintfCString [in-charge]()@PLT
        addl    $12, %esp
        jmp     .L289
.L288:
        movl    -528(%ebp), %edx        #  mlist
        pushl   4(%edx) #  <variable>.medium_name
        pushl   %edi
        call    nsACString::Assign(char const*)@PLT
        popl    %eax
        popl    %edx
.L289:

which is similar to the dump, including the null test. Perhaps something else 
is wrong.
I am sick of this. I've answered enougth emails about Xprint-being-broken-in
FireFox... ;-(

Taking myself.
Assignee: bryner → roland.mainz
Status: NEW → ASSIGNED
Attached patch Patch for 2004-03-09-trunk (obsolete) — Splinter Review
Comment on attachment 143442 [details] [diff] [review]
Patch for 2004-03-09-trunk

Requesting r= ...
Attachment #143442 - Flags: review?(cls)
Comment on attachment 143442 [details] [diff] [review]
Patch for 2004-03-09-trunk

I really hate those local compile rules.  Unless I misread something, the
comments in the makefiles do not agree with the findings of this bug.  If just
-Os is broken (comment #9), then why turn off all higher level optimizations
when -O2 works (also comment #9).? Instead of removing all optimizations from
those files, use $(subst) to remove the known bad flag (-Os).  Or leave it and
let's find the proper fix for the bug.	And use $(OBJ_SUFFIX) instead of
harcoding .o .
Attachment #143442 - Flags: review?(cls) → review-
Has anyone attempted to figure out what code is being miscompiled, and how?
cls wrote:
> (From update of attachment 143442 [details] [diff] [review])
> I really hate those local compile rules. 

I don't like that, too - but that's the only way to get rid of the problem from
my side.

> Unless I misread something, the
> comments in the makefiles do not agree with the findings of this bug.  If just
> -Os is broken (comment #9), then why turn off all higher level optimizations
> when -O2 works (also comment #9).?

Someone in #mdk-cooker pointed out last week that -Os is not the only option
where such problems appear. People can use some of gcc's -f*-switches after -O2
to get the same behaviour as -Os (I guess this is the same stuff as Sun
Workshop's -fast macro which expands to machine/platform/OS-specific switches).
Additionally -O2 increases the object file size - cutting down the optimizer
level would save some bytes in the resulting binary (e.g. some footprint work
since the matching code isn't very performance sensitive).

> Instead of removing all optimizations from
> those files, use $(subst) to remove the known bad flag (-Os).

See comment above... there is no single "known bad flag".

> Or leave it and let's find the proper fix for the bug.

I'll now three weeks since I filed the bug and I am sick of getting the whips
that Xprint is broken in FireFox (Xprint isn't the only functionality which
seems to have problems with -Os but I don't care about the rest). A proper fix
can AFAIK only be done from the compiler side... and that will take some time
until such a fix is available in the various linux distributions.

> And use $(OBJ_SUFFIX) instead of harcoding .o .

OK, I'll fix that.
David Baron wrote:
> Has anyone attempted to figure out what code is being miscompiled, and how?

I am no x86 assember guru... I can't help with that...
Attachment #143442 - Attachment is obsolete: true
Attachment #143448 - Flags: review?(cls)
Comment on attachment 143448 [details] [diff] [review]
New patch for 2004-03-09-trunk

I'm not going to approve any variant of this patch.  

The only reason I would consider this to be a valid bug is because mozilla.org
ships binaries built with -Os.	Otherwise, we're back to the same old issue of
people incorrectly assuming that any super-high optimization compiler flag can
be used safely with any piece of code.	We've been over this. "Don't do that."
Attachment #143448 - Flags: review?(cls)
Flags: blocking1.7b?
cls wrote:
> I'm not going to approve any variant of this patch.  

OK, what should I do now ?
1. I can't fix gcc. And even if there would be a fix for gcc it would take
months until these patches are commonly available in the single Linux
distributions.
2. I can't fix this in the Mozilla *.cpp files since this is no bug caused by
that code and even restructuring the code doesn't help.

> The only reason I would consider this to be a valid bug is because mozilla.org
> ships binaries built with -Os.

FireFox builds are being build with "-Os -g" by default.
Please do not blame me for that, I didn't start shipping builds compiled with
untested optimizer flags.
cls wrote:
> (From update of attachment 143448 [details] [diff] [review])
> I'm not going to approve any variant of this patch.

BTW: Why were other people then allowed to use this as workaround to fix
compiler bugs and I am not ?
-- snip --
% grep -r -- "_OPTIMI" js netwerk | fgrep Makefile.in
js/src/Makefile.in:     $(CC) -o $@ -c $(filter-out $(MOZ_OPTIMIZE_FLAGS),
$(COMPILE_CFLAGS)) $<
js/src/Makefile.in:     $(CC) -o $@ -c $(filter-out $(MOZ_OPTIMIZE_FLAGS),
$(COMPILE_CFLAGS)) $<
netwerk/protocol/ftp/src/Makefile.in:MODULE_OPTIMIZE_FLAGS=-O -g
etc.
-- snip --
I think it's important to have a plan beyond hacking the Makefile.  In the case
of the OS2 workaround in js/src/Makefile.in, bug 224487, the bug was in part of
the OS, not in the compiler.  It was also in part of the OS that apparently
could not be fixed.

Roland, what's your plan to get gcc fixed here?

/be
Brendan Eich wrote:
> Roland, what's your plan to get gcc fixed here?

File a gcc bug and ask for help. But that won't happen before the 1.7b freeze
(and if I recall it correctly - bryner may correct me - some branches for the
*birds/*foxen will be made after the freeze... and then I have to run after the
single branch maintainers).
The problem for me is that I am a) no x86 assembler guru b) unable to isolate
the problem to a single statement... otherwise I would come up with a reduced
testcase for the gcc folks.
How about isolating to a single file?  Why must all those subdirectories of gfx
be deoptimized wholesale?

/be
(In reply to comment #23)
> cls wrote:
> > (From update of attachment 143448 [details] [diff] [review])
> > I'm not going to approve any variant of this patch.
> 
> BTW: Why were other people then allowed to use this as workaround to fix
> compiler bugs and I am not ?

I didn't approve those changes either.  The module/platform owners did, which
they still could do for this patch.

I'm a bit concerned that no one (here) has replicated this problem and all of
the testing has been done with a single compiler which lists itself as a
pre-release.  I'm setting up a xprint server now to see if I can replicate the
problem using the steps from comment #0.
Summary: Strange printing behaviour → Strange xprint printing behaviour
Brendan Eich wrote:
> How about isolating to a single file?  Why must all those subdirectories of 
> gfx be deoptimized wholesale?

I guess it may be sufficient to compile
nsDeviceContextSpecGTK.cpp/nsDeviceContextSpecXlib.cpp with -O instead of -Os,
the other parts of the patch to clamp the optimizer for gfx/src/xprint and
gfx/src/xprintutils/ are to avoid further problems with optimizer bugs. I am not
much happy with having more of these problems so I am trying to kill these
issues once-and-forever in one patch (not a good solution but I am trying to
limit the amount of time I spend at mozilla.org).
With the 0.8 release from mozilla.org, I can duplicate the problem.  I also see
the (null)/ trays listed.  However, with the nightly ff build and my local
build, I do not see the problem.  This was tested on RH9 using gcc 3.2.2-5 for
the local build.  The nightly build used gcc 3.3.3 according to about:buildconfig .
cls wrote:
> I'm a bit concerned that no one (here) has replicated this problem and all of
> the testing has been done with a single compiler which lists itself as a
> pre-release.

The bug is present in the FireFox 0.8 release (I guess this was compiled on Red
Hat ?!), FireFox compiled with SuSE 8.2's compiler (that's the "gcc (GCC) 3.3
20030226 (prerelease) (SuSE Linux)" compiler you're quoting here) and FireFox
compiled with SuSE 9.0's compiler.
Is "Firefox compiled with..." the Firefox 0.8 release or a recent cvs tree?  
cls wrote:
> Is "Firefox compiled with..." the Firefox 0.8 release or a recent cvs tree

CVS build from 2004-03-02-trunk.
recompiling gtk/nsDeviceContextSpecG.cpp with "-Os -fno-gcse" "fixes" this bug
in a Seamonkey build.
gcc version 3.3.2 20031022 (Red Hat Linux 3.3.2-1)  (Fedora Core 1)

gcse does:
     Perform a global common subexpression elimination pass.  This pass
     also performs global constant and copy propagation.

I also attempted recompiling with the list of optimizations for -Os, but it
didn't exhibit the bug.
Summary: Strange xprint printing behaviour → Strange xprint printing behaviour ((null) in printer list)
this bug also occurs with the Fedora development gcc (3.3.3-2.1), but the Fedora
bleeding -edge development gcc (gcc34-3.4.0-0.6) works ok.

I filed a bug with Red Hat with a testcase in hopes they'll fix it for FC2:
http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=118026
This is a guess at a workaround.  Could someone who is set up to test this try
it?
That said, it might be better to switch to -Os -fno-gcse, since this seems like
something that could cause bugs elsewhere.
David Baron wrote:
> That said, it might be better to switch to -Os -fno-gcse, since this seems 
> like something that could cause bugs elsewhere.

Unfortunately this is out of our control whether the distributions use "-Os" or
"-Os -fno-gcse". FireFox people started the "-Os"-game and now these
optimisation flags are spreading across all distributions (and I doubt cls would
approve a configure.in test made from the testcase in
http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=118026) ... ;-(

I'll take your patch (attachment 143605 [details] [diff] [review]) when it gets rid of the problem.
Comment on attachment 143605 [details] [diff] [review]
possible workaround

ajschult - can you test the patch, please ?
Attachment #143605 - Flags: review?(ajschult)
jakub@redhat.com pointed to http://gcc.gnu.org/PR12308 and said the next Fedora
gcc RPMs would have the fix.

Based on my limited understanding of the bug @gcc.org, the optimization flags
that are triggering the bug here probably just set up the proper environment for
the bug to occur.  For instance, the testcase @gcc.org has the bug with -O2. 
-fno-gcse might or might not resolve all symptoms of the bug.

It might be worthwhile build a custom gcc so that at least mozilla.org's
binaries work ok.

I'll test out the dbaron's Mozilla patch and the gcc patch tonight.

Also, gcc-3.2.3-24 is not affected by this bug (at least not my testcase), so
any configure.in test could be confined to gcc > 3.2
If there's a relevant compiler bug that we can detect with a configure.in test,
we should do it (the test should fail and tell people what options to use
instead).  (We already have a test for a long long bug with gcc 2.7.x that shows
up with --enable-pedantic.)

Also, -O2 apparently does -fgcse as well.
Comment on attachment 143605 [details] [diff] [review]
possible workaround

this patch was successful in working around the gcc bug.
Attachment #143605 - Flags: review?(ajschult) → review+
Attachment #143605 - Flags: superreview?(bryner) → superreview?(roc)
Comment on attachment 143605 [details] [diff] [review]
possible workaround

should we try and get this patch into beta or final?
Attachment #143605 - Flags: approval1.7b?
Chris hofmann wrote
> (From update of attachment 143605 [details] [diff] [review])
> should we try and get this patch into beta or final

"Beta", please.

And I strongly suggest to add a test in "configure.in" for "final" to prevent
people from using optimizer switches which end up in problems like described in
this bug.
Attachment #143605 - Flags: superreview?(roc) → superreview+
Attachment #143605 - Flags: approval1.7b? → approval1.7b+
Workaround checked in.  Marking fixed since the original issue reported in this
bug is fixed.  If you think we need a configure test, please file a separate bug
preferably with a testcase for the compiler problem.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Assignee: roland.mainz → dbaron
Blocks: 237594
David Baron wrote:
> Workaround checked in. 

Thanks! :)

> Marking fixed since the original issue reported in 
> this bug is fixed.  If you think we need a configure test, please file a 
> separate bug preferably with a testcase for the compiler problem

I filed bug 237594 for that.
Flags: blocking1.7b?
dbaron:
BTW: Thank you very much for porting the patch to gfx/src/xlib/ ... :)
*** Bug 239179 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: