clean up format ambiguities

RESOLVED FIXED in Bugzilla 2.18

Status

()

Bugzilla
Bugzilla-General
P2
normal
RESOLVED FIXED
16 years ago
5 years ago

People

(Reporter: myk, Assigned: gerv)

Tracking

2.15
Bugzilla 2.18
Dependency tree / graph

Details

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

16 years ago
The concept of formats is ambiguous because it represents two kinds of
variations in appearance: structural (f.e. classic and modern versions of the
query form) and representational (f.e. HTML and RDF versions of bug lists).  All
structural variations *have* a representation (i.e. content type), but all
representational variations *are* a representation, which leads to confusion
about how users should request variations.

Structural and representational variations should be disambiguated from each
other by distinguishing between the two in filenames and URL/form parameters via
the following scheme:

Variations in structure are called "formats."  Each format has a unique name,
and the name of a template file implementing a format consists of the name of
the script (or equivalent identifier) followed by a dash, the format name, a
period, the filename extension representing the format's content type, another
period, and the "tmpl" extension, f.e.: 

foo-classic.html.tmpl
foo-modern.html.tmpl

Formats may be specified by users via the "format" URL/form parameter.

Variations in representation are called "content types."  Each content type has
a unique extension, and the name of a template file implementing a content type
consists of the name of the script (or equivalent identifier), followed by a
period, the extension, another period, and the "tmpl" extension, f.e.:

bar.html.tmpl
bar.xml.tmpl

Content types may be specified by users via the "ctype" URL/form parameter.

Scripts may support both kinds of variations, in which case the "format"
parameter takes precedence over the "ctype" parameter.  Scripts may not
implement both kinds of variations in a single file, at least not until someone
implements a data structure for storing variations that indexes them by both
format name and extension (there is no need for this now, and it's unclear that
there ever will be, but the scheme above does not do anything to prohibit its
implementation if anyone ever discovers a reason to do it).

Bugzilla should fail gracefully when it cannot find the requested variation.
Bugzilla should have ways for installations and users to specify default variations.
(Reporter)

Comment 1

16 years ago
Created attachment 84999 [details] [diff] [review]
work-in-progress: complete but untested

Here's a patch implementing the scheme suggested by this bug report.
Priority: -- → P2
Target Milestone: --- → Bugzilla 2.18
(Reporter)

Updated

16 years ago
Blocks: 132181
(Assignee)

Comment 2

16 years ago
> Scripts may support both kinds of variations, in which case the "format"
> parameter takes precedence over the "ctype" parameter.

I don't understand what you mean by "take precedence". If I ask for
fred.cgi?format=foo&ctype=rdf
I should get the template
fred-foo.rdf.tmpl
or an error, if it doesn't exist.

Also, what impact does this have on 2.16? If we are going to rename lots of
templates (e.g. buglist-rdf.rdf.tmpl -> buglist.rdf.tmpl) then we should discuss
this idea and, if we decide it's a good one, do it now rather than later...

The default format should be "" (as now; i.e. fred.ctype.tmpl), and the default
ctype should be defined in the CGI script itself.

Gerv
Gerv
(Reporter)

Comment 3

16 years ago
>I don't understand what you mean by "take precedence". If I ask for
>fred.cgi?format=foo&ctype=rdf
>I should get the template
>fred-foo.rdf.tmpl
>or an error, if it doesn't exist.

If someone asks for a template that doesn't exist, but half of their request
refers to a template that does exist, it's a good bet they wanted the existing
template, so it makes a lot more sense (because it makes our users happier more
of the time) to give them what they probably wanted then throw an error message.

>Also, what impact does this have on 2.16? If we are going to rename lots of
>templates (e.g. buglist-rdf.rdf.tmpl -> buglist.rdf.tmpl) then we should
>discuss this idea and, if we decide it's a good one, do it now rather than
>later...

It has no impact on 2.16, because 2.16 needs to ship and this is too big a
change to go in at the last minute.  It's a shame, but it's better if we ship
and then make big changes in 2.18 than hold 2.16 for another month or more while
we work these changes out.  Also, the idea that we can get all the big changes
into 2.16 if we just hold out a little longer is silly.  We're going to make big
changes in 2.18 no matter how long we hold 2.16.

>The default format should be "" (as now; i.e. fred.ctype.tmpl), and the default
>ctype should be defined in the CGI script itself.

This code doesn't specifically address default formats (in fact it specifically
bypasses that issue).  Let's go argue about that elsewhere. :-)
(Reporter)

Updated

16 years ago
No longer blocks: 132181
(Reporter)

Updated

16 years ago
Blocks: 150049
(Assignee)

Comment 4

16 years ago
> If someone asks for a template that doesn't exist, but half of their request
> refers to a template that does exist, it's a good bet they wanted the existing
> template, so it makes a lot more sense (because it makes our users happier more
> of the time) to give them what they probably wanted then throw an error message.

This causes problems, because Template Toolkit looks for the given filename
along a variable path. We'd have to make a call to try one filename and, if that
failed, try another one. That's horrible and icky.

Why can't it be simple?

foo.cgi
<nothing>               -> foo.html.tmpl 
format=simple           -> foo-simple.html.tmpl
ctype=xml               -> foo.xml.tmpl
format=simple&ctype=xml -> foo-simple.xml.tmpl

As Bugzilla is primarily a web-based application, it makes sense for <nothing>
to map to ctype=html and format="".

This means we can get rid of all the GetOutputFormats() and
ValidateOutputFormats() stuff, and just construct the filename we look for along
the simple lines listed above. It separates structural and representational
variations, easily allows both at once (a simple RDF version, or whatever), and
is generally easy. We stop the user requesting invalid combinations using the UI.

What's wrong with it? :-) I'm sure there's something...

Gerv
(Assignee)

Updated

15 years ago
Blocks: 162151

Comment 5

15 years ago
I like Gerv's idea and only
 ./list/list-rdf.rdf.tmpl
 ./bug/choose-xml.html.tmpl
need to be changed.
> the default ctype should be defined in the CGI script itself
Yes :-) This is e.g. needed for the emails which use the template system.
(Assignee)

Comment 6

15 years ago
> > the default ctype should be defined in the CGI script itself
> Yes :-) This is e.g. needed for the emails which use the template system.

It's not, you know. :-) Those templates would just be called (as they, in fact,
are) someemail.txt.tmpl . And remember, using the formats system is optional for
any given call to the Template Toolkit - it's still perfectly possible to
hard-code template names (such as someemail.txt.tmpl).

Gerv
(Reporter)

Comment 7

15 years ago
I'm uneasy about Gerv's proposal, but I can't put my finger on what exactly is
the matter with it.  As it stands, it sounds good, but something is nagging me
about it.
(Assignee)

Comment 8

15 years ago
Created attachment 98718 [details] [diff] [review]
Patch v.1

OK, here's an initial patch.

In addition to this patch, list/list-rdf.rdf.tmpl would need renaming to
list/list.rdf.tmpl, removing the redundancy. It would then be requested with
"ctype=rdf". This is the only file whose name doesn't fit the new mechanism.

Bug 150049 is a meta-bug for template format improvements. It has five
dependencies, including this bug. Here is the impact on the other four:

Bug 140513 - fixed (pass "txt" as the third parameter)
Bug 143604 - fixed (ctype=html does a no-op)
Bug 148133 - fixed (it now returns text/plain if the ctype is not found, which
		    seems reasonable)
Bug 149061 - not fixed. It's outside the scope.

Gerv
(Reporter)

Comment 9

15 years ago
Comment on attachment 98718 [details] [diff] [review]
Patch v.1

>Index: globals.pl

>@@ -1515,119 +1515,27 @@

+    my ($templatename, $format, $ctype) = @_;

Nit: "name" is redundant in "templatename".

>+    return 
>+    { 
>+        'template'    => $templatename , 
>+        'extension'   => $ctype , 
>+        'contenttype' => $::contenttypes->{$ctype} || "text/plain" , 

Nit: It would make sense to change the name of the "contenttype" property 
to "ctype" since that's what we now use in the URL and in this function.


>@@ -202,12 +202,14 @@

>+my $format = GetFormat("reports/duplicates", 
>+                       $::FORM{'format'}, 
>+                       $::FORM{'ctype'});

Nit: better as

my $format = 
  GetFormat("reports/duplicates", $::FORM{'format'}, $::FORM{'ctype'});


>Index: enter_bug.cgi

>@@ -382,8 +382,10 @@

>+my $format = GetFormat("bug/create/create", 
>+                       $::FORM{'format'}, 
>+                       $::FORM{'ctype'});

See previous nit.


>Index: query.cgi

>@@ -363,7 +363,7 @@

+print "$format->{'contenttype'}\n\n";

-> print "Content-Type: $format->{'contenttype'}\n\n";

Are you going to create a separate patch to convert list-rdf.rdf.tmpl?
Attachment #98718 - Flags: review-
So what if someone passes 'format=blah' into the script, where blahis not a
valid format?
(Assignee)

Comment 11

15 years ago
-> me. I haven't beeng getting mail on this bug :-(

Gerv
Assignee: justdave → gerv
(Assignee)

Comment 12

15 years ago
Created attachment 99441 [details] [diff] [review]
Patch v.2

Here's version 2, with Myk's comments addressed.

> So what if someone passes 'format=blah' into the script, where blahis not a
> valid format?

You get a standard "template not found" error. But I think this is reasonable -
the UI should only allow people to choose valid formats. Of course you can
define an invalid one by editing URLs, but you can create errors by editing
URLs all over Bugzilla. Nothing nasty or insecure happens.

Gerv
Attachment #84999 - Attachment is obsolete: true
Attachment #98718 - Attachment is obsolete: true
Yes, but currently you get told that, rather than the ThrowTemplateError that
you'll now get.

If ctype/format are wrong, then you should throw an error/die rather than
silently accept it, I think
(Assignee)

Comment 14

15 years ago
> If ctype/format are wrong, then you should throw an error/die rather than
> silently accept it, I think

The current patch lets Template Toolkit look for the file which ctype and/or
format defines, and throws an error if it's not found. The only other thing we
could do would be what happened previously - i.e. to replicate that process, and
throw an error if it's not found before we get to TT. I don't think we gain much
by throwing the error earlier, apart from the fact that we say "dodgy
format/ctype" instead of "template not found". And it means we have to duplicate
the code TT has for tracking down template files.

Gerv
Hmm. I guess I'm convinced, as long as the template error message mentions the
template/format/etc name somewhere in there, so that an admin can diagnose it. I
think it will, since its actually a template name by the time TT gets to it.

What about the silent detainting?
(Assignee)

Comment 16

15 years ago
> What about the silent detainting?

I don't quite understand... The ctype and format parameters are checked:

+    # Security - allow letters and a hyphen only
+    $ctype =~ s/[^a-zA-Z\-]//g;
+    $format =~ s/[^a-zA-Z\-]//g;

Is that what you mean?

Gerv
Right, they're checked, but failures are silently corrected. Why should a format
of 'html*' become 'html' ?

ThrowUserError for non-valid stuff.
(Reporter)

Comment 18

15 years ago
Comment on attachment 99441 [details] [diff] [review]
Patch v.2

Why should we throw a user error because of a typo?  If we probably know what
the user wanted, we should give it to them.  That behavior is better in more
situations, and actual errors (where the user mistypes one format name and gets
another) are exceedingly rare and easily spotted and corrected.

>+my $format = 
>+           GetFormat("reports/duplicates", $::FORM{'format'}, $::FORM{'ctype'});

Nit: Other parts of our code use half-level (two space) indentation rather than
right-justification in these situations, and I think it's more consistent,
readable, and maintainable:

my $format = 
  GetFormat("reports/duplicates", $::FORM{'format'}, $::FORM{'ctype'});

r=myk

This is a pretty obvious fix and doesn't need second review, but you do need to
fix existing template files (i.e. list-rdf.rdf.tmpl) and figure out how not to
break existing applications using format=rdf.
Attachment #99441 - Flags: review+
(Assignee)

Comment 19

15 years ago
> Why should we throw a user error because of a typo?

Well, users shouldn't be typing format names - they should be manipulating
forms. But I agree - the current code is more friendly.

I'll just re-checkin list-rdf.rdf.tmpl as list.rdf.tmpl. It's only had a couple
of minor tweaks since it was moved last anyway.

> You need to... figure out how not to break existing applications using 
> format=rdf.

That's more of a problem. Are there already many of these applications?
Presumably mozbot is one. Can we just notify everyone and put this one down to
experience, or are we going to have to add a hack which converts format=rdf to
ctype=rdf on that CGI?

Gerv
(Assignee)

Comment 20

15 years ago
Fixed.

Checking in globals.pl;
/cvsroot/mozilla/webtools/bugzilla/globals.pl,v  <--  globals.pl
new revision: 1.200; previous revision: 1.199
done
Checking in buglist.cgi;
/cvsroot/mozilla/webtools/bugzilla/buglist.cgi,v  <--  buglist.cgi
new revision: 1.193; previous revision: 1.192
done
Checking in duplicates.cgi;
/cvsroot/mozilla/webtools/bugzilla/duplicates.cgi,v  <--  duplicates.cgi
new revision: 1.26; previous revision: 1.25
done
Checking in enter_bug.cgi;
/cvsroot/mozilla/webtools/bugzilla/enter_bug.cgi,v  <--  enter_bug.cgi
new revision: 1.72; previous revision: 1.71
done
Checking in query.cgi;
/cvsroot/mozilla/webtools/bugzilla/query.cgi,v  <--  query.cgi
new revision: 1.104; previous revision: 1.103
done
Checking in post_bug.cgi;
/cvsroot/mozilla/webtools/bugzilla/post_bug.cgi,v  <--  post_bug.cgi
new revision: 1.65; previous revision: 1.64
done
Removing list-rdf.rdf.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/list/list-rdf.rdf.tmpl,v
 <--  list-rdf.rdf.tmpl
new revision: delete; previous revision: 1.3
done
RCS file:
/cvsroot/mozilla/webtools/bugzilla/template/en/default/list/list.rdf.tmpl,v
done
Checking in list.rdf.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/list/list.rdf.tmpl,v  <--
 list.rdf.tmpl
initial revision: 1.1
done


Removing list-rdf.rdf.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/list/list-rdf.rdf.tmpl,v
 <--  list-rdf.rdf.tmpl
new revision: delete; previous revision: 1.3
done
RCS file:
/cvsroot/mozilla/webtools/bugzilla/template/en/default/list/list.rdf.tmpl,v
done
Checking in list.rdf.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/list/list.rdf.tmpl,v  <--
 list.rdf.tmpl
initial revision: 1.1
done

Gerv
Status: NEW → RESOLVED
Last Resolved: 15 years ago
Resolution: --- → FIXED
(Reporter)

Comment 21

15 years ago
Created attachment 99716 [details] [diff] [review]
patch v3: hack

I think we should hack it, and here it is.
(Assignee)

Comment 22

15 years ago
Comment on attachment 99716 [details] [diff] [review]
patch v3: hack

r=gerv - seems OK code. I sort of disagree, though, because this seems like the
sort of nipped-it-in-the-bud situation where we can avoid having to carry a
hack for the next X years. 

But, I suppose we did do a release with the old way...

We should at least deprecate this in the 2.18 release notes. Do we have a
relnote keyword?

Gerv
Attachment #99716 - Flags: review+
We could just upgrade botbot - hixie hasn't applied that patch to the sources yet.
(Reporter)

Comment 24

15 years ago
I know of two tools using format=rdf (not including mozbot), and there are
undoubtedly others.  I don't think we can say we nipped it in the bud,
especially when some installations (b.m.o notwithstanding) don't upgrade
regularly (and thus may not upgrade from 2.16 to a later release for a few years).
(Assignee)

Comment 25

15 years ago
OK, the hack it is, then. Unless we provide them a one-liner they can apply to
_their_ Bugzillas:
$::FORM{'format'} = "rdf" if $::FORM{'ctype'} = "rdf";
and then get them to upgrade their client software.

Well, I can hope. :-)

Gerv
(Reporter)

Comment 26

15 years ago
Presumably you meant the reverse, but that won't work (whether in the tree or
distributed on demand) because it causes Bugzilla to look for
"list-rdf.rdf.tmpl" instead of "list.rdf.tmpl".  Hack checked in:

Checking in buglist.cgi;
/cvsroot/mozilla/webtools/bugzilla/buglist.cgi,v  <--  buglist.cgi
new revision: 1.194; previous revision: 1.193
done
QA Contact: matty_is_a_geek → default-qa

Comment 27

5 years ago
(In reply to Gervase Markham [:gerv] from comment #12)
> the UI should only allow people to choose valid formats. Of course you can
> define an invalid one by editing URLs, but you can create errors by editing
> URLs all over Bugzilla. Nothing nasty or insecure happens.

This mostly means "why should I bother about security here?". Your assumption is wrong, and bug 842038 is a perfect example of why invalid parameters must be rejected, not silently sanitized. bbaetz in comment 13 and comment 17 was right to worry about such partial validations. FYI, Bugzilla 4.4 and newer now loudly reject invalid format and ctype values.
(Assignee)

Comment 28

5 years ago
Bug 842038, AIUI, is a case of using untrusted data before it's been sanitized, which is never a good idea. (It's somewhat hard to discuss it here as that bug is not yet open.)

Gerv
You need to log in before you can comment on or make changes to this bug.