Closed Bug 115910 Opened 23 years ago Closed 12 years ago

[PatchReader] Download only part of a patch when viewing

Categories

(Bugzilla :: Creating/Changing Bugs, enhancement, P3)

2.15
enhancement

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: jayvdb, Unassigned)

References

Details

Attachments

(2 files, 10 obsolete files)

Patches are often very large, and one has no idea of the length until the whole
patch is on its way.  Once its all in the browser, a large patch is quite
difficult to review.  Printing parts of a patch is not easy.  It would be very
useful to allow the patch to be 'browsed', giving finer granularity of control
to the reviewer.
Attached patch v0.1 (obsolete) — Splinter Review
This patch allows a 'patch' to be browsed, and each file within the diff to be
displayed separately.
Keywords: patch
Reassigning to patch author.
Assignee: myk → zeroJ
Attached file Example (obsolete) —
Attached file Patch.pm : Patch parser (obsolete) —
Patch.pm provides a much improved patch browser.  The perl still leaves much to
be desired, however the parser is now able to handle 'diff' and 'cvs diff'
outputs, unified or context, single file or recursive patches.

The output includes (if available) 'cvs command used', 'patch date', 'Base
Directory' (for 'diff's run in a single directory), List of files with the
fields Filename, Format, No. of Lines, No. of Blocks, Base Revision and Base
Timestamp.
Filenames now include the complete path relative to where the cvs diff command
was executed (preferably the top level).

Each file of the diff is able to be viewed separately as text/plain.

It has been tested with the follow patch types: 
diff file file (quite useless to browse)
diff -u file file
diff -r dir dir
diff -ur dir dir
diff -ru dir dir
cvs diff (in various locations within a workpit)
cvs diff -u
Priority: -- → P3
Target Milestone: --- → Bugzilla 2.18
Attached file Patch.pm (obsolete) —
After much grappling with perl, this is now a proper module, however the parser
only handles Unified diffs (cvs diff, diff or diff -r).
Attached patch uses Patch.pm (obsolete) — Splinter Review
Attached file template/default/attachment/patch.atml (obsolete) —
John: this looks quite cool. Does it still work? :-)

Gerv
As its only hook into the Bugzilla codebase is the link on 'patch' in the
attachment view, you guys would have to change a _lot_ to break it. :)

If there are any changes you can suggest, I will happy to integrate them before
posting a fresh patch.
Status: NEW → ASSIGNED
> As its only hook into the Bugzilla codebase is the link on 'patch' in the
> attachment view, you guys would have to change a _lot_ to break it. :)

Well, the template filenames and directories would be wrong for a start :-)

Could you please create a new single-file cvs diff -uN, with Patch.pm being in
the new Bugzilla subdirectory for modules, and in package Bugzilla, and all the
template stuff fixed? Also, your template formatting is a bit awry, and your
code has no comments. :-P

Gerv

(to add files without cvs write access, see http://www.red-bean.com/cvsutils/)
Attached patch v2: rewrite (obsolete) — Splinter Review
There are several other perl Patch modules being written (revml & VCS) at the
moment so this one is hopefully just a stop-gap until such time as they are
finished.  There are two parsers in Patch.pm to provide a little efficiency for
the unusual case where the name of a file within a patch is known before the
patch has been parsed, as is the case when viewing it over the web.
Attachment #62139 - Attachment is obsolete: true
Attachment #62168 - Attachment is obsolete: true
Attachment #64493 - Attachment is obsolete: true
Attachment #64494 - Attachment is obsolete: true
Attachment #67574 - Attachment is obsolete: true
Attachment #67575 - Attachment is obsolete: true
Attachment #67576 - Attachment is obsolete: true
Attached file v2: example
Attachment #62169 - Attachment is obsolete: true
Comment on attachment 95735 [details] [diff] [review]
v2: rewrite

This is nice.

I skipped the parsing part of Patch.pm - it looked ok, and it worked on the
uified patched I had, so...

Whats the performance like on large patches?

I looked at the existing modules you mentioned - revml looks like an xml
format, and VCS didn't look like it was anything to do with this. Sigh.

<attachment.cgi>

>+    DisplayError("Patch $::FORM{'id'} is not a known format.");

That sentance has gramatical problems. It needs 'in' between 'not' and 'a', but
what about: "Attachmed $::FORM{'id'} could not be parsed", or something
similar, instead?

Alos, DisplayError shouldn't be used any more - use a different sub, and put
the text into the messages template for localisation

>+  # Return the appropriate HTTP response headers.
>+  if($contenttype ne "text/plain" || $ispatch == 0) {
>+      view();
>+  }

This (and the other one) should only need to test $ispatch, although I guess it
doesn't hurt.

<The template>

>+[%# INTERFACE:
>+  # patch: hash. This is the hash created by Bugzilla::Patch
>+  #%]

s/hash/object/g, no?

>+[% PROCESS global/header.html.tmpl 
>+  title = title
>+  h1 = h1
>+  h2 = h2
>+%]

Passing the variables that way shouldn't be needed


>+  [% IF patch.cmddate %]

why 'cmddate'?

>+  [% FOREACH file = patch.files %]
>+    <tr>
>+      <td>
>+        [% IF file.filename %]
>+          <a href=attachment.cgi?id=[% attachid %]&amp;action=viewpatchfile&amp;filename=[% file.filename %]>
>+            [% file.filename | replace('/','/ ') %]

			       | html, too

>+<br>

Can you give a summary at the bottom of the table if theres more than one file
involved? ie total # of files, hunks, lines added/deleted?

Again, this looks great, thanks
Attachment #95735 - Flags: review-
Attached patch v2.1 (obsolete) — Splinter Review
Bugzilla::Patch now understands Normal, Context and Unified formats, and is a
little finer tuned as well.

> Whats the performance like on large patches?

Patches of 1MB - 5MB use 0.5 - 0.6s (User + System) to view in the conventional
manner on my box, and ~1 - 3s to view as a file list.  I consider this
reasonable as the network & browser didn't grind to a halt all of a sudden, and
the process completed in 2/3 - 1/3 of the time.

Should I add a UserInGroup ( Param () ) around this to prevent abuse? 
'editbugs' would be a good default IMO.
Attachment #95735 - Attachment is obsolete: true
Comment on attachment 97364 [details] [diff] [review]
v2.1

You left Patch.pm out.....
Attachment #97364 - Flags: review-
Comment on attachment 97364 [details] [diff] [review]
v2.1

Sorry guys; misplaced FILTER html in eventual patch.

see template/en/default/attachment/patch.html.tmpl

+  [% IF patch.base_dir FILTER html %]
Blocks: 119502
I'll review this if you attach Patch.pm

You don't need a UserInGroup check; 3 seconds for a 5 Meg patch is perfectly
acceptable...
Attached patch v2.2Splinter Review
If b.m.o had a patch viewer ... it would show that attachment 97365 [details] [diff] [review] contained a
new file 'Bugzilla/ Patch.pm' among others :)
Attachment #97364 - Attachment is obsolete: true
oops - how did I miss that?

(actually, it woudl show that attachment 97365 [details] [diff] [review] was not a bugzilla patch, and
attachment 97364 [details] [diff] [review] had the .pm :)
*nod* im having a perl day :)
Comment on attachment 97366 [details] [diff] [review]
v2.2

>Index: attachment.cgi
>===================================================================
>RCS file: /cvsroot/mozilla/webtools/bugzilla/attachment.cgi,v
>retrieving revision 1.20
>diff -u -r1.20 attachment.cgi
>--- attachment.cgi	26 Aug 2002 06:16:51 -0000	1.20
>+++ attachment.cgi	31 Aug 2002 04:07:51 -0000
>@@ -44,6 +44,9 @@

>+
>+  $patch->parse;

What happens if the parsing fails?


>Index: Bugzilla/Patch.pm
>===================================================================
>RCS file: Bugzilla/Patch.pm
>diff -N Bugzilla/Patch.pm
>--- /dev/null	1 Jan 1970 00:00:00 -0000
>+++ Bugzilla/Patch.pm	31 Aug 2002 04:07:51 -0000
>@@ -0,0 +1,323 @@


>+
>+# Make it harder for us to do dangerous things in Perl.
>+use strict;
>+
>+# Bundle the functions in this file together into the "Patch" package.
>+package Bugzilla::Patch;

I think we decided that package came before |use strict|


>+  # Aggressively parses the patch to determine its format,
>+  # bailing as soon it does not understand
>+  my $formatcheck = sub { 
>+    $_ = shift;
>+    # ignore any embedded comments and known headers 
>+    return if (/^(\s*#|diff|\? |Only in |Index:|====|RCS|[-+*]{3} |retrieving|cvs)/o);

Why /o ?

>+    return "Unified" if (/^@@/);
>+    return "Normal" if (/^\d/);
>+    return "Context" if (/^\*+$/);
>+    return "Unknown";
>+  };

I don't know if formatcheck is the right word. _get_format, maybe? Why is this
a closure, anyway, instead of a package-level sub (called via $self)?

>+
>+  # Read in data and verify format
>+  my @lines;
>+  if (ref($self->{'data'}) eq 'SCALAR') {
>+    ${$self->{'data'}} =~ s/\r\n/\n/g;
>+    @lines = split(/\n/,${$self->{'data'}});
>+    my $line;
>+    my $i = 0;
>+    while (! defined $self->{'format'} && $i < $#lines) {
>+      $line = $lines[$i++];
>+      $self->{'format'} ||= &$formatcheck($line);
>+    }
>+  } elsif (ref($self->{'data'}) eq 'GLOB') {
>+    while(my $line = readline $self->{'data'}) {
>+      chomp($line);
>+      push(@lines,$line);
>+      $self->{'format'} ||= &$formatcheck($line);
>+    }
>+  }

When is this not passed a SCALAR? Is there a valid case where we're passed a
filehandle/typeglob?

>+  # Notes:
>+  # - modified lines (^!) in Context diffs are counted twice
>+  #   and halfed before being counted as both added and removed

spelling - should be halved

[patch.html.tmpl]

>+[% title = BLOCK %]View Patch #[% attachid %] for Bug #[% bugid %][% END %]

Should that say attachment, instead? I know that we know that its a patch by
the time we get here, but everywhere else says attachment...

I'm not fussed about this, though

[message.html.tmpl]

>+  [% ELSIF message_tag == "unknown_patch_format" %]
>+    [% title = "Unknown Patch Format" %]
>+    Attachment [% bugid %] could not be parsed.

You want to print the attachment id, not the bugid, here...
Attachment #97366 - Flags: review-
it would also be nice if the patch description showed in patch.html.tmpl, but if
thats a lot of effort, its not essential
> What happens if the parsing fails?

At this point the possible failures include 0 or 1 files, or files with 0/0 or
the like.  If error messages are going to result from cases like these (as
opposed to error handling within the patch template), then the terms of 'error'
need to be upheld on attachment.  As the saying goes ... the show must go on.

The other alternative that I can see is to go back to only parsing Unified (and
maybe Context), in which case the rules are simpler.

> >+    return if (/^(\s*#|diff|\? |Only in |Index:|====|RCS|[-+*]{3}
|retrieving|cvs)/o);
> Why /o ?

At one stage it made a difference.

> I don't know if formatcheck is the right word. _get_format, maybe? Why is this
> a closure, anyway, instead of a package-level sub (called via $self)?
..
> When is this not passed a SCALAR? Is there a valid case where we're passed a
filehandle/typeglob?

I had left the file handle support in as I am using it for other purposes. 
Testing new formats would be one case, but I suppose that space-related
arguments in that context are hard to justify.  Would it be acceptable if the
format checking code was re-written to not lean towards the GLOB's performance,
thereby reducing the overhead of keeping it in?

> Should that say attachment, instead? I know that we know that its a patch by
> the time we get here, but everywhere else says attachment...

The user has just clicked on 'patch', and most of the UI would need to change to
accomodate this.  e.g. 'Generated' on an Attachment does not really gel IMO.

> it would also be nice if the patch description showed in patch.html.tmpl, but 
> if thats a lot of effort, its not essential

No trouble.
I guess that the glob support is ok. That still doesnt' explain why it needs to
be a closure, though.

Testing is always good :)

Actually, why don't you call formatcheck at the end of all the data, rather than
in the loop with a ||= ?
> Actually, why don't you call formatcheck at the end of all the data, rather than
> in the loop with a ||= ?

The intention was to stop reading from the file handle as soon as the format
checker fails, thus @{$patch->{'lines'}}[-1] would be the line that caused the
format to be unknown, to assist someone adding new patch formats without having
to place a slew of debug lines in the package.  Attachment 95735 [details] [diff] (aka v2) had
this as a post condition of both the file and scalar cases.  This will be
achieved by other means, and the closure with it.

I presume more perf. gains can be made by moving some of the state=0 code from
parseDiff into the new sub.  At the same time I'll add some of those =head1 that
are springing up in the codebase.  This is not turning out to be a 'stop-gap'.
*sigh* :)

Thanks for the directions to date.
Well, its not the perf issues which are theproblem - it just looked ugly :)

GLOB reading can be slow if you want, since its just for testing.

Yeah, docs are always good :) Just the .pm, though.
ping? This is almos tready to go in, and it would be a shame if it bitrotted...
I intend to get to this one night this week.  Sorry about the delay.
ping?
[RFE] is deprecated in favor of severity: enhancement.  They have the same meaning.
Severity: normal → enhancement
Depends on: 174942
John, are you still with us?
Summary: [rfe] browse patch contents → Browse patch contents
Patch Viewer (bug 174942) is in the tree, and allows you to *view* only part of
a patch, but has no faculty yet for only *downloading* part of a really large
patch.  With the architecture it would be simple to do, however--there are
classes in PatchIterator to narrow down a patch to a select set of files.  I
won't resolve this fixed, therefore, but changing the summary to reflect the
part of the problem that is not yet solved.  I am not certain of the value of
the just-download-part-of-a-patch thing, but I'd be happy to assist anyone who's
interested in doing it (and possibly eventually I will write the patch).

I do suggest that it would be best to integrate with Patch Viewer, however.  A
lot of work has gone into making the patches look good and easy to review.
Blocks: bz-patch
Summary: Browse patch contents → Download only part of a patch when viewing
Unloved bugs targetted for 2.18 but untouched since 9-15-2003 are being
retargeted to 2.20
If you plan to act on one immediately, go ahead and pull it back to 2.18.
Target Milestone: Bugzilla 2.18 → Bugzilla 2.20
Bugzilla 2.20 feature set is now frozen as of 15 Sept 2004.  Anything flagged
enhancement that hasn't already landed is being pushed out.  If this bug is
otherwise ready to land, we'll handle it on a case-by-case basis, please set the
blocking2.20 flag to '?' if you think it qualifies.
Target Milestone: Bugzilla 2.20 → Bugzilla 2.22
Target Milestone: Bugzilla 2.22 → ---
QA Contact: mattyt-bugzilla → default-qa
Summary: Download only part of a patch when viewing → [PatchReader] Download only part of a patch when viewing
Assignee: jayvdb → create-and-change
Status: ASSIGNED → NEW
I don't think performance is a problem (at least not in 2012), so the question is not about the bandwidth. And we can already collapse parts of the patch file by file, which is what this bug is mostly about. So this WFM.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: