Last Comment Bug 712570 - Java-HtmlParser with endless Loop
: Java-HtmlParser with endless Loop
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: HTML: Parser (show other bugs)
: unspecified
: x86_64 Linux
: -- normal (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-12-21 00:19 PST by Markus Kull
Modified: 2012-05-23 06:16 PDT (History)
1 user (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
oom.html Malformed HTML which causes endless loop / OOM e.g. with HTML2HTML (40 bytes, text/plain)
2011-12-21 00:19 PST, Markus Kull
no flags Details
HG changeset patch of proposed fix (2.26 KB, patch)
2011-12-21 00:37 PST, Markus Kull
hsivonen: review+
Details | Diff | Splinter Review

Description Markus Kull 2011-12-21 00:19:51 PST
Created attachment 583415 [details]
oom.html  Malformed HTML which causes endless loop / OOM e.g. with HTML2HTML

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.2 (KHTML, like Gecko) Ubuntu/10.04 Chromium/15.0.874.121 Chrome/15.0.874.121 Safari/535.2

Steps to reproduce:

- Downloaded the original Java Source of the htmlparser used in mozilla from 
http://about.validator.nu/htmlparser/htmlparser-1.3.1.zip .
- used htmlparser-1.3.1.jar, not the -with-transitions.jar
- Parsed attached oom.html , e.g. with HTML2HTML.java



Actual results:

- endless loop with final OutOfMemory-Error due to repeated allocations


Expected results:

- no endless loop
Comment 1 Markus Kull 2011-12-21 00:36:14 PST
This problem does NOT affect firefox, it only occurs when using the original java source of the htmlparser with applied hotspot workaround. As the newest java-source is stored in the mozilla repository, i am reporting it here.

Cause and Fix: One method in the Tokenizer is too large to be JITted by the java hotspot compiler, causing slow performance. As a workaround for hotspot the method is splitted by a script into two methods, htmlparser-1.3.1.jar contains the splitted method, htmlparser-1.3.1-with-transitions contains the unsplitted method. Only htmlparser-1.3.1.jar with the applied hotspot-workaround is affected by the problem.
It seems the hotspot-workaround introduces a bug causing a "break stateloop;" inside the nested "workAroundHotSpotHugeMethodLimit"-switch to not correctly finish the outer Tokenizer-switch. The attached fix corrects this.

PS: I would appreciate a release of the newest java sources, especially again in a  
public maven repo.
Comment 2 Markus Kull 2011-12-21 00:37:20 PST
Created attachment 583419 [details] [diff] [review]
HG changeset patch of proposed fix
Comment 3 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2012-05-23 06:13:52 PDT
Comment on attachment 583419 [details] [diff] [review]
HG changeset patch of proposed fix

Thank you for catching and fixing this!

Landed: https://hg.mozilla.org/projects/htmlparser/rev/e278d506aab7

I'm sorry this fell through the cracks over Christmas.
Comment 4 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2012-05-23 06:16:29 PDT
(In reply to Markus Kull from comment #1)
> PS: I would appreciate a release of the newest java sources, especially
> again in a  
> public maven repo.

I intend to make a new release once I manage to integrate a pom.xml contribution for applying the HotSpot workaround from within a Maven build in a way that complies with the Maven repo rules that require binary artifacts to be derived from the source solely by running Maven.

Note You need to log in before you can comment on or make changes to this bug.