Java-HtmlParser with endless Loop

RESOLVED FIXED

Status

()

Core
HTML: Parser
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: Markus Kull, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Reporter)

Description

6 years ago
Created attachment 583415 [details]
oom.html  Malformed HTML which causes endless loop / OOM e.g. with HTML2HTML

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.2 (KHTML, like Gecko) Ubuntu/10.04 Chromium/15.0.874.121 Chrome/15.0.874.121 Safari/535.2

Steps to reproduce:

- Downloaded the original Java Source of the htmlparser used in mozilla from 
http://about.validator.nu/htmlparser/htmlparser-1.3.1.zip .
- used htmlparser-1.3.1.jar, not the -with-transitions.jar
- Parsed attached oom.html , e.g. with HTML2HTML.java



Actual results:

- endless loop with final OutOfMemory-Error due to repeated allocations


Expected results:

- no endless loop
(Reporter)

Comment 1

6 years ago
This problem does NOT affect firefox, it only occurs when using the original java source of the htmlparser with applied hotspot workaround. As the newest java-source is stored in the mozilla repository, i am reporting it here.

Cause and Fix: One method in the Tokenizer is too large to be JITted by the java hotspot compiler, causing slow performance. As a workaround for hotspot the method is splitted by a script into two methods, htmlparser-1.3.1.jar contains the splitted method, htmlparser-1.3.1-with-transitions contains the unsplitted method. Only htmlparser-1.3.1.jar with the applied hotspot-workaround is affected by the problem.
It seems the hotspot-workaround introduces a bug causing a "break stateloop;" inside the nested "workAroundHotSpotHugeMethodLimit"-switch to not correctly finish the outer Tokenizer-switch. The attached fix corrects this.

PS: I would appreciate a release of the newest java sources, especially again in a  
public maven repo.
(Reporter)

Comment 2

6 years ago
Created attachment 583419 [details] [diff] [review]
HG changeset patch of proposed fix
Comment on attachment 583419 [details] [diff] [review]
HG changeset patch of proposed fix

Thank you for catching and fixing this!

Landed: https://hg.mozilla.org/projects/htmlparser/rev/e278d506aab7

I'm sorry this fell through the cracks over Christmas.
Attachment #583419 - Attachment is patch: true
Attachment #583419 - Flags: review+
(In reply to Markus Kull from comment #1)
> PS: I would appreciate a release of the newest java sources, especially
> again in a  
> public maven repo.

I intend to make a new release once I manage to integrate a pom.xml contribution for applying the HotSpot workaround from within a Maven build in a way that complies with the Maven repo rules that require binary artifacts to be derived from the source solely by running Maven.
Status: UNCONFIRMED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.