Created attachment 583415 [details] oom.html Malformed HTML which causes endless loop / OOM e.g. with HTML2HTML User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.2 (KHTML, like Gecko) Ubuntu/10.04 Chromium/15.0.874.121 Chrome/15.0.874.121 Safari/535.2 Steps to reproduce: - Downloaded the original Java Source of the htmlparser used in mozilla from http://about.validator.nu/htmlparser/htmlparser-1.3.1.zip . - used htmlparser-1.3.1.jar, not the -with-transitions.jar - Parsed attached oom.html , e.g. with HTML2HTML.java Actual results: - endless loop with final OutOfMemory-Error due to repeated allocations Expected results: - no endless loop
This problem does NOT affect firefox, it only occurs when using the original java source of the htmlparser with applied hotspot workaround. As the newest java-source is stored in the mozilla repository, i am reporting it here. Cause and Fix: One method in the Tokenizer is too large to be JITted by the java hotspot compiler, causing slow performance. As a workaround for hotspot the method is splitted by a script into two methods, htmlparser-1.3.1.jar contains the splitted method, htmlparser-1.3.1-with-transitions contains the unsplitted method. Only htmlparser-1.3.1.jar with the applied hotspot-workaround is affected by the problem. It seems the hotspot-workaround introduces a bug causing a "break stateloop;" inside the nested "workAroundHotSpotHugeMethodLimit"-switch to not correctly finish the outer Tokenizer-switch. The attached fix corrects this. PS: I would appreciate a release of the newest java sources, especially again in a public maven repo.
Comment on attachment 583419 [details] [diff] [review] HG changeset patch of proposed fix Thank you for catching and fixing this! Landed: https://hg.mozilla.org/projects/htmlparser/rev/e278d506aab7 I'm sorry this fell through the cracks over Christmas.
(In reply to Markus Kull from comment #1) > PS: I would appreciate a release of the newest java sources, especially > again in a > public maven repo. I intend to make a new release once I manage to integrate a pom.xml contribution for applying the HotSpot workaround from within a Maven build in a way that complies with the Maven repo rules that require binary artifacts to be derived from the source solely by running Maven.