User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.7) Gecko/20050414
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.7) Gecko/20050414
When saving a page as an HTML file, it should be the page as it currently
appears in the browser. Specifically, any text that's been entered in form
fields should be preserved, radio button and checkbox state be retained, etc.
One place this would be very desirable is in saving pages that you're filling
out as part of an online transaction. Right now you only have the option to
print the pages to have a record of your inputs, but that approach has many
limitations, including not being able to copy & paste data from the saved page
later, having text that appears in TEXTAREAs with scrollbars be truncated, etc.
This came up for me today when I was trying to enter a bunch of text in an
online evaluation form and the site was giving me a generic insert error. When
I reported the problem to the webmaster, he requested I send him a screenshot of
the data I was trying to submit, but I had to reply that this would be
insufficient because the form in question was many screenfulls long and I had
typed much text in TEXTAREAs that would be truncated in screenshots. Mozilla
not giving me any way to save the form with my inputs, I was forced to save the
basic HTML and then manually edit it to add SELECTED attributes to OPTION tags,
CHECKED to checkboxes and radio buttons, VALUE attributes to TEXT inputs, and
paste typed text into the TEXTAREAs.
This feature would also be very useful to proactively protect oneself from data
loss on those pages that give you an error (e.g. because you've been typing so
long that your session expired) and then when you hit Back all the form fields
have been cleared to their default state. If you had typed a large amount of
text that you wanted to make sure you didn't lose, you could simply save the
page, and then if you got an error you could easily copy and paste your text
into a new instance of the page when retrying.
Steps to Reproduce:
1. Change some inputs on a form page.
2. Do Save Page As.
3. Open the file.
The file does not include any changes you made to the form inputs.
It should preserve your entered text and other form input changes.
jst, biesi, darin, thoughts? This seems like a reasonable idea to me...
hm, yeah, I think this is a good idea.
This is an automated message, with ID "auto-resolve01".
This bug has had no comments for a long time. Statistically, we have found that
bug reports that have not been confirmed by a second user after three months are
highly unlikely to be the source of a fix to the code.
While your input is very important to us, our resources are limited and so we
are asking for your help in focussing our efforts. If you can still reproduce
this problem in the latest version of the product (see below for how to obtain a
copy) or, for feature requests, if it's not present in the latest version and
you still believe we should implement it, please visit the URL of this bug
(given at the top of this mail) and add a comment to that effect, giving more
reproduction information if you have it.
If it is not a problem any longer, you need take no action. If this bug is not
changed in any way in the next two weeks, it will be automatically resolved.
Thank you for your help in this matter.
The latest beta releases can be obtained from:
This needs to be done over in webbrowserpersist...
We could perhaps do this by comparing value to defaultvalue for textarea/input, and so forth for the other controls.
Ryan, do you want to take a stab at this?
(In reply to comment #5)
> We could perhaps do this by comparing value to defaultvalue for textarea/input,
> and so forth for the other controls.
> Ryan, do you want to take a stab at this?
Sure, I'll take a shot!
I've got a few questions though, should we deal with select's for example (as in make sure the item that was selected will be selected when the saved file is loaded)? Also, should we save the file path for input type="file" for example?
Dealing with <select>s could be nice, but might be a bit of trouble; could do that in a followup if desired.
I don't think we should save file paths. It wouldn't do any good anyway, since there's no way to prefill a file input by parsing the HTML, so...
OK, I'll leave the handling of selects too a follow up bug. I'll try to look at this sometime this week or weekend.
Created attachment 277463 [details] [diff] [review]
approach 1: update attrs before serialization
Taking this bug with permission from Ryan.
The content serialization code works by recursively serializing nodes with stored attributes into a string of tags with key="value" pairs. The problem is that attributes are used to store default values, while DOM state, such as user input to form elements, is stored in the elements themselves. Thus, this patch puts some code in right before attributes are serialized and updates the attributes to reflect the current state of the given element, including any user input. The existing framework takes care of the rest.
Attached patch handles textarea, textbox, checkbox, radio, and option/select.
That doesn't look like the right approach. It changes the DOM you're saving, which is a problem, imo. In particular, saving a website could break its functioning.
What you want to do instead is to munge things on the cloned "fixup" nodes that web browser persist creates.
Created attachment 277698 [details] [diff] [review]
As much as possible in nsWebBrowserPersist
Okay, so doing stuff in webbrowserpersist works for checkboxes, radio buttons, textboxes, and selects... but not textareas.
The problem is that there's an assumption in the design of the current system, that attributes are the only thing that could need fixing. But textareas violate that rule: textareas may have child nodes in the original DOM that shouldn't be serialized as-is! Combined with the desire to leave the original DOM intact, that leaves us between a rock and a hard place.
1) Go ahead and modify the DOM when serializing textareas. The only thing that would be modified would be either adding or changing the content of a single #text node under each textarea. Advantages: simple to write and understand. Disadvantages: modifies uncloned DOM.
2) Add a special case to nsDocumentEncoder::SerializeToStringRecursive's handling of child node serialization. Advantages: simple to write and understand, does not modify uncloned DOM. Disadvantages: violates the document-language-agnosticism of nsDocumentEncoder, adds a QI before looping over child nodes (perf issue?).
3) Play fast and loose with the content serializer interface, add a method like EncoderShouldSkipChildSerialization (or some such). This would, at the end of the day, be rather like option number 2. It would be more flexible, but also harder to understand, code-wise, because it would split the work done in one place in option 2 into at least three separate places in at least two files. More, if you could the unused stubs in the non-HTML content serializer implementations. Advantages: arguably less hackish than option 2. Disadvantages: necessitates adding stubs to at least four content serializer implementations. Added complexity may not outweigh less hackish nature.
Other approaches I'm missing?
Since I already attached a patch that does option 1, this patch tries option 2 on for size. I actually started with approach 3 at the very beginning, but discarded it when I found the joys of nsContentUtils::SetNodeTextContent in patch #1.
Also, since it's not just URI-related attributes that are being fixed up now, perhaps the name of nsWebBrowserPersist::CloneNodeWithFixedUpURIAttributes should have the "URI" removed?
Peter, Jonas, what do you think is a good approach for the textarea issue?
My instinct is that we should expand nsIDocumentEncoderNodeFixup to allow fixing up the kids too (that is, move the fixup up a level from SerializeNodeStart and allow fixupNode() to return a boolean or something indicating whether the clone's kids should be used instead of the original node's kids.
Yeah, that sounds like a good idea.
Created attachment 277856 [details] [diff] [review]
* Extends the fixupNode method with an [out] boolean param
* Removes "URI" from "CloneNodeWithFixedUpURIAttributes", since we're now fixing up more than just URI attributes. Technically, with textareas, we're fixing up more than just attributes, too, but I think it's clearer with just URI removed.
* Moves a call to FixupNode out of SerializeNodeStart, along with a new default parameter so that all the other callers of SerializeNodeStart (who don't directly iterate over child nodes) can continue to assume that SerializeNodeStart will do fixup for them.
Comment on attachment 277856 [details] [diff] [review]
>+ nsCOMPtr<nsIDOMNode> node, maybeFixedNode;
|node| can just be an nsIDOMNOde* here, no?
>+ * apply recursive serialization to the children of the cloned node, instead
>+ rv = nodeAsInput->GetValue(valueStr);
>+ if (NS_SUCCEEDED(rv) && !valueStr.IsEmpty())
So... if NS_FAILED(rv) should we really go on? Not that that will happen here. On the other hand, why not do this if valueStr.IsEmpty()?
>+ outElt->SetAttribute(NS_LITERAL_STRING("value"), valueStr);
Would it be clearer to SetDefaultValue(valueStr)?
>+ if (checked)
>+ outElt->SetAttribute(checkedAttr, NS_LITERAL_STRING("true"));
Again, would |outElt->SetDefaultChecked(checked)| make more sense? Especially since checked="true" is not actually valid HTML...
Toss in a "default: break" to prevent build warnings?
>+ nsCOMPtr<nsIDOMHTMLTextAreaElement> nodeAsTextArea = do_QueryInterface(aNodeIn);
>+ nsCOMPtr<nsIDOMNode> out = do_QueryInterface(*aNodeOut);
You just need that for convenience, right? Why not:
nsIDOMNode* out = *aNodeOut;
>+ // Remove the textarea's child text nodes (should be at most one)
Shouldn't be any, in fact...
>+ rv = doc->CreateTextNode(valueStr, getter_AddRefs(childText));
etc. How about replacing all that with a QI to nsIDOM3Node followed by a call to SetTextContent(valueStr) on he result. That will handle removing existing kids (none), creating the text node, setting it to the right string, appending it, etc, etc.
>+ nsCOMPtr<nsIDOMHTMLOptionElement> nodeAsOption = do_QueryInterface(aNodeIn);
>+ if (selected)
>+ outElt->SetAttribute(selectedStr, NS_LITERAL_STRING("true"));
Rest looks good.
Created attachment 278065 [details] [diff] [review]
update to review comments 1
Updated to review comments.
I believe it is actually possible for the fixed up text area element we get to have child nodes, if mPersistFlags has PERSIST_FLAGS_FIXUP_ORIGINAL_DOM. It usually won't, but look at
>+ // Avoid superfluous value="" serialization
>+ if (valueStr.IsEmpty())
This bit isn't strictly necessary, it just makes the serialization a little bit cleaner. You're right, though, about the last patch's logic being a little dodgy. In the case of an un-cloned textbox with a default value which the user then erased, the last patch would have serialized the default value even though the user had explicitly changed it.
Comment on attachment 278065 [details] [diff] [review]
update to review comments 1
>+ if (valueStr.IsEmpty())
I guess SetDefaultValue() doesn't remove the attr, yeah. In that case, RemoveAttribute/SetAttribute is clearer, I think.
With that changed back, r=bzbarsky
Created attachment 278185 [details] [diff] [review]
Goes back to SetAttribute instead of SetDefaultValue
Carrying over r+, requesting sr.
Just a note, I was looking at bug 115107 and realized that script elements follow the textarea pattern. That is, we want serialize (in nsDocumentEncoder) a #text child for script elements generated from information stored in the script element, in nsWebBrowserPersist.
You mean <style> elements, right?
Err, yeah, those things. ;-)
Or <link>. Yes, that could be done. We'd probably want to fix some CSS serializer bugs first...
Comment on attachment 278185 [details] [diff] [review]
Goes back to SetAttribute instead of SetDefaultValue
Looks great, sorry for the extremely slow review on this one :(
Created attachment 331425 [details] [diff] [review]
Same as last patch, updated to mozilla-central
This patch is the same content diff as patch 278185, but done with hg in a mozilla-central tree instead of Firefox 3; thus, a few of the line numbers are slightly different.
I don't have commit access, so I'm carrying over r+sr and marking this as checkin-needed.
Checked in. It'd be good to have some tests here too.....
Created attachment 331631 [details]
Semi-manual testcase for form state
I don't know how to go about doing automated testing of webbrowserpersist, but here's the testcase I used when writing the patch, cleaned up and possibly suitable for adding to Litmus.
Steps to test:
1) Open testcase in browser
2) Click "Fill Form" button
3) File > Save Page As
4) Open saved page and verify that the form is still filled.
Could write a test that saves a webpage to a file and make sure it saves the right thing...
Created attachment 333367 [details] [diff] [review]
Mochitest version of above testcase
This patch adds a test (and the test directory) to /embedding/test/. The directory /embedding/tests/ already exists, but it contains compiled-code and xpcshell tests, and convention seems to be to put mochitests in a test/ dir.
The test itself uses wbp in a manner similar to File > Save Page As. It loads a form in an iframe, fills it out, and saves the form to a file in the temp dir. Because mochitests can't read file:/// URLs, the test then reads back the persisted file contents and loads the contents into a second iframe. Finally, the two iframes are confirmed to have the same form state.
As a last note, the "loads the contents into an iframe" bit seems to trigger a known (but still unfortunate) assertion:
###!!! ASSERTION: Adding child where we already have a child? This will likely misbehave: 'Error', file c:/moz/mozilla-central/docshell/shistory/src/nsSHEntry.cpp, line 599
Created attachment 333368 [details] [diff] [review]
Updated mochitest patch
Sorry for the bugspam, but the patch above just needed one minor fix:
@@ -0,0 +1,141 @@
@@ -0,0 +1,142 @@
Comment on attachment 333368 [details] [diff] [review]
Updated mochitest patch
This looks great! Thanks!
Test checked in.