Closed Bug 59687 Opened 24 years ago Closed 21 years ago

Wallet should use displayble text to determine meaning of fields

Categories

(Toolkit :: Form Manager, defect, P3)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: morse, Assigned: morse)

References

Details

Attachments

(7 files, 2 obsolete files)

THE WAY IT IS
-------------

Currently wallet tries to identify the meaning of a field by examining the value 
of the name attribute of the html input tag.  Sometime a website uses meaningful 
names (like "zipcode") which makes this job simple.  But more often than not, 
they use cryptic names (like 3.0.3.27.1.1.31.11.16.9.11 -- yes, that's what is 
actually found on the apple site for the zipcode field).

So wallet uses a table that contains URL-specific mappings for determining the 
meaning of field names used on specific websites.  That table has to be quite 
large in order to support a large number of websites.  But no matter how large 
it is, there will still be many websites that would not be supported.

Furthermore, the table is hosted on a server and pinged at the beginning of 
every browser session to see if it has changed and needs to be redownloaded.  
This also requires a procedure for maintaining the table on the server and to 
date none has been put in place.  And it requires a lot of specialized code in 
the browser to do the downloading in background mode.

THE WAY IT SHOULD BE
--------------------

Rather than using the name attribute of the input tag, wallet could use the 
displayable text preceding the input tag.  That's much more meaningful because 
it has to mean something to the user who is viewing it.  This would allow the 
entire mechanism for dealing involving URL-specific mappings to be discarded.  
Significant reduction in the amount of data actually stored in memory.  And no 
longer any need to do dynamic updates for a server site.
Summary: Wallet should use displayble text to determine meaning of fields → [w]Wallet should use displayble text to determine meaning of fields
The patches for this are as follows:

1. extensions/wallet/src/wallet.cpp
   all the coding changes are here

2. extensions/wallet/src/SchemaConcat.tbl,FieldSchema.tbl
   some changes to wallet tables to allow for steamlining of schema names

3. extensions/wallet/src/URLFieldSchema.tbl
   very large table no longer needed
 
4. extensions/wallet/src/SchemeStrings.tbl,PositionalSchema.tbl,StateSchema.tbl
   new tables for determining schema from the text displayed on the screen

5. extensions/wallet/src/makefile.win,Makefile.in,MANIFEST
   xpinstall/packager/packages-unix,packages-mac,packages-win

   new makefiles and install files for added/removed tables
I left out describing one patch in the list above.  Here it is:

6. extensions/wallet/editor/interview.html
   streamlining of interview form
Attached patch preliminary patch for wallet.cpp (obsolete) — Splinter Review
As mentioned above, all the substanitive changes are in wallet.cpp.  Here's a 
summary of what these changes are doing.

1. Add the ability to detect schema from the displayable text appearing
on the screen.  Besides triggering on the text, it also makes use of the
following:

  - Positional information for a set of consecutive input fields having
    no intervening displayable text.  For example, two consecutive fields
    for "zipcode" would be interpreted as the first being the 5-digit
    zipcode and the second the 4-digit zipcode extension.

  - State information to distinguish between such things as billing
    address and shipping address

2. Remove the use of the url-specific field-to-schema mapping table. 
The hit-ratio obtained by using the displayable text is so high, that
there is no longer a need for these url-specific mappings and all the
overhead that this entailed.  This means we no longer have to donwload
tables from a server and all that code has been removed.

3. Fix a bug in the processing of concatenated schema which came to
light when I was testing out the new algorithms.
Status: NEW → ASSIGNED
Putting all the diffs into a single attachment is usually best. Use "cvs diff
-N" to get new files included as part of the single diff (you must therefore cvs
add them locally first).

r=dveditz  Getting rid of the URL-specific tables will save tons of memory.
There are some intl issues that need to be worked on, but I've been cc'd on the
conversation between i18n and morse and it'll get dealt with -- not a good
reason to block this improvement.
Ok, comments on the patch:
* you have some cut'n pasted code, right around the tests for BY_LENGTH 
(not sure what functions they're in) - some of it looks somewhat complex (a 
loop, some temp variables, etc) - can you make this into a shared function 
that these cases can use?
* All over the place (such as the one mentioned above) you are using temporary 
variables like "ptr1" - When writing new code, can you use somewhat more 
descriptive names, like sublist1? 
No well-typed variable should ever need a name like "ptr"
* for the EqualsWithConversion stuff - I can see you were trying to use 
NS_LITERAL_STRING, but gave up by commenting it out - there are other 
examples of doing this correctly in the code.. I think it's as simple as saying
if (foo.EqualsWithConversion(NS_LITERAL_STRING("bar").get)

* Code like this:
+    result = elementNode->GetParentNode(getter_AddRefs(parent));
+    if ((NS_FAILED(result)) || !parent) {
+      /* no parent, we've reached the top of the tree */
+      atEnd = PR_TRUE;
+      return;
+    } else {
+      /* parent obtained */
+      elementNode = parent;
+      return;
+    }

"return" should be outside of this if() - if you (or someone else) later fixes a 
bug in this code, they might not see both returns and might cause other 
bugs..

* for checking if something is in the A-Z,a-z range, use the static 
nsString::IsAlpha, and for digits, nsString::IsDigit
oops, I meant "foo.Equals(NS_LITERAL_STRING(bar).get())"
adding myself to cc for followup comments & review
Comments on alecf's comments:

> you have some cut'n pasted code ... can you make this into a shared function 
> that these cases can use?

Problem here is that I meant to do cut-and-paste whereas I did copy-and-paste 
instead.  There should only be one instance of this code.  It was harmless the 
way I had it, but unnecessary.  Thanks for catching that.

> All over the place you are using temporary variables like "ptr1" ... can you > 
> use somewhat more descriptive names, like sublist1? 

done.  I just changed all such occurences throughout the module, not just in 
this added code.

> for the EqualsWithConversion stuff ... I think it's as simple as saying
> if (foo.EqualsWithConversion(NS_LITERAL_STRING("bar").get)

No that doesn't work either.  The way I had it is exactly as it is in 
nsGenericElement.cpp#1785:

   nsAutoString ..., prefix;
   if (prefix.Equals(NS_LITERAL_STRING("on"))) {

The only difference is that I am doing EqualsIgnoreCase whereas that code is 
doing just Equals.  I suspect something is not completely hooked up in the 
strings module.

> return" should be outside of this if()

done

> for checking if something is in the A-Z,a-z range, use the static 
> nsString::IsAlpha, and for digits, nsString::IsDigit

As far as I can tell (from an lxr search), those routines are defined but never 
used anywhere in our codebase.  So I changed my code to use the lower-level 
isalpha and isdigit instead.

Will attach revised patch for wallet.cpp in a moment.
Attached patch revised patch for wallet.cpp (obsolete) — Splinter Review
I just caught this in the latest patch:

+          for (i=0; i<text.Length(); i++) {
+            c = text.CharAt(i);
+
+            /* break out if an alphanumeric character is found */
+
+            nsresult res = nsServiceManager::GetService(kUnicharUtilCID, 
kICaseConversionIID,
+                                        (nsISupports**)&gCaseConv);
+
+
+            nsIUGenCategory* intl =  nsnull;
+            nsresult rv = nsServiceManager::GetService(kUnicharUtilCID, 
kIUGenCategoryIID,
+                                        (nsISupports**)&intl);

You are doing a GetService() on every iteration through this loop, but you are 
never NS_RELEASE()ing either service, nor even using gCaseConv...

I suggest always using nsCOMPtr, and never storing services in global/member 
variables unless you have done profiling that proves that GetService() is 
actually taking up a siginifigant amount of time.
Arghh!  That code was added in response to dveditz's comments about using an 
i18n friendly way of testing for alphabetic characters.  The i18n group has yet 
to be able to give us a functioning routine to do so.  They did suggest the one 
in UnicharUtil but their implementation of that is incomplete (currently the 
getting that service fails and my fall-back code is getting executed).

So I just commented out my attempt to get their service.  If and when we go back 
to using this service, I will incorporate your comments above.

Attaching another patch which has this code commented out.
sr=alecf.. thanks for making the changes!
r=dveditz for latest patch
Checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Summary: [w]Wallet should use displayble text to determine meaning of fields → Wallet should use displayble text to determine meaning of fields
Whiteboard: [w]
Whiteboard: [w]
Attachment #19157 - Attachment is obsolete: true
Attachment #19039 - Attachment is obsolete: true
Attachment #19159 - Attachment description: Another revised patch for wallet.cpp -- doesn't use UnicharUtil → Another revised patch for wallet.cpp -- doesn't use UnicharUtil [Checked in: Comment 21]
Changing:
*(H) PC -> All
*(OS) Windows NT -> All
*(S) R.Fixed -> "ReOpened"  <-- !!

While investigating for bug 221623, I noticed something very odd:

Attachment 19159 [details] [diff] (the latest) reads |#define IgnoreFieldNames|;
whereas the checkin from v1.267 to v1.268 reads |//#define IgnoreFieldNames| !
[ (wallet.cpp: '-patch -> +checkin' diff)
 @@ -25,6 +25,8 @@
  */
  
  #define AutoCapture
-+#define IgnoreFieldNames
++//#define IgnoreFieldNames
 +
  #include "wallet.h"
  #include "singsign.h"
@@ -881,10 +882,10 @@
 +            /* found an unused value for the multi-rhs item */
              value += value2;
 +            index00 += 2;
-           }
++          }
 +          if (index00 > index00max) {
 +            index00max = index00;
-+          }
+           }
          }
 +
          itemList = nsnull;
]
(I ignore the two '}' lines for now: should we care ?)

Since this happened 3 years ago, and has remained unchanged since then,
and since bug 221623 purpose is to do code cleanup only,
(untill FireBird wallet is back-ported...)

Should someone (not me !) look into activating this 'IgnoreFieldNames' case,
or can I go forward and remove this commented out '#define' ?

(CC: bryner, who wrote the FB wallet.)
Blocks: 221623
Status: RESOLVED → REOPENED
OS: Windows NT → All
Hardware: PC → All
Resolution: FIXED → ---
Back to fixed. Patch discrepancy aside this appears to work as designed.
Status: REOPENED → RESOLVED
Closed: 24 years ago21 years ago
Resolution: --- → FIXED
Product: Core → Toolkit
QA Contact: tpreston → form.manager
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: