Closed Bug 578561 Opened 14 years ago Closed 12 years ago

sdb_getTempDir returns NULL with new versions of sqlite (e.g., version 3.6.22)

Categories

(NSS :: Libraries, defect, P2)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED
3.14.2

People

(Reporter: wtc, Assigned: KaiE)

References

Details

Attachments

(4 files, 9 obsolete files)

5.15 KB, patch
rrelyea
: superreview+
Details | Diff | Splinter Review
793 bytes, patch
Details | Diff | Splinter Review
573 bytes, patch
KaiE
: review+
Details | Diff | Splinter Review
3.88 KB, patch
rrelyea
: review+
Details | Diff | Splinter Review
On Ubuntu Hardy, the system libsqlite3.so is version 3.4.2. sdb_getTempDir returns "/var/tmp". This is because in sdb_getTempDirCallback, when cname[i] is "file", cval[i] is "/var/tmp/etilqs_JeRzm8lia5xOBMl" (in one execution I traced in gdb). With the new versions of sqlite, such as the one shipped with NSS or the system libsqlite3.so in Ubuntu Lucid (both are version 3.6.22), sdb_getTempDir returns NULL. This is because in sdb_getTempDirCallback, when cname[i] is "file", cval[i] is "". I wonder if "" means the "temp" data The consequence of a null tempDir in sdb_init() is that the the filesystem speed test is skipped, so enableCache is PR_FALSE (the initial value), and we get poor performance if the NSS database is on NFS. See Chromium issue http://crbug.com/48585.
I wrote an incomplete sentence at the end of the second paragraph in comment 0. What I meant is: I wonder if "" means the "temp" database ("TEMPORARY TABLE") is in memory.
I found that "" does NOT mean an in-memory database. An in-memory database seems to be represented by ":memory:", but I haven't verified that. In any case, that's not important. I also concluded that with sqlite 3.6.22 it's impossible to get the pathname of the "temp" database. (Note: the "temp" database is immediately unlinked after it's opened, so it doesn't really have a pathname to speak of.) It's also impossible to get the directory used for temporary files from sqlite 3.6.22. So I'm afraid that sdb_getTempDir can't be implemented with sqlite 3.6.22.
Priority: P1 → P2
Target Milestone: 3.13 → 3.13.1
Target Milestone: 3.13.1 → 3.13.2
Blocks: 783994
Yes, I also see this bug. Automatic enabling of the shared db cache doesn't work.
Attached patch Patch v1 - crashes? (obsolete) — Splinter Review
This is a patch to try harder to get a path to a temporary directory. I believe tmpnam is an ANSI C function, so it should be OK to use cross platform?
Assignee: wtc → kaie
Attachment #679869 - Flags: review?(rrelyea)
Attachment #679869 - Attachment description: Patch v1 → Patch v1 - crashes?
Attachment #679869 - Attachment is obsolete: true
Attachment #679869 - Flags: review?(rrelyea)
Attached patch patch v2 (obsolete) — Splinter Review
After reading the tmpnam manpage in more detail ;) here is a better patch. If you don't like this patch, then we would have to: - either change sdb_measureAccess to use a filename created with tmpfile - or add a function to NSPR that implements a new function to obtain the system preferred temporary directory, using each platform's preferred modern APIs
Attachment #679883 - Flags: review?(rrelyea)
Comment on attachment 679883 [details] [diff] [review] patch v2 Either Bob or Wan-Teh, what do you think?
Attachment #679883 - Flags: review?(wtc)
I would remove the comment + * we will free tmpFileName after we're done. prior to checkin.
In addition, given that we have difficulties to reliably find the path to a temporary directory that is guaranteed to be on a local filesystem, I propose a different approach that doesn't require a temporary directory. Given that NSS uses an in-memory cache database, where no files are being created on disk at all, the speed of a temporary filesystem seems irrelevant. We need to know the speed of accesing files on a local filesystem. I think we don't need a writeable directory, the speed measurement to access nonexistant filenames is a read-only test. Why don't we simply test for filenames on the top level of the primary drive? On Windows-systems we'd use c:\ and on Unix-like systems we'd use / for the directory.
Attached patch ALTERNATIVE patch v3 (obsolete) — Splinter Review
This patch implements the alternative approach, which works correctly on my system, tested with database on local drive and on remote NFS drive.
Attachment #680100 - Flags: review?(rrelyea)
Attachment #680100 - Flags: review?(wtc)
Comment on attachment 679883 [details] [diff] [review] patch v2 r+... I'd also be ok with using tempnam bob
Attachment #679883 - Flags: review?(rrelyea) → review+
Comment on attachment 680100 [details] [diff] [review] ALTERNATIVE patch v3 r- this will fail horribly if we can't right to '/', which is pretty frequently the case for unix. bob
Attachment #680100 - Flags: review?(rrelyea) → review-
(In reply to Robert Relyea from comment #11) > > r- this will fail horribly if we can't right to '/', which is pretty > frequently the case for unix. We don't want to write to / The test simply attempts to check whether filenames in / exist,
Your right. I still prefer patch 2, however.
Attachment #680100 - Attachment is obsolete: true
Attachment #680100 - Flags: review?(wtc)
Comment on attachment 679883 [details] [diff] [review] patch v2 Thanks for the r+ I suspect that "tmpnam" is more widely available. I agree that "tempnam" looks cleaner and more advanced. I believe v2 has the disadvantage that we might get a temporary directory that is on the same network filesystem than the user's home directory, and as a result, we'd still use the network filesystem. For now, I'll check in v2. I'll start with a new patch that tries Bob's proposal to use "tempnam", but I'm worried it's not available on all platforms? Let's see how Tinderbox behaves. If I see failures, I'll back out and retry with "tmpnam". With the r-'ed patch v3 I believe the probability that we actually perform our comparison on a local filesystem is higher - because c:\ is usually an internal disk and / is usually the local mount point, with everything remote being on lower hierarchies.
Attachment #679883 - Flags: review?(wtc)
Attached patch patch v2 with comment fixed (obsolete) — Splinter Review
Comment on attachment 681518 [details] [diff] [review] patch v2 - but using "tempnam" instead [backed out] This patch checked in. Let's see if it works on all platforms. Checking in sdb.c; /cvsroot/mozilla/security/nss/lib/softoken/sdb.c,v <-- sdb.c new revision: 1.25; previous revision: 1.24 done
Attachment #681518 - Attachment description: patch v2 - but using "tempnam" instead → patch v2 - but using "tempnam" instead [checked in]
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: 3.13.2 → 3.14.1
It's more important that we get a pretty good guess at where sqlite will put it's tempdb then where the local file system lives. The test is to determine if it's better for us to use a tmpdb cache and periodically refresh it than to use the db in the networked filesystem. If both the tmpdb and the real db are both in the networked filesystem, then caching is moot. bob
(In reply to Robert Relyea from comment #18) > It's more important that we get a pretty good guess at where sqlite will put > it's tempdb then where the local file system lives. The test is to determine > if it's better for us to use a tmpdb cache and periodically refresh it than > to use the db in the networked filesystem. If both the tmpdb and the real db > are both in the networked filesystem, then caching is moot. Either you misremember, or I misunderstand our code.
(In reply to Kai Engert (:kaie) from comment #19) > > Either you misremember, or I misunderstand our code. My point is, our tmpdb caching is in RAM, not on disk. We only need to test if access to files in the database directory is slow - and for that, we should use a known fast directory as comparison (that's what patch v3 attempted to do).
Bob clarified to me the following: Despite the fact that we request that sqlite stores the temporary database in memory, sqlite might still decide to use a file on disk. Bob saw that to happen in the past. So, in order to consider the alternative patch v3, we would have to investigate very carefully if this behavior can still be seen, and in which environments. The above is just for the record. For now, let's hope this bug has been sufficiently fixed by patch v2.
I have backed out the patch for this bug because they caused bug 817233: Checking in lib/softoken/sdb.c; /cvsroot/mozilla/security/nss/lib/softoken/sdb.c,v <-- sdb.c new revision: 1.27; previous revision: 1.26 done Here is the crash: Crash reason: SIGSEGV Crash address: 0x0 Thread 4 (crashed) 0 libmozglue.so!arena_dalloc [jemalloc.c : 4652 + 0x0] r4 = 0x00305dc8 r5 = 0x00005dc8 r6 = 0x00001a40 r7 = 0x00000000 r8 = 0x00300000 r9 = 0x4e6f8549 r10 = 0x00305dc8 fp = 0x00000004 sp = 0x4e103c18 lr = 0x4e6e9174 pc = 0x80c0a6f4 Found by: given as instruction pointer in context 1 libsoftokn3.so (deleted) + 0x1e172 r4 = 0x00305dc8 r5 = 0x4e6fd65c r6 = 0x00001a40 r7 = 0x000006c3 r8 = 0x00000001 r9 = 0x4e6f8549 r10 = 0x4e6f85f6 fp = 0x00000004 sp = 0x4e103c40 pc = 0x4e6e9174 Found by: call frame info 2 libsoftokn3.so (deleted) + 0x1da3e sp = 0x4e103c70 pc = 0x4e6e8a40 Found by: stack scanning 3 libsoftokn3.so (deleted) + 0x2d52a sp = 0x4e103c84 pc = 0x4e6f852c Found by: stack scanning 4 libsoftokn3.so (deleted) + 0x1e75a sp = 0x4e103ca8 pc = 0x4e6e975c Found by: stack scanning 5 libsoftokn3.so (deleted) + 0x22816 sp = 0x4e103ce8 pc = 0x4e6ed818 Found by: stack scanning 6 libsoftokn3.so (deleted) + 0xdf0e sp = 0x4e103d58 pc = 0x4e6d8f10 Found by: stack scanning 7 libsoftokn3.so (deleted) + 0xe166 sp = 0x4e103dc0 pc = 0x4e6d9168 Found by: stack scanning 8 libsoftokn3.so (deleted) + 0xe56a sp = 0x4e103df8 pc = 0x4e6d956c Found by: stack scanning 9 libsoftokn3.so (deleted) + 0xe646 sp = 0x4e103e50 pc = 0x4e6d9648 Found by: stack scanning 10 libnss3.so!secmodUnlockMutext [pk11load.c : 49 + 0x6] sp = 0x4e103e58 pc = 0x5116edf4 Found by: stack scanning 11 0x52c3560e r4 = 0x00000003 sp = 0x4e103e60 pc = 0x52c35610 Found by: call frame info 12 libnss3.so!secmod_ModuleInit [pk11load.c : 221 + 0xe] sp = 0x4e103e68 pc = 0x5116eb04 Found by: stack scanning 13 libnss3.so!secmod_LoadPKCS11Module [pk11load.c : 457 + 0xe] r4 = 0x52c35610 r5 = 0x00000000 r6 = 0x00000001 r7 = 0x4e103f54 r8 = 0x00000000 r9 = 0x4e3363a8 r10 = 0x500fdd34 fp = 0x5009acd0 sp = 0x4e103ed8 pc = 0x5116f588 Found by: call frame info 14 libnss3.so!SECMOD_LoadModule [pk11pars.c : 1010 + 0xa] r4 = 0x52c35610 r5 = 0x52c35410 r6 = 0x500f0000 r7 = 0x00000001 r8 = 0x00000000 r9 = 0x4e3363a8 r10 = 0x500fdd34 fp = 0x5009acd0 sp = 0x4e103f48 pc = 0x51180a00 Found by: call frame info 15 libnss3.so!SECMOD_LoadModule [pk11pars.c : 1045 + 0x2] r4 = 0x52c35410 r5 = 0x500fdd30 r6 = 0x500ef400 r7 = 0x500f0000 r8 = 0x00000000 r9 = 0x4e3363a8 r10 = 0x500fdd34 fp = 0x5009acd0 sp = 0x4e103f88 pc = 0x51180aa0 Found by: call frame info 16 libnss3.so!nss_Init [nssinit.c : 438 + 0xe] r4 = 0x5008dfb0 r5 = 0x00000000 r6 = 0x500ed920 r7 = 0x4e36d880 r8 = 0x4e3363a4 r9 = 0x4e3363a8 r10 = 0x500ef400 fp = 0x5009acd0 sp = 0x4e103fc8 pc = 0x51157488 Found by: call frame info 17 libnss3.so!NSS_Initialize [nssinit.c : 816 + 0x96] r4 = 0x00000000 r5 = 0x00000000 r6 = 0x00000000 r7 = 0x00000000 r8 = 0x00000001 r9 = 0x00000000 r10 = 0x00000001 fp = 0x00000000 sp = 0x4e104048 pc = 0x51157c4c Found by: call frame info 18 libxul.so!nsNSSComponent::InitializeNSS(bool) [nsNSSComponent.cpp : 1687 + 0x1a] r4 = 0x52c01f00 r5 = 0x54a8e620 r6 = 0x54886b68 r7 = 0x54871dae r8 = 0x4e392030 r9 = 0x00000001 r10 = 0x548712c0 fp = 0x54ac14ec sp = 0x4e1040b8 pc = 0x53c24efc Found by: call frame info 19 libxul.so!nsNSSComponent::Init() [nsNSSComponent.cpp : 1922 + 0xa] r4 = 0x52c01f00 r5 = 0x4e10415c r6 = 0x00000000 r7 = 0x548712c0 r8 = 0x4e392030 r9 = 0x00000000 r10 = 0x548712c0 fp = 0x54ac14ec sp = 0x4e104150 pc = 0x53c25ec0 Found by: call frame info
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Brian Smith (:bsmith) from comment #22) > Here is the crash: > > Crash reason: SIGSEGV > Crash address: 0x0 Hmm...now that I look at it, this stack trace is not helpful. Sorry. When I actually ran GDB against fennec, I saw that the crash was occuring in the line PORT_Free here: + if (mustFreeTempDir) + PORT_Free(tempDir); And tempDir was non-NULL. Perhaps we need to be calling free(tempDir) instead of PORT_free(tempdir) when tempDir was returned from tempnam?
Attached patch Patch v3 with tempnam (obsolete) — Splinter Review
Maybe that's safer. Worth an Mozilla Android try run?
Attachment #681518 - Attachment description: patch v2 - but using "tempnam" instead [checked in] → patch v2 - but using "tempnam" instead [backed out]
Hmmm,, why did we crash, though...? It's OK for us to through up our hands and say "I don't know the answer" in this case. We shouldn't crash, though. It's particularly OK for this to fail on android. It's pretty unlikely that we are using shared filesystems on android devices... bob
Right, the question is, given that our default is still the old database, why did we crash in code related to the new shared database? I also was under the impression that Android doesn't use shared-db yet, and this code should get executed when using shared db, only.
I started a Mozilla try build that combines NSS 3.14.1 beta1 with patch v3. https://tbpl.mozilla.org/?tree=Try&rev=46a58e466c1d
(In reply to Kai Engert (:kaie) from comment #27) > I started a Mozilla try build that combines NSS 3.14.1 beta1 with patch v3. > https://tbpl.mozilla.org/?tree=Try&rev=46a58e466c1d That crashed, too. I'm not yet convinced this patch is repsonsible for the crash. I've started another try build with this bug backed out: https://tbpl.mozilla.org/?tree=Try&rev=10fd1a86b735
Yes, using NSS 3.14.1 beta1 WITHOUT these patches make Android pass all tests. After spending time to root my tablet, I was able to look at the files in the profile directory. To my surprise: Firefox Android uses cert9 + key4 ! I want to find out why. I haven't yet found the Android-specific NSS init code yet that intructs NSS to use it the new sql db.
Android builds NSS with: NSS_DISABLE_DBM I used patch v3 plus tracing. What we get is: tempDir: 0x0 TEMP: 0x0 TMP: 0x0 tempnam: 0x6372b050 /data/data/org.mozilla.fennec/app_tmp/tmp.OzyqQ11552 end: 0x6372b075 /tmp.OzyqQ11552 len: 37 shortening, new tempDir: 0x6372b050 /data/data/org.mozilla.fennec/app_tmp measuring 0x6372b050 /data/data/org.mozilla.fennec/app_tmp decision enable: 0 tempDir: 0x0 TEMP: 0x0 TMP: 0x0 tempnam: 0x6372c058 /data/data/org.mozilla.fennec/app_tmp/tmp.RyaKE11552 end: 0x6372c07d /tmp.RyaKE11552 len: 37 shortening, new tempDir: 0x6372c058 /data/data/org.mozilla.fennec/app_tmp measuring 0x6372c058 /data/data/org.mozilla.fennec/app_tmp decision enable: 0 My tracing also showed that we call "free" (not PORT_Free). This is expected to be correct (according to the tempnam manpage). If I disable the call to free: no crash. I still fail to understand why. Maybe there is a bug in Android's implementation of the tempnam function? Maybe I'm not allowed to modify the buffer returned by tempnam? I could try doing a strcpy instead of manipulating the buffer returned by tempnam.
I've looked at the implementation of tempnam on Android, and I couldn't find any bugs yet. It seems correct to free the result using "free". Well, it shouldn't cause problems, but maybe the memory allocation strategy used by sdb_measureAccess isn't the best. Right now it will allocate+free temporary name strings up to 10000 times. And while it's expensive, it certainly shouldn't trigger any crashes. Lacking ideas, as a shoot into the wild, I'm doing another try build, where I've changed sdb_measureAccess to allocate a buffer only once, and use it for the entire loop. In addition, I've started yet another try build where I don't manipulate the result of tempnam - instead I'll copy the directory substring using PL_strndup, and will free the result from tempnam immediately (prior to running through sdb_measureAccess. (For now, documenting this for myself, so I'll know later what I did.)
tempnam is the culprit. This code crashes on Android: char *tn = tempnam(0,0); if (tn) free(tn); However, it crashes, only, when built as part of Mozilla's infrastructure to build Firefox for Android. If I build locally on my system, using NDK v8, as part of a NSS command line tool, no crash. (Both binaries executed on the same tablet.)
Attached patch Patch v4 (obsolete) — Splinter Review
Ok, let's not use any of the temp-filename functions. Let's simply try to use all the potential sources for temporary directory names, and give up if we cannot find anything. tempnam uses environment variable TMPDIR (which is the standard from the single unix standard), and falls back to P_tmpdir, which appears to be a cross platform standard. Let's try that. Before that, we'll try environment variables TEMP and TMP, which probably work on Windows. According to https://en.wikipedia.org/wiki/TMPDIR the variable TEMPDIR is another potential source.
Attachment #679883 - Attachment is obsolete: true
Attachment #681517 - Attachment is obsolete: true
Attachment #681518 - Attachment is obsolete: true
Attachment #687939 - Attachment is obsolete: true
Attachment #688468 - Flags: review?(rrelyea)
Target Milestone: 3.14.1 → 3.14.2
Comment on attachment 688468 [details] [diff] [review] Patch v4 >+ tempDir = getenv("TEMP"); >+ if (!tempDir) >+ tempDir = getenv("TMP"); >+ if (!tempDir) >+ tempDir = getenv("TMPDIR"); >+ if (!tempDir) >+ tempDir = getenv("TEMPDIR"); On Ubuntu Linux, none of these environment variables is set. Also, I've never seen the TEMPDIR environment variable before. Which OS uses it? >+ if (!tempDir) >+ tempDir = P_tmpdir; /* from stdio.h */ I can't find documentation that P_tmpdir is defined in <stdio.h>. So P_tmpdir may not be universally available. I think we need to take very different approaches. Some possible alternative solutions are: 1. If the NSS database directory is on a network file system, set enableCache to PR_TRUE. In this approach, we need to write code to test the file system type, which is another can of worms. I prefer this approach. 2. We can set the global variable sqlite3_temp_directory to tell sqlite which directory it should use as the temp directory: http://mxr.mozilla.org/mozilla-central/ident?i=sqlite3_temp_directory But it is risky to change a global variable because there may be multiple sqlite users in a process. 3. We can replicate how sqlite chooses the directory for its temporary files. This is implementation details of sqlite, but I think this kind of logic is unlikely to change frequently. 4. We can pay the sqlite maintainers to implement this performance test inside sqlite so that it will enable caching automatically.
(In reply to Wan-Teh Chang from comment #34) > > >+ if (!tempDir) > >+ tempDir = P_tmpdir; /* from stdio.h */ > > I can't find documentation that P_tmpdir is defined in <stdio.h>. > So P_tmpdir may not be universally available. It is documented in the manual page for tempnam. Also, I found it documented at http://msdn.microsoft.com/en-us/library/aa273387%28v=vs.60%29.aspx Given that tempnam is an early standard, I was hoping that we can expect P_tmpdir to be available everywhere where we need it, at least on all UNIX-like systems. > Also, I've never seen the TEMPDIR environment variable before. > Which OS uses it? I don't know. I took it from the link in comment 33.
(In reply to Wan-Teh Chang from comment #34) > > I think we need to take very different approaches. Some possible > alternative solutions are: > > 1. If the NSS database directory is on a network file system, > set enableCache to PR_TRUE. In this approach, we need to write > code to test the file system type, which is another can of worms. > I prefer this approach. > > 2. We can set the global variable sqlite3_temp_directory to tell > sqlite which directory it should use as the temp directory: > http://mxr.mozilla.org/mozilla-central/ident?i=sqlite3_temp_directory > But it is risky to change a global variable because there may be > multiple sqlite users in a process. > > 3. We can replicate how sqlite chooses the directory for its > temporary files. This is implementation details of sqlite, but > I think this kind of logic is unlikely to change frequently. > > 4. We can pay the sqlite maintainers to implement this performance > test inside sqlite so that it will enable caching automatically. I don't disagree, but I would hope that patch v4 can cover many more scenarios than we have covered today, and your alternative proposals seem like muchs more work, therefore I propose to get a simple patch like v4 landed first. It it turns out that we run into problems because P_tempdir isn't defined on some platforms, we can use an #ifdef to disable it on those platforms.
>1. If the NSS database directory is on a network file system, > set enableCache to PR_TRUE. As you pointed out, determining this isn't necessarily easier then determining tempdir. > 4. We can pay the sqlite maintainers to implement this performance > test inside sqlite so that it will enable caching automatically. I'm not sure that they may have all the appropriate information to do caching. We really only care about database consistency, and it's OK if some app creates a new file that we don't see for 1-3 seconds. For sqlite in general, that may not be acceptable. I believe the general attitude of the sqlite maintainers is 'shared filesystems with databases are evil, don't do it'. One other simple solution would be to use tempnam, but punt on android. bob
Attached patch Patch v5 (obsolete) — Splinter Review
(In reply to Robert Relyea from comment #37) > > One other simple solution would be to use tempnam, but punt on android. I like your proposal. Bob, what do you think about this combined strategy? It uses tempnam in most scenarios, except on Android, where we will use the directory that is used internally by tempnam. BTW, my argument for checking all potential environment variables for a temporary directory first: On systems where a network file system is being used, the administrator might have set one of the temp environment variables to a known fast filesystem.
Attachment #688468 - Attachment is obsolete: true
Attachment #688468 - Flags: review?(rrelyea)
Attachment #688981 - Flags: review?(rrelyea)
I submitted a try build for all Android platforms with 3.14.1beta1 plus patch v5. https://tbpl.mozilla.org/?tree=Try&rev=4d6b17c795b6 (we already know the tempnam approach works on the other platforms)
Comment on attachment 688981 [details] [diff] [review] Patch v5 Review of attachment 688981 [details] [diff] [review]: ----------------------------------------------------------------- ::: security/nss/lib/softoken/sdb.c @@ +1832,5 @@ > + tempDir = sdb_getTempDir(sqlDB); /* PORT_Free */ > + > + if (!tempDir) { > + mustPortFreeTempDir = PR_FALSE; /* getenv will return references */ > + tempDir = getenv("TEMP"); getenv is not thread-safe. You must use PR_GetEnv. @@ +1840,5 @@ > + tempDir = getenv("TMPDIR"); > + if (!tempDir) > + tempDir = getenv("TEMPDIR"); > + if (!tempDir) { > +#ifdef ANDROID We should not be doing *any* of this on Android because Android will never have a network filesystem, and because this test impacts the startup performance of the app. Please move the #if ANDROID test higher. @@ +1845,5 @@ > + /* Avoid tempnam on Android because of crashes in Mozilla code, > + * see bug 578561 and bug 817233. */ > + tempDir = P_tmpdir; /* from stdio.h */ > +#else > + tempDir = tempnam(NULL, NULL); /* destroy result using free */ There are several negative security considerations with tempnam, because applications that use tempnam are subject to race conditions; another application could create a file with the same name between the time tempnam returns and the time the file is created. For this reason, in particular, we should avoid using tempnam altogether on all platforms.
(In reply to Wan-Teh Chang from comment #34) > > 3. We can replicate how sqlite chooses the directory for its > temporary files. This is implementation details of sqlite, but > I think this kind of logic is unlikely to change frequently. I will implement this approach.
(In reply to Wan-Teh Chang from comment #34) > I think we need to take very different approaches. Some possible > alternative solutions are: > > 1. If the NSS database directory is on a network file system, > set enableCache to PR_TRUE. In this approach, we need to write > code to test the file system type, which is another can of worms. > I prefer this approach. > > 2. We can set the global variable sqlite3_temp_directory to tell > sqlite which directory it should use as the temp directory: > http://mxr.mozilla.org/mozilla-central/ident?i=sqlite3_temp_directory > But it is risky to change a global variable because there may be > multiple sqlite users in a process. > > 3. We can replicate how sqlite chooses the directory for its > temporary files. This is implementation details of sqlite, but > I think this kind of logic is unlikely to change frequently. > > 4. We can pay the sqlite maintainers to implement this performance > test inside sqlite so that it will enable caching automatically. We can also defer the decision to the application, and have the application set sqlite3_temp_directory. I believe that Mozilla has some support contract for sqlite and we may be able to sponsor an improvement to sqlite here. Richard, do you have any suggestions, especially related to the alternatives that Wan-Teh posted? AFAICT, we're basically trying to cache some data locally if the sqlite database is on a network filesystem. Perhaps others can explain better what we'd like.
Target Milestone: 3.14.2 → 3.14.1
I studied three versions of sqlite: the version we used before (sqlite3.c, rev. 1.4), the version we are using, and the latest release (SQLite 3.7.14.1). The way sqlite3 determines its temp directory has not changed since the version we are using. There was a small change between the version we used before and the version we are using. So this logic is very stable. This patch has the following changes. 1. Remove the old sdb_getTempDir() code because it always fails with current versions of sqlite. 2. Reimplement sdb_getTempDir() to replicate how sqlite determines its temp directory. Note that this is really the same approach as Kai's patches. The improvement is that it does an "educated guess", based on the code in three versions of sqlite, rather than using tempnam() or the environment variables TMP/TEMP/TMPDIR/TEMPDIR. The use of tempnam() was especially bad because sqlite doesn't use that function at all. Kai: I have thoroughly tested this patch on Linux and Windows, including verifying that the old sdb_getTempDir() code doesn't work any more. I am offering this patch to you. Bob: this patch can be easily improved to skip the sdb_measureAccess() code for mobile devices that are unlikely to store NSS databases in a network filesystem.
Attachment #689052 - Flags: superreview?(rrelyea)
Attachment #689052 - Flags: review?(kaie)
(In reply to Kai Engert (:kaie) from comment #32) > tempnam is the culprit. This code crashes on Android: > > char *tn = tempnam(0,0); > if (tn) free(tn); > > However, it crashes, only, when built as part of Mozilla's infrastructure to > build Firefox for Android. > > If I build locally on my system, using NDK v8, as part of a NSS command line > tool, no crash. > > (Both binaries executed on the same tablet.) Firefox uses jemalloc. (You can see jemalloc.c in the crash stack in comment 22.) Perhaps jemalloc does more error checking than the default malloc. Perhaps this crash shows jemalloc fails to replace the default malloc perfectly on Android, so the return value of tempnam() is still allocated using the default malloc.
(In reply to Brian Smith (:bsmith) from comment #42) > I believe that Mozilla has some support contract for sqlite and we may be > able to sponsor an improvement to sqlite here. > > Richard, do you have any suggestions, especially related to the alternatives > that Wan-Teh posted? AFAICT, we're basically trying to cache some data > locally if the sqlite database is on a network filesystem. Perhaps others > can explain better what we'd like. Yes - Mozilla has the highest level of support available for SQLite. All you have to do is ask, and we will do things for you. But, after reading through all the comments on this ticket, I still don't quite understand what it is you are trying to do and what problem you are trying to address nor how we SQLite developers can help you. Can somebody explain? Perhaps by private email?
(In reply to D. Richard Hipp from comment #45) > But, after reading through all the comments on this ticket, I still don't > quite understand what it is you are trying to do and what problem you are > trying to address nor how we SQLite developers can help you. Can somebody > explain? Perhaps by private email? In short, we must find out whether the directory that contains an sqlite file is on a slow network filesystem, by comparing performance of the directory with that of a temporary directory on a local disk. In order to perform the comparison, we need to find a fast temporary directory. In more detail: We need a cross platform function to find the path of a temporary directory on a local disk, and we hoped that sqlite could provide that to us, by telling us the directory where sqlite stores temporary tables. The reason is that the storage of our sqlite file might be on a network disk - but we don't know whether the file is local or on the network. Using sqlite on a network file system is very slow, because it makes frequent accesses to disk. In particular, sqlite checks very often whether or not a lock file (or a journal file) exists, and that's very slow if we are on a network file system. In order to work around this performance bottleneck, we want to test the speed of a local (temporary) directory, by repeatedly accessing non-existing filenames. If this test is faster than accessing files in the directory containing our persistent sqlite file, then we will use a performance optimization. In order for our test to succeed, and in order to make the best decision, we must reliably find a path to a temporary directory on a local disk. (Note that we had tried the strategy to ask sqlite to create a temporary database, and we tried to learn where sqlite stores it - but that strategy is unreliable, that's what the initial comment in this bug refers to. The approach to ask sqlite was to use sqlite3_exec("Pragma database_list"), but the information regarding the temp directory can be missing. Because it is sometimes (on some platforms) missing, this strategy is insufficient, because we need a strategy that works on all platforms (works at least on Linux, OSX, Windows, Android)) (FYI, not really relevant for the question, but here is a description of our optimization. Once we decide that we are dealing with an sqlite file stored on a slow network filesystem, we will create a temporary sqlite table with pragma=memory, and we will copy information from the persistent database to an sqlite table in memory ("PRAGMA temp_store=MEMORY"), and when only reading information, we will read from that temporary area. We will refresh our temporary storage after write operations.)
(In reply to Brian Smith (:bsmith) from comment #40) > > getenv is not thread-safe. You must use PR_GetEnv. I didn't know we had PR_GetEnv. Apparently we have many places in Mozilla code that uses getenv! We should have a separate discussion when PR_GetEnv must be used instead (let's do that in another bug or by email). > > + tempDir = getenv("TMPDIR"); > > + if (!tempDir) > > + tempDir = getenv("TEMPDIR"); > > + if (!tempDir) { > > +#ifdef ANDROID > > We should not be doing *any* of this on Android because Android will never > have a network filesystem, and because this test impacts the startup > performance of the app. Please move the #if ANDROID test higher. I'm not sure about "never" in the sense of "computers will never use RAM beyond 640 KB". Future mobile devices could store application data on a filesystem in the cloud. If some consumers of NSS request the ability to disable our filesystem performance test, I'd prefer a #define symbol for that, which the application's build environment can set, instead of hardcoding such special behavior to a platform in general. > There are several negative security considerations with tempnam, because > applications that use tempnam are subject to race conditions; another > application could create a file with the same name between the time tempnam > returns and the time the file is created. > For this reason, in particular, we should avoid using tempnam altogether on > all platforms. Brian, you missed that we'll never ever open files with the filename returned by tempnam. Our intention is to extract the name of the directory used by tempnam, and we will measure access speed to nonexistent filenames in that directory. The warnings about race conditions that you mentioned do not apply to our scenario.
(In reply to Kai Engert (:kaie) from comment #46) > We need a cross platform function to find the path of a temporary directory > on a local disk, Thanks for the very clear explanation, Kai! Suppose we provide you with a new sqlite3_file_control() verb that asks SQLite to create a temporary filename for you, using the same mechanism it would normally use to create a temporary filename. Like this: char *zTempName = 0; sqlite3_file_control(db, 0, SQLITE_FCNTL_GET_TEMPNAME, (void*)&zTempName); if( zTempName==0 ){ /* Malloc failure, or SQLITE_FCNTL_GET_TEMPNAME not implemented because you ** are using an older SQLite. */ }else{ /* Use zTempName to find the temporary directory */ sqlite3_free(zTempName); /* Free malloc-ed memory when done */ } Would such an interface solve your problem. If you answer quickly, we still might have time to land this in the 3.7.15 release of SQLite - but please do let us know quickly as 3.7.15 is due out in about a week!
(In reply to D. Richard Hipp from comment #48) > > Suppose we provide you with a new sqlite3_file_control() verb that asks > SQLite to create a temporary filename for you, using the same mechanism it > would normally use to create a temporary filename. Like this: Thanks a lot for your offer! We'll think about it. We'll discuss it later today during our weekly conference call.
Pro Wan-Teh's patch: - same logic as sqlite today Contra Wan-Teh's patch: - if sqlite's implementation ever changes, we might test a different directory than the one that would actually be used. Pro Richard's offering: - we'll always test using the directory that sqlite uses for temp tables, even if sqlite changes in the future. Contra Richard's offering: - requires us to upgrade sqlite - we still need a fallback strategy if the API returns "null" Pro Kai's patch: - simpler code, no mixing of sqlite API and fallback strategy Contra Kai's patch: - might test different directory then sqlite would use Let's make a decision in the NSS conf call today which approach we prefer.
Dear Richard, the answer is yes, thank you very much for offering, we'd glady appreciate the new feature you have proposed and would wait for it to become available in the next official sqlite release. If I understand correctly, based on your description, this won't be a new linker dependency - it will use an existing API with a new runtime flag - which also sounds great. We're looking forward to it! Thanks!
On the NSS call we made 2 decisions about thie TempDir issue: 1) That the sqlite team's proposal was ideal, and we should move to it. (see Kai's previous comment). 2) On Android we can just turn the caching off always, but we should do so in a way that was generic. This patch provides a generic way of turning off the perf tests to determine whether or not we should cache. To cache by default, add -DNSS_SDB_USE_CACHE=1 and to not cache by default add -DNSS_SDB_USE_CACHE=0. This latter define should be added to the android flags in coreconf. I did not add them since the android patch has not yet been checked in. NOTE: With this code, the environement variable may still override this decision. bob
(In reply to Kai Engert (:kaie) from comment #51) > Your requested changes are in the SQLite source tree here: http://www.sqlite.org/src/info/1a63b1d5fa These changes will be in the 3.7.15 release of SQLite which was originally scheduled for Wednesday, 2012-12-12, but which has been delayed by unrelated issues. You can find an example of how to use the new interface here: http://www.sqlite.org/src/artifact/f62769c989146?ln=5329-5354 There are no new C-level APIs. Instead, the change is the addition of a new integer "opcode" constant to the preexisting sqlite3_file_control() interface. The new integer constant is given a #define SQLITE_FCNTL_TEMPFILENAME with a value of 16. If you link against a version of SQLite that does not support this feature, and invoke the file-control, you should get back SQLITE_NOTFOUND. If it all works, you should get back SQLITE_OK. We have more testing to do, and documentation to write, before the official release. In the meantime, if you need a bootleg "sqlite3.c" and "sqlite3.h" file that includes this feature, for testing, please send me or any other member of the SQLite development an email and we will get you copies straight away. Also, please let us know if you encounter any problems or need tweaks to the interface.
Richard: thanks a lot for the fast response. I think your change in http://www.sqlite.org/src/info/1a63b1d5fa will meet our requirement. However, it may be more convenient for us if the new file control "opcode" returns the temp file directory instead (i.e., what unixTempFileDir() returns), for two reasons: 1. It would better match our current code, which uses a sdb_getTempDir() function to get (or guess) the sqlite temp directory name. This is just a minor problem though. 2. Our code generates 10000 file names in the sqlite temp directory and tests their existence. If the file control "opcode" returns a temp file name, it causes the temp file name to be malloc'ed and free'ed 10000 times during our test. Returning the temp file directory name would give us more flexibility on how the memory for the temp file names is allocated. We can certainly infer the temp directory name from a temp file name returned by the file control opcode. So this isn't a big problem for us. I just wanted you to consider whether returning a temp file name or the temp file directory is more useful to sqlite users in general. Thanks.
(In reply to Kai Engert (:kaie) from comment #50) We still need to fall back on our own sdb_getTempDir() function when running with a system sqlite library that doesn't support the new SQLITE_FCNTL_TEMPFILENAME opcode. I think my patch is a better candidate as the fallback because it replicates the logic used in current versions of sqlite.
(In reply to Wan-Teh Chang from comment #54) > 2. Our code generates 10000 file names in the sqlite temp directory > and tests their existence. If the file control "opcode" returns a > temp file name, it causes the temp file name to be malloc'ed and free'ed > 10000 times during our test. Returning the temp file directory name > would give us more flexibility on how the memory for the temp file names > is allocated. > So, you want to change SQLITE_FCNTL_TEMPFILENAME into SQLITE_FCNTL_TEMPDIR and you want it return a static string that does not need to be passed into sqlite3_free().... That can be easily done on unix. But it won't work on Windows. On windows we use a call to GetTempPathW() to have the operating system write the temp file folder name into a preallocated buffer. And we would need to malloc() for space for that buffer.
Wan-Teh, I don't understand why we need to ask for a change. I agree, getting just the directory name would be a closer match to our need, but a file name is good enough, I think? If it's a file, we can easily use the known platform directory separator to find its last occurence, and change the contents of the following byte to zero. I also don't understand the issue about allocation. If we need 10000 names, we can copy what we got from sqlite, and manipulate our own buffer 10000 times. I'm very pleased that Richard helped us that quickly. I think receiving a dynamic buffer is safest, it avoids race conditions about static buffers, and it avoids the need to communicate buffer sizes.
Richard: I'm requesting a file control opcode that returns the temp directory name rather than a temp file name. The return value can be a dynamically allocated string that must be freed with sqlite3_free(). I did not request the return value to be a static string. BUT, I am not sure if this alternative opcode is more useful to SQLite users in general. So you need to decide which opcode to provide. If you provide the SQLITE_FCNTL_TEMPFILENAME opcode, we will very likely chop off the last component of its return value to get the temp directory name. NOTE: an opcode for the temp directory name would be similar to the existing PRAGMA temp_store_directory, with a subtle difference. (The temp_store_directory pragma deals with sqlite3_temp_directory, whereas the actual temp directory name may be something else if sqlite3_temp_directory is NULL.) It's important to "harmonize" the opcode and the pragma, or document their difference clearly.
Brian (bsmith): I checked out Android's "Bionic" libc source tree from its git repository. I can confirm Kai's findings that its tempnam() function returns a string allocated with malloc(). Caveat: I only looked at the tip of the Bionic source tree. So, the crash reported in this bug is strong evidence that Firefox's jemalloc fails to replace the default malloc() perfectly on Android. This is worth looking into.
Richard: As wtc says, we just need the tempfile directory, but we can get it from the tempfilename, so I believe your proposal is fine. Having it return the filename is probably best since there may be a need in the future to get the full filename. Also, as wtc points out, allocating the string to be freed with sqlite3_free() is fine. WTC and Kai: We do need a fallback for the case where the function returns NULL. At this point I would lean toward's wtc's proposal of just replicating the sqlite internal function. My main objection to doing so before was future proof (what if sqlite decided to change it). Since we couldn't guarrentee 100% compatibility, then getting 80% with tempnam seemed sufficient. With the new sqlite function, we know we can always get the right directory because either 1) sqlite told us, or 2) sqlite didn't tell us, so we know it was an older version of sqlite and we know exactly how it calculated the name. My only worry with wtc's method is to make sure we get the platform dependencies right. In the future we will presumably start depending on some future sqlite feature that doesn't exist now. At that point we can remove the fallback code altogether. bob
Comment on attachment 689052 [details] [diff] [review] Replicate how sqlite determines its temp directory (checked in) r+ but only as a fallback to the sqlite function (as we've discussed in the bug).
Attachment #689052 - Flags: superreview?(rrelyea) → superreview+
(In reply to Robert Relyea from comment #60) > My only worry with wtc's method is to make sure we get the platform > dependencies right. I neglected to explain this. SQLite proper has only supported three platforms: Unix, Windows, and OS/2. In the latest SQLite 3.7.14.1 release, OS/2 code is gone. This is why my patch only has Windows and Unix implementations. (It seems that open source projects have collectively started to stop taking OS/2 patches.)
Comment on attachment 689052 [details] [diff] [review] Replicate how sqlite determines its temp directory (checked in) Patch checked in on the NSS trunk (NSS 3.14.1). Checking in sdb.c; /cvsroot/mozilla/security/nss/lib/softoken/sdb.c,v <-- sdb.c new revision: 1.28; previous revision: 1.27 done
Attachment #689052 - Attachment description: Replicate how sqlite determines its temp directory → Replicate how sqlite determines its temp directory (checked in)
I forgot to include this sqlite.def change in my patch (attachment 689052 [details] [diff] [review]). Checked in on the NSS trunk (NSS 3.14.1). Checking in sqlite.def; /cvsroot/mozilla/security/nss/lib/sqlite/sqlite.def,v <-- sqlite.def new revision: 1.6; previous revision: 1.5 done
Attachment #689559 - Flags: review?(rrelyea)
Moved the target milestone to NSS 3.14.2. I excluded my checkins in comment 63 and comment 64 from NSS 3.14.1.
Status: REOPENED → ASSIGNED
Target Milestone: 3.14.1 → 3.14.2
Depends on: 818771
Comment on attachment 689559 [details] [diff] [review] Export the sqlite3_temp_directory data symbol (checked in) I want to clarify for Bob what this does, Wan-Teh please correct me if I'm wrong. sqlite3_temp_directory is an OPTIONAL directory that an application might set, as a preferred place for temporary tables. I understand that by default, that variable is EMPTY. Wan-Teh used code that reads this variable - in order to fully simulate sqlite today's behaviour (which we will use as the fallback behaviour, if the new API is unavailable).
Depends on: 821895
No longer depends on: 818771
Comment on attachment 689559 [details] [diff] [review] Export the sqlite3_temp_directory data symbol (checked in) Kai, your explanation of this patch is correct. In addition, sqlite3_temp_directory suffers from the same problem as the ASN.1 templates in NSS. It is a data symbol, so it cannot be safely exported from a DLL on Windows. So we don't export sqlite3_temp_directory from our sqlite3.dll, and my function doesn't use (because we can't use) sqlite3_temp_directory on Windows. Note: sqlite provides a sqlite3_win32_set_directory function for setting sqlite3_temp_directory on Windows, but doesn't provide a way to get the value of sqlite3_temp_directory. Kai: note that the temp directory returned by sqlite on Windows is in UTF-8. You need to convert it to the current code page before passing it to PR_Access. You can use the sqlite3_win32_utf8_to_mbcs code to do that.
Attached patch Patch v6 (obsolete) — Splinter Review
Patch v6, makes use of the new API from sqlite 3.7.15, and calls the code that Wan-Teh had added as a fallback action. This builds, but not yet tested. If you're interested to review, please go ahead - if not, feel free to wait until I have tested it.
Attachment #688981 - Attachment is obsolete: true
Attachment #688981 - Flags: review?(rrelyea)
Comment on attachment 692479 [details] [diff] [review] Patch v6 Review of attachment 692479 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozilla/security/nss/lib/softoken/sdb.c @@ +267,5 @@ > + if (sqlrv != SQLITE_OK || !tempName) { > + /* Malloc failure, or SQLITE_FCNTL_TEMPFILENAME not implemented > + * because we are using an older SQLite. */ > + return sdb_getFallbackTempDir(); > + } This should be done as follows: if (sqlrv == SQLITE_NOTFOUND) { /* SQLITE_FCNTL_TEMPFILENAME not implemented because we are using * an older SQLite. */ return sdb_getFallbackTempDir(); } if (sqlrv != SQLITE_OK) { return NULL; } (If malloc failed, it's not worth trying our sdb_getFallbackTempDir().) Then, at this point we should trust sqlite and assume tempName is not NULL. In addition, on Windows we need to convert the UTF-8 pathname returned by sqlite to the current code page, as I said above.
(In reply to Wan-Teh Chang from comment #67) > ... my function doesn't use (because we can't use) sqlite3_temp_directory > on Windows. > > ... note that the temp directory returned by sqlite on > Windows is in UTF-8. You need to convert it to the current > code page before passing it to PR_Access. You can use > the sqlite3_win32_utf8_to_mbcs code to do that. sqlite3_win32_utf8_to_mbcs isn't exported by sqlite, not declared in the header! We cannot read variable sqlite3_temp_directory on Windows, that means we don't use it on Windows, and that means, we don't need to worry about its encoding. Regarding the API to get the temp filename, which we use to extract the directory: The windows implementation DOES already call sqlite3_win32_utf8_to_mbcs. Only the part that uses the POTENTIALLY set variable sqlite3_temp_directory DOESN'T call sqlite3_win32_utf8_to_mbcs. There are two ways to interprete that: - either sqlite is inconsistent - or sqlite3_temp_directory isn't allowed to contain utf8 I'd conclude, let's not worry about sqlite3_temp_directory and sqlite3_win32_utf8_to_mbcs on Windows. If this ever turns out to be a problem, for Windows applications, then the sqlite internals should be fixed to be consistent (either by explicitly promising that sqlite3_temp_directory will never be UTF8, or by exporting sqlite3_win32_utf8_to_mbcs for consumption).
Depends on: 818275
Attached patch Patch v7Splinter Review
(In reply to Wan-Teh Chang from comment #69) > > + if (sqlrv != SQLITE_OK || !tempName) { > > + /* Malloc failure, or SQLITE_FCNTL_TEMPFILENAME not implemented > > + * because we are using an older SQLite. */ > > + return sdb_getFallbackTempDir(); > > + } > > This should be done as follows: Changed as requested.
Attachment #692479 - Attachment is obsolete: true
Attachment #692586 - Flags: review?(wtc)
Attachment #689052 - Flags: review?(kaie)
Comment on attachment 689559 [details] [diff] [review] Export the sqlite3_temp_directory data symbol (checked in) Wan-Teh, thanks for confirming. Given this is non-Windows only, and matches the style of exporting DATA in other NSS *.def files: r=kaie
Attachment #689559 - Flags: review+
Comment on attachment 689559 [details] [diff] [review] Export the sqlite3_temp_directory data symbol (checked in) kai's review is sufficient.
Attachment #689559 - Flags: review?(rrelyea)
Comment on attachment 692586 [details] [diff] [review] Patch v7 I believe I have sufficiently addressed all change requests, and explained why my changes are sufficient. I don't understand why I didn't get a final review from Wan-Teh. In order to push things forward, I'm requesting review from Bob instead.
Attachment #692586 - Flags: review?(wtc) → review?(rrelyea)
This patch together with the patch from bug 818275 has passed the NSS test suite on all platforms in a "try build".
Comment on attachment 692586 [details] [diff] [review] Patch v7 r+ rrelyea
Attachment #692586 - Flags: review?(rrelyea) → review+
Thanks again for your reviews and for the sqlite enhancements to get this bug fixed. Checking in sdb.c; /cvsroot/mozilla/security/nss/lib/softoken/sdb.c,v <-- sdb.c new revision: 1.30; previous revision: 1.29 done Marking fixed.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: