Closed Bug 368152 Opened 14 years ago Closed 14 years ago

spidermonkey regexp "debugging" mode

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
minor

Tracking

()

RESOLVED FIXED

People

(Reporter: crowderbt, Assigned: crowderbt)

Details

Attachments

(3 files, 4 obsolete files)

Attached patch js_DisassembleRegex and friends (obsolete) — Splinter Review
dis() in the JS shell should be able to handle regular expressions.  This is a first stab at a patch that displays at list of op-codes.  It is probably at least partly wrong.
Attachment #252727 - Flags: review?(mrbkap)
Comment on attachment 252727 [details] [diff] [review]
js_DisassembleRegex and friends

>? jsreops.tbl

This is missing from the diff.

>Index: js.c
>+        if (JSVAL_IS_REGEXP(cx, argv[i])) {
>+            JSRegExp *re = JS_GetPrivate(cx, JSVAL_TO_OBJECT(argv[i]));
>+            js_DisassembleRegex(cx, re);

Probably worth checking for impossible errors from js_DisassembleRegex.

>+            return JS_TRUE;

continue?

>Index: jsregexp.c
>+#ifdef DEBUG
>+const char *reop_names[] = {
>+#define REOP_DEF(opcode, name) name,
>+#include "jsreops.tbl"
>+#undef REOP_DEF
>+    0

Nit: How about NULL, since these are pointers.

>+    do {
>+        op = (REOp)*pc++;
>+        const char *opname = reop_names[op];

This won't compile as C.

>+        printf("%06d:  %s", pc - re->program, opname);
>+        switch (op) {
>+        case REOP_ALTPREREQ:

Nit: In SpiderMonkey, switches use two half-step indentations:
switch (op) {
  case REOP_ALTPREREQ:
    pc += .....;
    break;
  case ...:

>+#endif

Comment this as being for the #ifdef DEBUG?
Attachment #252727 - Flags: review?(mrbkap) → review-
Attached patch v2 (obsolete) — Splinter Review
I added output for FLAT1 and FLAT, too, and made js_DisassembleRegex() take a FILE* parameter ala js_Disassemble()
Attachment #252727 - Attachment is obsolete: true
Attachment #252799 - Flags: review?(mrbkap)
Comment on attachment 252799 [details] [diff] [review]
v2

Actually, this is a silly idea.  A better idea is to show the path of the regex executing (including stack-depth), using the same ops-table change I've made here.  I will repurpose the bug shortly and add a REGEX_DEBUG preprocessor guard, or something similar, to wrap it all.
Attachment #252799 - Attachment is obsolete: true
Attachment #252799 - Flags: review?(mrbkap)
Attached patch bugzilla as backup (obsolete) — Splinter Review
Ignore the man behind the curtain, this is just me not wanting to lose this work while I hack on other things.
Attached patch better rev (obsolete) — Splinter Review
This is a poor mimmic of perl's "use re 'debug'" concept.  It's already proven useful for me, though.  It makes one actual logic-change, which is in PushBackTrackState, setting result->parenIndex whenever a backtrack state is pushed, not ONLY in the case of the parenCount being non-0.  This should be a relatively low-cost change for the sake of debugging clarity.  The addition of the  PREPARE_REPEAT macro just helps to clean-up some code for future debugging additions.
Attachment #255265 - Attachment is obsolete: true
Attachment #256502 - Flags: review?(mrbkap)
Summary: Disassemble for JS regular expressions in js shell → spidermonkey JavaScript "debugging" mode
Comment on attachment 256502 [details] [diff] [review]
better rev

>Index: jsregexp.c
>+#ifdef GNUC
>+int re_debug(const char *fmt, ...) __attribute__ ((format(printf, 1, 2)));
>+#endif

This is currently unused because GCC defines __GNUC__. If the condition was fixed, you'd get multiple-definition errors anyway.

>+    printf(" \"");
>+    while (*chrs && i++ < length) {
>+        printf("%c", (char)*chrs++);
>+    }

Can you use printf(" \"%*s\"", length, chrs) here?

>+#ifdef REGEXP_DEBUG
>+    const char *opname = reop_names[op];
>+    char stackpad[40];
>+    int i, j;
>+    for (i = 0, j = 0; i < gData->stateStackTop; i++) {
>+        stackpad[j++] = ' ';
>+        stackpad[j++] = ' ';
>+    }
>+    stackpad[j] = '\0';
>+    re_debug("\n%06d: %s%s", pc - gData->regexp->program,
>+            stackpad, opname);

Nit: the 2nd line of the re_debug call is underindented by a space.

>+#endif

It might be worth splitting the above code (which is repeated below) into a new function. Also, are we guaranteed that gData->stateStackTop is less than 20? Or is there some constant we could use for the stackpad or something?

>+#define PREPARE_REPEAT()                                                      \
>+    curState->index = x->cp - gData->cpbegin;                                 \
>+    curState->continue_op = REOP_MINIMALREPEAT;                               \
>+    curState->continue_pc = pc;                                               \
>+    pc += ARG_LEN;                                                            \
>+    for (k = curState->parenSoFar; k < parenSoFar; k++)                       \
>+        x->parens[k].index = -1;                                              \
>+    PUSH_STATE_STACK(gData);                                                  \
>+    op = (REOp) *pc++;                                                        \
>+    JS_ASSERT(op < REOP_LIMIT);
>+    

Wrap the body of the macro in JS_BEGIN_MACRO/JS_END_MACRO to avoid odd expansion and empty statements?

Also, uber-nit: the line after the assertion has some trailing whitespace ;-).
printf("%c", ...) should not be used -- use putchar(...) instead if you must. But mrbkap has the right idea (he's missing a dot, though):

printf("%.*s", len, buf) will print len chars from buf.

BTW, chrs seems cybercruddy to me -- chars is more readable and consistent with other jschar *chars variables in SpiderMonkey.

/be
Attached patch v3Splinter Review
The indentation handling is dramatically better in this version, couldn't use the printf formatting suggestions for the char-drawing, but could use it for indentation.  Used putchar() instead of printf("%c") though (printf("%.*s") won't work for jschar* strings).
Attachment #256502 - Attachment is obsolete: true
Attachment #256502 - Flags: review?(mrbkap)
Attachment #257041 - Flags: review?(mrbkap)
Comment on attachment 257041 [details] [diff] [review]
v3

Cool.
Attachment #257041 - Flags: review?(mrbkap) → review+
Summary: spidermonkey JavaScript "debugging" mode → spidermonkey regexp "debugging" mode
jsregexp.c: 3.137
jsreops.tbl: 3.1
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Flags: in-testsuite-
Thanks to Brendan for pointing out the compiler warnings here.  Also noticed a bug with building with OPT and REGEXP_DEBUG, so that's fixed here, too.
Attachment #259842 - Flags: review?(mrbkap)
Attachment #259842 - Flags: review?(mrbkap) → review+
With the latest commit GCC started to produce:

/home/igor/m/trunk/mozilla/js/src/jsregexp.c: In function ‘PushBackTrackState’:
/home/igor/m/trunk/mozilla/js/src/jsregexp.c:2078: warning: format ‘%ld’ expects type ‘long int’, but argument 2 has type ‘unsigned int’
/home/igor/m/trunk/mozilla/js/src/jsregexp.c:2078: warning: format ‘%ld’ expects type ‘long int’, but argument 3 has type ‘size_t’
/home/igor/m/trunk/mozilla/js/src/jsregexp.c:2078: warning: format ‘%ld’ expects type ‘long int’, but argument 4 has type ‘size_t’
/home/igor/m/trunk/mozilla/js/src/jsregexp.c: In function ‘ExecuteREBytecode’:
/home/igor/m/trunk/mozilla/js/src/jsregexp.c:2860: warning: format ‘%ld’ expects type ‘long int’, but argument 2 has type ‘size_t’
/home/igor/m/trunk/mozilla/js/src/jsregexp.c:3225: warning: format ‘%ld’ expects type ‘long int’, but argument 2 has type ‘unsigned int’
/home/igor/m/trunk/mozilla/js/src/jsregexp.c:3225: warning: format ‘%ld’ expects type ‘long int’, but argument 3 has type ‘size_t’
/home/igor/m/trunk/mozilla/js/src/jsregexp.c:3225: warning: format ‘%ld’ expects type ‘long int’, but argument 4 has type ‘size_t’

The reason for that is that one must cast to (unsigned long) the argument. This is not only nuisance since C allows (and Windows 64 uses it unless I am wrong) sizeof(size_t) > sizeof(unsigned long). 
Attached patch with castsSplinter Review
Another rev on this.  I'm actually ripping out a slightly misleading chunk of this because it's misleading and the cast looks ugly.  Igor, let me know if this is better, please?
Attachment #259920 - Flags: review?(igor)
Attachment #259920 - Flags: review?(igor) → review+
jsregexp.c: 3.139
You need to log in before you can comment on or make changes to this bug.