Author Topic: recursive descent parser ?  (Read 9163 times)

SteveA

  • Guest
recursive descent parser ?
« on: March 01, 2012, 05:23:33 PM »
Hey guys,
Over at BP.org, Herman has been talking about and working on a math parser and that peaked my interest, because I've written a number of them myself.
Generally, I code them in C or Asm, but, I have done so using Bxbasic as well.
I dug out one of the latest versions I had written in Bxb and was testing it.
I have long known that there is a bug in it, but, it's a Bxb bug rather than the parser itself.
I have pretty much figured out where the bug originates from, but, haven't decided how to correct it.
Anyway, the bug only appears under a specific set of circumstances.

So, I wanted to try the parser code on another interpreter, just to see what the reaction would be.
I decided to try Yabasic first and I made the necessary modifications for Yabasic.
After making the hundred or so annoying tweaks, I fired it up.
The result is that the code terminates abnormally under Yabasic.

The parser is a LALR-RD parser, (Look Ahead, Left to Right, Recursive Descent).

Now, this is code that works just fine under Bxbasic and produces correct results.
I have included both versions of the basic script along with the executables.
If some one can explain it, please do.

Steve

SteveA

  • Guest
Re: recursive descent parser ?
« Reply #1 on: March 02, 2012, 05:38:25 PM »
Okay, it took a bit of detective work, but, I think I figured it out.
Attached is a working version of a Recursive Descent Parser for Yabasic.

Code: [Select]
' rdparser.yab 
' Copyright:(c) sarbayo, 2012

' special credits to: Jack Crenshaw's:
'   "How to Build a Compiiler"

' This is an: LALR-RD Parser (Look-Ahead, Left-to-Right, Recursive-Descent)
' Based on: Rdparser.c, in Bxbasic

 
' --------------------- SYSTEM VARIABLES ---------------------
    DIM pstring$(1)
    tholder$ = ""
    xstring$ = ""
    epos = 1
    xlen = 0
    dreturn = 0
    ireturn = 0

' ----- begin program -------------
'    CLEAR SCREEN

    main()
    Print
   
label TheEnd
'    Pause 10
    END
' ===========================================================

SUB main()
    Local value, ii, instring$

    value = 0
    ii = 0
    instring$ = ""

    Print
    Print "LALR Parser"
    Print
    Print "Enter Expression: Example: 1+2*3*4*10/5"
    Print "                  ";
    INPUT instring$

    IF (instring$ = "") THEN
        GOTO DoneMain
    ENDIF

    xlen = LEN(instring$)
    ii = xlen + 1
    REDIM pstring$(ii)

    FOR ii = 1 TO xlen
        pstring$(ii) = MID$(instring$, ii, 1)
    NEXT ii
    pstring$(ii) = CHR$(0)

    Expression()
    value = dreturn
    Print "          ", "result = ", value

label DoneMain
'
END SUB
' --------- end main ---------


SUB Expression()
    Local ch$, pii, isaddop, ich, eValue

    pii = epos
    ch$ = pstring$(pii)
    IsAddop(ch$)
    isaddop = ireturn

    IF (isaddop = 1) THEN
        eValue = 0
    ELSE
        Term()
        eValue = dreturn
        pii = epos
        ch$ = pstring$(pii)
    ENDIF

    IsAddop(ch$)
    isaddop = ireturn

    WHILE (isaddop = 1)
        ich = ASC(ch$)
        SWITCH ich
            CASE 43
                Match()
                Term()
                eValue = eValue + dreturn
                BREAK
            CASE 45
                Match()
                Term()
                eValue = eValue - dreturn
                BREAK
        END SWITCH

        pii = epos
        ch$ = pstring$(pii)
        IsAddop(ch$)
        isaddop = ireturn
    WEND

    dreturn = eValue
END SUB
' ---------- end Expression ----------


SUB Term()
    Local ch$, pii, tValue, ismultop, ich

    Factor()
    tValue = dreturn
    pii = epos
    ch$ = pstring$(pii)

    IsMultop(ch$)
    ismultop = ireturn
   
    WHILE (ismultop = 1)
        ich = ASC(ch$)

        SWITCH ich
            CASE 42
                Match()
                Factor()
                tValue = tValue * dreturn
                BREAK
            CASE 47
                Match()
                Factor()
                tValue = tValue / dreturn
                BREAK
            CASE 94
                Match()
                Factor()
                tValue = tValue ^ dreturn
                BREAK
            CASE 37
                Match()
                Factor()
                ireturn = INT(dreturn)
                tValue = INT(tValue)
                tValue = mod(tValue, ireturn)
                BREAK
        END SWITCH

        pii = epos
        ch$ = pstring$(pii)
        IsMultop(ch$)
        ismultop = ireturn
    WEND

    dreturn = tValue
END SUB
' ---------- end Term ----------


SUB Factor()
    Local ch$, pii, isalpha, fvalue

    pii = epos
    ch$ = pstring$(pii)

    IF (ch$ = "(") THEN
         Match()
         Expression()
        fvalue = dreturn
        Match()
    ELSE
        GetNum()
        fvalue = dreturn
    ENDIF

    dreturn = fvalue
END SUB
' ---------- end Factor ----------



' ====================== rdp utility routines ======================

SUB Match()
    Local pii

    pii = epos

    IF (pii < xlen) THEN
         GetChar()
         SkipWhite()
    ENDIF

END SUB
'---------- end Match ----------


SUB GetNum()
    Local ch$, pii, isdigit, abcode, ln, gvalue

    abcode = 12
    ln = 1
    pii = epos
    ch$ = pstring$(pii)

    IF (pii <= xlen) THEN
        IsDigit(ch$)
        isdigit = ireturn

        IF (isdigit = 0) THEN
            tholder$ =  "Numeric Value"
            abort(abcode, ln)
        ENDIF

        asc2dbl()
        gvalue = dreturn
        pii = epos
        ch$ = pstring$(pii)

        IsDigit(ch$)
        isdigit = ireturn

        IF (isdigit = 1) THEN
            While (isdigit = 1)
                pii = pii + 1
                ch$ = pstring$(pii)
                IsDigit(ch$)
                isdigit = ireturn
            WEND
            epos = pii
        ENDIF
        SkipWhite()
    ENDIF

    dreturn = gvalue
END SUB
' ---------- end GetNum ----------


SUB asc2dbl()
    Local ch$, cvalue$, pii, isdigit, fvalue

    cvalue$ = ""
    pii = epos
    ch$ = pstring$(pii)
    isdigit = 1

    While (isdigit = 1)
        cvalue$ = cvalue$ + ch$
        pii = pii + 1
        ch$ = pstring$(pii)
        IsDigit(ch$)
        isdigit = ireturn
    WEND

    fvalue = VAL(cvalue$)
    epos = pii

    dreturn = fvalue
END SUB
' ------- end asc2dbl ---------


SUB GetChar()
    epos = epos + 1
END SUB
' ---------- end GetChar ----------


SUB IsAddop(ch$)
    Local rval

    rval = 0

    IF (ch$ = "+" OR ch$ = "-") THEN
        rval = 1
    ENDIF

    ireturn = rval
END SUB
' ---------- end IsAddop ----------


SUB IsMultop(ch$)
    Local rval

    rval = 0

    IF (ch$ = "*" OR ch$ = "^" OR ch$ = "%" OR ch$ = "/") THEN
        rval = 1
    ENDIF

    ireturn = rval
END SUB
' ---------- end IsMultop ----------


SUB IsDigit(ch$)
    Local test, ival

    test = 0
    ival = 0

    ival = ASC(ch$)

    IF (ival >= 48 AND ival <= 57 OR ch$ = ".") THEN
        test = 1
    ENDIF

    ireturn = test
END SUB
' ---------- end IsDigit ----------


SUB IsAlpha(ch$)
    Local test, ival

    test = 0
    ival = 0

    ival = ASC(ch$)

    IF (ival >= 65 AND ival <= 90 OR ival >= 97 AND ival <= 122) THEN
        test = 1
    ENDIF

    ireturn = test
END SUB
' ----------end IsAlpha -----------


SUB IsWhite(ch$)
    Local test, tab$

    test = 0

    tab$ = CHR$(9)

    IF (ch$ = " " OR ch$ = tab$) THEN
        test = -1
    ENDIF

    ireturn = test
END SUB
' ---------- end IsWhite ----------


SUB SkipWhite()
    Local ch$, pii, iswhite

    ch$ = ""
    pii = 0
    iswhite = 0

    pii = epos
    ch$ = pstring$(pii)

    IsWhite(ch$)
    iswhite = ireturn

    WHILE (iswhite <> 0)
        GetChar()
        pii = epos
        ch$ = pstring$(pii)
         IsWhite(ch$)
        iswhite = ireturn
    WEND

END SUB
' ---------- end SkipWhite ----------


SUB abort(code, linendx)
    Local instring$, ii

    instring$ = ""
    ii = 0

    FOR ii = 1 TO xlen
        instring$ = instring$ + pstring$(ii)
    NEXT ii

    BEEP

    Switch code
        Case 12
            Print
            Print "Expected ", tholder$, " ",
            Print ": in line: ", linendx
            Print instring$
            Print "code(", code, ")"
            BREAK
        DEFAULT
            Print "Program aborted, undefined error."
            BREAK
    END SWITCH

    END
END SUB
' ----------end abort -----------


To try it, simply enter: yabasic rdparser.yab
you will be prompted to enter a numeric expression.
This is a stripped down version, no algebraic functions are available in this version.

Example: 1+2*(3*4)*(10/5)


Offline John

  • Forum Support / SB Dev
  • Posts: 3598
    • ScriptBasic Open Source Project
Re: recursive descent parser ?
« Reply #2 on: March 02, 2012, 10:12:00 PM »
Steve,

Here is a ScriptBasic version of your Yabasic parser.

John

Code: [Select]
' rdparser.yab  
' Copyright:(c) sarbayo, 2012

' special credits to: Jack Crenshaw's:
'   "How to Build a Compiiler"

' This is an: LALR-RD Parser (Look-Ahead, Left-to-Right, Recursive-Descent)
' Based on: Rdparser.c, in Bxbasic

  
' --------------------- SYSTEM VARIABLES ---------------------
'   DIM pstring$[1]
    tholder$ = ""
    xstring$ = ""
    epos = 1
    xlen = 0
    dreturn = 0
    ireturn = 0

' ----- begin program -------------
'    CLEAR SCREEN

    CALL main()
    Printnl
    
TheEnd:
'    Pause 10
    END
' ===========================================================

SUB main
    Local value, ii, instring$

    value = 0
    ii = 0
    instring$ = ""

    Printnl
    Print "LALR Parser\n"
    Printnl
    Print "Enter Expression: Example: 1+2*3*4*10/5\n"
    Print "                  "
    LINE INPUT instring$

    IF (instring$ = "") THEN
        GOTO DoneMain
    ENDIF

    xlen = LEN(instring$)
    ii = xlen + 1
'   REDIM pstring$[ii]
    UNDEF pstring$

    FOR ii = 1 TO xlen
        pstring$[ii] = MID$(instring$, ii, 1)
    NEXT ii
    pstring$[ii] = CHR$(0)

    CALL Expression()
    value = dreturn
    Print "          ", "result = ", FORMAT("%g",value),"\n"

DoneMain:
'
END SUB
' --------- end main ---------


SUB Expression
    Local ch$, pii, isaddop, ich, eValue

    pii = epos
    ch$ = pstring$[pii]
    CALL IsAddop(ch$)
    isaddop = ireturn

    IF (isaddop = 1) THEN
        eValue = 0
    ELSE
        CALL Term()
        eValue = dreturn
        pii = epos
        ch$ = pstring$[pii]
    ENDIF

    CALL IsAddop(ch$)
    isaddop = ireturn

    WHILE (isaddop = 1)
        ich = ASC(ch$)
        IF ich = 43 THEN
            CALL Match()
            CALL Term()
            eValue = eValue + dreturn
        ELSEIF ich = 45 THEN
            CALL Match()
            CALL Term()
            eValue = eValue - dreturn
        ENDIF

        pii = epos
        ch$ = pstring$[pii]
        CALL IsAddop(ch$)
        isaddop = ireturn
    WEND

    dreturn = eValue
END SUB
' ---------- end Expression ----------


SUB Term
    Local ch$, pii, tValue, ismultop, ich

    CALL Factor()
    tValue = dreturn
    pii = epos
    ch$ = pstring$[pii]

    CALL IsMultop(ch$)
    ismultop = ireturn
    
    WHILE (ismultop = 1)
        ich = ASC(ch$)

        IF ich = 42 THEN
            CALL Match()
            CALL Factor()
            tValue = tValue * dreturn
        ELSEIF ich = 47 THEN
            CALL Match()
            CALL Factor()
            tValue = tValue / dreturn
        ELSEIF ich = 94 THEN
            CALL Match()
            CALL Factor()
            tValue = tValue ^ dreturn
        ELSEIF ich = 37 THEN
            CALL Match()
            CALL Factor()
            ireturn = INT(dreturn)
            tValue = INT(tValue)
            tValue = tValue % ireturn
        ENDIF

        pii = epos
        ch$ = pstring$[pii]
        CALL IsMultop(ch$)
        ismultop = ireturn
    WEND

    dreturn = tValue
END SUB
' ---------- end Term ----------


SUB Factor
    Local ch$, pii, isalpha, fvalue

    pii = epos
    ch$ = pstring$[pii]

    IF (ch$ = "(") THEN
         CALL Match()
         CALL Expression()
         fvalue = dreturn
         CALL Match()
    ELSE
        CALL GetNum()
        fvalue = dreturn
    ENDIF

    dreturn = fvalue
END SUB
' ---------- end Factor ----------



' ====================== rdp utility routines ======================

SUB Match
    Local pii

    pii = epos

    IF (pii < xlen) THEN
         CALL GetChar()
         CALL SkipWhite()
    ENDIF

END SUB
'---------- end Match ----------


SUB GetNum
    Local ch$, pii, isdigit, abcode, ln, gvalue

    abcode = 12
    ln = 1
    pii = epos
    ch$ = pstring$[pii]

    IF (pii <= xlen) THEN
        CALL IsDigit(ch$)
        isdigit = ireturn

        IF (isdigit = 0) THEN
            tholder$ =  "Numeric Value"
            CALL abort(abcode, ln)
        ENDIF

        CALL asc2dbl()
        gvalue = dreturn
        pii = epos
        ch$ = pstring$[pii]

        CALL IsDigit(ch$)
        isdigit = ireturn

        IF (isdigit = 1) THEN
            While (isdigit = 1)
                pii = pii + 1
                ch$ = pstring$[pii]
                CALL IsDigit(ch$)
                isdigit = ireturn
            WEND
            epos = pii
        ENDIF
        CALL SkipWhite()
    ENDIF

    dreturn = gvalue
END SUB
' ---------- end GetNum ----------


SUB asc2dbl
    Local ch$, cvalue$, pii, isdigit, fvalue

    cvalue$ = ""
    pii = epos
    ch$ = pstring$[pii]
    isdigit = 1

    While (isdigit = 1)
        cvalue$ = cvalue$ + ch$
        pii = pii + 1
        ch$ = pstring$[pii]
        CALL IsDigit(ch$)
        isdigit = ireturn
    WEND

    fvalue = VAL(cvalue$)
    epos = pii

    dreturn = fvalue
END SUB
' ------- end asc2dbl ---------


SUB GetChar
    epos = epos + 1
END SUB
' ---------- end GetChar ----------


SUB IsAddop(ch$)
    Local rval

    rval = 0

    IF (ch$ = "+" OR ch$ = "-") THEN
        rval = 1
    ENDIF

    ireturn = rval
END SUB
' ---------- end IsAddop ----------


SUB IsMultop(ch$)
    Local rval

    rval = 0

    IF (ch$ = "*" OR ch$ = "^" OR ch$ = "%" OR ch$ = "/") THEN
        rval = 1
    ENDIF

    ireturn = rval
END SUB
' ---------- end IsMultop ----------


SUB IsDigit(ch$)
    Local test, ival

    test = 0
    ival = 0

    ival = ASC(ch$)

    IF (ival >= 48 AND ival <= 57 OR ch$ = ".") THEN
        test = 1
    ENDIF

    ireturn = test
END SUB
' ---------- end IsDigit ----------


SUB IsAlpha(ch$)
    Local test, ival

    test = 0
    ival = 0

    ival = ASC(ch$)

    IF (ival >= 65 AND ival <= 90 OR ival >= 97 AND ival <= 122) THEN
        test = 1
    ENDIF

    ireturn = test
END SUB
' ----------end IsAlpha -----------


SUB IsWhite(ch$)
    Local test, tab$

    test = 0

    tab$ = CHR$(9)

    IF (ch$ = " " OR ch$ = tab$) THEN
        test = -1
    ENDIF

    ireturn = test
END SUB
' ---------- end IsWhite ----------


SUB SkipWhite
    Local ch$, pii, iswhite

    ch$ = ""
    pii = 0
    iswhite = 0

    pii = epos
    ch$ = pstring$[pii]

    CALL IsWhite(ch$)
    iswhite = ireturn

    WHILE (iswhite <> 0)
        CALL GetChar()
        pii = epos
        ch$ = pstring$[pii]
        CALL IsWhite(ch$)
        iswhite = ireturn
    WEND

END SUB
' ---------- end SkipWhite ----------


SUB abort(code, linendx)
    Local instring$, ii

    instring$ = ""
    ii = 0

    FOR ii = 1 TO xlen
        instring$ = instring$ & pstring$[ii]
    NEXT ii

    PRINT "BEEP\n"

    IF code = 12 THEN
        Printnl
        Print "Expected ", tholder$, " "
        Print ": in line: ", linendx,"\n"
        Print instring$,"\n"
        Print "code(", code, ")\n"
    ELSE
        Print "Program aborted, undefined error.\n"
    ENDIF

    END
END SUB
' ----------end abort -----------

jrs@laptop:~/sb/test$ scriba parse.sb

LALR Parser

Enter Expression: Example: 1+2*3*4*10/5
                  1+2*3*4*10/5
          result = 5.8

jrs@laptop:~/sb/test$

FYI: I had to add the CALL keyword for SUBs as most were used before they were declared.
« Last Edit: March 03, 2012, 01:00:24 AM by JRS »

SteveA

  • Guest
Re: recursive descent parser ?
« Reply #3 on: March 03, 2012, 02:44:34 PM »
Here is a ScriptBasic version of your Yabasic parser.

Code: [Select]
...<snip>

FYI: I had to add the CALL keyword for SUBs as most were used before they were declared.


Yeh, Yabasic doesn't use the keyword CALL for SUBs.
I had to strip them out of the Yabasic version.
Bxbasic requires CALL.

Quote
Enter Expression: Example: 1+2*3*4*10/5
                  1+2*3*4*10/5
          result = 5.8


If 5.8 is the answer SB returns, then there is something wrong.  ???
The correct answer should be 49:-\

I hadn't had the time to run it through ScriptBasic yet, so I didn't know what changes it might need.
I'll run it thru SB and see what's going on.
Thanks John.

Offline John

  • Forum Support / SB Dev
  • Posts: 3598
    • ScriptBasic Open Source Project
Re: recursive descent parser ?
« Reply #4 on: March 03, 2012, 09:43:40 PM »
I found the problem. Yabasic uses + to concatenate strings and ScriptBasic uses &. I must of missed this statement in the asc2dbl SUB.

Code: [Select]
cvalue$ = cvalue$ & ch$

jrs@laptop:~/sb/test$ scriba parse.sb

LALR Parser

Enter Expression: Example: 1+2*3*4*10/5
                  1+2*3*4*10/5
          result = 49

jrs@laptop:~/sb/test$

SteveA

  • Guest
Re: recursive descent parser ?
« Reply #5 on: March 04, 2012, 04:13:15 PM »
I found the problem. Yabasic uses + to concatenate strings and ScriptBasic uses &. I must of missed this statement in the asc2dbl SUB.

cvalue$ = cvalue$ & ch$


Oooh!
I never would have found that!
Haha! (LOL).  :D
Somehow, that totally slipped by me.  :-\

Last evening I was adding all kinds of diagnostic code, so I could trace every step.
I knew it was failing when I divided by 10. That was because 10 is two (2) ascii characters.

I'm sure glad you found that John, I was planning on staying up late tonight, until I found the bug!  (LOL)  :D

Steve

Offline John

  • Forum Support / SB Dev
  • Posts: 3598
    • ScriptBasic Open Source Project
Re: recursive descent parser ?
« Reply #6 on: March 04, 2012, 05:26:50 PM »
It would have been harder to find if you didn't use the $ string indicator. I normally don't use the $ for strings or string functions. (optional in SB)

You could also do this ...

Code: [Select]
cvalue$ &= ch$

Quote
Last evening I was adding all kinds of diagnostic code, so I could trace every step.

If running under Windows, SB has a single step debugger preprocessor you could use.

scriba -idbg yourscript

You can also look at values of variables, set breakpoints and other stuff you would expect in a debugger.

The goal after I get the IUP/CD/IM extension modules done is to create an IDE and embed this debugger into it. IUP already has a nice GUI layout tool that should round out the SB development environment. (all platforms)

« Last Edit: March 04, 2012, 06:54:31 PM by JRS »