Author Topic: LIKE +  (Read 21672 times)

Offline AIR

  • BASIC Developer
  • Posts: 932
  • Coder
Re: LIKE +
« Reply #15 on: November 03, 2018, 12:06:32 AM »
Thinking about this some more, I have to revisit the regex portion that I'm using.

While it works with the simple string I tested with, it get's confused when I duplicate the string to find additional matches...I have to change that part into a loop to process additional "hits".


AIR.


Offline John

  • Forum Support / SB Dev
  • Posts: 3597
    • ScriptBasic Open Source Project
Re: LIKE +
« Reply #16 on: November 03, 2018, 12:52:19 AM »
I'm not good enough with regex to know its limitations.

My approach is much different. You may end up last to release a working LIKE based on this latest news.   :o

I will post what I have soon that will not have support for multiple occurrences and code for failure if a match isn't found. That may change depending if my concept pans out or not.

LIKE Tips

  • For every JOKER there is a pair of patterns. (except the first and last)
  • Pattern strings always start and end with a JOKER. Normally the associated data for these are ignored.
« Last Edit: November 03, 2018, 02:20:46 AM by John »

Offline AIR

  • BASIC Developer
  • Posts: 932
  • Coder
Re: LIKE +
« Reply #17 on: November 03, 2018, 05:30:59 AM »
I'm not good enough with regex to know its limitations.

Turns out I needed to loop in order to meet the first requirement, grabbing additional occurrences/hits. 

Quote from: John
My approach is much different. You may end up last to release a working LIKE based on this latest news.   :o

I'm almost done, will post an update shortly....

Quote from: John
I will post what I have soon that will not have support for multiple occurrences and code for failure if a match isn't found. That may change depending if my concept pans out or not.

LIKE Tips

  • For every JOKER there is a pair of patterns. (except the first and last)
  • Pattern strings always start and end with a JOKER. Normally the associated data for these are ignored.

This is LIKE++, things can change!!!!  Right now, putting a JOKER at the beginning or end in my implementation causes nothing to be found.  I don't see the need to include that, since you still have access to the line/text that includes what you're looking to extract (position zero).

AIR.

Offline AIR

  • BASIC Developer
  • Posts: 932
  • Coder
Re: LIKE +
« Reply #18 on: November 03, 2018, 09:13:31 AM »
Updated version:  Returns multiple occurrences, and allows joker front and back...still a WIP.

Code: C++
  1. /*
  2.  *   "LIKE" keyword challenge submissionby AIR
  3.  *
  4.  *   Using C++
  5.  *
  6.  *   Emulates the "*" option used in ScriptBasic's LIKE function
  7.  *
  8.  *   Only tested with the string below, YMMV
  9.  *
  10.  *   Compile with: g++ --std=c++11 like.cpp -o like
  11.  */
  12.  
  13. #include<iostream>
  14. #include<string>
  15. #include <regex>
  16. #include <vector>
  17.  
  18. using namespace std;
  19.  
  20. struct JOKERINFO {
  21.     string text;
  22.     int length;
  23.     int position;
  24.     int object_num;
  25.     int prev_pos;
  26.     // JOKERINFO(const string text,const int length,const int position): text(text),length(length),position(position){}
  27. };
  28.  
  29. vector< JOKERINFO > JOKER;
  30.  
  31. int LIKE(string input,string expr) {
  32.     size_t pos = 0;
  33.     smatch match;
  34.     int i;
  35.     string src(input);
  36.  
  37.     while( ( pos = expr.find("*", pos) ) != string::npos) {
  38.         if (pos==0){
  39.             expr.erase(pos,1);
  40.         }
  41.         if (pos!=0)
  42.             expr.replace(pos, 1, "\\b(\\w+)");
  43.         pos += 1;
  44.     }
  45.  
  46.     regex term(expr);
  47.  
  48.     int count=0;
  49.     while ( std::regex_search (src, match, term) ) {
  50.        
  51.         for (int x=0 ; x < match.size(); x++){
  52.             JOKERINFO info;
  53.             info.text = match[x];
  54.             info.length = match[x].length();
  55.             info.position = match.position(x);  
  56.             info.object_num = count+1;  
  57.             JOKER.push_back(info);
  58.         }
  59.  
  60.         count++;
  61.  
  62.         src = match.suffix().str();
  63.     }
  64.  
  65.     return count;
  66. }
  67.  
  68. int main(int argc, char **argv) {
  69.  
  70.     string input = "You can see in the program that the vector 'JOKER' is declared as 'global'.";
  71.     input.append(input);
  72.     // cout << input << "\n\n";
  73.  
  74.     // SHOULD OUTPUT "JOKER" is global"
  75.     // and value of Joker[x] plus position in original string and length of match
  76.     if ( LIKE( input, "*vector '*' is * as '*'" ) ) {
  77.         // for (int y =0; y < JOKER.size(); y++){
  78.         //     cout << JOKER[y].text << endl;
  79.         // }
  80.         cout << "** FIRST OBJECT **" << endl;
  81.         cout << "OBJECT Number: " << JOKER[3].object_num << "\n\n";
  82.         cout << JOKER[1].text << " is " << JOKER[3].text << "\n";
  83.         cout << JOKER[1].text << " starts at position: " << JOKER[1].position << " and is " << JOKER[1].length << " Characters Long." <<endl;
  84.         cout << JOKER[3].text << " starts at position: " << JOKER[3].position << " and is " << JOKER[3].length << " Characters Long.\n\n";
  85.  
  86.         cout << "\n\n** SECOND OBJECT **" << endl;
  87.         cout << "OBJECT Number: " << JOKER[5].object_num << "\n\n";
  88.         cout << JOKER[5].text << " is " << JOKER[7].text << "\n";
  89.         cout << JOKER[5].text << " starts at position: " << JOKER[5].position << " and is " << JOKER[5].length << " Characters Long." <<endl;
  90.         cout << JOKER[7].text << " starts at position: " << JOKER[7].position << " and is " << JOKER[7].length << " Characters Long.\n\n";
  91.  
  92.  
  93.     }
  94.     return 0;
  95. }
  96.  

AIR.

Offline John

  • Forum Support / SB Dev
  • Posts: 3597
    • ScriptBasic Open Source Project
Re: LIKE +
« Reply #19 on: November 03, 2018, 11:16:49 AM »
I hink your original comment about this challenge may be accurate. This is well beyond Floyd or the word count challeng.

If anyone else decides to make a go for it, I highly recommend having Script BASIC installed on your system.
« Last Edit: November 03, 2018, 11:26:28 AM by John »

Offline AIR

  • BASIC Developer
  • Posts: 932
  • Coder
Re: LIKE +
« Reply #20 on: November 03, 2018, 11:28:59 AM »
A rudimentary BACON version, just to show it's capable, but I'm not finishing this....

Code: Text
  1. RECORD JOKERINFO
  2.     LOCAL text$
  3.     LOCAL length TYPE int
  4.     LOCAL position TYPE int
  5. END RECORD
  6.  
  7. GLOBAL JOKER[30] TYPE JOKERINFO_type
  8.  
  9. a$="You can see in the program that the vector 'JOKER' is declared as 'global'."
  10. like$="*vector '*' is * as '*'"
  11.  
  12. SPLIT like$ BY "*" TO array$ SIZE size
  13.  
  14. FOR x =0 TO size -1
  15.     int i = INSTR(a$,array$[x])
  16.     IF i THEN
  17.         hit$ = INBETWEEN$(a$,array$[x],array$[x+1])
  18.         JOKER[x].text$ = hit$
  19.         JOKER[x].length = LEN(hit$)
  20.         JOKER[x].position = i+LEN(hit$)
  21.     END IF
  22. NEXT
  23.  
  24. PRINT JOKER[1].text$," is ",JOKER[3].text$
  25. PRINT JOKER[1].text$," starts at position: ", JOKER[1].position, " and is ", JOKER[1].length, " Characters in length."
  26. PRINT JOKER[3].text$," starts at position: ", JOKER[3].position, " and is ", JOKER[3].length, " Characters in length."
  27.  
  28.  
« Last Edit: November 03, 2018, 11:33:35 AM by AIR »

Offline John

  • Forum Support / SB Dev
  • Posts: 3597
    • ScriptBasic Open Source Project
Re: LIKE +
« Reply #21 on: November 03, 2018, 11:38:07 AM »
If my SB LIKEX and JOKERS direction works out, this challenge may turn out to be relatively simple. Patterns are delimiters in full dress.

When this challenge is over, LIKEX is joining T (Tools) extension module along with the array sort I did in BASIC.
« Last Edit: November 04, 2018, 01:01:34 AM by John »

Offline John

  • Forum Support / SB Dev
  • Posts: 3597
    • ScriptBasic Open Source Project
Re: LIKE +
« Reply #22 on: November 03, 2018, 11:43:00 AM »
Quote from: AIR
A rudimentary BACON version, just to show it's capable, but I'm not finishing this....

Can you ping Peter and see if he is interested?

I'm surprised Mike hasn't made an appearance with FBSL or Oxygen Basic.
« Last Edit: November 03, 2018, 12:19:46 PM by John »

Offline AIR

  • BASIC Developer
  • Posts: 932
  • Coder
Re: LIKE +
« Reply #23 on: November 03, 2018, 06:50:07 PM »
If my SB LIKEX and JOKERS direction works out, this challenge may turn out to be relatively simple.

When this challenge is over, LIKEX is joining T (Tools) extension module along with the array sort I did in BASIC.

Question:  Will you be re-implementing LIKE/JOKER, or will you be enhancing them with wrapper code/functions?  Either is a valid approach, just wondering....

AIR.

Offline John

  • Forum Support / SB Dev
  • Posts: 3597
    • ScriptBasic Open Source Project
Re: LIKE +
« Reply #24 on: November 03, 2018, 09:05:12 PM »
LIKEX is an enhanced version of LIKE but written from scratch in SB and not using native LIKE.

Sorry about the delay in getting something posted. My concept is sound and I'm working on optimizing it. I may change tbe name of the new LIKE function.

Quote
EXTRACT
remove or take out, especially by effort or force.

MATCH (array) will probably replace JOKER

You know I'm a BASIC guy that is driven to present code in its most simplistic (readable) form as I'm able to deliver. The Firefox and Floyd challenge should be a good enough example.


Trust me, it will be worth the wait.

« Last Edit: November 04, 2018, 01:29:58 AM by John »

Offline AIR

  • BASIC Developer
  • Posts: 932
  • Coder
Re: LIKE +
« Reply #25 on: November 04, 2018, 12:31:21 PM »
I like those two keywords, good choices!!

(BTW, BCX/MBC had an EXTRACT$ function, but it acted more like a LEFT$ where instead of a position you provided a string, and would have everything to the left of that returned.)

Anyway, I was tired of looking at C++, so I decided to play around with the Bacon version I submitted earlier.

One of the things this does, which I don't know if you've considered, is return the number of matches as an integer.  So you can loop over the MATCH array if you want, or use it as a  check.

Code: Text
  1. RECORD MATCHINFO
  2.     LOCAL text$ TYPE STRING
  3.     LOCAL capture$ TYPE STRING
  4.     LOCAL length TYPE int
  5.     LOCAL position TYPE int
  6. END RECORD
  7.  
  8. GLOBAL MATCH[30] TYPE MATCHINFO_type
  9.  
  10. FUNCTION EXTRACT (source$,pattern$) TYPE int
  11.     LOCAL i TYPE int
  12.     LOCAL res TYPE int
  13.     LOCAL capture$ TYPE STRING
  14.  
  15.     SPLIT pattern$ BY "*" TO array$ SIZE size
  16.     FOR x = 0 TO size -1
  17.         i = INSTR(source$,array$[x])
  18.         IF i >= 0 THEN
  19.             local info = {0} TYPE MATCHINFO_type
  20.             IF x = 0 THEN
  21.                 capture$ = source$
  22.             ELSE
  23.                 capture$ = INBETWEEN$(source$,array$[x],array$[x+1])
  24.             END IF
  25.             IF LEN(capture$) THEN
  26.                 res = x
  27.                 info.text$ = capture$
  28.                 info.length = LEN(capture$)
  29.                 info.position = INSTR(source$,capture$)
  30.                 MATCH[x]=info
  31.             END IF
  32.         END IF
  33.     NEXT
  34.     RETURN res
  35. END FUNCTION
  36.  
  37. a$="You can see in the program that the array 'MATCH' is declared as 'global'."
  38. like$="*array '*' is * as '*'."
  39.  
  40. ret = EXTRACT(a$,like$)
  41. IF ret THEN
  42.     PRINT "The number of matched substrings is: ",ret
  43.     PRINT MATCH[1].text$," is a ",MATCH[3].text$, " array of structs/records."
  44.     PRINT MATCH[1].text$," starts at position: ", MATCH[1].position, " and is ", MATCH[1].length, " Characters in length."
  45.     PRINT MATCH[3].text$," starts at position: ", MATCH[3].position, " and is ", MATCH[3].length, " Characters in length."
  46. END IF
  47.  

AIR.
« Last Edit: November 04, 2018, 12:34:12 PM by AIR »

Offline John

  • Forum Support / SB Dev
  • Posts: 3597
    • ScriptBasic Open Source Project
Re: LIKE +
« Reply #26 on: November 04, 2018, 12:44:49 PM »
Quote from: AIR
Anyway, I was tired of looking at C++, so I decided to play around with the Bacon version I submitted earlier.

Great approach!  :D

I wish SB had an INBETWEEN$ function. It would save me a second pass at finalizing the MATCH arrary. I'm not done optimizing yet so I may get close to your BaCon submission..

Quote from: AIR
One of the things this does, which I don't know if you've considered, is return the number of matches as an integer.

MATCH[0,0] contains the total match occurances. JOKER(1...n)

Quote from: JRS Post #4
LIKEX will return the number of occurances and undef if no matches are found of the pattern string being passed.

I have learned to appreciate a returned value of undef. The refusal of assignment result.
« Last Edit: November 04, 2018, 02:12:59 PM by John »

Offline AIR

  • BASIC Developer
  • Posts: 932
  • Coder
Re: LIKE +
« Reply #27 on: November 04, 2018, 07:09:20 PM »
Updated BaCon version (tested with attached text file):

Code: Text
  1. ' /*
  2. '  *   "EXTRACT" (was "LIKE") keyword challenge submissionby AIR
  3. '  *
  4. '  *   Using BaCon version 3.8 on Darwin x86_64
  5. '  *
  6. '  *   Emulates the "*" option used in ScriptBasic's LIKE function
  7. '  *
  8. '  *   This version will return multiple matches, it us up to the
  9. '  *   programmer to decide how to use them. :)
  10. '  *
  11. '  *   Only tested with the attached file, YMMV
  12. '  *
  13. '  *
  14. '  *   TODO:
  15. '  *            Return the entire line of the matched query
  16. '  *            in addition to the match
  17. '  */
  18.  
  19. RECORD MATCHINFO
  20.     LOCAL text$ TYPE STRING
  21.     LOCAL fulltext$ TYPE STRING
  22.     LOCAL length TYPE int
  23.     LOCAL position TYPE int
  24. END RECORD
  25.  
  26. GLOBAL MATCH[100] TYPE MATCHINFO_type
  27.  
  28.  
  29. FUNCTION EXTRACT (source$,pattern$) TYPE int
  30.     LOCAL i,index TYPE int
  31.     LOCAL res TYPE int
  32.     LOCAL capture$ TYPE STRING
  33.  
  34.     SPLIT source$ BY NL$ TO list$ SIZE list_size
  35.     SPLIT pattern$ BY "*" TO array$ SIZE size
  36.  
  37.     index = 0
  38.     FOR y = 0 TO list_size-1
  39.         FOR x = 0 TO size -1
  40.             ' PRINT LEN(list$[y])
  41.             IF LEN(list$[y]>0) THEN
  42.                 i = INSTR(list$[y],array$[x])
  43.                 IF i >= 0 THEN
  44.                     local info = {0} TYPE MATCHINFO_type
  45.                     IF x = 0 THEN
  46.                         capture$ = list$[y]
  47.                     ELSE
  48.                         IF LEN(capture$) THEN
  49.                             capture$ = INBETWEEN$(list$[y],array$[x],array$[x+1])
  50.                         END IF
  51.                     END IF
  52.                     IF LEN(capture$) and x THEN
  53.                         info.text$ = capture$
  54.                         info.length = LEN(capture$)
  55.                         info.position = INSTR(source$,capture$)
  56.                         MATCH[index]=info
  57.                         INCR index
  58.                         res = index
  59.                     END IF
  60.                 END IF
  61.             END IF
  62.         NEXT
  63.     NEXT
  64.     RETURN res
  65. END FUNCTION
  66.  
  67. a$=LOAD$("anchors.txt")
  68. pattern$="*<a href=*>*"
  69.  
  70.  
  71. ret = EXTRACT(a$,pattern$)
  72.  
  73. IF ret THEN
  74.     PRINT NL$, "Using EXTRACT pattern: ", pattern$, NL$
  75.     FOR x = 0 TO ret -1
  76.         PRINT MATCH[x].text$," starts at position: ", MATCH[x].position, " and is ", MATCH[x].length, " Characters in length.",NL$
  77.     NEXT
  78. ELSE
  79.     PRINT "No Matches Found."
  80. END IF
  81.  
  82.  

AIR.

Offline John

  • Forum Support / SB Dev
  • Posts: 3597
    • ScriptBasic Open Source Project
Re: LIKE +
« Reply #28 on: November 04, 2018, 11:10:14 PM »
Maybe the next challenge could be SPLITA:)

Offline jalih

  • Advocate
  • Posts: 111
Re: LIKE +
« Reply #29 on: November 05, 2018, 05:19:19 AM »
Maybe the next challenge could be SPLITA:)

On that challenge 8th programming language would be really hard to beat...
« Last Edit: November 05, 2018, 08:42:02 AM by jalih »