AllBASIC Forum

BASIC User Group => Code Challenges => Topic started by: John on November 01, 2018, 04:39:29 PM

Title: LIKE +
Post by: John on November 01, 2018, 04:39:29 PM
This code challenge is to reproduce the Script BASIC LIKE / JOKER and WILDCARD pattern matching functionality.

LIKE Docs (https://www.scriptbasic.org/docs/ug/ug_25.113.html)

I will be validating all LIKE submissions against SB native LIKE.

Extra Points

Find a user requested number of matches or ALL based on the pattern string. SB LIKE only finds the first occurrence.

Return the JOKER count as a result of LIKE rather than TRUE (-1) / FALSE (0). I always wondered why Peter didn't use JOKER(0) to contain the total number.

Another extended feature that would be nice would be each JOKER also return the starting position of the returned string value in the search string.


Title: Re: LIKE +
Post by: John on November 01, 2018, 08:29:41 PM
Quote from: AIR
If you look at match.c in the SB source, you'll see that this is not a trivial thing to implement from scratch.  There's a lot of parsing going on in that module.

That's good to know. My LIKEX and JOKERS will seem to be all that more impressive.  :-*
Title: Re: LIKE +
Post by: jalih on November 02, 2018, 12:16:04 AM
Quote from: AIR
If you look at match.c in the SB source, you'll see that this is not a trivial thing to implement from scratch.  There's a lot of parsing going on in that module.

That's good to know. My LIKEX and JOKERS will be all that more impressive.  :-*

I know what wildcard is but what is joker?

There is a short and simple wildcard matching algorithm in a single while loop described inside the old ddj article (http://www.drdobbs.com/architecture-and-design/matching-wildcards-an-algorithm/210200888).
Title: Re: LIKE +
Post by: John on November 02, 2018, 12:34:44 AM
Welcome jalih!

Checkout the link in the first post for the LIKE online Script BASIC documentation.

Think of LIKE as using text patterns as brackets around the data you really want. The JOKER is represented by a * which is the data not represented in the text patterns.
Title: Re: LIKE +
Post by: John on November 02, 2018, 02:07:03 AM
LIKEX and JOKERS Preview.

LIKEX
This is syntax compatible with SB native LIKE but allows for an option parameter to indicate the number of occurances of the passed pattern string is to be found. * means all occurances. LIKEX will return the number of occurances and undef if no matches are found of the pattern string being passed.

JOKERS
JOKER is being expanded to not only return the data represented by the * placeholder but the following information as well.


Title: Re: LIKE +
Post by: AIR on November 02, 2018, 10:54:42 AM
Preliminary submission, does not include "Extra Point" items, and only uses "*":

Code: C++
  1. /*
  2.  *   "LIKE" keyword challenge submissionby AIR
  3.  *
  4.  *   Using C++
  5.  *
  6.  *   Implements the "*" option used in ScriptBasic's LIKE function
  7.  *
  8.  *   Only tested with the string below, YMMV
  9.  *
  10.  *   Compile with: g++ --std=c++11 like.cpp -o like
  11.  */
  12.  
  13. #include<iostream>
  14. #include<string>
  15. #include <regex>
  16. #include <vector>
  17.  
  18. using namespace std;
  19.  
  20. vector<string> JOKER;
  21.  
  22. int LIKE(string input,string expr) {
  23.     size_t pos = 0;
  24.     smatch match;
  25.  
  26.     while( ( pos = expr.find("*", pos) ) != string::npos) {
  27.         expr.replace(pos, 1, "(.+)");
  28.         pos += 1;
  29.     }
  30.  
  31.     regex term(expr);
  32.  
  33.     int i = regex_search(input, match, term, regex_constants::match_any);
  34.     if (match.size() > 0) {
  35.         for (int x=0 ; x < match.size(); x++){
  36.             JOKER.push_back(match[x]);
  37.         }
  38.     }
  39.     return i;
  40.  
  41. }
  42.  
  43. int main(int argc, char **argv) {
  44.  
  45.     string input = "You can see in the program that the vector 'JOKER' is declared as 'global'.";
  46.  
  47.     // SHOULD OUTPUT "JOKER is global"
  48.     if ( LIKE( input, "vector '*' is * as '*'." ) ) {
  49.         cout << JOKER[1] << " is " << JOKER[3] << endl;
  50.     }
  51.     return 0;
  52. }

AIR.
Title: Re: LIKE +
Post by: John on November 02, 2018, 10:58:37 AM
Thanks AIR for kicking this off with your C++ submission.

I hope to post my Sctipt BASIC first round submission sometime today.

Title: Re: LIKE +
Post by: AIR on November 02, 2018, 11:38:02 AM
LIKEX and JOKERS Preview.

LIKEX
This is syntax compatible with SB native LIKE but allows for an option parameter to indicate the number of occurances of the passed pattern string is to be found. * means all occurances. LIKEX will return the number of occurances and undef if no matches are found of the pattern string being passed.

JOKERS
JOKER is being expanded to not only return the data represented by the * placeholder but the following information as well.

  • Length of the referenced JOKERS index returned data.
  • Starting position in the match string for JOKERS index returned string being referenced.

Can you clarify this, it's not very clear what you mean...pseudo code is fine....

AIR.
Title: Re: LIKE +
Post by: John on November 02, 2018, 12:16:50 PM
The SB example I'll post should clear up any questions to the challenge goals. Folks should use the SB LIKE docs before going after extra points.

LIKEX  Enhance LIKE to do multiple occurrences of the pattern string.

JOKERS Extend JOKER to also provide its length and its position in the match string.

Title: Re: LIKE +
Post by: AIR on November 02, 2018, 02:04:49 PM
Okay, here's another submission, this one also returns the position of the MATCH in the original string...

Code: C++
  1. /*
  2.  *   "LIKE" keyword challenge submissionby AIR
  3.  *
  4.  *   Using C++
  5.  *
  6.  *   Emulates the "*" option used in ScriptBasic's LIKE function
  7.  *
  8.  *   Only tested with the string below, YMMV
  9.  *
  10.  *   Compile with: g++ --std=c++11 like.cpp -o like
  11.  */
  12.  
  13. #include<iostream>
  14. #include<string>
  15. #include <regex>
  16. #include <vector>
  17.  
  18. using namespace std;
  19.  
  20. vector< pair<string,int> > JOKER;
  21. #define TEXT first
  22. #define POSITION second
  23.  
  24. int LIKE(string input,string expr) {
  25.     size_t pos = 0;
  26.     smatch match;
  27.  
  28.  
  29.     while( ( pos = expr.find("*", pos) ) != string::npos) {
  30.         expr.replace(pos, 1, "(.+)");
  31.         pos += 1;
  32.     }
  33.  
  34.     regex term(expr);
  35.  
  36.     int i = regex_search(input, match, term, regex_constants::match_any);
  37.     if (match.size() > 0) {
  38.         for (int x=0 ; x < match.size(); x++){
  39.             JOKER.push_back(make_pair(match[x],match.position(x)));
  40.         }
  41.     }
  42.     return i;
  43.  
  44. }
  45.  
  46. int main(int argc, char **argv) {
  47.  
  48.     string input = "You can see in the program that the vector 'JOKER' is declared as 'global'.";
  49.  
  50.     // SHOULD OUTPUT "JOKER" is global"
  51.     // and value of Joker[x] plus position in original string
  52.     if ( LIKE( input, "vector '*' is * as '*'." ) ) {
  53.         cout << JOKER[1].TEXT << " is " << JOKER[3].TEXT << "\n\n";
  54.         cout << JOKER[1].TEXT << " is at position: " << JOKER[1].POSITION << endl;
  55.         cout << JOKER[3].TEXT << " is at position: " << JOKER[3].POSITION << "\n\n";
  56.  
  57.     }
  58.     return 0;
  59. }
  60.  

AIR.
Title: Re: LIKE +
Post by: John on November 02, 2018, 02:10:38 PM
You're already in extra points territory this soon. WOW!

I think this challenge will result in a generic function we all will LIKE.
Title: Re: LIKE +
Post by: AIR on November 02, 2018, 04:01:13 PM
Extra Extra points:  JOKER now CONTAINS the text, position, and length for the match.

Code: C++
  1. /*
  2.  *   "LIKE" keyword challenge submissionby AIR
  3.  *
  4.  *   Using C++
  5.  *
  6.  *   Emulates the "*" option used in ScriptBasic's LIKE function
  7.  *
  8.  *   Only tested with the string below, YMMV
  9.  *
  10.  *   Compile with: g++ --std=c++11 like.cpp -o like
  11.  */
  12.  
  13. #include<iostream>
  14. #include<string>
  15. #include <regex>
  16. #include <vector>
  17.  
  18. using namespace std;
  19.  
  20. struct JOKERINFO {
  21.     string text;
  22.     int length;
  23.     int positiion;
  24. };
  25.  
  26. vector< JOKERINFO > JOKER;
  27.  
  28. int LIKE(string input,string expr) {
  29.     size_t pos = 0;
  30.     smatch match;
  31.  
  32.     while( ( pos = expr.find("*", pos) ) != string::npos) {
  33.         expr.replace(pos, 1, "(.+)");
  34.         pos += 1;
  35.     }
  36.  
  37.     regex term(expr);
  38.  
  39.     int i = regex_search(input, match, term, regex_constants::match_any);
  40.     if (match.size() > 0) {
  41.         for (int x=0 ; x < match.size(); x++){
  42.             JOKER.push_back(JOKERINFO());
  43.             JOKER[x].text = match[x];
  44.             JOKER[x].length = match[x].length();
  45.             JOKER[x].positiion = match.position(x);
  46.         }
  47.     }
  48.     return i;
  49. }
  50.  
  51. int main(int argc, char **argv) {
  52.  
  53.     string input = "You can see in the program that the vector 'JOKER' is declared as 'global'.";
  54.  
  55.     // SHOULD OUTPUT "JOKER" is global"
  56.     // and value of Joker[x] plus position in original string and length of match
  57.     if ( LIKE( input, "vector '*' is * as '*'." ) ) {
  58.         cout << JOKER[1].text << " is " << JOKER[3].text << "\n\n";
  59.         cout << JOKER[1].text << " starts at position: " << JOKER[1].positiion << " and is " << JOKER[1].length << " Characters Long." <<endl;
  60.         cout << JOKER[3].text << " starts at position: " << JOKER[3].positiion << " and is " << JOKER[3].length << " Characters Long.\n\n";
  61.  
  62.     }
  63.     return 0;
  64. }
  65.  

Still waiting on your code, this took me a couple of hours in between work duties.... 8)

AIR.
Title: Re: LIKE +
Post by: John on November 02, 2018, 08:01:32 PM
Looks like you have my JOKERS function working in C++.

I'm trying to address all the items in the challenge spec. before posting a SB submission.

It would be useful if compiled submissions could be offered as a shared object. (DLL, SO, ...)

IT's not a race or competition but a sharing of concepts in multiple languages. There are no extra points for finishing first.

Title: Re: LIKE +
Post by: AIR on November 02, 2018, 09:04:37 PM
There are no extra points for finishing first.

Says the guy who's gonna finish second.... ;D ;D ;D ;D
Title: Re: LIKE +
Post by: John on November 02, 2018, 09:11:32 PM
Quote
Says the guy who's gonna finish second...

The guy who finishes last if others don't jump in.  :'(

You're not finished YET!

First to post half baked code will only get you admired.   8)

On a positive note, these challenges prove Script BASIC is finished and stable enough to submit entries without being embarrassed.
Title: Re: LIKE +
Post by: AIR on November 03, 2018, 12:06:32 AM
Thinking about this some more, I have to revisit the regex portion that I'm using.

While it works with the simple string I tested with, it get's confused when I duplicate the string to find additional matches...I have to change that part into a loop to process additional "hits".


AIR.

Title: Re: LIKE +
Post by: John on November 03, 2018, 12:52:19 AM
I'm not good enough with regex to know its limitations.

My approach is much different. You may end up last to release a working LIKE based on this latest news.   :o

I will post what I have soon that will not have support for multiple occurrences and code for failure if a match isn't found. That may change depending if my concept pans out or not.

LIKE Tips

Title: Re: LIKE +
Post by: AIR on November 03, 2018, 05:30:59 AM
I'm not good enough with regex to know its limitations.

Turns out I needed to loop in order to meet the first requirement, grabbing additional occurrences/hits. 

Quote from: John
My approach is much different. You may end up last to release a working LIKE based on this latest news.   :o

I'm almost done, will post an update shortly....

Quote from: John
I will post what I have soon that will not have support for multiple occurrences and code for failure if a match isn't found. That may change depending if my concept pans out or not.

LIKE Tips

  • For every JOKER there is a pair of patterns. (except the first and last)
  • Pattern strings always start and end with a JOKER. Normally the associated data for these are ignored.

This is LIKE++, things can change!!!!  Right now, putting a JOKER at the beginning or end in my implementation causes nothing to be found.  I don't see the need to include that, since you still have access to the line/text that includes what you're looking to extract (position zero).

AIR.
Title: Re: LIKE +
Post by: AIR on November 03, 2018, 09:13:31 AM
Updated version:  Returns multiple occurrences, and allows joker front and back...still a WIP.

Code: C++
  1. /*
  2.  *   "LIKE" keyword challenge submissionby AIR
  3.  *
  4.  *   Using C++
  5.  *
  6.  *   Emulates the "*" option used in ScriptBasic's LIKE function
  7.  *
  8.  *   Only tested with the string below, YMMV
  9.  *
  10.  *   Compile with: g++ --std=c++11 like.cpp -o like
  11.  */
  12.  
  13. #include<iostream>
  14. #include<string>
  15. #include <regex>
  16. #include <vector>
  17.  
  18. using namespace std;
  19.  
  20. struct JOKERINFO {
  21.     string text;
  22.     int length;
  23.     int position;
  24.     int object_num;
  25.     int prev_pos;
  26.     // JOKERINFO(const string text,const int length,const int position): text(text),length(length),position(position){}
  27. };
  28.  
  29. vector< JOKERINFO > JOKER;
  30.  
  31. int LIKE(string input,string expr) {
  32.     size_t pos = 0;
  33.     smatch match;
  34.     int i;
  35.     string src(input);
  36.  
  37.     while( ( pos = expr.find("*", pos) ) != string::npos) {
  38.         if (pos==0){
  39.             expr.erase(pos,1);
  40.         }
  41.         if (pos!=0)
  42.             expr.replace(pos, 1, "\\b(\\w+)");
  43.         pos += 1;
  44.     }
  45.  
  46.     regex term(expr);
  47.  
  48.     int count=0;
  49.     while ( std::regex_search (src, match, term) ) {
  50.        
  51.         for (int x=0 ; x < match.size(); x++){
  52.             JOKERINFO info;
  53.             info.text = match[x];
  54.             info.length = match[x].length();
  55.             info.position = match.position(x);  
  56.             info.object_num = count+1;  
  57.             JOKER.push_back(info);
  58.         }
  59.  
  60.         count++;
  61.  
  62.         src = match.suffix().str();
  63.     }
  64.  
  65.     return count;
  66. }
  67.  
  68. int main(int argc, char **argv) {
  69.  
  70.     string input = "You can see in the program that the vector 'JOKER' is declared as 'global'.";
  71.     input.append(input);
  72.     // cout << input << "\n\n";
  73.  
  74.     // SHOULD OUTPUT "JOKER" is global"
  75.     // and value of Joker[x] plus position in original string and length of match
  76.     if ( LIKE( input, "*vector '*' is * as '*'" ) ) {
  77.         // for (int y =0; y < JOKER.size(); y++){
  78.         //     cout << JOKER[y].text << endl;
  79.         // }
  80.         cout << "** FIRST OBJECT **" << endl;
  81.         cout << "OBJECT Number: " << JOKER[3].object_num << "\n\n";
  82.         cout << JOKER[1].text << " is " << JOKER[3].text << "\n";
  83.         cout << JOKER[1].text << " starts at position: " << JOKER[1].position << " and is " << JOKER[1].length << " Characters Long." <<endl;
  84.         cout << JOKER[3].text << " starts at position: " << JOKER[3].position << " and is " << JOKER[3].length << " Characters Long.\n\n";
  85.  
  86.         cout << "\n\n** SECOND OBJECT **" << endl;
  87.         cout << "OBJECT Number: " << JOKER[5].object_num << "\n\n";
  88.         cout << JOKER[5].text << " is " << JOKER[7].text << "\n";
  89.         cout << JOKER[5].text << " starts at position: " << JOKER[5].position << " and is " << JOKER[5].length << " Characters Long." <<endl;
  90.         cout << JOKER[7].text << " starts at position: " << JOKER[7].position << " and is " << JOKER[7].length << " Characters Long.\n\n";
  91.  
  92.  
  93.     }
  94.     return 0;
  95. }
  96.  

AIR.
Title: Re: LIKE +
Post by: John on November 03, 2018, 11:16:49 AM
I hink your original comment about this challenge may be accurate. This is well beyond Floyd or the word count challeng.

If anyone else decides to make a go for it, I highly recommend having Script BASIC installed on your system.
Title: Re: LIKE +
Post by: AIR on November 03, 2018, 11:28:59 AM
A rudimentary BACON version, just to show it's capable, but I'm not finishing this....

Code: Text
  1. RECORD JOKERINFO
  2.     LOCAL text$
  3.     LOCAL length TYPE int
  4.     LOCAL position TYPE int
  5. END RECORD
  6.  
  7. GLOBAL JOKER[30] TYPE JOKERINFO_type
  8.  
  9. a$="You can see in the program that the vector 'JOKER' is declared as 'global'."
  10. like$="*vector '*' is * as '*'"
  11.  
  12. SPLIT like$ BY "*" TO array$ SIZE size
  13.  
  14. FOR x =0 TO size -1
  15.     int i = INSTR(a$,array$[x])
  16.     IF i THEN
  17.         hit$ = INBETWEEN$(a$,array$[x],array$[x+1])
  18.         JOKER[x].text$ = hit$
  19.         JOKER[x].length = LEN(hit$)
  20.         JOKER[x].position = i+LEN(hit$)
  21.     END IF
  22. NEXT
  23.  
  24. PRINT JOKER[1].text$," is ",JOKER[3].text$
  25. PRINT JOKER[1].text$," starts at position: ", JOKER[1].position, " and is ", JOKER[1].length, " Characters in length."
  26. PRINT JOKER[3].text$," starts at position: ", JOKER[3].position, " and is ", JOKER[3].length, " Characters in length."
  27.  
  28.  
Title: Re: LIKE +
Post by: John on November 03, 2018, 11:38:07 AM
If my SB LIKEX and JOKERS direction works out, this challenge may turn out to be relatively simple. Patterns are delimiters in full dress.

When this challenge is over, LIKEX is joining T (Tools) extension module along with the array sort I did in BASIC.
Title: Re: LIKE +
Post by: John on November 03, 2018, 11:43:00 AM
Quote from: AIR
A rudimentary BACON version, just to show it's capable, but I'm not finishing this....

Can you ping Peter and see if he is interested?

I'm surprised Mike hasn't made an appearance with FBSL or Oxygen Basic.
Title: Re: LIKE +
Post by: AIR on November 03, 2018, 06:50:07 PM
If my SB LIKEX and JOKERS direction works out, this challenge may turn out to be relatively simple.

When this challenge is over, LIKEX is joining T (Tools) extension module along with the array sort I did in BASIC.

Question:  Will you be re-implementing LIKE/JOKER, or will you be enhancing them with wrapper code/functions?  Either is a valid approach, just wondering....

AIR.
Title: Re: LIKE +
Post by: John on November 03, 2018, 09:05:12 PM
LIKEX is an enhanced version of LIKE but written from scratch in SB and not using native LIKE.

Sorry about the delay in getting something posted. My concept is sound and I'm working on optimizing it. I may change tbe name of the new LIKE function.

Quote
EXTRACT
remove or take out, especially by effort or force.

MATCH (array) will probably replace JOKER

You know I'm a BASIC guy that is driven to present code in its most simplistic (readable) form as I'm able to deliver. The Firefox and Floyd challenge should be a good enough example.


Trust me, it will be worth the wait.

Title: Re: LIKE +
Post by: AIR on November 04, 2018, 12:31:21 PM
I like those two keywords, good choices!!

(BTW, BCX/MBC had an EXTRACT$ function, but it acted more like a LEFT$ where instead of a position you provided a string, and would have everything to the left of that returned.)

Anyway, I was tired of looking at C++, so I decided to play around with the Bacon version I submitted earlier.

One of the things this does, which I don't know if you've considered, is return the number of matches as an integer.  So you can loop over the MATCH array if you want, or use it as a  check.

Code: Text
  1. RECORD MATCHINFO
  2.     LOCAL text$ TYPE STRING
  3.     LOCAL capture$ TYPE STRING
  4.     LOCAL length TYPE int
  5.     LOCAL position TYPE int
  6. END RECORD
  7.  
  8. GLOBAL MATCH[30] TYPE MATCHINFO_type
  9.  
  10. FUNCTION EXTRACT (source$,pattern$) TYPE int
  11.     LOCAL i TYPE int
  12.     LOCAL res TYPE int
  13.     LOCAL capture$ TYPE STRING
  14.  
  15.     SPLIT pattern$ BY "*" TO array$ SIZE size
  16.     FOR x = 0 TO size -1
  17.         i = INSTR(source$,array$[x])
  18.         IF i >= 0 THEN
  19.             local info = {0} TYPE MATCHINFO_type
  20.             IF x = 0 THEN
  21.                 capture$ = source$
  22.             ELSE
  23.                 capture$ = INBETWEEN$(source$,array$[x],array$[x+1])
  24.             END IF
  25.             IF LEN(capture$) THEN
  26.                 res = x
  27.                 info.text$ = capture$
  28.                 info.length = LEN(capture$)
  29.                 info.position = INSTR(source$,capture$)
  30.                 MATCH[x]=info
  31.             END IF
  32.         END IF
  33.     NEXT
  34.     RETURN res
  35. END FUNCTION
  36.  
  37. a$="You can see in the program that the array 'MATCH' is declared as 'global'."
  38. like$="*array '*' is * as '*'."
  39.  
  40. ret = EXTRACT(a$,like$)
  41. IF ret THEN
  42.     PRINT "The number of matched substrings is: ",ret
  43.     PRINT MATCH[1].text$," is a ",MATCH[3].text$, " array of structs/records."
  44.     PRINT MATCH[1].text$," starts at position: ", MATCH[1].position, " and is ", MATCH[1].length, " Characters in length."
  45.     PRINT MATCH[3].text$," starts at position: ", MATCH[3].position, " and is ", MATCH[3].length, " Characters in length."
  46. END IF
  47.  

AIR.
Title: Re: LIKE +
Post by: John on November 04, 2018, 12:44:49 PM
Quote from: AIR
Anyway, I was tired of looking at C++, so I decided to play around with the Bacon version I submitted earlier.

Great approach!  :D

I wish SB had an INBETWEEN$ function. It would save me a second pass at finalizing the MATCH arrary. I'm not done optimizing yet so I may get close to your BaCon submission..

Quote from: AIR
One of the things this does, which I don't know if you've considered, is return the number of matches as an integer.

MATCH[0,0] contains the total match occurances. JOKER(1...n)

Quote from: JRS Post #4
LIKEX will return the number of occurances and undef if no matches are found of the pattern string being passed.

I have learned to appreciate a returned value of undef. The refusal of assignment result.
Title: Re: LIKE +
Post by: AIR on November 04, 2018, 07:09:20 PM
Updated BaCon version (tested with attached text file):

Code: Text
  1. ' /*
  2. '  *   "EXTRACT" (was "LIKE") keyword challenge submissionby AIR
  3. '  *
  4. '  *   Using BaCon version 3.8 on Darwin x86_64
  5. '  *
  6. '  *   Emulates the "*" option used in ScriptBasic's LIKE function
  7. '  *
  8. '  *   This version will return multiple matches, it us up to the
  9. '  *   programmer to decide how to use them. :)
  10. '  *
  11. '  *   Only tested with the attached file, YMMV
  12. '  *
  13. '  *
  14. '  *   TODO:
  15. '  *            Return the entire line of the matched query
  16. '  *            in addition to the match
  17. '  */
  18.  
  19. RECORD MATCHINFO
  20.     LOCAL text$ TYPE STRING
  21.     LOCAL fulltext$ TYPE STRING
  22.     LOCAL length TYPE int
  23.     LOCAL position TYPE int
  24. END RECORD
  25.  
  26. GLOBAL MATCH[100] TYPE MATCHINFO_type
  27.  
  28.  
  29. FUNCTION EXTRACT (source$,pattern$) TYPE int
  30.     LOCAL i,index TYPE int
  31.     LOCAL res TYPE int
  32.     LOCAL capture$ TYPE STRING
  33.  
  34.     SPLIT source$ BY NL$ TO list$ SIZE list_size
  35.     SPLIT pattern$ BY "*" TO array$ SIZE size
  36.  
  37.     index = 0
  38.     FOR y = 0 TO list_size-1
  39.         FOR x = 0 TO size -1
  40.             ' PRINT LEN(list$[y])
  41.             IF LEN(list$[y]>0) THEN
  42.                 i = INSTR(list$[y],array$[x])
  43.                 IF i >= 0 THEN
  44.                     local info = {0} TYPE MATCHINFO_type
  45.                     IF x = 0 THEN
  46.                         capture$ = list$[y]
  47.                     ELSE
  48.                         IF LEN(capture$) THEN
  49.                             capture$ = INBETWEEN$(list$[y],array$[x],array$[x+1])
  50.                         END IF
  51.                     END IF
  52.                     IF LEN(capture$) and x THEN
  53.                         info.text$ = capture$
  54.                         info.length = LEN(capture$)
  55.                         info.position = INSTR(source$,capture$)
  56.                         MATCH[index]=info
  57.                         INCR index
  58.                         res = index
  59.                     END IF
  60.                 END IF
  61.             END IF
  62.         NEXT
  63.     NEXT
  64.     RETURN res
  65. END FUNCTION
  66.  
  67. a$=LOAD$("anchors.txt")
  68. pattern$="*<a href=*>*"
  69.  
  70.  
  71. ret = EXTRACT(a$,pattern$)
  72.  
  73. IF ret THEN
  74.     PRINT NL$, "Using EXTRACT pattern: ", pattern$, NL$
  75.     FOR x = 0 TO ret -1
  76.         PRINT MATCH[x].text$," starts at position: ", MATCH[x].position, " and is ", MATCH[x].length, " Characters in length.",NL$
  77.     NEXT
  78. ELSE
  79.     PRINT "No Matches Found."
  80. END IF
  81.  
  82.  

AIR.
Title: Re: LIKE +
Post by: John on November 04, 2018, 11:10:14 PM
Maybe the next challenge could be SPLITA.  :)
Title: Re: LIKE +
Post by: jalih on November 05, 2018, 05:19:19 AM
Maybe the next challenge could be SPLITA.  :)

On that challenge 8th programming language would be really hard to beat...
Title: Re: LIKE +
Post by: John on November 05, 2018, 11:35:33 AM
I think I should be effective with regex before trying another cryptic tool like 8th.
Title: Re: LIKE +
Post by: AIR on November 05, 2018, 01:35:40 PM
Well, it's not so much about if the language can do it, but about how would you implement it.

Python makes it simple too....

AIR.
Title: Re: LIKE +
Post by: AIR on November 05, 2018, 01:49:49 PM
I think I should be effective with regex before trying another cryptic tool like 8th.

Regex is one of those things that will drive you crazy until you have an "AHA!" moment and it makes sense.

Perl is a good way to practice; PCRE was created with that syntax in mind...If I recall, the SB "RE" module is posix-based, meaning it uses the posix syntax, which is an entirely different way of doing stuff.
Title: Re: LIKE +
Post by: John on November 05, 2018, 01:51:33 PM
Keep in mind that the split BY can consist of more than one character.

Quote
If I recall, the SB "RE" module is posix-based, meaning it uses the posix syntax, which is an entirely different way of doing stuff.

It now makes sense why none of my testing of the RE module worked.
Title: Re: LIKE +
Post by: John on November 05, 2018, 09:14:00 PM
Another feature I'm planning on adding to the Script BASIC EXTRACT is if the user passes a negative value in the optional third argument, EXTRACT will act LIKE a function and return the MATCH (just the value  MATCH[-x,0]) for the place holder position rather than the default occurance count. If no optional argument is passed, EXTRACT emulates SB native LIKE returning TRUE / FALSE.

Curious if BaCon can return various types? (numeric or a string)

Title: Re: LIKE +
Post by: AIR on November 05, 2018, 09:58:26 PM
I don't think you can do that in Bacon with an array/associate array, but you should be able to do it using an array of RECORDs (struct, in C parlance).  But the number of entries per RECORD is fixed and typed so if you set up a string as a field, you can't directly place a number in there.  You may be able to just set up longs and pass the address of the variables, but then you get into some C-pointer like territory...

SB finagles this by essentially treating everything as a string, so you're able to do this in your arrays....
Title: Re: LIKE +
Post by: John on November 05, 2018, 11:25:08 PM
SB doesn't allow returning arrays. You have to pass them back and forth as byref (default) arguments. SB variables are in the form of a variant.

The MATCH array is created in the EXTRACT function but it's a global variable not a local array.

Title: Re: LIKE +
Post by: AIR on November 09, 2018, 10:06:06 PM
Hey John, did you ever finish the SB version?
Title: Re: LIKE +
Post by: John on November 10, 2018, 07:53:00 AM
Close but got sidetracked playing in the sandbox. I hope to have some time this weekend to finish it.
 
Title: Re: LIKE +
Post by: AIR on November 10, 2018, 11:43:49 AM
I figured as much.  I'm really interested in what you come up with, especially the lower level module bits. 

Title: Re: LIKE +
Post by: John on November 10, 2018, 08:18:26 PM
This is a very early release that gets the patterns MATCH items but not the other data that surrounds it (non-relevent jokers)

Code: ScriptBasic
  1. ' EXTRACT (LIKE +) Code Challenge - Script BASIC / JRS
  2.  
  3. html = """<!DOCTYPE html>
  4. <html>
  5.  <head>
  6.    <title>AllBASIC.INFO Forum LIKE Code Challenge</title>
  7.  </head>
  8.  <body>
  9. LIKE it or don't.
  10.  </body>
  11. </html>"""
  12.  
  13. FUNCTION EXTRACT(base, mask)
  14.   UNDEF MATCH
  15.   start = 1
  16.   match_idx = 2
  17.   SPLITA mask BY "*" TO patterns
  18.   FOR idx = 0 TO UBOUND(patterns) STEP 2
  19.     MATCH[match_idx, 1] = INSTR(base, patterns[idx], start)
  20.     start = MATCH[match_idx, 1]
  21.     MATCH[match_idx, 2] = INSTR(base, patterns[idx + 1], start)
  22.     start = MATCH[match_idx, 2]
  23.     MATCH[match_idx, 1] = MATCH[match_idx, 1] + LEN(patterns[idx])
  24.     MATCH[match_idx, 2] = MATCH[match_idx, 2] - MATCH[match_idx, 1]
  25.     MATCH[match_idx, 0] = MID(base, MATCH[match_idx, 1], MATCH[match_idx, 2])
  26.     match_idx += 2
  27.   NEXT
  28. END FUNCTION
  29.  
  30. EXTRACT html, "*<title>*</title>*<body>*</body>*"
  31.  
  32. PRINT "Title:  ", MATCH[2, 0],"\n"
  33. PRINT "Start:  ", MATCH[2, 1],"\n"
  34. PRINT "Length: ", MATCH[2, 2],"\n"
  35. PRINTNL
  36. PRINT "Body:   ", MATCH[4, 0],"\n"
  37. PRINT "Start:  ", MATCH[4, 1],"\n"
  38. PRINT "Length: ", MATCH[4, 2],"\n"
  39.  


$ time scriba extract.sb
Title:  AllBASIC.INFO Forum LIKE Code Challenge
Start:  44
Length: 39

Body:   
LIKE it or don't.
 
Start:  110
Length: 21

real   0m0.007s
user   0m0.008s
sys   0m0.000s
$


BTW: Firefox for Windows is now at 63.0.1.
Title: Re: LIKE +
Post by: John on November 11, 2018, 08:47:19 AM
The EXTRACT function seems to work fine with the Firefox download challenge replacing LIKE. One issue I have discovered with this version of EXTRACT is a pattern needs to be more than one character.

Code: ScriptBasic
  1. ' Firefox Download Challenge (Script BASIC Version) by JRS.
  2.  
  3. INCLUDE curl.bas
  4.  
  5. FUNCTION EXTRACT(base, mask)
  6.   UNDEF MATCH
  7.   start = 1
  8.   match_idx = 2
  9.   SPLITA mask BY "*" TO patterns
  10.   FOR idx = 0 TO UBOUND(patterns) STEP 2
  11.     MATCH[match_idx, 1] = INSTR(base, patterns[idx], start)
  12.     start = MATCH[match_idx, 1]
  13.     MATCH[match_idx, 2] = INSTR(base, patterns[idx + 1], start)
  14.     start = MATCH[match_idx, 2]
  15.     MATCH[match_idx, 1] = MATCH[match_idx, 1] + LEN(patterns[idx])
  16.     MATCH[match_idx, 2] = MATCH[match_idx, 2] - MATCH[match_idx, 1]
  17.     MATCH[match_idx, 0] = MID(base, MATCH[match_idx, 1], MATCH[match_idx, 2])
  18.     match_idx += 2
  19.   NEXT
  20.   EXTRACT = UBOUND(patterns) + 1 / 2
  21. END FUNCTION
  22.  
  23.  
  24. ch = curl::init()
  25. curl::option(ch,"URL","https://www.mozilla.org/en-US/firefox/new/")
  26. wp = curl::perform(ch)
  27.  
  28. IF EXTRACT(wp, """*data-latest-firefox="*" data-esr-versions*<div id="other-platforms">*<li class="os_win64">*<a href="*"
  29.            class*""") THEN
  30.   PRINT "Downloading Latest 64Bit Firefox (",MATCH[2, 0],") for Windows.\n"
  31.   curl::option(ch,"FOLLOWLOCATION", 1)
  32.   curl::option(ch,"NOPROGRESS",0)
  33.   curl::option(ch,"FILE","Firefox_Setup-" & MATCH[2, 0] & ".exe")
  34.   curl::option(ch,"URL", MATCH[6, 0])
  35.   curl::perform(ch)
  36.   PRINTNL
  37.   PRINT "Firefox_Setup-" & MATCH[2, 0] & ".exe Downloaded ",FORMAT("%~##,###,###~ Bytes",curl::info(ch,"SIZE_DOWNLOAD")), _
  38.   " at ",FORMAT("%~##,###,###~ Bytes/Second",curl::info(ch,"SPEED_DOWNLOAD")),".\n"
  39. ELSE
  40.   PRINT "<< ERROR >>\n"
  41. END IF
  42.  
  43. curl::finish(ch)
  44.  


$ time scriba ff_extract.sb
Downloading Latest 64Bit Firefox (63.0.1) for Windows.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   134  100   134    0     0    611      0 --:--:-- --:--:-- --:--:--   611
100 42.3M  100 42.3M    0     0  22.1M      0  0:00:01  0:00:01 --:--:-- 34.3M

Firefox_Setup-63.0.1.exe Downloaded 44,396,144 Bytes at 23,219,740 Bytes/Second.

real   0m2.201s
user   0m1.075s
sys   0m0.214s
$

Title: Re: LIKE +
Post by: John on November 12, 2018, 05:07:15 PM
I will be posting another update to EXTRACT with better error handling and an optional argument to specify which MATCH item you want returned by EXTRACT rather than the total MATCH count. Multiple occurrences gets unmanageable deciphering which MATCH index to select for a given occurrence.

Title: Re: LIKE +
Post by: John on November 12, 2018, 06:08:37 PM
Script BASIC 

Note: Only JOKERS surrounded by match patterns are returned. Others are ignored. If you need them, use LIKE. I have REM'ed the UNDEF MATCH in this example to show both options EXTRACT offers. I tested the EXTRACT function with the Firefox download page and it returned the correct number of JOKERS and returned the data requested using the optional argument.

Code: ScriptBasic
  1. '' EXTRACT (LIKE +) Code Challenge - Script BASIC / JRS
  2.  
  3. html = """<!DOCTYPE html>
  4. <html>
  5.  <head>
  6.    <title>AllBASIC.INFO Forum LIKE Code Challenge</title>
  7.  </head>
  8.  <body>
  9. LIKE it or don't.
  10.  </body>
  11. </html>"""
  12.  
  13. FUNCTION EXTRACT(base, mask, select)
  14.   UNDEF MATCH
  15.   start = 1
  16.   match_idx = 2
  17.   SPLITA mask BY "*" TO patterns
  18.   FOR idx = 0 TO UBOUND(patterns) STEP 2
  19.     MATCH[match_idx, 1] = INSTR(base, patterns[idx], start)
  20.     IF MATCH[match_idx, 1] <> undef THEN
  21.       start = MATCH[match_idx, 1]
  22.     ELSE
  23.       GOTO MATCH_ERROR
  24.     END IF
  25.     MATCH[match_idx, 2] = INSTR(base, patterns[idx + 1], start)
  26.     IF MATCH[match_idx, 2] <> undef THEN
  27.       start = MATCH[match_idx, 2]
  28.     ELSE
  29.       GOTO MATCH_ERROR
  30.     END IF
  31.     MATCH[match_idx, 1] = MATCH[match_idx, 1] + LEN(patterns[idx])
  32.     MATCH[match_idx, 2] = MATCH[match_idx, 2] - MATCH[match_idx, 1]
  33.     MATCH[match_idx, 0] = MID(base, MATCH[match_idx, 1], MATCH[match_idx, 2])
  34.     match_idx += 2
  35.   NEXT
  36.   IF select <> undef AND select <= UBOUND(patterns) + 1 THEN
  37.     EXTRACT = MATCH[select, 0]
  38.     ' UNDEF MATCH
  39.  ELSE
  40.     EXTRACT = UBOUND(patterns) + 2
  41.   END IF
  42.   GOTO DONE
  43.  
  44. MATCH_ERROR:
  45.   EXTRACT = 0
  46.  
  47. DONE:
  48. END FUNCTION
  49.  
  50. PRINT EXTRACT(html, "*<title>*</title>*<body>*</body>*", 2), "\n"
  51. PRINTNL
  52. PRINT "Title:  ", MATCH[2, 0],"\n"
  53. PRINT "Start:  ", MATCH[2, 1],"\n"
  54. PRINT "Length: ", MATCH[2, 2],"\n"
  55. PRINTNL
  56. PRINT "Body:   ", MATCH[4, 0],"\n"
  57. PRINT "Start:  ", MATCH[4, 1],"\n"
  58. PRINT "Length: ", MATCH[4, 2],"\n"
  59.  


$ time scriba extract.sb
AllBASIC.INFO Forum LIKE Code Challenge

Title:  AllBASIC.INFO Forum LIKE Code Challenge
Start:  44
Length: 39

Body:   
LIKE it or don't.
 
Start:  110
Length: 21

real   0m0.009s
user   0m0.005s
sys   0m0.004s
$
Title: Re: LIKE +
Post by: John on November 13, 2018, 02:59:58 PM
Is there anyone besides me willing to post a working EXTRACT / MATCH submission for this code challenge?
Title: Re: LIKE +
Post by: AIR on November 13, 2018, 04:34:42 PM
Is there anyone besides me willing to post a working EXTRACT / MATCH submission for this code challenge?

I think they got tired of waiting for yours... ;D ;D ;D ;D

Seriously, nice work!!

AIR.
Title: Re: LIKE +
Post by: John on November 13, 2018, 06:02:54 PM
I had the concept from the start. The sandbox project took more of my time then I anticipated.

Thanks! I think it turned out nice as well.
Title: Re: LIKE +
Post by: AIR on November 13, 2018, 06:16:48 PM
Here's a PYTHONIC take on your approach:

Code: Python
  1. #!/usr/bin/env python
  2.  
  3. html = """<!DOCTYPE html>
  4. <html>
  5.  <head>
  6.    <title>AllBASIC.INFO Forum LIKE Code Challenge</title>
  7.  </head>
  8.  <body>
  9. LIKE it or don't.
  10.  </body>
  11. </html>"""
  12.  
  13. MATCH={}
  14. def extract( source, filter):
  15.     MATCH.clear()
  16.     try:
  17.         d = filter.split('*')
  18.         for i, x in enumerate(d):
  19.             if x and i % 2 != 0:
  20.                 start = source.index( d[i] ) + len( d[i] )
  21.                 end = source.index( d[i+1], start )
  22.                 if i == 1: MATCH[0]=source[start:end]
  23.                 MATCH[i]=( source[start:end].strip(), start, end-start )
  24.         return MATCH[0]
  25.     except:
  26.         return ""
  27.  
  28. print extract(html,"*<title>*</title>*<body>*</body>*"),"\n"
  29.  
  30. print "Title: ",MATCH[1][0]
  31. print "Start: ",MATCH[1][1]
  32. print "Length: ",MATCH[1][2],"\n"
  33.  
  34. print "Body: ",MATCH[3][0]
  35. print "Start: ",MATCH[3][1]
  36. print "Length: ",MATCH[3][2]
  37.  

[riveraa@MacDev ~/Projects/Python/Extract] $ ./extract.py
AllBASIC.INFO Forum LIKE Code Challenge

Title:  AllBASIC.INFO Forum LIKE Code Challenge
Start:  43
Length:  39

Body:  LIKE it or don't.
Start:  109
Length:  21


AIR.
Title: Re: LIKE +
Post by: John on November 13, 2018, 06:33:34 PM
That looks great!

Python is definatly my second choice in scripting languages. Perl comes in third.

Peter was a fan of Python and Perl. You can see traces of these languages in Scipt BASIC.

I think your start positions are off by one. Do Python strings start at zero?

Title: Re: LIKE +
Post by: John on November 14, 2018, 09:04:33 PM
AIR,

This may or may not be of interest to you.

https://winpython.github.io/

Title: Re: LIKE +
Post by: AIR on November 14, 2018, 10:19:27 PM
I think your start positions are off by one. Do Python strings start at zero?

Yes.
Title: Re: LIKE +
Post by: John on November 14, 2018, 10:50:52 PM
That sucks!
Title: Re: LIKE +
Post by: AIR on November 15, 2018, 06:40:13 AM
Zero-based indexing is pretty much the standard.
Title: Re: LIKE +
Post by: John on November 15, 2018, 07:19:14 AM
For arrays, not string starting position.
Title: Re: LIKE +
Post by: AIR on November 15, 2018, 11:44:36 AM
strings = char arrays = zero-based

Code: C
  1. #include <stdio.h>
  2. #include <stdlib.h>
  3.  
  4.  
  5.  
  6. int main (int argc, char **argv) {
  7.  
  8.     char *string = "Hello, World!";
  9.  
  10.     printf("string = %s\n",string);
  11.     printf("string[0] = %c\n", string[0]);
  12.     printf("string[1] = %c\n", string[1]);
  13.    
  14.     return 0;
  15.  
  16. }

[arivera1271@arivera1271 ~/Projects/C] $ gcc string.c -o string
[arivera1271@arivera1271 ~/Projects/C] $ ./string
string = Hello, World!
string[0] = H
string[1] = e
[arivera1271@arivera1271 ~/Projects/C] $


AIR.

Edit: removed redundant strdup for this example...
Title: Re: LIKE +
Post by: John on November 15, 2018, 12:39:41 PM
I'm good with zero based character arrays emulating strings in various languages.
Title: Re: LIKE +
Post by: AIR on November 15, 2018, 07:55:18 PM
AIR,

This may or may not be of interest to you.

https://winpython.github.io/

Way too much bloat for my taste, but thanks!

AIR.
Title: Re: LIKE +
Post by: John on November 15, 2018, 08:35:26 PM
The only reason I thought they might have a chance with you is their efforts to containerized the environment. I didn't go far enough to see if there was a weight problem.

SB spoils me again only having < 800KB footprint. If I don't care about scriba's command line utility extensions, SB would have a 4KB footprint with the interpreter running out of a DLL (.so)

Where SB shines is running multi-threaded using a share object already resident in memory.