The STREGEX function performs regular expression matching against the strings contained in StringExpression. STREGEX can perform either a simple boolean True/False evaluation of whether a match occurred, or it can return the position and offset within the strings for each match. The regular expressions accepted by this routine, which correspond to "Posix Extended Regular Expressions", are similar to those used by such UNIX tools as egrep, lex, awk, and Perl.
For more information about regular expressions, see Learning About Regular Expressions.
STREGEX is based on the regex package written by Henry Spencer, modified by RSI only to the extent required to integrate it into IDL. This package is freely available at ftp://zoo.toronto.edu/pub/regex.shar.
Result = STREGEX( StringExpression, RegularExpression [, /BOOLEAN | , /EXTRACT | , LENGTH=variable [, /SUBEXPR]] [, /FOLD_CASE] )
By default, STREGEX returns the position and length of the matched string within StringExpression. If no match is found, -1 is returned for both of these. Optionally, it can return a Boolean True/False result of the match, or the matched strings.
String to be matched.
A scalar string containing the regular expression to match. See Learning About Regular Expressions for a description of the meta characters that can be used in a regular expression.
Normally, STREGEX returns the position of the first character in StringExpression that matches RegularExpression. Setting BOOLEAN modifies this behavior to simply return a True/False value indicating if a match occurred or not.
Normally, STREGEX returns the position of the first character in StringExpression that matches RegularExpression. Setting EXTRACT modifies this behavior to simply return the matched substrings. The EXTRACT keyword cannot be used with either BOOLEAN or LENGTH.
Regular expression matching is normally a case-sensitive operation. Set FOLD_CASE to perform case-insensitive matching instead.
If present, specifies a variable to receive the lengths of the matches. Together with this result of this function, which contains the starting points of the matches in StringExpression, LENGTH can be used with the STRMID function to extract the matched substrings. The LENGTH keyword cannot be used with either BOOLEAN or EXTRACT.
By default, STREGEX only reports the overall match. Setting SUBEXPR causes it to report the overall match as well as any subexpression matches. A subexpression is any part of a regular expression written within parentheses. For example, the regular expression `(a)(b)(c+)' has 3 subexpressions, whereas the functionally equivalent 'abc+' has none. The SUBEXPR keyword cannot be used with BOOLEAN.
If a subexpression participated in the match several times, the reported substring is the last one it matched. Note, as an example in particular, that when the regular expression `(b*)+' matches `bbb', the parenthesized subexpression matches the three 'b's and then an infinite number of empty strings following the last `b', so the reported substring is one of the empties. This occurs because the `*' matches zero or more instances of the character that precedes it.
In order to return multiple positions and lengths for each input, the result from SUBEXPR has a new first dimension added compared to StringExpression.
To match a string starting with an "a", followed by a "b", followed by 1 or more "c":
pos = STREGEX('aaabccc', 'abc+', length=len) PRINT, STRMID('aaabccc', pos, len)
To perform the same match, and also find the locations of the three parts:
pos = STREGEX('aaabccc', '(a)(b)(c+)', length=len, /SUBEXPR) print, STRMID('aaabccc', pos, len)
abccc a b ccc
Or more simply:
abccc a b ccc
This example searches a string array for words of any length beginning with "f" and ending with "t" without the letter "o" in between:
str = ['foot', 'Feet', 'fate', 'FAST', 'ferret', 'affluent'] PRINT, STREGEX(str, '^f[^o]*t$', /EXTRACT, /FOLD_CASE)
This statement results in:
Feet FAST ferret
Note the following about this example:
PRINT, str[WHERE(STRMATCH(str, 'f[!o]*t', /FOLD_CASE) EQ 1)]
STRCMP, STRJOIN, STRMATCH, STRMID, STRPOS, STRSPLIT