Core Function Regex Match
<Expression> =~ m/pattern/flags m/pattern/flags
Contents |
Description
Match a string to a regular expression pattern and check if it matches and optionally return captured groups
Parameters
Expression
Any valid expression that is a string or an array.
You may choose to skip this part and it will use $_ for you.
Note - If you use an array it can only check if an item in the array matches or not and return it in the $1 $2 etc etc however it does not support returning all matches at this time.
pattern
The regular expression pattern to match.
flags
Optional; The flags to use in the pattern.
i = Ignore case.
m = Treat the string as multiple lines.
s = Treat the string as a single line.
o = Do not recompile the regular expression after the first compile (Improves speed of your matches if you run the pattern many times).
g = Match all occurrences of the pattern in the string (Default is only match the first).
d = Return a single dimension array when using flag "g" (Default is to return a multidimensional array).
k = Leave out group 0 (The whole match) when creating the Regexp group array.
a = Instead of returning the matched item as a string it will return an array containing the matched item, its index(where it was found) and its length(length of the match).
z = Same as flag 'a' but this will return an array of the index and length of the whole match if it cant find any group matches.
t = By default, the regular expression engine searches from left to right. You can reverse the search direction by using this flag.
n = Do not capture unnamed groups. The only valid captures are explicitly named or numbered groups of the form (?<name> subexpression)
b = Require all elements in an array match instead of any.
v = Match against array keys instead of values (it will only use string keys not numeric keys).
c = Ignore cultural differences in language.
p = Do not parse the Regexp pattern for variables etc.
x = Allows newlines and commands and ignores whitespace in the Regexp.
Note - If flag "g" is used the $_rg array will contain all matches from first to last it will not contain the text it matched it will only contain the matches, However if "g" is not used then the first element will be the matched text followed by all the matched groups 1 2 3 4 etc.
Default: None of the flags are used by default.
Return Value
Success: Returns 1 if the match was successful.
Failure: Returns 0.
Remarks
To iterate with an array you must use a 'while()' or 'foreach' anything else is not good enough.
Note - The below is only for single matches not flag "g" matches.
If a regular expression pattern is matched correctly and it has capture groups in the pattern the groups will be set to local variables for example:
Group 0 will be $0.
Group 1 will be $1.
Group 2 will be $2.
And so on.
Of course captured NAMED groups will also be returned as follows:
Named group "Test" will be $_rg["Test"]
Named group "Moo" will be $_rg["Moo"]
Named group "Cat" will be $_rg["Cat"]
And so on.
After each regular expression match all capture groups from the previous match will be deleted so its best to copy them if you intend to keep using them.
Regular Expressions
Regular expression notation is a compact way of specifying a pattern for strings that can be searched. Regular expressions are character strings in which plain text characters indicate what text should exist in the target string, and a some characters are given special meanings to indicate what variability is allowed in the target string. AutoIt regular expressions are normally case-sensitive.
Regular expressions are constructed of one or more of the following simple regular expression specifiers. If the character is not in the following table, then it will match only itself.
Repeating characters (*, +, ?, {...} ) will try to match the largest set possible, which allows the following characters to match as well, unless followed immediately by a question mark; then it will find the smallest pattern that allows the following characters to match as well.
Nested groups are allowed, but keep in mind that all the groups, except non-capturing groups, assign to the returned array, with the outer groups assigning after the inner groups.
Character Escapes
The backslash character (\) in a regular expression indicates that the character that follows it either is a special character (as shown in the following table), or should be interpreted literally.
Escaped character Description Pattern Matches
\a Matches a bell character, \u0007. \a "\u0007" in "Error!" + '\u0007'
\b In a character class, matches a
backspace, \u0008. [\b]{3,} "\b\b\b\b" in "\b\b\b\b"
\t Matches a tab, \u0009. (\w+)\t "item1\t", "item2\t" in "item1\titem2\t"
\r Matches a carriage return, \u000D.
(\r is not equivalent to the
newline character, \n.) \r\n(\w+) "\r\nThese" in "\r\nThese are\ntwo lines."
\v Matches a vertical tab, \u000B. [\v]{2,} "\v\v\v" in "\v\v\v"
\f Matches a form feed, \u000C. [\f]{2,} "\f\f\f" in "\f\f\f"
\n Matches a new line, \u000A. \r\n(\w+) "\r\nThese" in "\r\nThese are\ntwo lines."
\e Matches an escape, \u001B. \e "\x001B" in "\x001B"
\ nnn Uses octal representation to specify
a character (nnn consists of two or
three digits). \w\040\w "a b", "c d" in "a bc d"
\x nn Uses hexadecimal representation to
specify a character (nn consists of
exactly two digits). \w\x20\w "a b", "c d" in "a bc d"
\c X
\c x Matches the ASCII control character
that is specified by X or x, where
X or x is the letter of the control
character. \cC "\x0003" in "\x0003" (Ctrl-C)
\u nnnn Matches a Unicode character by
using hexadecimal representation
(exactly four digits, as represented
by nnnn). \w\u0020\w "a b", "c d" in "a bc d"
\ When followed by a character that is "2+2" and "3*9" in "(2+2) * 3*9"
not recognized as an escaped character
in this and other tables in this topic,
matches that character. For example, \*
is the same as \x2A, and \. is the same
as \x2E. This allows the regular
expression engine to disambiguate language
elements (such as * or ?) and character
literals (represented by \* or \?). \d+[\+-x\*]\d+\d+[\+-x\*\d+
Repeating Characters
Character classes
A character class matches any one of a set of characters. Character classes include the language elements listed in the following table.
Character class Description Pattern Matches
[ character_group ] Matches any single character in
character_group. By default,
the match is case-sensitive. [ae] "a" in "gray"
"a", "e" in "lane"
[^ character_group ] Negation: Matches any single character
that is not in character_group. By
default, characters in character_group
are case-sensitive. [^aei] "r", "g", "n" in "reign"
[ first - last ] Character range: Matches any single
character in the range from first to
last. [A-Z] "A", "B" in "AB123"
. Wildcard: Matches any single character
except \n.
To match a literal period character
(. or \u002E), you must precede it with
the escape character (\.). a.e "ave" in "nave"
"ate" in "water"
\p{ name } Matches any single character in the
Unicode general category or named block
specified by name. \p{Lu} "C", "L" in "City Lights"
\p{IsCyrillic} "Д", "Ж" in "ДЖem"
\P{ name } Matches any single character that is
not in the Unicode general category or
named block specified by name. \P{Lu} "i", "t", "y" in "City"
\P{IsCyrillic} "e", "m" in "ДЖem"
\w Matches any word character. \w "I", "D", "A", "1"
"3" in "ID A1.3"
\W Matches any non-word character. \W " ", "." in "ID A1.3"
\s Matches any white-space character. \w\s "D " in "ID A1.3"
\S Matches any non-white-space character. \s\S " _" in "int __ctr"
\d Matches any decimal digit. \d "4" in "4 = IV"
\D Matches any character other than a
decimal digit. \D " ", "=", " ", "I"
"V" in "4 = IV"
Anchors
Anchors, or atomic zero-width assertions, cause a match to succeed or fail depending on the current position in the string, but they do not cause the engine to advance through the string or consume characters. The metacharacters listed in the following table are anchors.
Assertion Description Pattern Matches
^ The match must start at the beginning of the
string or line. ^\d{3} "901" in "901-333-"
$ The match must occur at the end of the string
or before \n at the end of the line or string. -\d{3}$ "-333" in "-901-333"
\A The match must occur at the start of the string. \A\d{3} "901" in "901-333-"
\Z The match must occur at the end of the string
or before \n at the end of the string. -\d{3}\Z "-333" in "-901-333"
\z The match must occur at the end of the string. -\d{3}\z "-333" in "-901-333"
\G The match must occur at the point where the
previous match ended. \G\(\d\) "(1)", "(3)", "(5)" in "(1)(3)(5)[7](9)"
\b The match must occur on a boundary between
a \w (alphanumeric) and a \W (nonalphanumeric)
character. \b\w+\s\w+\b "them theme", "them them" in
"them theme them them"
\B The match must not occur on a \b boundary. \Bend\w*\b "ends", "ender" in
"end sends endure lender"
Grouping Constructs
Grouping constructs delineate subexpressions of a regular expression and typically capture substrings of an input string. Grouping constructs include the language elements listed in the following table.
Grouping construct Description Pattern Matches
( subexpression ) Captures the matched subexpression and
assigns it a zero-based ordinal number. (\w)\1 "ee" in "deep"
(?< name > subexpression) Captures the matched subexpression into
a named group. (?<double>\w)\k<double> "ee" in "deep"
(?< name1 - name2 > subexpression) Defines a balancing group definition. (((?'Open'\()[^\(\)]*)+ "((1-3)*(3-1))" in
(((?'Open'\()[^\(\)]*)+ "3+2^((1-3)*(3-1))"
((?'Close-Open'\))[^\(\)]*)+)*
(?(Open)(?!))$
(?: subexpression) Defines a noncapturing group. Write(?:Line)? "WriteLine" in
"Console.WriteLine()"
(?imnsx-imnsx: subexpression) Applies or disables the specified
options within subexpression. A\d{2}(?i:\w+)\b "A12xl", "A12XL" in
"A12xl A12XL a12xl"
(?= subexpression) Zero-width positive lookahead assertion. \w+(?=\.) "is", "ran", and "out" in
"He is. The dog ran. The sun is out."
(?! subexpression) Zero-width negative lookahead assertion. \b(?!un)\w+\b "sure", "used" in
"unsure sure unity used"
(?<= subexpression) Zero-width positive lookbehind assertion.(?<=19)\d{2}\b "99", "50", "05" in
"1851 1999 1950 1905 2003"
(?<! subexpression) Zero-width negative lookbehind assertion.(?<!19)\d{2}\b "51", "03" in
"1851 1999 1950 1905 2003"
(?> subexpression) Nonbacktracking (or "greedy")
subexpression. [13579](?>A+B+) "1ABB", "3ABB", and "5AB" in
"1ABB 3ABBC 5AB 5AC"
Quantifiers
A quantifier specifies how many instances of the previous element (which can be a character, a group, or a character class) must be present in the input string for a match to occur. Quantifiers include the language elements listed in the following table.
Quantifier Description Pattern Matches
* Matches the previous element zero or
more times. \d*\.\d ".0", "19.9", "219.9"
+ Matches the previous element one or
more times. "be+" "bee" in "been", "be" in "bent"
? Matches the previous element zero or
one time. "rai?n" "ran", "rain"
{ n } Matches the previous element exactly n times. ",\d{3}" ",043" in "1,043.6", ",876", ",543", and
",210" in "9,876,543,210"
{ n ,} Matches the previous element at least n times."\d{2,}" "166", "29", "1930"
{ n , m } Matches the previous element at least n times,
but no more than m times. "\d{3,5}" "166", "17668" "19302" in "193024"
*? Matches the previous element zero or more
times, but as few times as possible. \d*?\.\d ".0", "19.9", "219.9"
+? Matches the previous element one or more
times, but as few times as possible. "be+?" "be" in "been", "be" in "bent"
?? Matches the previous element zero or one
time, but as few times as possible. "rai??n" "ran", "rain"
{ n }? Matches the preceding element exactly n
times. ",\d{3}?" ",043" in "1,043.6", ",876", ",543", and
",210" in "9,876,543,210"
{ n ,}? Matches the previous element at least n
times, but as few times as possible. "\d{2,}?" "166", "29", "1930"
{ n , m }? Matches the previous element between n and
m times, but as few times as possible. "\d{3,5}?" "166", "17668" "193", "024" in "193024"
Backreference Constructs
A backreference allows a previously matched subexpression to be identified subsequently in the same regular expression. The following table lists the backreference constructs supported by regular expressions in Sputnik.
Backreference construct Description Pattern Matches
\ number Backreference. Matches the value
of a numbered subexpression. (\w)\1 "ee" in "seek"
\k< name > Named backreference. Matches the
value of a named expression. (?<char>\w)\k<char> "ee" in "seek"
Alternation Constructs
Alternation constructs modify a regular expression to enable either/or matching. These constructs include the language elements listed in the following table.
Alternation construct Description Pattern Matches
| Matches any one element separated by the
vertical bar (|) character. th(e|is|at) "the", "this" in
"this is the day. "
(?( expression ) yes | no ) Matches yes if the regular expression
pattern designated by expression matches;
otherwise, matches the optional no part.
expression is interpreted as a zero-width
assertion. (?(A)A\d{2}\b|\b\d{3}\b) "A10", "910" in
"A10 C103 910"
(?( name ) yes | no ) Matches yes if name, a named or numbered
capturing group, has a match; otherwise,
matches the optional no. (?<quoted>")?(?(quoted).+?"|\S+\s) Dogs.jpg, "Yiska playing.jpg" in
"Dogs.jpg "Yiska playing.jpg""
Substitutions
Substitutions are regular expression language elements that are supported in replacement patterns.
The metacharacters listed in the following table are atomic zero-width assertions.
Character Description Pattern Replacement pattern Input string Result string
$ number Substitutes the substring
matched by group number. \b(\w+)(\s)(\w+)\b $3$2$1 "one two" "two one"
${ name } Substitutes the substring
matched by the named group
name. \b(?<word1>\w+)(\s)(?<word2>\w+)\b ${word2} ${word1} "one two" "two one"
$$ Substitutes a literal "$". \b(\d+)\s?USD $$$1 "103 USD" "$103"
$& Substitutes a copy of the
whole match. (\$*(\d*(\.+\d+)?){1}) **$& "$1.30" "**$1.30**"
$` Substitutes all the text of
the input string before the
match. B+ $` "AABBCC" "AAAACC"
$' Substitutes all the text of
the input string after the
match. B+ $' "AABBCC" "AACCCC"
$+ Substitutes the last group
that was captured. B+(C+) $+ "AABBCCDD" AACCDD
$_ Substitutes the entire input
string. B+ $_ "AABBCC" "AAAABBCCCC"
Miscellaneous Constructs
Miscellaneous constructs either modify a regular expression pattern or provide information about it.
The following table lists the miscellaneous constructs supported by the Sputnik.
Construct Definition Example
(?imnsx-imnsx) Sets or disables options such as case
insensitivity in the middle of a pattern. \bA(?i)b\w+\b matches "ABA", "Able" in "ABA Able Act"
(?# comment) Inline comment. The comment ends at the first
closing parenthesis. \bA(?#Matches words starting with A)\w+\b
# [to end of line] X-mode comment. The comment starts at an
unescaped # and continues to the end of the line. (?x)\bA\w+\b#Matches words starting with A
Supported Named Blocks
Sputnik provides the named blocks listed in the following tables.
The set of supported named blocks is based on Unicode 4.0 and Perl 5.6.
// Default and standard
alnum letters and digits
alpha letters
ascii character codes 0 - 127
blank space or tab only
cntrl control characters
digit decimal digits (same as \d)
graph printing characters, excluding space
lower lower case letters
print printing characters, including space
punct printing characters, excluding letters and digits
space white space (not quite the same as \s)
upper upper case letters
lower lower case letters
word "word" characters (same as \w)
xdigit hexadecimal digits
// Additional
IsBasicLatin ---> Unicode Range:0000 - 007F
IsLatin-1Supplement ---> Unicode Range:0080 - 00FF
IsLatinExtended-A ---> Unicode Range:0100 - 017F
IsLatinExtended-B ---> Unicode Range:0180 - 024F
IsIPAExtensions ---> Unicode Range:0250 - 02AF
IsSpacingModifierLetters ---> Unicode Range:02B0 - 02FF
IsCombiningDiacriticalMarks ---> Unicode Range:0300 - 036F
IsGreek ---> Unicode Range:0370 - 03FF
IsCyrillic ---> Unicode Range:0400 - 04FF
IsCyrillicSupplement ---> Unicode Range:0500 - 052F
IsArmenian ---> Unicode Range:0530 - 058F
IsHebrew ---> Unicode Range:0590 - 05FF
IsArabic ---> Unicode Range:0600 - 06FF
IsSyriac ---> Unicode Range:0700 - 074F
IsThaana ---> Unicode Range:0780 - 07BF
IsDevanagari ---> Unicode Range:0900 - 097F
IsBengali ---> Unicode Range:0980 - 09FF
IsGurmukhi ---> Unicode Range:0A00 - 0A7F
IsGujarati ---> Unicode Range:0A80 - 0AFF
IsOriya ---> Unicode Range:0B00 - 0B7F
IsTamil ---> Unicode Range:0B80 - 0BFF
IsTelugu ---> Unicode Range:0C00 - 0C7F
IsKannada ---> Unicode Range:0C80 - 0CFF
IsMalayalam ---> Unicode Range:0D00 - 0D7F
IsSinhala ---> Unicode Range:0D80 - 0DFF
IsThai ---> Unicode Range:0E00 - 0E7F
IsLao ---> Unicode Range:0E80 - 0EFF
IsTibetan ---> Unicode Range:0F00 - 0FFF
IsMyanmar ---> Unicode Range:1000 - 109F
IsGeorgian ---> Unicode Range:10A0 - 10FF
IsHangulJamo ---> Unicode Range:1100 - 11FF
IsEthiopic ---> Unicode Range:1200 - 137F
IsCherokee ---> Unicode Range:13A0 - 13FF
IsUnifiedCanadianAboriginalSyllabics ---> Unicode Range:1400 - 167F
IsOgham ---> Unicode Range:1680 - 169F
IsRunic ---> Unicode Range:16A0 - 16FF
IsTagalog ---> Unicode Range:1700 - 171F
IsHanunoo ---> Unicode Range:1720 - 173F
IsBuhid ---> Unicode Range:1740 - 175F
IsTagbanwa ---> Unicode Range:1760 - 177F
IsKhmer ---> Unicode Range:1780 - 17FF
IsMongolian ---> Unicode Range:1800 - 18AF
IsLimbu ---> Unicode Range:1900 - 194F
IsTaiLe ---> Unicode Range:1950 - 197F
IsKhmerSymbols ---> Unicode Range:19E0 - 19FF
IsPhoneticExtensions ---> Unicode Range:1D00 - 1D7F
IsLatinExtendedAdditional ---> Unicode Range:1E00 - 1EFF
IsGreekExtended ---> Unicode Range:1F00 - 1FFF
IsGeneralPunctuation ---> Unicode Range:2000 - 206F
IsSuperscriptsandSubscripts ---> Unicode Range:2070 - 209F
IsCurrencySymbols ---> Unicode Range:20A0 - 20CF
IsCombiningDiacriticalMarksforSymbols ---> Unicode Range:20D0 - 20FF
IsLetterlikeSymbols ---> Unicode Range:2100 - 214F
IsNumberForms ---> Unicode Range:2150 - 218F
IsArrows ---> Unicode Range:2190 - 21FF
IsMathematicalOperators ---> Unicode Range:2200 - 22FF
IsMiscellaneousTechnical ---> Unicode Range:2300 - 23FF
IsControlPictures ---> Unicode Range:2400 - 243F
IsOpticalCharacterRecognition ---> Unicode Range:2440 - 245F
IsEnclosedAlphanumerics ---> Unicode Range:2460 - 24FF
IsBoxDrawing ---> Unicode Range:2500 - 257F
IsBlockElements ---> Unicode Range:2580 - 259F
IsGeometricShapes ---> Unicode Range:25A0 - 25FF
IsMiscellaneousSymbols ---> Unicode Range:2600 - 26FF
IsDingbats ---> Unicode Range:2700 - 27BF
IsMiscellaneousMathematicalSymbols-A ---> Unicode Range:27C0 - 27EF
IsSupplementalArrows-A ---> Unicode Range:27F0 - 27FF
IsBraillePatterns ---> Unicode Range:2800 - 28FF
IsSupplementalArrows-B ---> Unicode Range:2900 - 297F
IsMiscellaneousMathematicalSymbols-B ---> Unicode Range:2980 - 29FF
IsSupplementalMathematicalOperators ---> Unicode Range:2A00 - 2AFF
IsMiscellaneousSymbolsandArrows ---> Unicode Range:2B00 - 2BFF
IsCJKRadicalsSupplement ---> Unicode Range:2E80 - 2EFF
IsKangxiRadicals ---> Unicode Range:2F00 - 2FDF
IsIdeographicDescriptionCharacters ---> Unicode Range:2FF0 - 2FFF
IsCJKSymbolsandPunctuation ---> Unicode Range:3000 - 303F
IsHiragana ---> Unicode Range:3040 - 309F
IsKatakana ---> Unicode Range:30A0 - 30FF
IsBopomofo ---> Unicode Range:3100 - 312F
IsHangulCompatibilityJamo ---> Unicode Range:3130 - 318F
IsKanbun ---> Unicode Range:3190 - 319F
IsBopomofoExtended ---> Unicode Range:31A0 - 31BF
IsKatakanaPhoneticExtensions ---> Unicode Range:31F0 - 31FF
IsEnclosedCJKLettersandMonths ---> Unicode Range:3200 - 32FF
IsCJKCompatibility ---> Unicode Range:3300 - 33FF
IsCJKUnifiedIdeographsExtensionA ---> Unicode Range:3400 - 4DBF
IsYijingHexagramSymbols ---> Unicode Range:4DC0 - 4DFF
IsCJKUnifiedIdeographs ---> Unicode Range:4E00 - 9FFF
IsYiSyllables ---> Unicode Range:A000 - A48F
IsYiRadicals ---> Unicode Range:A490 - A4CF
IsHangulSyllables ---> Unicode Range:AC00 - D7AF
IsHighSurrogates ---> Unicode Range:D800 - DB7F
IsHighPrivateUseSurrogates ---> Unicode Range:DB80 - DBFF
IsLowSurrogates ---> Unicode Range:DC00 - DFFF
IsPrivateUse or IsPrivateUseArea ---> Unicode Range:E000 - F8FF
IsCJKCompatibilityIdeographs ---> Unicode Range:F900 - FAFF
IsAlphabeticPresentationForms ---> Unicode Range:FB00 - FB4F
IsArabicPresentationForms-A ---> Unicode Range:FB50 - FDFF
IsVariationSelectors ---> Unicode Range:FE00 - FE0F
IsCombiningHalfMarks ---> Unicode Range:FE20 - FE2F
IsCJKCompatibilityForms ---> Unicode Range:FE30 - FE4F
IsSmallFormVariants ---> Unicode Range:FE50 - FE6F
IsArabicPresentationForms-B ---> Unicode Range:FE70 - FEFF
IsHalfwidthandFullwidthForms ---> Unicode Range:FF00 - FFEF
IsSpecials ---> Unicode Range:FFF0 - FFFF
// As usual you use these like
// \p{IsBasicLatin}
// $var =~ m/\p{IsBasicLatin}/;
Example
Check if any item within an array matches the given pattern :
$test = array("Cat", "Dog222", "Fox"); if($test =~ m/\d+/) { say "Found"; } else { say "Not found"; }
Check if all the items within an array matches the given pattern :
$test = array("Cat11", "Dog222", "Fox22"); if($test =~ m/\d+/b) { say "Found"; } else { say "Not found"; }
Check if any key within an array matches the given pattern :
$test = array("Cat" => "Meow", "Dog222" => "Woof"); if($test =~ m/\d+/v) { say "Found $1"; } else { say "Not found"; }
Check if all the keys within an array matches the given pattern :
$test = array("Cat33" => "Meow", "Dog222" => "Woof"); if($test =~ m/\d+/vb) { say "Found $1"; } else { say "Not found"; }
Check if a string matches a given pattern :
// Set a string to parse $str = "Hello, World!"; if( $str =~ m/\w+,\s+\w+!/ ) { println("True"); } else { println("False"); }
Check if a string matches a given pattern case insensitive :
// Set a string to parse $str = "Hello, World!"; if( $str =~ m/hello,\s+WORLD!/i ) { println("True"); } else { println("False"); }
Simple matching a string and returning 2 captured groups :
// Set a string to parse $str = 'Account Test Credits 777'; // Do the regex match $str =~ m/Account\s+(\w+)\s+\w+\s+(\d+)/i; println("Account '$1' Credits '$2'"); // Prints // Account 'Test' Credits '777'
Same as above but with an array
// Set a string to parse $arr = array('Cat', 'Account Test Credits 777', 'Foo'); // Do the regex match $arr =~ m/Account\s+(\w+)\s+\w+\s+(\d+)/i; println("Account '$1' Credits '$2'"); // Prints // Account 'Test' Credits '777'
Simple matching a string and returning 2 captured groups and saving the variables:
// Set a string to parse $str = 'Account Test Credits 777'; // Do the regex match $str =~ m/Account\s+(\w+)\s+\w+\s+(\d+)/i; $Account = $1; $Credits = $2; println("Account '$Account' Credits '$Credits'"); // Prints // Account 'Test' Credits '777'
Same as above but with an array
// Set a string to parse $arr = array('Cat', 'Account Test Credits 777', 'Foo'); // Do the regex match $arr =~ m/Account\s+(\w+)\s+\w+\s+(\d+)/i; $Account = $1; $Credits = $2; println("Account '$Account' Credits '$Credits'"); // Prints // Account 'Test' Credits '777'
Same thing but this time parsing multiple lines of accounts :
// Set a string to parse $str = 'Account Test Credits 777' . @CRLF; $str .= 'Account FoX Credits 1337' . @CRLF; $str .= 'Account Cat Credits 100' . @CRLF; $str .= 'Account Dog Credits 50' . @CRLF; // Do the regex match $str =~ m/Account\s+(\w+)\s+\w+\s+(\d+)/ig; // Print them all for($i = 0; $i < @Groups; $i++) { $Account = $_rg[$i][1]; $Credits = $_rg[$i][2]; println("Match ($i) | Account '" . $Account . "' | Credits '" . $Credits . "'" ); } // Prints // Match (0) | Account 'Test' | Credits '777' // Match (1) | Account 'FoX' | Credits '1337' // Match (2) | Account 'Cat' | Credits '100' // Match (3) | Account 'Dog' | Credits '50'
Case insensitive match on a string to capture all possible matches and return them as a multi-dimensional array :
// Set a string to parse $str = '<test>a</test> <test>b</test> <test>c</Test>'; // Do the regex match $str =~ m/<(?i)test>(.*?)<\/(?i)test>/ig; // How many groups did we find? println("Found groups: " . @Groups); // Print them all for($i = 0; $i < @Groups; $i++) { $match = $_rg[$i]; println("Match ($i) | Text '" . $match[0] . "' | Group text '" . $match[1] . "'" ); } // Prints // Found groups: 3 // Match (0) | Text '<test>a</test>' | Group text 'a' // Match (1) | Text '<test>b</test>' | Group text 'b' // Match (2) | Text '<test>c</Test>' | Group text 'c'
Case insensitive match on a string to capture all possible matches and return them as a single dimension array :
// Set a string to parse $str = '<test>a</test> <test>b</test> <test>c</Test>'; // Do the regex match $str =~ m/<(?i)test>(.*?)<\/(?i)test>/igd; // How many groups did we find? println("Found groups: " . @Groups); // Print them all for($i = 0; $i < @Groups; $i++) { println("Match ($i) | Group text '" . $_rg[$i] . "'" ); } // Prints // Found groups: 3 // Match (0) | Group text 'a' // Match (1) | Group text 'b' // Match (2) | Group text 'c'
Example of using the /x flag
my $a = "xor eax, edx"; $a =~ m/ (\w+) # You can add comments \s* (\w+) \s* # Yup comments all over , \s* (\w+) /x; print( "'$1' -> '$2' -> '$3'" );
Example of using a While loop (While loops with regexp wont work properly without the /g flag)
// Set a string to parse $str = 'Account Test Credits 777' . @CRLF; $str .= 'Account FoX Credits 1337' . @CRLF; $str .= 'Account Cat Credits 100' . @CRLF; $str .= 'Account Dog Credits 50' . @CRLF; while( $str =~ m/Account\s+(\w+)\s+\w+\s+(\d+)/ig ) { $Account = $_rg[$_][1]; $Credits = $_rg[$_][2]; println("Match ($_) | Account '" . $Account . "' | Credits '" . $Credits . "'" ); }
Same as above but this time using an Array
$str = array( 'Account Test Credits 777', 'Account FoX Credits 1337', 'Account Cat Credits 100', 'Account Dog Credits 50' ); while( $str =~ m/Account\s+(\w+)\s+\w+\s+(\d+)/ig ) { $Account = $_rg[$_][1]; $Credits = $_rg[$_][2]; println("Match ($_) | Account '" . $Account . "' | Credits '" . $Credits . "'" ); }
Example of named capture groups
$str = "xor eax, edx"; if( $str =~ m/xor\s*(?<first>\w*),\s*(?<second>\w*)/ ) { println("True: " . $_rg["first"] . " | " . $_rg["second"]); } else { println("False"); }
Example of using Regexp with a foreach loop
my $delimited = @"\G(.+)[\t\u007c](.+)\r?\n"; my $input = "Mumbai, India|13,922,125\t\n" . "Shanghai, China\t13,831,900\n" . "Karachi, Pakistan|12,991,000\n" . "Dehli, India\t12,259,230\n" . "Istanbul, Turkey|11,372,613\n"; printf("Population of the World's Largest Cities, 2009\n"); printf("\n"); printf(@"%-30s %s" . "\n", "City", "Population"); foreach( $input =~ m/$delimited/gk as my $m ) { my List( $City, $Pop ) = *$m; // Note A printf(@"%-30s %s" . "\n", $City, $Pop); } // As shown in Note A it uses *$m rather than $m this is because // the $m is actually a reference and needs to be resolved // Prints // Population of the World's Largest Cities, 2009 // // City Population // Mumbai, India 13,922,125 // Shanghai, China 13,831,900 // Karachi, Pakistan 12,991,000 // Dehli, India 12,259,230 // Istanbul, Turkey 11,372,613
Same as above but using an array
my $delimited = @"\G(.+)[\t\u007c](.+)"; my $input = array( "Mumbai, India|13,922,125", "Shanghai, China\t13,831,900", "Karachi, Pakistan|12,991,000", "Dehli, India\t12,259,230", "Istanbul, Turkey|11,372,613" ); printf("Population of the World's Largest Cities, 2009\n"); printf("\n"); printf(@"%-30s %s" . "\n", "City", "Population"); foreach( $input =~ m/$delimited/gk as my $m ) { my List( $City, $Pop ) = *$m; // Note A printf("%-30s %s\n", $City, $Pop); } // As shown in Note A it uses *$m rather than $m this is because // the $m is actually a reference and needs to be resolved // Prints // Population of the World's Largest Cities, 2009 // // City Population // Mumbai, India 13,922,125 // Shanghai, China 13,831,900 // Karachi, Pakistan 12,991,000 // Dehli, India 12,259,230 // Istanbul, Turkey 11,372,613
The following example illustrates a regular expression that identifies duplicated words in text. The regular expression pattern's two capturing groups represent the two instances of the duplicated word. The second instance is captured to report its starting position in the input string.
my $delimited = @"(\w+)\s(\1)"; my $input = "He said that that was the the correct answer."; foreach( $input =~ m/$delimited/iga as my $m ) { my List( $Value1, $Index1, $Length1 ) = *$m[1]; my List( $Value2, $Index2, $Length2 ) = *$m[2]; printf("Duplicate '%s' found at positions %s and %s.\n", $Value1, $Index1, $Index2 ); } // Prints // Duplicate 'that' found at positions 8 and 13. // Duplicate 'the' found at positions 22 and 26.
Example of using the /a flag to get more information about a match
my $input = "The quick 777 fox"; $input =~ m/(\d+)/a; my $Text = $_rg[1][0]; my $Index = $_rg[1][1]; my $Length = $_rg[1][2]; printf("Searching for digits in: %s\n", $input); printf("Matched text '%s' index '%s' length '%s'\n", $Text, $Index, $Length); printf("Text before match: %s\n", substr($input, 0, $Index)); printf("Text after match: %s\n", substr($input, $Index + $Length)); // Prints // Searching for digits in: The quick 777 fox // Matched text '777' index '10' length '3' // Text before match: The quick // Text after match: fox
Example of how to read the QUERY_STRING and produce an array the same as PHPs $_RQUEST array
Global $Request = array(); my $QueryString = EnvGet('QUERY_STRING'); my $QueryList = Split($QueryString, '&'); Foreach($QueryList as my $i) { my List ( $Key, $Value ) = Split($i, '='); $Value =~ s/%([a-fA-F0-9][a-fA-F0-9])/ChrW(Dec($1))/ego; $Value =~ s/\+/ /gi; $Value =~ s/\</</gi; $Value =~ s/\>/>/gi; $Request[$Key] = $Value; }
If you ignore the $var =~ part and just do the regex part it will use the $_ variable example
my $_ = "cat"; if (m/cat/) { say "True"; } else { say "False"; } // PRINTS // True