Wilson WindowWare Tech Support

WinBatch WinBatch+Compiler WebBatch
Home | Tech Database | Tech BBS | White Papers | Purchase


Regular Expressions with OLE and Regular Expression Object

Keywords:   regular expressions regular expression object OLE

This script uses a regular expression pattern to find and return the first match for the pattern in a WIL variable.
It is an example that shows how the Windows regular expression capability can be used from WinBatch.
; Author: Jim Stiles 20020628

GoSub UserDefinedFunctions  
; Initialize the regular expression function.
; The function is SimpleRegEx(myString,myExpression,ignoreCase).
; myString can be any variable. myExpression is a regular expression pattern. 
; ignoreCase is logical @TRUE or @FALSE.

myString = "myabCthis is the second myabcthing" ; String with several possible matches for the expression pattern.
myExpression = "a.*?c"
; match from the first a to and including the next c. To exclude the c, try [^c]* in place of .*? .
; A regular expression pattern. .* signifies match any number of characters. 
; The ? finds only the first match. Without it, the pattern would be "greedy"
; The greedy match is "abCthis is the second myabc". This is usually an error.
; A list of common expressions is at the end of this script.
ignoreCase = @FALSE ; Default is to test for case in string text. Set to @TRUE to ignore case.

; To select the string from the first a to the first c, use the ?. 
; Without the ?, the expression would select a string from the first a to the final, and third, c.

myReturn = SimpleRegEx(myString,myExpression,ignoreCase)
foundStr = ItemExtract(1,myReturn,@TAB)
foundPos = ItemExtract(2,myReturn,@TAB)
foundLen = ItemExtract(3,myReturn,@TAB)
myText = StrCat("Source string => ",myString,@CR,"Search pattern => ",myExpression,"   (note the case of the c)",@CR)
myText = StrCat(myText,"Found string => ",foundStr,@CR,"String Start => ",foundPos,@CR)
myText = StrCat(myText,"String Length => ", foundlen)


Message("Results", myText)

:cancel	 ; Used in all WIL scripts to handle dialog box cancel actions by a user.
Exit		 ; End of the script.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;; END OF SCRIPT ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;                 User Defined Function Section
:UserDefinedFunctions

#DefineFunction SimpleRegEx(myString,myExpression,ignoreCase)

regex = ObjectOpen("VBScript.RegExp") ; Creates a regular expression object for use by WinBatch.

regex.Pattern = myExpression  ; Required to establish a regular expression pattern to use in searching.
regex.IgnoreCase = ignoreCase ; Optional, but required by this user defined function.

	If regex.Test(myString) == 0 ; Set default return values if search string was not found.
		index = -1
		position = 0
		value = ""
		length = -1 
		GoTo finish
	EndIf

objCollection = regex.Execute(myString) ; This initiates the object collection of the regular expression object. (!)

henum = ObjectCollectionOpen(objCollection) ; OK. Start 
match = ObjectCollectionNext(henum)
index = match.FirstIndex  ; 0 based index. Note that WIL variables are indexed at 1. Arrays are indexed at 0.
position = index + 1
value = match.Value
length = match.Length

:finish
list = StrCat(value,@TAB,position,@TAB,length) ; Position is the normal WIL use based on 1 as the first character.
 
ObjectCollectionClose(henum)
ObjectClose(regex)

Return list
#EndFunction

Return
;------------------------  Common Regular Expressions ----------------------
;||||||||||||||||||  Text Spaces Numbers
;Patterns can include any letters, spaces, and numbers. Punctuation characters, however, are used differently.
;
;||||||||||||||||||  Punctuation and Special Characters
;Regular expressions reserve punctuation characters for special uses. Several are 
;   identical to the wildcards used in DOS and WIL functions such as StrReplaceWild(). 
;   * means any number of something. -* means any number of hyphens. {n} means n times. {n,m} is n to m times.
;   ? means 0 or 1 instance. 
;   + means one or more instances. 
;   . means any character except a line end or new line.
;   .* stands for any number of any character. To keep a search from going to the end of data and back again,
;    follow .* and .+ by a ? character. Example: use .*? instead of .* . An alternative method: Say you want to capture the string
;    between two consecutive m characters. But there are 20 of these in the source string. Use m^[^m]. to do this. The 
;    regular expression engine will begin with the first m. Then it will find any non m characters up to the next one,
;    and, this is important, it will stop there. The regular expression function returns the position and length of the
;    found string. Using these, WIL can continue searching through more data.
;    
;
;|||||||||||||||||| Escape Character and Special Character Codes
; Special character escape codes. Use a backslash to escape the special meaning of these punctuation characters:
;     . * + $ ^ ? \ ; Examples: \. and \\ where double backslashes stand for one backslash. 
;     The \\ is used constantly in DOS/Windows file paths. 
;
; Tabs and other special characters also have regular expression abbreviations. 
; \t tab, (CRLF) DOS/Windows line end \r. A other newlines are \n. Hexidecimal equivalents can represent 
; any byte. \x1A is the old DOS end of file byte. tab is \x09. For others, see WinBatch.com tech support article W13468.
;
;||||||||||||||||||    Position Markers
;
; ^ start of input string. $ end of input string. \b boundary between a word and a character not allowed in a word.
; Examples of these are underscores, periods, and spaces. \B identifies a position followed by a word character. 
; (No _ underscores, no  spaces, no . periods, but hyphens - are OK)
; \d any digit. \D not a digit. \n new line. \r DOS/Windows newline. \s white space. \t tab. \w word character.
; \W non word like [^A-Za-z0-9] \n recursive matching of patterns included between parentheses.
;
;||||||||||||||||||    Ranges
; [a@ge7_&0925k O] Select any of these characters.
; [^13579]  Selects none of these. Example chooses no odd digits. [^@] rules out email addresses.
; Note the two uses of the ^ character. above to signify the start of data. 
; [A_Za-z] upper and lower case letters. [aeiouAEIOU] all vowels. [0-9] single digit integers. 
;
;||||||||||||||||||    Logical
; [x|y] selects either x or y.
;