Can't find the information you are looking for here? Then leave a message over on our WinBatch Tech Support Forum.
Keywords: Parsing extract grab get text HTM strip HTML MSIE IE Internet Explorer COM OLE Regular Expressions httpStripHTML
URL
Browser = ObjectCreate("InternetExplorer.Application") browser.visible = @true browser.navigate ("http://www.winbatch.com") ;to access an internet file while browser.readystate <> 4 timedelay(0.5) endwhile BrowserDoc = Browser.Document BrowserBody = BrowserDoc.Body BrowserPage = BrowserBody.CreateTextRange BrowserText = BrowserPage.Text message("Text", BrowserText) exitHTML String
;*************************************************************************** ;** ;** Internet Explorer Strip HTML ;** ;*************************************************************************** #DefineFunction udfStripHTML(strHTML) ;This code doesn't include TITLE in results objIE = ObjectCreate("InternetExplorer.Application") objIE.visible = @FALSE objIE.navigate("about:blank") objDiv = objIE.Document.createElement("div"); objDiv.innerHTML = strHTML strTxt = objDiv.InnerText ;Pause('InnerText',strTxt) objIE.quit objDiv = 0 objIE = 0 Return strTxt #EndFunction strHTML= `<HTML><HEAD><TITLE>Basic HTML Sample Page</TITLE></HEAD><BODY><CENTER><H1>A Simple Sample Web Page</H1></CENTER></BODY></HTML> ` strText = udfStripHTML(strHTML) Pause('udfStripHTML',strText)
AddExtender("wwwsk34i.dll") URL="http://www.winbatch.com/" serv=httpGetServer(URl, "") path=httpGetpath(URl, "") a=httpRecvText(serv, path, 30000, 0) a=httpStripHTML(a) message("File Contents are", a)
Here are some Regular Expression UDFs at Regular Expression UDFs
;*************************************************************************** ;** ;** RegEx Strip HTML ;** ;*************************************************************************** #DefineFunction udfRegExStripHTML(strHTML) objRegExp = ObjectCreate("VBScript.RegExp") ; Creates a regular expression object for use by WinBatch. objRegExp.IgnoreCase = @TRUE ; Set case insensitivity. Default is @false. objRegExp.Global = @TRUE ; Set global applicability. Default is @false. objRegExp.Pattern = `<(.|\n)*?>` ; ; Thanks to Roy Osherove, Oisin and Hugh Brown : http://osherove.com/blog/2003/5/13/strip-html-tags-from-a-string-using-regular-expressions.html ;objRegExp.Pattern = `<[^>]*>` strTxt = objRegExp.Replace(strHTML,"") objRegExp = 0 Return strTxt #EndFunction strHTML= `<HTML><HEAD><TITLE>Basic HTML Sample Page</TITLE></HEAD><BODY><CENTER><H1>A Simple Sample Web Page</H1></CENTER></BODY></HTML> ` strText = udfRegExStripHTML(strHTML) Pause('udfRegExStripHTML',strText) Exit
It rambles on about "template" files, but the section on BinaryTags is relavent...
Article ID: W15522
File Created: 2014:07:18:09:51:38
Last Updated: 2014:07:18:09:51:38