Parse HTML
Keywords: Parsing extract grab get text HTM strip HTML
Question:
I am trying to parse some HTML files for some specific values. What is the preferred method in winbatch for parsing an HTML File? Should I do it using regular string functions? Any tricks or EXTENDERS? A regular expression extender perhaps?Answer:
There are quite a few different ways to parse HTML, using WinBatch
Via OLE:
http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/OLE~ADO~CDO~ADSI~LDAP/OLE~and~Outlook+Get~text~from~HTML~pages.txtBrowser = ObjectOpen("InternetExplorer.Application") browser.visible = @true browser.navigate ("http://www.winbatch.com") ;to access an internet file while browser.readystate <> 4 timedelay(0.5) endwhile BrowserDoc = Browser.Document BrowserBody = BrowserDoc.Body BrowserPage = BrowserBody.CreateTextRange BrowserText = BrowserPage.Text message("Text", BrowserText) exit
Using WinSock Extender:
See the functions httpRecvText and httpStripHTMLAddExtender("wwwsk34i.dll") URL="http://www.winbatch.com/" serv=httpGetServer(URl, "") path=httpGetpath(URl, "") a=httpRecvText(serv, path, 30000, 0) a=httpStripHTML(a) message("File Contents are", a)
Using Regular Expressions:
Here are some Regular Expression UDFs at http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/UDF~-~UDS~Library+Regular~Expressions~UDFs.txt
Using BinaryTag Functions:
http://techsupt.winbatch.com/Newsletters/September2002.htmlIt rambles on about "template" files, but the section on BinaryTags is relavent...