Wilson WindowWare Tech Support

WinBatch WinBatch+Compiler WebBatch
Home | Tech Database | Tech BBS | White Papers | Purchase


Parse HTML

 Keywords: Parsing extract grab get text HTM strip HTML

Question:

I am trying to parse some HTML files for some specific values. What is the preferred method in winbatch for parsing an HTML File? Should I do it using regular string functions? Any tricks or EXTENDERS? A regular expression extender perhaps?

Answer:

There are quite a few different ways to parse HTML, using WinBatch

Via OLE:

http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/OLE~ADO~CDO~ADSI~LDAP/OLE~and~Outlook+Get~text~from~HTML~pages.txt
Browser = ObjectOpen("InternetExplorer.Application")
browser.visible = @true
browser.navigate ("http://www.winbatch.com") ;to access an internet file

while browser.readystate <> 4
       timedelay(0.5)
endwhile

BrowserDoc = Browser.Document
BrowserBody = BrowserDoc.Body
BrowserPage = BrowserBody.CreateTextRange
BrowserText = BrowserPage.Text

message("Text", BrowserText)
exit


Using WinSock Extender:

See the functions httpRecvText and httpStripHTML
AddExtender("wwwsk34i.dll")
URL="http://www.winbatch.com/"
serv=httpGetServer(URl, "")
path=httpGetpath(URl, "")
a=httpRecvText(serv, path, 30000, 0)
a=httpStripHTML(a)   
message("File Contents are", a)

Using Regular Expressions:

Here are some Regular Expression UDFs at http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/UDF~-~UDS~Library+Regular~Expressions~UDFs.txt


Using BinaryTag Functions:

http://techsupt.winbatch.com/Newsletters/September2002.html

It rambles on about "template" files, but the section on BinaryTags is relavent...