Working With Web Pages Tutorial

Jay Alverson

Introduction

This is an attempt to cover the basic questions I've seen on the Winbatch BBS dealing with accessing the web, MSIE and XML.

Working with web pages, XML and various products is usually an on-going process, meaning that it's something that may take a while, not to mention study and research. This tutorial should give you a leg up when wandering into that world using Winbatch. These examples require you be familiar with Winbatch and have a working knowledge of the areas they deal with. Moving beyond the information provided here will become the burden of the programmer as they move into more complex and detailed coding scenarios.

Of course before we get into specifics we'll need to make sure your computer has the correct tools. The introductory script entitled Q0.WBT will assist you in finding out if you have the components needed to run the examples in this tutorial.

Q0.WBT

;   Question:  How do I find out if I can use these scripts ?

#DefineFunction QueryRegistry(srchstring)
   topkey = @REGCLASSES
   topsub = "\"
   looktype = 0
   lookat = 7
   dosubtree = @FALSE
   retall = rRegSearch(topkey,topsub,srchstring,looktype,lookat,dosubtree)
   cleaned = ""
   For x = 1 To ItemCount(retall, @TAB)
      ThisLine = ItemExtract(x, retall, @TAB)
      If StrSub(ThisLine, 1,1) == "\" Then ThisLine = StrSub(ThisLine, 2, -1)
      If !StrIndexNC(ThisLine, "[]", 1, @FWDSCAN) Then cleaned = ItemInsert(ThisLine, -1, cleaned, @TAB)
   Next
   retall = cleaned
   Drop(cleaned)
   retall = StrReplace(retall,@TAB,@LF)
   Return(retall)
#EndFunction

AddExtender("WWREG34I.DLL")
reglist = "InternetExplorer.Application|Msxml2.MXXMLWriter.3.0|Msxml2.SAXXMLReader.3.0|Msxml2.XMLHTTP|WinHttp.WinHttpRequest.5|XStandard.HTTP"
;srchstring = askline("Registry Search", "Enter Your String", "InternetExplorer.Application")  ; string to find...
srchstring = AskItemlist("Registry Search - Choose Component", reglist, "|", @SORTED, @SINGLE, @TRUE)
BoxOpen("Search for Component", "Please Wait...")
retall = QueryRegistry(srchstring)
BoxShut()
count = ItemCount(retall,@LF)
Message("Reggie Found %count% Entries",retall)
Exit

First you'll need to know if you've got the required Winbatch Extenders installed on your PC. If you know already then you can skip ahead. If you're not sure, then you'll need to check your ..\Winbatch\System folder to see if the following files exist:

Registry Extender -- WWREG34I.DLL
WinInet Extender -- WWINT44I.DLL
WinSock Extender -- WWWSK44I.DLL

If you don't have them installed, either use your version of Winbatch's CD-ROM to install them, or visit http://www.winbatch.com, go to the Downloads link and download and install the latest version.

In addition, we'll also show you two non-Winbatch tools, that can be controlled using Winbatch and help the programmer conduct web and XML operations behind the scenes.

XMLStarlet -- is a command line tool designed to help programmers who need to retrieve pages, format output and pull data from web pages and XML files. The tool is free available at the following link:

http://xmlstar.sourceforge.net/

It's a powerful program that comes with a user's guide and examples. At this point it's a "niche tool" for Winbatch, but in certain circumstances you'll find it invaluable, once you get the hang of it. Since it's an EXE it doesn't really require any installation. Simply place the executable within the current pc's PATH and it'll be immediately available. I placed mine inside my C:\WINNT\System32\ folder.

XStandard's XHTTP -- is a COM component that will let you retrieve HTML pages from the web, similar to XMLStarlet. Downloading is a bit more complex though:

http://xstandard.com/download.asp

You'll have to provide your email address and XStandard will email you a download link (don't worry about SPAM from the company, I haven't seen any) then you can download the XHTTP.DLL and then register it. On a sidenote, XStandard has some other COM components that will work with Winbatch, so look them over. Registering the DLL is simple:

regsvr32.exe /s ..\XHTTP.dll

and using

regsvr32.exe /u ..\XHTTP.dll

to un-register it. The ".." is shorthand for the path where the DLL resides.

THE MICROSOFT COMPONENTS

In addition to the WinInet and Winsock extenders provided by Winbatch, and the third-party products XMLStarlet and XStandard, we'll be showing you how to use several Microsoft components which are very handy. They're liable to be installed in most recent machines, but you can use Q0.WBT to search the registry for them.

WINHTTP

Microsoft Windows HTTP Services (WinHTTP) provides developers with an HTTP client application programming interface (API) to send requests through the HTTP protocol to other HTTP servers.

The current version is WinHTTP 5.1 and an operating-system component of the following systems:

Windows Server 2003 family
Windows XP SP1
Windows 2000 SP3 (except Datacenter Server)

The nice thing about WinHTTP is that it also handles https:// and can send credentials to an https:// site. While that goes beyond the scope of this tutorial you can get more information at http://www.microsoft.com web site.

XMLHTTP

Is very similar to WinHTTP in use. In fact, you'll notice the similarities in the first sample script.

XMLHTTP comes with Microsoft Internet Explorer (MSIE) and should be installed on most systems that have MSIE 5+. You can use the registry script in Q0.WBT to double-check. We provide information on how to use XMLHTTP, just in case you're using a pc that doesn't have WinHTTP (or any of the other tools) available. The idea is to give you an alternative. You can get complete documentation on XMLHTTP (along with all the MS XML tools) at the microsoft web site: http://www.microsoft.com, and searching for "MS XML SDK Documentation."

MSIE

You should be familiar with this product, even if you don't use it, as it's installed in many systems. While many use other browsers, MSIE has an COM interface that lets you control it programmatically. The interface will also let you access the objects on the web pages (something WinHTTP and XMLHTTP won't) a major point when you need not only to retrieve information but interact with it. You can use the registry script in Q0.WBT to double-check to make sure MSIE is installed. Otherwise you can download it from the microsoft web site: http://www.microsoft.com, or grab one of those free "AOL Free Hours" CD-ROM you see at the grocery store. Inside those are various versions of MSIE, which you can install separately from AOL.

So take the time to execute the Q0.WBT script and see what you have to work with. If you don't feel like installing the 3rd party products, you can ignore those samples and just work with what you have.

NOTE ABOUT THE SCRIPTS

Naturally these were written on my pc and tested on another to make sure they worked (both were XP systems -- XP Home and XP Pro) so if you get an error make sure you post it on the Winbatch BBS, so I and others can double-check it. If this is the first time using any of the extenders or components make sure you spend some time with the appropriate documentation, and familiarize yourself with it. While these scripts should work "out of the box" your particular internet connection may involve some tweaking, such as specifying a proxy for firewall access. You should find the online community at the Winbatch BBS invaluable in helping you solve any issues you might encounter.

Also, the "intro portion" of each script has a temporary directory and editor (and others) pre-coded into it, which you'll need to change to fit your system. If you're using the HTML to view the scripts, you can easily cut & paste the top portion of the script into a WBT file, edit it, then cut & paste the portion beginning with the label (eg, ":MSIE") to the Return keyword and not have to continually edit the intro.

Okay, now let's get to the basics.

Question: How do I retrieve the HTML of a web page?

Q1.WBT

 Method Filter
 WinInet  WinSock  MSIE  WinHTTP  XMLHTTP  XMLStarlet  XStandard


;   Question: How do I retrieve the HTML of a web page?

editor = "notepad.exe"
tPath  = DirScript()
url    = "http://www.google.com"
hfile  = StrCat(tPath, "Q1.html")

:WinInet
   AddExtender("WWINT44I.DLL")
   tophandle  = iBegin(0,"","")
   datahandle = IUrlOpen(tophandle, url)
   xx = iReadData(datahandle, hfile)
   iClose(datahandle)
   iClose(tophandle)
   Message("All","Done")
   Run(editor, hfile)
Return


;   Question: How do I retrieve the HTML of a web page?

editor = "notepad.exe"
tPath  = DirScript()
url    = "http://www.google.com"
hfile  = StrCat(tPath, "Q1.html")

:WinSock
   AddExtender("WWWSK44I.dll")
   serv=httpGetServer(url, "")
   path=httpGetPath(url, "")
   FilePut(hfile, httpRecvText(serv, path, 30000, 0))
   Message("All","Done")
   Run(editor, hfile)
Return


;   Question: How do I retrieve the HTML of a web page?

editor = "notepad.exe"
tPath  = DirScript()
url    = "http://www.google.com"
hfile  = StrCat(tPath, "Q1.html")

:MSIE
   msie = ObjectCreate("InternetExplorer.Application")
   msie.addressbar = @FALSE
   msie.statusbar = @FALSE
   msie.menubar = @FALSE
   msie.toolbar = @FALSE
   msie.visible = @TRUE
   msie.navigate(url)
   While msie.busy || msie.readystate <> 4
      TimeDelay(0.5)
   EndWhile
   FilePut(hfile, msie.document.GetElementsByTagName("HTML").item(0).outerHTML)
   msie.quit
   msie = 0
   Message("All","Done")
   Run(editor, hfile)
Return


;   Question: How do I retrieve the HTML of a web page?

editor = "notepad.exe"
tPath  = DirScript()
url    = "http://www.google.com"
hfile  = StrCat(tPath, "Q1.html")

:WinHTTP
   WinHTTP = ObjectCreate("WinHttp.WinHttpRequest.5")
   WinHTTP.Open("GET", url, @FALSE)
   WinHTTP.Send()
   FilePut(hfile, WinHTTP.ResponseText)
   WinHTTP = 0
   Message("All","Done")
   Run(editor, hfile)
Return


;   Question: How do I retrieve the HTML of a web page?

editor = "notepad.exe"
tPath  = DirScript()
url    = "http://www.google.com"
hfile  = StrCat(tPath, "Q1.html")

:XMLHTTP
   xmlHTTP = ObjectCreate("Msxml2.XMLHTTP")
   xmlHTTP.Open("GET", url, @FALSE)
   xmlHTTP.Send()
   FilePut(hfile, xmlHTTP.ResponseText)
   xmlHTTP = 0
   Message("All","Done")
   Run(editor, hfile)   
Return


;   Question: How do I retrieve the HTML of a web page?

editor = "notepad.exe"
tPath  = DirScript()
url    = "http://www.google.com"
hfile  = StrCat(tPath, "Q1.html")

:XMLStarlet
   comspec = Environment("COMSPEC")
   RunHideWait(comspec, `/c xml fo -H "%url%" > "%hfile%"`)
   Message("All","Done")
   Run(editor, hfile)
Return


;   Question: How do I retrieve the HTML of a web page?

editor = "notepad.exe"
tPath  = DirScript()
url    = "http://www.google.com"
hfile  = StrCat(tPath, "Q1.html")

:XStandard
   xHTTP = ObjectCreate("XStandard.HTTP")
   xHTTP.Get(url)
   xHTTP.SaveResponseToFile(hfile)
   xHTTP = 0
   Message("All","Done")
   Run(editor, hfile)
Return

Good question, it's one I asked myself and you'll be happy to know there are several ways to go about it. The other good thing is that the process involves a minimum of code.

WININET

The WinInet extender starts by opening a "tophandle" which you can think of as link or handle to the session:

tophandle  = iBegin(0,"","")

Once the link has been established, you can use it to open the connection to the web site, which is a link to the data:

   datahandle = IUrlOpen(tophandle, url)

Once you have a data handle, you can then read data from the data link into a file:

   xx = iReadData(datahandle, hfile)

Then once that's done, you can close the links, in reverse order you opened them and view the file:

   iClose(datahandle)
   iClose(tophandle)
   Message("All","Done")
   Run(editor, hfile)

Pretty easy huh? You have a simple way to retrieve a web page from a web site without having to interact with the user.

WINSOCK

The Winsock extender is similar.

httpGetServer takes the URL we provided and parses it, and extracts the server name from it:

	serv=httpGetServer(url, "")

Next it parses the URL and get's the path to server:

   serv=httpGetServer(url, "")

This means you can supply a long URL that you got from someplace and Winbatch will do the grunt work of breaking it up. Next we want to take the text we get from the web site and place it into a file:

   FilePut(hfile, httpRecvText(serv, path, 30000, 0))

Since we provided the server and path, the extender connects to the site, grabs the first 30,000 characters and retrieves it. Winbatch's Fileput() function places the text into a file and we're done.

Alternately you can use: httpRecvFile(server, path, hfile, flag) to download a page directly to a local file. You can try it yourself.

MSIE

While the previous two examples show you how to do this behind the scenes without user intervention, the following takes a different approach, showing you how to start MSIE, have it .navigate to a web site and save the source. This might be the only avenue avaiable to you if you're on a pc that doesn't have the WinInet or WinSock extenders on it. No matter, it performs essentially the same, it just requires the programmer use MSIE to peform the necessary steps.

The first step we do is create an instance of MSIE via OLE, using Winbatch's ObjectCreate() function:

   msie = ObjectCreate("InternetExplorer.Application")

Next we set the properties of the MSIE window. I've set everything but .visible to @False, so that there's plenty of display room onscreen:

   msie.addressbar = @FALSE
   msie.statusbar = @FALSE
   msie.menubar = @FALSE
   msie.toolbar = @FALSE
   msie.visible = @TRUE

Note: you can set .visible property to @False too, so that the user doesn't see the window, but the trouble is that despite being invisible, while MSIE is working it will take the "focus" of the window to itself, which may be annoying to a user who is trying to type something while the script executes.

Next we tell MSIE to .navigate to the URL we provided it, and when it gets the page, the script should delay a half-second until the page is fully loaded:

   msie.navigate(url)
   While msie.busy || msie.readystate <> 4
      TimeDelay(0.5)
   EndWhile

Now we place the contents of the page into a file. This is done by accessing the document object. The page inside the web browser is considered a "document" and the document has many properties and methods available to it. In this case we'll tell MSIE that we want it to retrieve an element on the page by it's HTML tag name. If you've ever seen an HTML document, you'll know most start with <HTML>. Since .GetElementsByTagName retreives a collection (elements is plural) this means that the result is stored in an array. Arrays in MSIE collections are zero-based, which means the first element has a subscript of zero. Since a page has only 1 <HTML> element, we specify it like below:

   FilePut(hfile, msie.document.GetElementsByTagName("HTML").item(0).outerHTML)

So the .GetElementsByTagName("HTML").item(0) evaluates to the actual item on the page and we then tell MSIE we want its .outerHTML property. This has the effect of taking the HTML of the page and placing it into file.

We then tell MSIE to quit and display the result:

   msie.quit
   msie = 0
   Message("All","Done")
   Run(editor, hfile)

By setting msie = 0 we tell Winbatch that we're through using the OLE object created by ObjectCreate(). If you're using an older version of Winbatch, you'll need to change this statement to:

    msie = 0

If you're familiar with VB or VBScript you know that they use

    Set msie = nothing

This has the effect of closing the object and saving memory, which is important in older versions of Winbatch.

WINHTTP

WinHTTP also creates an OLE object and we access it's properties and methods in a similar manner. First we create the WinHTTP object and tell it we want to GET the URL specified:

   WinHTTP = ObjectCreate("WinHttp.WinHttpRequest.5")
   WinHTTP.Open("GET", url, @FALSE)

Then we SEND the request to the URL

   WinHTTP.Send()

The WinHTTP object can return data in several formats, but we're only interested in the text. So just as the WinSock extender received back text we take the .ResponseText() from the WinHTTP object and place it into a file:

   FilePut(hfile, WinHTTP.ResponseText)

As you can see, all the settings and waiting for the page to load are unecessary (true also for WinInet and WinSock) all of that is handled by the WinHTTP object. We then close up and view the results:

   WinHTTP = 0
   Message("All","Done")
   Run(editor, hfile)

XMLHTTP

XMLHTTP is almost identical, the only difference is that we create a different object to work with. Most of the methods and functions of WinHTTP object have been duplicated with XMLHTTP object:

   xmlHTTP = ObjectCreate("Msxml2.XMLHTTP")
   xmlHTTP.Open("GET", url, @FALSE)
   xmlHTTP.Send()
   FilePut(hfile, xmlHTTP.ResponseText)
   xmlHTTP = 0
   Message("All","Done")
   Run(editor, hfile)

Why? Well no sense in creating something different. Again, we show you this to give you an alternative to work with should the WinHTTP object be unavailable.

XMLSTARLET

XMLStarlet is an executable with command-line switches to configure for proper use.

First we setup the DOS command processor:

   comspec = Environment("COMSPEC")

Next we run the executable. The "/c" option is specified to tell the DOS command processor to close the window when it completes. Winbatch's RunHideWait() allows us to run the tool invisibly and the script will wait until it has finished and exited before continuing.

XMLStarlet's command line uses the "fo" or "format" option. The "-H" specifies that the source file is an HTML file (remember it's called XML-Starlet) this means it will take the HTML page it finds and convert it to XML. Finally we use the DOS redirection to send the output to a file:

   RunHideWait(comspec, `/c xml fo -H "%url%" > "%hfile%"`)

And then view it. If you inspect the output file, you'll notice it's different than the output of the other tools. This is because XMLStarlet changes the HTML to XHTML (a version of XML) in order to carry out its job. While this may be "nice but not needed" in your eyes, it might come in handy down the road. We'll touch on XML in later scripts.

XSTANDARD

XStandard uses a familiar startup sequence, like the WinHTTP and XMLHTTP objects. Of course it uses it own methods, in this case .Get() and .SaveResponseToFile() is used to take the data returned from the URL and save it to disk:

   xHTTP = ObjectCreate("XStandard.HTTP")
   xHTTP.Get(url)
   xHTTP.SaveResponseToFile(hfile)

Closing up is the same as the others.

"What good is this?" you say. Well, you may find you have need to view a web page offline. I had a particular site that the webmaster would change periodically. So I had WinInet download the main page everyday and store it to disk. I'd use Winbatch FileCompare() function to see if yesterday's page was the same as today's. If not I'd popup a Message() to remind me to check the site. Your requirements may be more rigorous than that, so be patient.

So there you have it: seven, relatively easy ways to download an HTML page via Winbatch. There are others, but we'll leave those open to you to explore.

Question: How do I read the text of an HTML page?

Q2.WBT

 Method Filter
 WinInet  WinSock  MSIE  WinHTTP  XMLHTTP  XMLStarlet  XStandard


;   Question: How do I read the text of an HTML page?

editor = "notepad.exe"
tPath  = DirScript()
url    = "http://www.google.com"
hfile  = StrCat(tPath, "Q2.html")
tfile  = StrCat(tPath, "Q2.txt")

:WinInet
   AddExtender("WWINT44I.DLL")
   AddExtender("WWWSK44I.dll")
   tophandle  = iBegin(0,"","")
   datahandle = IUrlOpen(tophandle, url)
   xx = iReadData(datahandle, hfile)
   iClose(datahandle)
   iClose(tophandle)
   FilePut(tfile, httpStripHTML(FileGet(hfile)))
   Message("All","Done")
   Run(editor, tfile)
Return


;   Question: How do I read the text of an HTML page?

editor = "notepad.exe"
tPath  = DirScript()
url    = "http://www.google.com"
hfile  = StrCat(tPath, "Q2.html")
tfile  = StrCat(tPath, "Q2.txt")

:WinSock
   AddExtender("WWWSK44I.dll")
   serv=httpGetServer(url, "")
   path=httpGetPath(url, "")
   FilePut(hfile, httpRecvText(serv, path, 30000, 0))
   FilePut(tfile, httpStripHTML(FileGet(hfile)))
   Message("All","Done")
   Run(editor, tfile)
Return


;   Question: How do I read the text of an HTML page?

editor = "notepad.exe"
tPath  = DirScript()
url    = "http://www.google.com"
hfile  = StrCat(tPath, "Q2.html")
tfile  = StrCat(tPath, "Q2.txt")

:MSIE
   msie = ObjectCreate("InternetExplorer.Application")
   msie.addressbar = @FALSE
   msie.statusbar = @FALSE
   msie.menubar = @FALSE
   msie.toolbar = @FALSE
   msie.visible = @TRUE
   msie.navigate(url)
   While msie.busy || msie.readystate <> 4
      TimeDelay(0.5)
   EndWhile
   FilePut(tfile, msie.document.GetElementsByTagName("HTML").item(0).outerText)
   msie.quit
   msie = 0
   Message("All","Done")
   Run(editor, tfile)
Return


;   Question: How do I read the text of an HTML page?

editor = "notepad.exe"
tPath  = DirScript()
url    = "http://www.google.com"
hfile  = StrCat(tPath, "Q2.html")
tfile  = StrCat(tPath, "Q2.txt")

:WinHTTP
   AddExtender("WWWSK44I.dll")
   WinHTTP = ObjectCreate("WinHttp.WinHttpRequest.5")
   WinHTTP.Open("GET", url, @FALSE)
   WinHTTP.Send()
   FilePut(tfile, httpStripHTML(WinHTTP.ResponseText))
   WinHTTP = 0
   Message("All","Done")
   Run(editor, tfile)
Return


;   Question: How do I read the text of an HTML page?

editor = "notepad.exe"
tPath  = DirScript()
url    = "http://www.google.com"
hfile  = StrCat(tPath, "Q2.html")
tfile  = StrCat(tPath, "Q2.txt")

:XMLHTTP
   AddExtender("WWWSK44I.dll")
   xmlHTTP = ObjectCreate("Msxml2.XMLHTTP")
   xmlHTTP.Open("GET", url, @FALSE)
   xmlHTTP.Send()
   FilePut(tfile, httpStripHTML(xmlHTTP.ResponseText))
   xmlHTTP = 0
   Message("All","Done")
   Run(editor, tfile)   
Return


;   Question: How do I read the text of an HTML page?

editor = "notepad.exe"
tPath  = DirScript()
url    = "http://www.google.com"
hfile  = StrCat(tPath, "Q2.html")
tfile  = StrCat(tPath, "Q2.txt")

:XMLStarlet
   comspec = Environment("COMSPEC")
   RunHideWait(comspec, `/c xml fo -H "%url%" > "%hfile%"`)
;   runhidewait(comspec, `/c xml sel -T -t -m "//text()" -v "." -n "%hfile%" > "%tfile%"`)
   RunHideWait(comspec, `/c xml sel -T -t -m "//html" -v "." "%hfile%" > "%tfile%"`)
   Message("All","Done")
   Run(editor, tfile)
Return


;   Question: How do I read the text of an HTML page?

editor = "notepad.exe"
tPath  = DirScript()
url    = "http://www.google.com"
hfile  = StrCat(tPath, "Q2.html")
tfile  = StrCat(tPath, "Q2.txt")

:XStandard
   AddExtender("WWWSK44I.dll")
   xHTTP = ObjectCreate("XStandard.HTTP")
   xHTTP.Get(url)
   xHTTP.SaveResponseToFile(hfile)
   xHTTP = 0
   FilePut(tfile, httpStripHTML(FileGet(hfile)))
   Message("All","Done")
   Run(editor, tfile)
Return

Where the scripts in Q1.txt showed you how to get the HTML from a URL, you might need to view the text. You may need to know when to upload a new release of a software program or get information from a business partner. Having the text means you can parse it or use Winbatch's StrCnt() or StrIndex() functions to locate a key word or phrase of text within an HTML page. Whatever your reason, you'll find you can do it, using the methods shown in the previous examples.

You'll also find the WinSock extender is crucial in this task.

WININET

The code for retrieving text using the WinInet extender is almost indentical to the example where we retrieved the HTML. The only difference is that we also need the WinSock extender (note the extra AddExtender() line) which has a built-in function to strip HTML off a string and leave the resulting text.

So aside from the extra AddExtender() line, we add:

   FilePut(tfile, httpStripHTML(FileGet(hfile)))

Which let's us place the text into the text file. Now you can inspect the text with your Winbatch functions as needed.

WINSOCK

Obviously, this is almost identical to the first example. We've only added the following extra line:

   FilePut(tfile, httpStripHTML(FileGet(hfile)))

To strip the HTML. This is done to show you how easy it is. So now, no matter how you get an HTML file, you can eliminate the unneeded HTML and get at the text of the page easily.

Note: httpStripHTML() will also strip the elements out of an XML file, but be careful as it may not work right with certain types of XML files, especially the XML files saved using ADO's Persist to XML. You can try it later, when we work with one of the Winbatch PAD files that comes with each Winbatch download. For the moment hold on.

MSIE

Next is MSIE and the only difference in retrieving text is very obvious, replacing the .outerHTML property with the .outerText property:

    FilePut(tfile, msie.document.GetElementsByTagName("HTML").item(0).outerText)

Built into MSIE Document Object Model (DOM) is the ability to specify either. Nice flexibility for programmers, especially if you're stuck on a PC that doesn't have access to the WinSock extender. The other nice feature is that when you use MSIE's .navigate method, you don't have to point it to a web site. You can use a local file. This is crucial if the pc you're working on doesn't have an internet connection. You can bring the files to the pc and have it process them separately.

WINHTTP

WinHTTP, uses the same method as WinInet, it does it's thing normally then uses the WinSock extender to clean up the HTML tags into plain text.

Aside from the AddExtender() line, the only thing we've added is:

    FilePut(tfile, httpStripHTML(WinHTTP.ResponseText))

and the rest is the same.

XMLHTTP

XMLHTTP is deja-vu all over again, and uses the same method as WinInet, it does it's thing normally then uses the WinSock extender to clean up the HTML tags into plain text.

Aside from the AddExtender() line, the only thing we've added is:

    FilePut(tfile, httpStripHTML(xmlHTTP.ResponseText))

and the rest is the same.

XMLSTARLET

XMLStarlet is slightly different. This time we not only have to retrieve the file, but then process it into text.

So this line of code:

   RunHideWait(comspec, `/c xml fo -H "%url%" &gt; "%hfile%"`)

Retrieves the page and saves it locally...and this one:

   RunHideWait(comspec,  `/c xml  sel -T -t -m  "//html"  -v "."  "%hfile%" > "%tfile%"`)

Grabs the HTML tag and writes it out as text. Note the difference in the options. The first used "fo" or "formatting" and the second used "sel" which mean we need to "select" data from the resulting XHTML file (remember XMLStarlet formats its target). First, the -T specifies to XMLStarlet that we want the output in text. The -t means we're going to specify a "template", meaning an XSL template, sometimes seen as .XSLT files. Don't worry about all that for the moment. The -m "//html" means "match all of the html tags" and the -v means "print the Value of it". Since there's only one HTML tag and it's the top most tag, the tool will grab it (and all its contents) and print them, which has the effect of turning the entire page into text.

You'll also note that in MSIE we specified "HTML" in uppercase, but XMLStarlet used "//html" in lowercase. This is due to XMLStarlet formatting the html page into XHTML. XHTML is different that HTML. Rather than launch into a long discourse here we'll simply point you to an excellent beginner's reference:

http://www.w3schools.com/xhtml/

When you have time, you can visit the site and read up on the XHTML Introduction which can summarize things nicely, then take their informative and fast tutorial, to get a good perspective on XHTML. Now back to work.

XSTANDARD

Last but certainly not least is the XStandard entry, which uses a familiar approach of doing its thing, but utilizing the WinSock extender:

    FilePut(tfile, httpStripHTML(FileGet(hfile)))

As you can guess, there's really no limit to mixing an matching the tools above. In most cases you'll find what you like best and stick with that, but there's nothing keeping you from say, downloading the page with XMLStarlet and then stripping the text with the WinSock extender, especially if you're not interested in XSLT or XHTML.

Question: How do I find the correct control on an HTML page?

Q3.WBT


;   Question: How do I find the correct control on an HTML page?

#DefineFunction startMSIE(url)
   msie = ObjectCreate("InternetExplorer.Application")
   msie.addressbar = @FALSE
   msie.statusbar = @FALSE
   msie.menubar = @FALSE
   msie.toolbar = @FALSE
   msie.visible = @TRUE
   msie.navigate(url)
   While msie.busy || msie.readystate <> 4
      TimeDelay(0.5)
   EndWhile
   Return(msie)
#EndFunction   

url    = StrCat(DirScript(), "Q3.html")

OptionList = "FindTables|FindInputs|FindCheckBoxes|FindLinks|FindTableCells"
label      = AskItemlist("Select Code Option", OptionList, "|", @UNSORTED, @SINGLE, @TRUE)
GoSub %label%

Exit

:FindTables
   GoSub BuildHTML
   br = startMSIE(url)
   TableCollection = br.document.GetElementsByTagName("TABLE")
   Message("How Many Tables?", TableCollection.length)
   ForEach Table In TableCollection
      Table.style.border = ".25mm solid red"
      Display(2, "Table", "Highlight")
      Table.style.border = ""
   Next
   br.quit
Return

:FindInputs
   GoSub BuildHTML
   br = startMSIE(url)
   InputCollection = br.document.GetElementsByTagName("INPUT")
   Message("How Many Inputs?", InputCollection.length)
   ForEach Input In InputCollection
      Input.style.border = ".25mm solid red"
      Display(1, "Input", "Hilight")
      Input.style.border = ".5mm solid powderblue"
   Next
   br.quit
Return

:FindCheckBoxes
   GoSub BuildHTML
   br = startMSIE(url)
   InputCollection = br.document.GetElementsByTagName("INPUT")
   ForEach Input In InputCollection
      If Input.type == "checkbox"
         Input.style.border = ".25mm solid red"
         Display(1, "Input", "Hilight")
         Input.style.border = ".5mm solid powderblue"         
      EndIf
   Next
   br.quit
Return

:FindLinks
   GoSub BuildHTML
   br = startMSIE(url)
   LinksCollection = br.document.links
   Message("How Many Links?", LinksCollection.length)
   ForEach Link In LinksCollection
      Link.style.border = ".25mm solid red"
      Display(2, "Link", "Hilight")
      Link.style.border = ""
   Next
   br.quit
Return

:FindTableCells
   GoSub BuildHTML
   br = startMSIE(url)
   ThisTable = br.document.GetElementsByTagName("TABLE").item(0)
   nRows = ThisTable.GetElementsByTagName("TR").length
   For r = 0 To nRows-1
      ThisRow = ThisTable.rows(r)
      ThisRow.bgColor = "red"
      nCells = ThisRow.cells.length
      For c = 0 To nCells-1
         ThisCell = ThisRow.cells(c)
         ThisCell.style.border = ".5mm solid black"
         TimeDelay(1)
         ThisCell.style.border = ""
      Next
      ThisRow.bgColor = ""
   Next
   br.quit
Return

:BuildHTML
   html = ""
   html = StrCat(html, `<html>`,@CRLF)
   html = StrCat(html, `<head>`,@CRLF)
   html = StrCat(html, `    <title>How to find the right control via Winbatch</title>`,@CRLF)
   html = StrCat(html, `</head>`,@CRLF)
   html = StrCat(html, `<body>`,@CRLF)
   html = StrCat(html, `<h2>Test Page</h2>`,@CRLF)
   html = StrCat(html, `<a href="" target="_blank">Link Number 1</a>&#160;&#160;&#160;`,@CRLF)
   html = StrCat(html, `<a href="" target="_blank">Link Number 2</a>&#160;&#160;&#160;`,@CRLF)
   html = StrCat(html, `<a href="" target="_blank">Link Number 3</a>&#160;&#160;&#160;`,@CRLF)
   html = StrCat(html, `<br/>`,@CRLF)
   html = StrCat(html, `<br/>`,@CRLF)
   html = StrCat(html, `<table border="1">`,@CRLF)
   html = StrCat(html, `<caption>Table 1</caption>`,@CRLF)
   html = StrCat(html, `<tr><td>First Name</td><td><input type="text" size="20" /></td><td><input type="checkbox" /></td></tr>`,@CRLF)
   html = StrCat(html, `<tr><td>Last Name</td><td><input type="text" size="20" /></td><td><input type="checkbox" /></td></tr>`,@CRLF)
   html = StrCat(html, `<tr><td>Age</td><td><input type="text" size="20" /></td><td><input type="checkbox" /></td></tr>`,@CRLF)
   html = StrCat(html, `<tr><td>Title</td><td><input type="text" size="20" /></td><td><input type="checkbox" /></td></tr>`,@CRLF)
   html = StrCat(html, `</table>`,@CRLF)
   html = StrCat(html, `<br/>`,@CRLF)
   html = StrCat(html, `<table border="1">`,@CRLF)
   html = StrCat(html, `<caption>Table 2</caption>`,@CRLF)
   html = StrCat(html, `<tr><td>First Name</td><td><input type="text" size="20" /></td><td><input type="checkbox" /></td></tr>`,@CRLF)
   html = StrCat(html, `<tr><td>Last Name</td><td><input type="text" size="20" /></td><td><input type="checkbox" /></td></tr>`,@CRLF)
   html = StrCat(html, `<tr><td>Age</td><td><input type="text" size="20" /></td><td><input type="checkbox" /></td></tr>`,@CRLF)
   html = StrCat(html, `<tr><td>Title</td><td><input type="text" size="20" /></td><td><input type="checkbox" /></td></tr>`,@CRLF)
   html = StrCat(html, `</table>`,@CRLF)
   html = StrCat(html, `<br/>`,@CRLF)
   html = StrCat(html, `<table border="1">`,@CRLF)
   html = StrCat(html, `<caption>Table 3</caption>`,@CRLF)
   html = StrCat(html, `<tr><td>First Name</td><td><input type="text" size="20" /></td><td><input type="checkbox" /></td></tr>`,@CRLF)
   html = StrCat(html, `<tr><td>Last Name</td><td><input type="text" size="20" /></td><td><input type="checkbox" /></td></tr>`,@CRLF)
   html = StrCat(html, `<tr><td>Age</td><td><input type="text" size="20" /></td><td><input type="checkbox" /></td></tr>`,@CRLF)
   html = StrCat(html, `<tr><td>Title</td><td><input type="text" size="20" /></td><td><input type="checkbox" /></td></tr>`,@CRLF)
   html = StrCat(html, `</table>`,@CRLF)
   html = StrCat(html, `<br/>`,@CRLF)
   html = StrCat(html, `</body>`,@CRLF)
   html = StrCat(html, `<html>`,@CRLF)
   FilePut(url, html)
Return

These scripts are different as they'll only use one of the previous tools, MSIE and will only show you how to find controls on a web page interactively. There are other methods, that require some additional learning, which we won't go into.

No matter which script you run, it will build a fresh HTML page each time.

Now, when dealing with HTML pages, half the battle is finding what you need amongst the often busy HTML tags placed on the page. In most cases you'll be dealing with INPUT tags or links, generally for inputting or retrieving data, their settings, clicking buttons and links.

First off, you'll notice that there's a User Defined Function to start MSIE for each example. This saves us from having to repeat the same code over and over. In the previous examples we called the object "msie", in these we'll use "br" which is short for "browser". There's no special signifcance to it, and you could use anything you want.

FIND TABLES

This code snippet will start MSIE, navigate to the URL and display the page onscreen. Once there, we use the familiar method of .GetElementsByTagName() but this time requesting the TABLE collection. You'll notice the Message() statement displays the number of tables found as TableCollection.length. Once again the term "TableCollection" is something made up and you could substitute something else for it. However the .length isn't. This is a property of all collections within the MS HTML object model. Whenever you retrieve a collection using MS HTML, you can see how many items are present by inspecting the .length property. In our case there are 3 tables.

Next the script loops through the collection and finds each table as it comes to it. As can be expected, tables are located in the same position as they occur on the page, so the first table is the top-most, and so on.

Using the ForEach statement that's built-into Winbatch, it does most of the work for us, leaving us to concentrate on working with the objects in the table collection. As mentioned before the "Table" keyword is something made up for this exercise and you could use another term. It seems logical to refer to each item in the TableCollection as a "Table" so:

   ForEach Table In TableCollection
      Table.style.border = ".25mm solid red"
      Display(2, "Table", "Highlight")
      Table.style.border = ""
   Next

So, the ForEach statement finds each item in the TableCollection and that allows us to refer to it as a Table. It just so happens that within the MS HTML object model that tables have a .style property, and within the .style collection is an item called .border, which we then set to be a .25 millimeter solid red line. This allows us to highlight each item in the collection. The Display() statement allows us to view the item briefly on screen, before the last statement removes the highlighting. Since the objects in the collection occur as they appear onscreen it's easy to follow. Why do this ? Well often times you might have to work with a web site that has dozens of tables (you'll see later when we work more with Google pages) and knowing which table you need to deal with is important.

The only problem with the code is if you do have many, many tables to deal with the one you want may not be easy to spot. The code can easily be changed to the following:

Count = 0
   ForEach Table In TableCollection
      Table.style.border = ".25mm solid red"
      Display(2, "Table %Count%", "Highlight")
      Table.style.border = ""
      Count = Count + 1
   Next

So that with each loop the number of the table is displayed and you don't have to count along. Remember, collections in MS HTML are zero-based, so start with zero and increment with each iteration.

FIND INPUTS

Find the INPUTs on a page uses a similar process. Some of the variable names have changed to make the script readable, but it's essentially the same. Instead of a TableCollection we have InputCollection, and instead of Table, we use Input.

   ForEach Input In InputCollection
      Input.style.border = ".25mm solid red"
      Display(1, "Input", "Hilight")
      Input.style.border = ".5mm solid powderblue"
   Next

You'll notice that the size of the input changes slightly after highlighting each one then turning it blue. Almost every control on an HTML page can be formatted according to the user's needs.

FIND CHECK BOXES

Just as we found the collection of INPUTs you may need to find a collection of CHECKBOXES. Of course the first thing you notice about a checkbox tag in HTML is that they're INPUTs. How do you distinguish a checkbox from the other INPUTs? Luckily each INPUT has a TYPE attribute.

So the basic script should be very familiar looking by now. We add an extra line of code that checks the .type attribute of the INPUT on each loop:

      If Input.type == "checkbox"

So now, the script will only highlight INPUTs that have their TYPE="checkbox" attribute set.

FIND LINKS

Links are even easier to retrieve since they have their own separate collection.

In this case, you specify the .links collection like so:

    LinksCollection = br.document.links

And you script can deal with it easily enough. You can then cycle through the .links collection and find the one(s) you're interested in.

FIND TABLE CELLS

Finally we'll deal with finding table cells. Tables are very common on HTML pages and often a programmer needs to retrieve the data from one or more cells.

As in the first example once you identify the table you're looking for, you need to access its cells. Generally what I do is retrieve table cells row-by-row which allows me to keep them in order, otherwise if you retreive the TD collection, it will give you all of them in a single collection, which may or may not be what you need. Most of the exporting I've done is similar to CSV files, where the data is arranged in columns by rows. This will follow that logic, grabbing a row of data at a time, and then accessing each cell in the row. The logic is simple:

Access the correct table.
Access the collection of rows in the table.
Access the collection of cells in each row.

This statment retrieves the rows in the table. Now notice the first object reference in the statement is "ThisTable" using this approach tells MSIE and MS HTML that we're only interested in the ROWS inside this table, otherwise it would find all the ROWS on the page:

   nRows = ThisTable.GetElementsByTagName("TR").length

If you wanted to you could access all the ROWS on the page, but then you'd have a mess on your hands. This way is much simpler.

Next you'll notice that the coding convention has changed slightly. Instead of using ForEach, we use For/To. This was done purposely to show you how to loop through collections, something that you'll find over and over within the MSIE and MS HTML object model -- and -- if you need to work with MS XML, you find the same coding conventions work there as well.

So let's loop through the nRows collection:

   For r = 0 To nRows-1
      ThisRow = ThisTable.rows(r)
      ThisRow.bgColor = "red"
      nCells = ThisRow.cells.length

Each time we find a row, we'll identify it by the variable "ThisRow", we'll set it's background color to red, so that we can easily see it on screen and finally we'll loop through it's collection of cells, highlight each one with a black border:

      For c = 0 To nCells-1
         ThisCell = ThisRow.cells(c)
         ThisCell.style.border = ".5mm solid black"
         TimeDelay(1)
         ThisCell.style.border = ""
      Next

Lastly, when we're done, we'll remove the background red and move on to the next row in the collection:

      ThisRow.bgColor = ""
   Next

So now you know how to download and save and HTML page, get its text and access the objects on its page.

Question: How do I access controls/documents within a FRAME element?

Q4.WBT


;   Question: How do I access controls/documents within a FRAME element?

#DefineFunction startMSIE(url)
   msie = ObjectCreate("InternetExplorer.Application")
   msie.addressbar = @FALSE
   msie.statusbar = @FALSE
   msie.menubar = @FALSE
   msie.toolbar = @FALSE
   msie.visible = @TRUE
   msie.navigate(url)
   While msie.busy || msie.readystate <> 4
      TimeDelay(0.5)
   EndWhile
   Return(msie)
#EndFunction   

editor = "notepad.exe"
tPath  = dirscript()

OptionList = "CreateFrameSet|GetFrameData|SetFrameData"
label      = AskItemlist("Select Code Option", OptionList, "|", @UNSORTED, @SINGLE, @TRUE)
GoSub %label%

Exit

:CreateFrameSet
   ;   build main frame...
   frameHTML = ""
   frameHTML = StrCat(frameHTML, `<HTML>`,@CRLF)
   frameHTML = StrCat(frameHTML, `<HEAD>`,@CRLF)
   frameHTML = StrCat(frameHTML, ` <TITLE>`,@CRLF)
   frameHTML = StrCat(frameHTML, `  Accessing controls/documents within FRAMES`,@CRLF)
   frameHTML = StrCat(frameHTML, ` </TITLE>`,@CRLF)
   frameHTML = StrCat(frameHTML, `</HEAD>`,@CRLF)
   frameHTML = StrCat(frameHTML, ` <FRAMESET COLS="50%%,50%%">`,@CRLF)
   frameHTML = StrCat(frameHTML, `  <FRAME NAME="LeftFrame" SRC="fr1.html">`,@CRLF)
   frameHTML = StrCat(frameHTML, `  <FRAME NAME="RightFrame" SRC="fr2.html">`,@CRLF)
   frameHTML = StrCat(frameHTML, ` </FRAMESET>`,@CRLF)
   frameHTML = StrCat(frameHTML, `</HTML>`,@CRLF)
   ;
   frameMain = StrCat(tPath, "frameMain.html")
   FilePut(frameMain, frameHTML)
   ;   build first frame (FR1.HTML)
   frameHTML = ""
   frameHTML = StrCat(frameHTML, `<HTML>`,@CRLF)
   frameHTML = StrCat(frameHTML, ` <BODY BGCOLOR="WHITE">`,@CRLF)
   frameHTML = StrCat(frameHTML, `  <FONT FACE=ARIAL>`,@CRLF)
   frameHTML = StrCat(frameHTML, `  <H3>Left Page</H3>`,@CRLF)
   frameHTML = StrCat(frameHTML, `   <CENTER>`,@CRLF)
   frameHTML = StrCat(frameHTML, `   <FORM name="LeftInfoForm">`,@CRLF)
   frameHTML = StrCat(frameHTML, `    First Name <INPUT NAME="LeftFirst" TYPE="text" Value="Joe"><BR/>`,@CRLF)
   frameHTML = StrCat(frameHTML, `    Last Name <INPUT NAME="LeftLast" TYPE="text" Value="Smith"><BR/><BR/>`,@CRLF)
   frameHTML = StrCat(frameHTML, `    Sex: M <INPUT NAME="LeftSex" TYPE="radio" value="1" checked="checked"> F <INPUT NAME="LeftSex" TYPE="radio" value="2"><BR/><BR/>`,@CRLF)
   frameHTML = StrCat(frameHTML, `    I Enjoy Working with Winbatch <INPUT NAME="LeftWinbatch" TYPE="checkbox" checked="checked"><BR/><BR/>`,@CRLF)
   frameHTML = StrCat(frameHTML, `    <INPUT NAME="LeftSubmit" TYPE="button" Value="Submit Data" onclick="alert('Left Submit clicked')">   `,@CRLF)
   frameHTML = StrCat(frameHTML, `   </FORM>`,@CRLF)
   frameHTML = StrCat(frameHTML, `   </CENTER>`,@CRLF)
   frameHTML = StrCat(frameHTML, ` </BODY>`,@CRLF)
   frameHTML = StrCat(frameHTML, `</HTML> `,@CRLF)
   ;
   fr1 = StrCat(tPath, "fr1.html")
   FilePut(fr1, frameHTML)
   ;   build second frame (FR2.HTML)
   frameHTML = ""
   frameHTML = StrCat(frameHTML, `<HTML>`,@CRLF)
   frameHTML = StrCat(frameHTML, ` <BODY BGCOLOR="WHITE">`,@CRLF)
   frameHTML = StrCat(frameHTML, `  <FONT FACE=ARIAL>`,@CRLF)
   frameHTML = StrCat(frameHTML, `  <H3>Right Page</H3>`,@CRLF)
   frameHTML = StrCat(frameHTML, `   <CENTER>`,@CRLF)
   frameHTML = StrCat(frameHTML, `   <FORM name="RightInfoForm">`,@CRLF)
   frameHTML = StrCat(frameHTML, `    First Name <INPUT NAME="RightFirst" TYPE="text" Value="Donna"><BR/>`,@CRLF)
   frameHTML = StrCat(frameHTML, `    Last Name <INPUT NAME="RightLast" TYPE="text" Value="Jones"><BR/><BR/>`,@CRLF)
   frameHTML = StrCat(frameHTML, `    Sex: M <INPUT NAME="RightSex" TYPE="radio"> F <INPUT NAME="RightSex" TYPE="radio" checked="checked"><BR/><BR/>`,@CRLF)
   frameHTML = StrCat(frameHTML, `    I Enjoy Working with Winbatch <INPUT NAME="RightWinbatch" TYPE="checkbox"><BR/><BR/>`,@CRLF)
   frameHTML = StrCat(frameHTML, `    <INPUT NAME="RightSubmit" TYPE="button" Value="Submit Data" onclick="alert('Right Submit clicked')">   `,@CRLF)
   frameHTML = StrCat(frameHTML, `   </FORM>`,@CRLF)
   frameHTML = StrCat(frameHTML, `   </CENTER>`,@CRLF)
   frameHTML = StrCat(frameHTML, ` </BODY>`,@CRLF)
   frameHTML = StrCat(frameHTML, `</HTML> `,@CRLF)
   ;
   fr2 = StrCat(tPath, "fr2.html")
   FilePut(fr2, frameHTML)
   Run(framemain, "")
Return

:GetFrameData
   url = StrCat(tPath, "frameMain.html")
   msie = startMSIE(url)
   LeftInfo  = ""
   RightInfo = ""
   LFirst = msie.document.script.top.LeftFrame.document.LeftInfoForm.LeftFirst.value
   LLast  = msie.document.script.top.LeftFrame.document.LeftInfoForm.LeftLast.value
   LSexM  = Abs(msie.document.script.top.LeftFrame.document.LeftInfoForm.LeftSex.item(0).checked)
   LSexF  = Abs(msie.document.script.top.LeftFrame.document.LeftInfoForm.LeftSex.item(1).checked)
   If LSexM
      LSex = "Male"
   Else
      LSex = "Female"
   EndIf
   LWinbatch = Abs(msie.document.script.top.LeftFrame.document.LeftInfoForm.LeftWinbatch.checked)
   If LWinbatch
      LWBT = "Yes"
   Else
      LWBT = "No"
   EndIf
   LeftInfo = StrCat("Name: ", LFirst, " ", LLast, @LF, "Sex: ", LSex, @LF, "Likes Winbatch?: ", LWBT)
   RFirst = msie.document.script.top.RightFrame.document.RightInfoForm.RightFirst.value
   RLast  = msie.document.script.top.RightFrame.document.RightInfoForm.RightLast.value
   RSexM  = Abs(msie.document.script.top.RightFrame.document.RightInfoForm.RightSex.item(0).checked)
   RSexF  = Abs(msie.document.script.top.RightFrame.document.RightInfoForm.RightSex.item(1).checked)
   If RSexM
      RSex = "Male"
   Else
      RSex = "Female"
   EndIf
   RWinbatch = Abs(msie.document.script.top.RightFrame.document.RightInfoForm.RightWinbatch.checked)
   If RWinbatch
      RWBT = "Yes"
   Else
      RWBT = "No"
   EndIf
   RightInfo = StrCat("Name: ", RFirst, " ", RLast, @LF, "Sex: ", RSex, @LF, "Likes Winbatch?: ", RWBT)
   Message("Form Info", StrCat(LeftInfo, @LF, @LF, RightInfo))
   msie.quit
Return

:SetFrameData
   url = StrCat(tPath, "frameMain.html")
   msie = startMSIE(url)
   Message("Form Info", "Before Changes...")
   msie.document.script.top.RightFrame.document.RightInfoForm.RightLast.value = "Dawson"
   msie.document.script.top.RightFrame.document.RightInfoForm.RightWinbatch.checked = @TRUE
   Message("Form Info", "Check changes")
   msie.document.script.top.RightFrame.document.RightInfoForm.RightSubmit.click
   msie.quit
Return

Q5.WBT


;   Question: How do I access controls/documents on a page or within a FORM element?

#DefineFunction startMSIE(url)
   msie = ObjectCreate("InternetExplorer.Application")
   msie.addressbar = @FALSE
   msie.statusbar = @FALSE
   msie.menubar = @FALSE
   msie.toolbar = @FALSE
   msie.visible = @TRUE
   msie.navigate(url)
   While msie.busy || msie.readystate <> 4
      TimeDelay(0.5)
   EndWhile
   Return(msie)
#EndFunction   

editor = "notepad.exe"
tPath  = DirScript()

OptionList = "GetFormData|SetFormData"
label      = AskItemlist("Select Code Option", OptionList, "|", @UNSORTED, @SINGLE, @TRUE)
GoSub %label%

Exit


:GetFormData
   url = StrCat(tPath, "fr1.html")
   msie = startMSIE(url)
   LeftInfo  = ""
   LFirst = msie.document.LeftInfoForm.LeftFirst.value
   LLast  = msie.document.LeftInfoForm.LeftLast.value
   LSexM  = Abs(msie.document.LeftInfoForm.LeftSex.item(0).checked)
   LSexF  = Abs(msie.document.LeftInfoForm.LeftSex.item(1).checked)
   If LSexM
      LSex = "Male"
   Else
      LSex = "Female"
   EndIf
   LWinbatch = Abs(msie.document.LeftInfoForm.LeftWinbatch.checked)
   If LWinbatch
      LWBT = "Yes"
   Else
      LWBT = "No"
   EndIf
   LeftInfo = StrCat("Name: ", LFirst, " ", LLast, @LF, "Sex: ", LSex, @LF, "Likes Winbatch?: ", LWBT)
   Message("Form Info", LeftInfo)
   msie.quit
Return

:SetFormData
   url = StrCat(tPath, "fr2.html")
   msie = startMSIE(url)
   Message("Form Info", "Before Changes...")
   msie.document.RightInfoForm.RightLast.value = "Dawson"
   msie.document.RightInfoForm.RightWinbatch.checked = @TRUE
   Message("Form Info", "Check changes")
   msie.document.RightInfoForm.RightSubmit.click
   msie.quit
Return

It's quite common to come across web pages that have FRAMEs and FRAMESETs on the page. However accessing the controls on these sites is different than before. Previously we accessed controls via:

browser.document.GetElements

method, and for frames you'd probably want to follow that logic. Unfortunately that's not the case. Most of the FRAMEs and FRAMESETs that you'll run into will be a single page that contains the SRC or source of one or more separate pages. We still get at the frame's objects but it's slightly different. At this point we need to introduce the TOP object.

The TOP object is the top-most frame of the browser, and you get it by using the following code:

    msie.document.script.top

The TOP object is part of the SCRIPT object, which is part of the DOCUMENT object, etc, etc. If we were coding in VBScript, we could simply use Top.whatever, but using Winbatch we have to take a different route.

GET FRAME DATA

So run the GetFrameData Q4.WBT script you'll see the screen divided into LEFT and RIGHT halves. You'll also notice that when you built each of the frames HTML, they have their own name, which are:

"LeftFrame" and "RightFrame"

So now, when we want to get at data within one of the frames we'll use:

    msie.document.script.top.LeftFrame

Which refers to the LeftFrame. Then we drill down inside the LeftFrame to get at its objects using a familiar method

    msie.document.script.top.LeftFrame.document.LeftInfoForm.LeftFirst.value

Now you'll also notice we used "LeftInfoForm". Forms are a big part of web pages and much like links, have their own collection. You can address forms directly on the page, if you know their name. After that you'll notice "LeftFirst" which is the name of one of the elements within the "LeftInfoForm". So rather than having to use .GetElementsByTagName() and having to loop through a collection of controls, microsoft has simplified things for the programmer.

The entire line looks like:

LFirst = msie.document.script.top.LeftFrame.document.LeftInfoForm.LeftFirst.value

This allows us to retrieve the value of the control named "LeftFirst" in a single line of code (albeit a long one). This makes working with complex pages, holding multiple frames and multiple controls much easier.

If you examine the HTML of the actual page, you'll notice that the INPUT named "LeftFirst" is a text INPUT. You get at their data by accessing the .value property. LeftLast is the same.

Next is the radiobutton which the user specifies their sex. Unlike textboxes, we're not interested in the .value property, rather you look first at the control and see whether it's checked or not. So the code to access this element is very similar:

LSexM = Abs(msie.document.script.top.LeftFrame.document.LeftInfoForm.LeftSex.item(0).checked)

The main difference is that we need to retrieve the collection of radiobuttons. This is because the radiobutton control is used to find out a single item out of several. When your user clicks the MALE radiobutton, the FEMALE one unchecks, or vice-versa. If your page has the user click RACE or some other item with many selections, you don't want them clicking and unclicking all day. So what happens with radiobuttons is that whether you have two or twenty, you give them all the same name, this way when one is checked the others automatically uncheck and the person coding the HTML doesn't have to setup an elaborate script to take care of all the possibilities.

Now in Winbatch, "yes" or "true" is 1, which can be seen in the Winbatch constants @True and @False. However, Microsoft is slightly different, representing "false" as "0" and "true" as "non-zero" meaning: any value other than zero is true. So in Microsoft, "false = 0" and "true = -1", we can convert this to Winbatch easily enough using the Abs() function, which turns -1 to 1 and has no effect on zero.

Lastly is the "LeftWinbatch" checkbox, which is very similar to a radiobutton and uses almost the same code to retreive its state. Since there's only one, you don't need to use the .item(0) to get at it.

If you examine the rest of the script, you'll see the right frame is the mirror of the left. The script will retrieve information from both sides.

SET FRAME DATA

The previous script showed you how to retrieve data from a web page, now we'll show you how to set it. It's almost identical, but its good to know how it's done.

This line of code, sets the right last name text box to a new value:

msie.document.script.top.RightFrame.document.RightInfoForm.RightLast.value = "Dawson"

And this one checks the winbatch button:

msie.document.script.top.RightFrame.document.RightInfoForm.RightWinbatch.checked = @TRUE

In most cases when you're dealing with inputting data into a form, you'll also need to know how to click the form's submit button to submit the data. If you check the right frame's HTML you'll find the button has the name "RightSubmit" so:

msie.document.script.top.RightFrame.document.RightInfoForm.RightSubmit.click

Enables you to click the button, as if you had filled out the form interactively.

Q5.WBT deals with forms on a single page, minus the frames. If you understand these examples you shouldn't have any problems with those.

Question: How do I navigate between several pages after clicking a link?
How can I do it without getting lost or out of synch?

Q6.WBT


;   Question: How do I navigate between several pages after clicking a link? 
;             How can I do it without getting lost or out of synch?

#DefineFunction startMSIE(url)
   msie = ObjectCreate("InternetExplorer.Application")
   msie.addressbar = @FALSE
   msie.statusbar = @FALSE
   msie.menubar = @FALSE
   msie.toolbar = @FALSE
   msie.visible = @TRUE
   msie.navigate(url)
   While msie.busy || msie.readystate <> 4
      TimeDelay(0.5)
   EndWhile
   Return(msie)
#EndFunction   

#DefineFunction WaitForMSIE(msie)
   While msie.busy || msie.readystate <> 4
      TimeDelay(0.5)
   EndWhile
   Return
#EndFunction

editor = "notepad.exe"
tPath  = DirScript()

url = "http://www.google.com"
msie = startMSIE(url)
GoSub GoogleViaMSIE

Exit

:GoogleViaMSIE
   ;              <input maxlength="256" size="55" name="q" value=""/>
   ;              <br/>
   ;              <input type="submit" value="Google Search" name="btnG"/>
   
   InputCollection = msie.document.GetElementsByTagName("INPUT")
   ForEach i In InputCollection
      Select @TRUE
         Case i.name == "q"
         i.value = "Microsoft Windows"
         Break
         Case i.name == "btnG"
         i.click
         Break
      EndSelect
   Next
   WaitForMSIE(msie)
   Table = msie.document.GetElementsByTagName("TABLE")
   Table.item(1).GetElementsByTagName("A").item(1).style.border = ".25mm solid red"
   TimeDelay(2)
   Table.item(1).GetElementsByTagName("A").item(1).click
   WaitForMSIE(msie)
   Message("Debug", StrSub(msie.document.GetElementsByTagName("HTML").item(0).outerText, 1, 500))
   Table = msie.document.GetElementsByTagName("TABLE")
   Table.item(1).GetElementsByTagName("A").item(2).style.border = ".25mm solid red"
   TimeDelay(2)
   Table.item(1).GetElementsByTagName("A").item(2).click
   WaitForMSIE(msie)
   Message("Debug", StrSub(msie.document.GetElementsByTagName("HTML").item(0).outerText, 1, 500))
   TimeDelay(2)
   msie.quit
Return

Many times you'll need to go to one page, then click a link and get to another. In some cases you could find out what the last link in the chain is and simply navigate straight to it, but I've seen sites that won't allow a browser to simply access any page as long as it has a URL. Some sites have "referrers" which means to reach a given page, the browser must be referred from another page. You could find all this information out, but what if you don't have the time, and need something done quick? The solution is to use MSIE once again. You can automate the browser to move from link to link until you arrive at the destination you want.

If you examine

Q6.WBT you'll find that there are two User Defined Functions: startMSIE() and WaitForBrowser(). As seen previously, startMSIE() takes a URL as an argument, starts Internet Explorer, .navigates to the URL, waits for the page to load, then returns the handle to the browser object. WaitForBrowser() takes the handle to the browser as an argument, then checks the browser waiting until it has finished loading, then returns to the script. With these two functions, we can start the session, and move between pages without having any problems trying to access controls that haven't appeared yet, or anything similar.

The GoogleViaMSIE section, has some HTML in it. I downloaded the page using WinInet then put it into WinStudio to examine it. From this I determined that in order to automate a search using Google, we need to access two INPUTs, the one named "q" and one named "btnG". The "q" I assume is for "query" or "querystring" (okay, I'm guessing) and "btnG" is for "button Google" (another stellar inference). Naming issues aside, what we need to do is place our search string in "q" then click "btnG" then wait.

We retrieve access to the INPUTs on the page in the standard way:

   InputCollection = msie.document.GetElementsByTagName("INPUT")

Then once we have access to the collection, we loop through it using ForEach. The Select statement examines each i (for "input") item's .name and acts accordingly. Again, because the script will find the textbox before the button due to its position on the page, we know it has already placed our string "Microsoft Windows" inside the textbox, before it finds the submit button and clicks it.

   ForEach i In InputCollection
      Select @TRUE
         Case i.name == "q"
         i.value = "Microsoft Windows"
         Break
         Case i.name == "btnG"
         i.click
         Break
      EndSelect
   Next

After the script clicks the button, it will exit the ForEach loop, and continue on with the rest of the script.

Then comes:

   WaitForMSIE(msie)

And this is where we pass the browser object handle to the function, and it waits until the new page has loaded.

Google places the results of it's search inside a table, however the first thing you'll discover about Google pages is that there are several tables on the page. In this example though, we're not interested in the search results, rather we're interested in the 2nd table on the page. This is the table that contains the links to other Google pages. So by referencing Table.item(1) we get the 2nd table, and by referencing .GetElementsByTagName("A").item(1) we get the 2nd anchor or link inside the 2nd table:

   Table = msie.document.GetElementsByTagName("TABLE")
   Table.item(1).GetElementsByTagName("A").item(1).style.border = ".25mm solid red"
   TimeDelay(2)

Which we set it's border to red, then pause a few seconds so the user can observe.

Next we click the link and then wait for the next page to load:

   Table.item(1).GetElementsByTagName("A").item(1).click
   WaitForMSIE(msie)

Now we're at the Google Groups page. At this point, we have the script grab the entire page and display the first 500 characters of the page's text to show we're in synch.

Now we again get the 2nd table, but this time the 3rd anchor, highlight it in red then click it, then wait for the next page to load:

   Table = msie.document.GetElementsByTagName("TABLE")
   Table.item(1).GetElementsByTagName("A").item(2).style.border = ".25mm solid red"
   TimeDelay(2)
   Table.item(1).GetElementsByTagName("A").item(2).click
   WaitForMSIE(msie)

Then once the new page has loaded, we have the script get the page's text, and display the first 500 characters of it, so the user can see we're in synch, before we wait a few seconds then close the browser.

   Message("Debug", StrSub(msie.document.GetElementsByTagName("HTML").item(0).outerText, 1, 500))
   TimeDelay(2)
   msie.quit

As you can see, as long as the links on the page remain static you can navigate between many pages and keep in synch. You can also do so via pages that have unique IDs or TEXT inside anchors. We've showed you this method just in case the site you're accessing isn't so obliging. It'll be incumbent on you to get to know the MS HTML object model better. The more you know, the more you can do.

Question: How do I download a file from a web page?

Q7.WBT

 Method Filter
 WinInet  WinSock  WinHTTP  XMLHTTP  XStandard


;   Question: How do I download a file from a web page.

editor = "notepad.exe"
tPath  = DirScript()
url    = "http://www.google.com/intl/en/images/logo.gif"
file  = StrCat(tPath, "Q7.gif")

:WinInet
   AddExtender("WWINT44I.DLL")
   tophandle  = iBegin(0,"","")
   datahandle = IUrlOpen(tophandle, url)
   xx = iReadData(datahandle, file)
   iClose(datahandle)
   iClose(tophandle)
   Message("All","Done")
   Run(file, "")
Return


;   Question: How do I download a file from a web page.

editor = "notepad.exe"
tPath  = DirScript()
url    = "http://www.google.com/intl/en/images/logo.gif"
file  = StrCat(tPath, "Q7.gif")

:WinSock
   AddExtender("WWWSK44I.dll")
   serv=httpGetServer(url, "")
   path=httpGetPath(url, "")
   rsp = httpRecvFile(serv, path, file, 0)
   Message(rsp,"All Done")
   Run(file, "")
Return


;   Question: How do I download a file from a web page.

editor = "notepad.exe"
tPath  = DirScript()
url    = "http://www.google.com/intl/en/images/logo.gif"
file  = StrCat(tPath, "Q7.gif")

:WinHTTP
   WinHTTP = ObjectCreate("WinHttp.WinHttpRequest.5")
   WinHTTP.Open("GET", url, @FALSE)
   WinHTTP.Send()
   ADOStream = CreateObject("ADODB.Stream")
   ADOStream.Type = 1
   ADOStream.Open()
   ADOStream.Write(WinHTTP.responseBody)
   ADOStream.SaveToFile(file,2)
   ADOStream.close()
   ADOStream=0
   WinHTTP = 0
   Message("All","Done")
   Run(file, "")
Return


;   Question: How do I download a file from a web page.

editor = "notepad.exe"
tPath  = DirScript()
url    = "http://www.google.com/intl/en/images/logo.gif"
file  = StrCat(tPath, "Q7.gif")

:XMLHTTP
   xmlHTTP = ObjectCreate("Msxml2.XMLHTTP")
   xmlHTTP.Open("GET", url, @FALSE)
   xmlHTTP.Send()
   ADOStream = CreateObject("ADODB.Stream")
   ADOStream.Type = 1
   ADOStream.Open()
   ADOStream.Write(xmlHTTP.responseBody)
   ADOStream.SaveToFile(file,2)
   ADOStream.close()
   ADOStream=0
   xmlHTTP = 0
   Message("All","Done")
   Run(file, "")
Return


;   Question: How do I download a file from a web page.

editor = "notepad.exe"
tPath  = DirScript()
url    = "http://www.google.com/intl/en/images/logo.gif"
file  = StrCat(tPath, "Q7.gif")

:XStandard
   xHTTP = ObjectCreate("XStandard.HTTP")
   xHTTP.Get(url)
   xHTTP.SaveResponseToFile(file)
   xHTTP = 0
   Message("All","Done")
   Run(file, "")
Return

Okay, now that you can work with HTML and objects on the page, you may have need to perform downloads from websites. You may need to access data files, text files, zip files as well as gifs and jpegs. As long as you know the URL to where the file resides, you have several options available to help you download it.

WININET

The WinInet extender uses the same exact method as we used in Q1.wbt to download the HTML from a page. You simply specify a path for the local file, pass it a URL and it'll locate the file and download it. Simple.

:WinInet
   AddExtender("WWINT44I.DLL")
   tophandle  = iBegin(0,"","")
   datahandle = IUrlOpen(tophandle, url)
   xx = iReadData(datahandle, file)
   iClose(datahandle)
   iClose(tophandle)
   Message("All","Done")
   Run(file, "")
Return

WINSOCK

Is almost the same as the example in Q1.wbt, except we introduce the httpRecvFile() function.

:WinSock
   AddExtender("WWWSK44I.dll")
   serv=httpGetServer(url, "")
   path=httpGetPath(url, "")
   rsp = httpRecvFile(serv, path, file, 0)
   Message(rsp,"All Done")
   Run(file, "")
Return

Much like the WinInet, you simply specify the path to the local file and it'll download it nice and neat. Don't you love Winbatch?!

WINHTTP

With the WinHTTP object, you now get introduced to a new method: using the ADODB.Stream object. Opening the WinHTTP is the same as Q1.wbt, but once you .Send the request, you need to open a Stream object:


:WinHTTP
   WinHTTP = ObjectCreate("WinHttp.WinHttpRequest.5")
   WinHTTP.Open("GET", url, @FALSE)
   WinHTTP.Send()
   ADOStream = CreateObject("ADODB.Stream")
   ADOStream.Type = 1
   ADOStream.Open()

Once the Stream object is created you need to specify the .Type of data it'll be handling (in this case .Type = 1, or binary); then you write the data to the stream, save it to a file, then close the stream:

ADOStream.Write(WinHTTP.responseBody)
   ADOStream.SaveToFile(file,2)
   ADOStream.close()

After that close up is standard:

 ADOStream=0
   WinHTTP = 0
   Message("All","Done")
   Run(file, "")
Return

XMLHTTP

As I'm sure you're well aware of by now, using XMLHTTP object is almost identical to the code from the WinHTTP:

:XMLHTTP
   xmlHTTP = ObjectCreate("Msxml2.XMLHTTP")
   xmlHTTP.Open("GET", url, @FALSE)
   xmlHTTP.Send()
   ADOStream = CreateObject("ADODB.Stream")
   ADOStream.Type = 1
   ADOStream.Open()
   ADOStream.Write(xmlHTTP.responseBody)
   ADOStream.SaveToFile(file,2)
   ADOStream.close()
   ADOStream=0
   xmlHTTP = 0
   Message("All","Done")
   Run(file, "")
Return

XSTANDARD

This object is a nice surprise. Built into the XHTTP object is the method .SaveResponseToFile() and you don't need to supply it with anything other than the local file name. No streams to open and close, no .Write, no .Save, it's handled elegantly in a single command:

:XStandard
   xHTTP = ObjectCreate("XStandard.HTTP")
   xHTTP.Get(url)
   xHTTP.SaveResponseToFile(file)
   xHTTP = 0
   Message("All","Done")
   Run(file, "")
Return

No matter which method you choose, you'll find they're all pretty easy to work with.

Question: How do I export an HTML table to a CSV file ?

Q8.WBT


;   Question: How do I export an HTML table to a CSV file?

editor = "notepad.exe"
tPath  = DirScript()
url    = StrCat(tPath, "Q8.html")
file   = url

OptionList = "CreateDataTables|MSIE"
label      = AskItemlist("Select Code Option", OptionList, "|", @UNSORTED, @SINGLE, @TRUE)
GoSub %label%

Exit

:CreateDataTables
   html = ""
   html = StrCat(html, `<html>`, @CRLF)
   html = StrCat(html, `<head>`, @CRLF)
   html = StrCat(html, `<title>Export Data to CSV</title>`, @CRLF)
   html = StrCat(html, `<style>`, @CRLF)
   html = StrCat(html, `caption, .tdbold {font-weight: bold}`, @CRLF)
   html = StrCat(html, `td, th, table, caption {border: .25mm solid black;border-collapse: collapse;}`, @CRLF)
   html = StrCat(html, `td {background: whitesmoke;text-align: center}`, @CRLF)
   html = StrCat(html, `table {font-size: 15pt;padding: 3;spacing: 3;width: 50%%}`, @CRLF)
   html = StrCat(html, `caption, th {color: yellow; background: black}`, @CRLF)
   html = StrCat(html, `</style>`, @CRLF)
   html = StrCat(html, `</head>`, @CRLF)
   html = StrCat(html, `<body><center><br/><br/>`, @CRLF)
   html = StrCat(html, `<table border="1">`, @CRLF)
   html = StrCat(html, `<caption>Table #1</caption>`, @CRLF)
   html = StrCat(html, `<tr><td>1</td><td>2</td><td>3</td></tr>`, @CRLF)
   html = StrCat(html, `<tr><td>4</td><td>5</td><td>6</td></tr>`, @CRLF)
   html = StrCat(html, `<tr><td>7</td><td>8</td><td>9</td></tr>`, @CRLF)
   html = StrCat(html, `<tr><td>10</td><td>11</td><td>12</td></tr>`, @CRLF)
   html = StrCat(html, `<tr><td>13</td><td>14</td><td>15</td></tr>`, @CRLF)
   html = StrCat(html, `</table>`, @CRLF)
   html = StrCat(html, `<br/><br/>`, @CRLF)
   html = StrCat(html, `<table border="1">`, @CRLF)
   html = StrCat(html, `<caption>Table #2</caption>`, @CRLF)
   html = StrCat(html, `<tr><td>20</td><td>21</td><td>22</td></tr>`, @CRLF)
   html = StrCat(html, `<tr><td>23</td><td>24</td><td>25</td></tr>`, @CRLF)
   html = StrCat(html, `<tr><td>26</td><td>27</td><td>28</td></tr>`, @CRLF)
   html = StrCat(html, `<tr><td>29</td><td>30</td><td>31</td></tr>`, @CRLF)
   html = StrCat(html, `<tr><td>32</td><td>33</td><td>34</td></tr>`, @CRLF)
   html = StrCat(html, `</table>`, @CRLF)
   html = StrCat(html, `<br/><br/>`, @CRLF)
   html = StrCat(html, `<table border="1">`, @CRLF)
   html = StrCat(html, `<caption>Table #3</caption>`, @CRLF)
   html = StrCat(html, `<tr><th>Column 1</th><th>Column 2</th><th>Column 3</th></tr>`, @CRLF)
   html = StrCat(html, `<tr><td>70</td><td>80</td><td>90</td></tr>`, @CRLF)
   html = StrCat(html, `<tr><td>100</td><td>110</td><td>120</td></tr>`, @CRLF)
   html = StrCat(html, `<tr><td>130</td><td>140</td><td>150</td></tr>`, @CRLF)
   html = StrCat(html, `<tr><td>160</td><td>170</td><td>180</td></tr>`, @CRLF)
   html = StrCat(html, `<tr><td>190</td><td>200</td><td>210</td></tr>`, @CRLF)
   html = StrCat(html, `</table>`, @CRLF)
   html = StrCat(html, `</center></body>`, @CRLF)
   html = StrCat(html, `</html>`, @CRLF)
   FilePut(file, html)
   Message("All", "Done")   
Return

:MSIE
   msie = ObjectCreate("InternetExplorer.Application")
   msie.addressbar = @FALSE
   msie.statusbar = @FALSE
   msie.menubar = @FALSE
   msie.toolbar = @FALSE
   msie.visible = @TRUE
   msie.navigate(url)
   While msie.busy || msie.readystate <> 4
      TimeDelay(0.5)
   EndWhile
   Table2 = msie.document.GetElementsByTagName("TABLE").item(1)
   RowCollection = Table2.GetElementsByTagName("TR")
   Message("First Row: Table 2", RowCollection.item(0).innerHTML)

   Table3 = msie.document.GetElementsByTagName("TABLE").item(2)
   RowCollection = Table3.GetElementsByTagName("TR")
   Message("First Row: Table 3", RowCollection.item(0).innerHTML)
   CSVText = ""
   Table = msie.document.GetElementsByTagName("TABLE").item(2)
   RowCollection = Table.GetElementsByTagName("TR")
   ForEach Row In RowCollection
      If !StrIndexNC(Row.innerHTML, "<TH>", 1, @FWDSCAN)
         RowText = ""
         ForEach cell In Row.cells
            RowText = ItemInsert(cell.innerText, -1, RowText, ",")
         Next
         CSVText = ItemInsert(RowText, -1, CSVText, @LF)
      EndIf
   Next
   Message("CSV", CSVText)
   msie.quit
   msie = 0
Return

Many web sites display data that you can use in your work, so being able to grab a table off a website and put it into another program like MS Access or MS Excel, can save a lot of time.

This will show you how to work with MSIE when doing this. There are other methods, such as using MS Excel or converting the HTML into XML and doing it that way, but those are for other days. We'll try and stay on track here and show you how to do it using MSIE. You can decide what's best for you.

Once we start MSIE and navigate to the page, we see 3 tables. As in previous examples we pick a certain table, then display the first row to the user. The reason being is that Table 3 has a header (TH) which contain the column names. In the MS HTML object model, since we ask for all rows in a table, it gives us all of them regardless of what we're really interested in. If you need the column names then it's great, if you're only interested in the data, then you may end up with data you don't need. Likewise the rare appearance of footers (TF) which could contain sums of columns or other data. Anyway our little example makes you aware of these things and you can plan accordingly.

After we find the 3rd table, we setup a variable to hold our CSV data, then grab the rows inside the table:

CSVText = ""
   Table = msie.document.GetElementsByTagName("TABLE").item(2)
   RowCollection = Table.GetElementsByTagName("TR")

We then use ForEach to loop through the collection of rows. Since we can access the .innerHTML of each row (the first Message() statements we saw) we can look for things like TH or TF and exclude them:

   ForEach Row In RowCollection
      If !StrIndexNC(Row.innerHTML, "<TH>", 1, @FWDSCAN)

We then setup a new variable to hold the data from each of the cells and insert them one at time into the variable:

   RowText = ""
         ForEach cell In Row.cells
            RowText = ItemInsert(cell.innerText, -1, RowText, ",")
         Next

Finally, once we've done all the cells in that row, we insert the entire row into the CSV variable and then continue until all the rows are processed:

         CSVText = ItemInsert(RowText, -1, CSVText, @LF)
      EndIf
   Next
   Message("CSV", CSVText)

Cleanup is simple:

   msie.quit
   msie = 0

And we're done. Fast and easy. There's a way to export an HTML table to CSV using XMLStarlet, but it involves some discussion of XML and XSL, which we'll get to later. We'll show you the export routine for that tool at that time.

Question: How can I change a page that has bad or unreadable colors?

Q9.WBT


;   Question: How can I change the colors on a page that has bad or unreadable colors?

editor = "notepad.exe"
tPath  = DirScript()
url    = StrCat(tPath, "Q9.html")
file   = url

OptionList = "CreatePage|ChangePageViaMSIE"
label      = AskItemlist("Select Code Option", OptionList, "|", @UNSORTED, @SINGLE, @TRUE)
GoSub %label%

Exit

:CreatePage
   html = ""
   html = StrCat(html, `<html>`, @CRLF)
   html = StrCat(html, `<head>`, @CRLF)
   html = StrCat(html, `<style>`, @CRLF)
   html = StrCat(html, `    body {background: powderblue;color: silver}`, @CRLF)
   html = StrCat(html, `</style>`, @CRLF)
   html = StrCat(html, `</head>`, @CRLF)
   html = StrCat(html, `<body>`, @CRLF)
   html = StrCat(html, `<h2>This is really hard to read</h2>`, @CRLF)
   html = StrCat(html, `<p/>`, @CRLF)
   html = StrCat(html, `<p>Isn't it ?</p>`, @CRLF)
   html = StrCat(html, `</body>`, @CRLF)
   html = StrCat(html, `</html>`, @CRLF)
   FilePut(file, html)
   Run(file, "")
Return

:ChangePageViaMSIE
   msie = ObjectCreate("InternetExplorer.Application")
   msie.addressbar = @FALSE
   msie.statusbar = @FALSE
   msie.menubar = @FALSE
   msie.toolbar = @FALSE
   msie.visible = @TRUE
   msie.navigate(url)
   While msie.busy || msie.readystate <> 4
      TimeDelay(0.5)
   EndWhile
   Message("WBT", "Before")
   msie.document.body.style.color = "white"
   msie.document.body.style.backgroundColor = "darkblue"
   Message("WBT", "After")
   msie.quit
   msie = 0
Return

It's quite common to have to do research on the web for your work. While the information may be helpful, have you ever been to a site that hurts your eyes or is impossible to read ? Join the club, but we'll also show you how to do something about it.

After you build the page, you'll start MSIE and display the page. Silver text on skyblue page plays havoc with my eyes.

Just as we did with the placing red borders around tables, and controls, you can also switch the colors of the page:

   msie.document.body.style.color = "white"
   msie.document.body.style.backgroundColor = "darkblue"

Will set the text to white on a nice dark blue background, making it a little nicer to read. MSIE already has the ability to change the size of the font displayed and you can do it via your mouse by holding down the CONTROL key and moving the mousewheel towards you or away from you until you find a setting you like. Or you can use the menus, click the VIEW menu, then TEXT SIZE and change it accordingly.

You might be able to set the font face too. We'll show you how to use the Object Browser later on, and you can give it a try, especially if you like a site but can't stand the font it displays in. The alternative is to download the page(s) via the methods in Q1.wbt and change them by hand and read them on your own computer.

Question: How do I attach to an already running instance of MSIE ?

Q10.WBT


;      Question: How do I attach to an already running instance of MSIE ?

#DefineFunction GetMSIE()
   retval = 0
   objShell = ObjectCreate("Shell.Application")
   For x = 0 To objShell.Windows.count-1
      objWindow = objShell.Windows.item(x)
      If objWindow.TopLevelContainer
         If StrIndexNC(objWindow.fullname, "iexplore.exe", 1, @FWDSCAN); <> 0
            If AskYesNo("GetMSIE()", StrCat("Attach to ", objWindow.locationURL))
               objShell = 0
               Return(objWindow)
               Break
            EndIf
         EndIf
      EndIf
      objWindow = 0
   Next
   objShell = 0
   Message("Debug", "Can't find the browser!")
   Return(0)
#EndFunction

tPath  = DirScript()
url    = StrCat(tPath, "Q9.html")

Run("iexplore.exe", url)
TimeDelay(1)

msie = GetMSIE()

If msie
   msie.addressbar = @FALSE
   msie.statusbar = @FALSE
   msie.menubar = @FALSE
   msie.toolbar = @FALSE
   Message("WBT", "Before")
   msie.document.body.style.color = "white"
   msie.document.body.style.backgroundColor = "darkblue"
   Message("WBT", "After")
   msie.quit
   msie = 0
EndIf

Exit

The last script was handy, but what if you're using MSIE and happen onto a website by accident and don't have time to code a script or if you're forced to move between pages (remember "referred") just to change the font.

This script will show you how to attach to an existing instance of MSIE and work with it. Obviously you may have more important reasons for attaching to a pre-existing instance of MSIE and automating it, but this will show you how to do so.

First we start the page from the last example in MSIE:

    url    = StrCat(tPath, "Q9.html")
    Run("iexplore.exe", url)
    TimeDelay(1)

We wait a second so the user can see, then we connect to it using the User Defined Function GetMSIE():

    msie = GetMSIE()

The UDF is designed to return an object handle to MSIE or zero, meaning it couldn't find one. It takes no arguments and connects to the shell on it's own.

#DefineFunction GetMSIE()
   retval = 0
   Shell = ObjectCreate("Shell.Application")

Once there, it loops through the collection of Windows. The Shell object is different than the MS HTML collections and uses .count instead of .length, but it is zero-based.

We now look at each window it finds in the collection and looks at the window's .fullname property to find "iexplore.exe". This is done because Shell.windows will also find any Windows Explorer objects out there, since they're all in the same collection. In addition, it'll find explorer or MSIE windows opened in reverse order, meaning it'll find the most recently opened windows first, and the oldest last:

For x = 0 To objShell.Windows.count-1
      objWindow = objShell.Windows.item(x)
      If objWindow.TopLevelContainer
         If StrIndexNC(objWindow.fullname, "iexplore.exe", 1, @FWDSCAN); <> 0
            If AskYesNo("GetMSIE()", StrCat("Attach to ", objWindow.locationURL))
               objShell = 0
               Return(objWindow)
               Break
            EndIf
         EndIf
      EndIf
      objWindow = 0
 Next

When it finds a match it'll display the .locationURL property of the object to the user so they can confirm. Then it returns a handle to the browser object.

First we check to make sure the value of "msie" isn't zero. If it's not, then we continue. From there the script takes over just like our previous examples:

    msie.addressbar = @FALSE
    msie.statusbar = @FALSE
    msie.menubar = @FALSE
    msie.toolbar = @FALSE
    Message("WBT", "Before")
    msie.document.body.style.color = "white"
    msie.document.body.style.backgroundColor = "darkblue"

We can reset the menu bars and address bars and change the font color and background color of the page.

Once you have the browser handle, you could then perform any other operation we've done previously using the .document property. Try pointing the script at the FRAMESET example page, or the page with 3 tables on it and cutting and pasting the code into it. It should perform the same.

Note: This script may not work on all versions of Microsoft Windows. Over the years on the Winbatch BBS we've seen a few different methods, however I've had people tell me that it won't work on their pc (despite having the same version of Windows), so just in case you get an error trying this script you will know why. We do have another alternative that involves writing entries into the Windows Registry but I'd rather not rely on that method for this tutorial. If this script doesn't work for you, try posting a message on the Winbatch BBS and we can send you the "AAA.Test" script that Marty posted a few years back.

Question: How do I work with values in an XML file?

Q11.WBT

 Method Filter
 Create XML 
 Get XML Data 
 Get XML Data from Attributes 
 Get XML Data via HTTP 
 Get XML Data Via XMLStarlet 
 Get XML Data Via XMLStarlet via HTTP 
 Get XML Via XStandard


;   Question: How do I work with values in an XML file?

editor = "notepad.exe"
tPath  = DirScript()
file    = StrCat(tPath, "Q11.xml")

:CreateXML
   xml = ""
   xml = StrCat(xml, `<?xml version="1.0" ?>`, @CRLF)
   xml = StrCat(xml, `<XML_DIZ_INFO>`, @CRLF)
   xml = StrCat(xml, `   <Program_Info>`, @CRLF)
   xml = StrCat(xml, `      <Program_Name>Control Manager Extender</Program_Name>`, @CRLF)
   xml = StrCat(xml, `      <Program_Version>20035</Program_Version>`, @CRLF)
   xml = StrCat(xml, `      <File_Info>`, @CRLF)
   xml = StrCat(xml, `         <Filename_Versioned>wwctl44i.zip</Filename_Versioned>`, @CRLF)
   xml = StrCat(xml, `         <Filename_Previous>wwctl44i.zip</Filename_Previous>`, @CRLF)
   xml = StrCat(xml, `         <Filename_Generic>wwctl44i.zip</Filename_Generic>`, @CRLF)
   xml = StrCat(xml, `         <Filename_Long>wwctl44i.zip</Filename_Long>`, @CRLF)
   xml = StrCat(xml, `         <File_Size_Bytes>xxsizekxx</File_Size_Bytes>`, @CRLF)
   xml = StrCat(xml, `         <File_Size_K>xxsizekxx</File_Size_K>`, @CRLF)
   xml = StrCat(xml, `         <File_Size_MB>xxsizemxx</File_Size_MB>`, @CRLF)
   xml = StrCat(xml, `      </File_Info>`, @CRLF)
   xml = StrCat(xml, `      <Expire_Info>`, @CRLF)
   xml = StrCat(xml, `         <Has_Expire_Info>N</Has_Expire_Info>`, @CRLF)
   xml = StrCat(xml, `         <Expire_Count />`, @CRLF)
   xml = StrCat(xml, `         <Expire_Based_On>Days</Expire_Based_On>`, @CRLF)
   xml = StrCat(xml, `         <Expire_Other_Info />`, @CRLF)
   xml = StrCat(xml, `         <Expire_Month />`, @CRLF)
   xml = StrCat(xml, `         <Expire_Day />`, @CRLF)
   xml = StrCat(xml, `         <Expire_Year />`, @CRLF)
   xml = StrCat(xml, `      </Expire_Info>`, @CRLF)
   xml = StrCat(xml, `   </Program_Info>`, @CRLF)

   xml = StrCat(xml, `   <Attribute_Section>`, @CRLF)
   xml = StrCat(xml, `      <Element Name="One" Number="1" />`, @CRLF)
   xml = StrCat(xml, `      <Element Name="Two" Number="2" />`, @CRLF)
   xml = StrCat(xml, `      <Element Name="Three" Number="3" />`, @CRLF)
   xml = StrCat(xml, `   </Attribute_Section>`, @CRLF)

   xml = StrCat(xml, `   <Web_Info>`, @CRLF)
   xml = StrCat(xml, `      <Application_URLs>`, @CRLF)
   xml = StrCat(xml, `         <Application_Info_URL>http://www.winbatch.com/download.html</Application_Info_URL>`, @CRLF)
   xml = StrCat(xml, `         <Application_Order_URL>http://commerce.winbatch.com</Application_Order_URL>`, @CRLF)
   xml = StrCat(xml, `         <Application_Screenshot_URL>http://www.winbatch.com/art/wbscreen.gif</Application_Screenshot_URL>`, @CRLF)
   xml = StrCat(xml, `         <Application_Icon_URL>http://www.winbatch.com/art/wbicon.gif</Application_Icon_URL>`, @CRLF)
   xml = StrCat(xml, `         <Application_XML_File_URL>http://www.winbatch.com/wwwftp/padxml/wwctl44i.xml</Application_XML_File_URL>`, @CRLF)
   xml = StrCat(xml, `      </Application_URLs>`, @CRLF)
   xml = StrCat(xml, `      <Download_URLs>`, @CRLF)
   xml = StrCat(xml, `         <Primary_Download_URL>http://files.winbatch.com/wwwftp/wb01/wwctl44i.zip</Primary_Download_URL>`, @CRLF)
   xml = StrCat(xml, `         <Secondary_Download_URL />`, @CRLF)
   xml = StrCat(xml, `         <Additional_Download_URL_1 />`, @CRLF)
   xml = StrCat(xml, `         <Additional_Download_URL_2 />`, @CRLF)
   xml = StrCat(xml, `      </Download_URLs>`, @CRLF)
   xml = StrCat(xml, `   </Web_Info>`, @CRLF)
   xml = StrCat(xml, `</XML_DIZ_INFO>`, @CRLF)

   FilePut(file, xml)
   Run(file, "")
Return


;   Question: How do I work with values in an XML file?

editor = "notepad.exe"
tPath  = DirScript()
file    = StrCat(tPath, "Q11.xml")

:GetXMLData
   xpath = "/XML_DIZ_INFO/Program_Info"
   xmlDoc = ObjectCreate("Msxml2.DOMDocument.3.0")
   xmlDoc.async = @FALSE
   xmlDoc.load(file)
   err = xmlDoc.parseerror
   If err.errorCode
      ec = err.errorCode
      er = err.reason
      el = err.line
      elp = err.linepos
      es  = err.srcText
      Message("XML Parse Error", StrCat("Error Code: ", ec, @LF, "Reason: ", er, @LF, "Line: ", el, " ", "Column: ", elp, @LF, "Text: ", es))
      Exit
   EndIf
   Message("Debug", xmlDoc.xml)
   ProgramInfoElement = xmlDoc.selectNodes(xpath)
   Message("Debug", ProgramInfoElement.item(0).xml)
   ProgramName    = ProgramInfoElement.item(0).selectSingleNode("./Program_Name").text
   ProgramVersion = ProgramInfoElement.item(0).selectSingleNode("./Program_Version").text
   Message("Debug", StrCat(ProgramName, @CRLF, ProgramVersion))
   xmlDoc = 0
Return


;   Question: How do I work with values in an XML file?

editor = "notepad.exe"
tPath  = DirScript()
file    = StrCat(tPath, "Q11.xml")

:GetXMLDataAttributes
   xpath = "/XML_DIZ_INFO/Attribute_Section/Element"
   xmlDoc = ObjectCreate("Msxml2.DOMDocument.3.0")
   xmlDoc.async = @FALSE
   xmlDoc.load(file)
   ElementCollection = xmlDoc.selectNodes(xpath)
   Message("Debug", ElementCollection.length)
   ForEach Element In ElementCollection
      Display(2, "Debug", StrCat(Element.getAttribute("Name"), "|", Element.getAttribute("Number")))
   Next
   number = 3
   xpath = StrCat("/XML_DIZ_INFO/Attribute_Section/Element[@Number=", number, "]/@Name")
   Message("Debug", xmlDoc.selectSingleNode(xpath).text)
   xmlDoc = 0
Return


;   Question: How do I work with values in an XML file?

editor = "notepad.exe"
tPath  = DirScript()
file    = StrCat(tPath, "Q11.xml")

:GetXMLDataHTTP
   url = "http://www.winbatch.com/wwwftp/padxml/wwctl44i.xml"
   ofile = StrCat(tPath,"Q11Http.xml")
   xmlDoc = ObjectCreate("Msxml2.DOMDocument.3.0")
   xmlDoc.async = @FALSE
   xmlDoc.load(url)
   xmlDoc.save(ofile)
   xmlDoc = 0
   Run(editor, ofile)
Return


;   Question: How do I work with values in an XML file?

editor = "notepad.exe"
tPath  = DirScript()
file    = StrCat(tPath, "Q11.xml")

:GetXMLDataViaXMLStarlet
   url   = file
   ofile = StrCat(tpath, "Q11.txt")
   comspec = Environment("COMSPEC")
   RunHideWait(comspec, `/c xml sel -T -t -m "/XML_DIZ_INFO/Program_Info" -v "concat(./Program_Name,'|',./Program_Version)" "%url%" > "%ofile%"`)
   RunWait(editor, ofile)
   number = 3
   xpath = StrCat("/XML_DIZ_INFO/Attribute_Section/Element[@Number=", number, "]/@Name")
   RunHideWait(comspec, `/c xml sel -T -t -m "%xpath%" -v "." "%url%" > "%ofile%"`)
   RunWait(editor, ofile)
Return


;   Question: How do I work with values in an XML file?

editor = "notepad.exe"
tPath  = DirScript()
file    = StrCat(tPath, "Q11.xml")

:GetXMLDataViaXMLStarletHTTP
   url = "http://www.winbatch.com/wwwftp/padxml/wwctl44i.xml"
   ofile = StrCat(tPath,"Q11Http.xml")
   comspec = Environment("COMSPEC")
   RunHideWait(comspec, `/c xml fo "%url%" > "%ofile%"`)
   RunWait(editor, ofile)
Return


;   Question: How do I work with values in an XML file?

editor = "notepad.exe"
tPath  = DirScript()
file    = StrCat(tPath, "Q11.xml")

:GetXMLViaXStandard
   xHTTP = ObjectCreate("XStandard.HTTP")
   url = "http://www.winbatch.com/wwwftp/padxml/wwctl44i.xml"
   ofile = StrCat(tPath,"Q11Http.xml")
   xHTTP.Get(url)
   xHTTP.SaveResponseToFile(ofile)
   xHTTP = 0
   Run(editor, ofile)
Return

At some point, you may need to work with an XML file. This tutoral can't show you everything, but we can give you an introductory lesson in working with XML and Winbatch. We'll touch on few topics here, then leave you learn more about XML on your own.

I've taken one of the XML Pad files that comes with each Winbatch Extender zip file and created a smaller version of it. This way you can download the source file and experiment with it later on.

The first example gets XML data from the document's elements, by using the XML DOM. Once you learn how to use the XML DOM you'll be able to work with XML on almost any Windows machine that has MSIE version 5+ installed on it.

The first thing we do is create an DOM object:

   xpath = "/XML_DIZ_INFO/Program_Info"
   xmlDoc = ObjectCreate("Msxml2.DOMDocument.3.0")

We then tell the document we don't want an asynchronous download of the file we're going to load. Why? This is the same as using MSIE and having to use the UDF WaitForBrowser(), meaning that with a large XML document, loaded from a remote web site, it could take some time. However, we're going to load directly from a file, and we want the document to wait until everything is loaded.

Then we issue the .load() command:

   xmlDoc.async = @FALSE
   xmlDoc.load(file)

This loads a file. There's another command called .loadXML() which you'll see soon enough, which allows you to load text from a variable when you don't have a file.

Next part of the script does some error checking. This is important, especially if you're new to XML. The script sets up a reference to the document's .pareserror object, then checks it to see if an error occured during loading:

   err = xmlDoc.parseerror
   If err.errorCode
      ec = err.errorCode
      er = err.reason
      el = err.line
      elp = err.linepos
      es  = err.srcText
      Message("XML Parse Error", StrCat("Error Code: ", ec, @LF, "Reason: ", er, @LF, "Line: ", el, " ", "Column: ", elp, @LF, "Text: ", es))
      Exit
   EndIf

If there's no error (eg, err.errorCode equals zero) then the script continues. Otherwise the script sets up all the error information, which it then shows to the user, before exiting. This can be invaluable in helping sorting out XML formatting errors in the future, as it will tell you where the error occured in the file and why.

Next we show the document's .xml property, or the entire xml of the file. This is also important and can be used as an error-checking device. If there's an error loading the file or XML you've supplied, the document's .xml will be blank.

   Message("Debug", xmlDoc.xml)

Whenever I get results I'm not expecting, like no data, the first thing I do is display the document's .xml to see if it's blank. Then if it is, I check the error information carefully.

Next and somewhat similar to using MSIE, we query the document for a collection of nodes. You'll see two types of queries in this section, one for a collection of nodes and one for a single node:

   ProgramInfoElement = xmlDoc.selectNodes(xpath)
   Message("Debug", ProgramInfoElement.item(0).xml)
   ProgramName    = ProgramInfoElement.item(0).selectSingleNode("./Program_Name").text
   ProgramVersion = ProgramInfoElement.item(0).selectSingleNode("./Program_Version").text
   Message("Debug", StrCat(ProgramName, @CRLF, ProgramVersion))
   xmlDoc = 0

As you can see, when you return a collection, just like the collections in MSIE you can access the individual members of the collection by using the .item() convention. ForEach will work too, but we don't need it in this example.

The other important aspect of this code is the variable called "xpath" which is at the top of the script. If you examine it you'll see:

   xpath = "/XML_DIZ_INFO/Program_Info"

Xpath is XML Path Language (XPath). XPath provides a language for addressing parts of an XML document.

An XML document is made up of a root element (the top most element) and it's children. Children may have children and siblings and so on. A good analogy of the hierarchy is the folder structure in Windows. Doesn't the xpath string remind you of a file path? So in essence what we've done is told the XML DOM to start at the "/XML_DIZ_INFO" folder and go to the "Program_Info" child folder and give us a collection of nodes, which it does.

Then to get the "ProgramName" we tell the XML DOM to select the first node in the collection, then select the child node called "Program_Name". You'll notice in the selection string we put "./Program_Name". The dot "." means "the current node context". If you open WinStudio and go up to the FILE menu and hit the OPEN button, you'll get a dialog box. Instead of typing a file name, if you type a single period and hit the OPEN button, you'll see nothing happens. In the Windows hierarchy the "." means the current directory. So in effect we've told the XML DOM to "start at the current node, and bring back the child node with the tag "Program_Name". It turns out the current node also has another child we're interested in called "Program_Version". To get the value of the node we use the .text property and the script brings us back the values contained in those nodes.

Closing up should be old hat by now and we're done. We've just accessed the values inside an XML file.

Sometimes there are values inside attributes of an element. By the way, I've changed the PAD file slightly and put in some elements that have data inside their attributes which you won't find in the original PAD file.

So this time we're telling the XML DOM to start at "/XML_DIZ_INFO", drop down into the "Attribute_Section" and find the child nodes named "Element":

   xpath = "/XML_DIZ_INFO/Attribute_Section/Element"

We open the object and load the file as before

   xmlDoc = ObjectCreate("Msxml2.DOMDocument.3.0")
   xmlDoc.async = @FALSE
   xmlDoc.load(file)

And then we setup a collection to hold the nodes:

   ElementCollection = xmlDoc.selectNodes(xpath)
   Message("Debug", ElementCollection.length)

Notice that like the MS HTML collection, the XML DOM uses the ".length" property to tell you how many nodes are in the collection you've requested.

Next we use ForEach to loop through the nodes. Since each node has an attribute called "Name" and "Number" you can get at the values by using the .getAttribute() method:

ForEach Element In ElementCollection
  Display(2, "Debug", StrCat(Element.getAttribute("Name"), "|", Element.getAttribute("Number")))
Next

If for some reason the nodes contained no attributes, .getAttribute() would just return a blank string. Unlike HTML, Case is important in XML so make sure you specify node tags and attribute text as they are in the file.

Now we'll show you how to get at a particular attribute. If the .length of the node collection above was in the thousands, the user would be sitting for a considerable length of time before spotting what they needed. If you know what you're looking for, you can go right to it:

   number = 3
   xpath = StrCat("/XML_DIZ_INFO/Attribute_Section/Element[@Number=", number, "]/@Name")
   Message("Debug", xmlDoc.selectSingleNode(xpath).text)

You can change the value of number to 1 or 2 and see how it works for yourself.

What we've done is tell the XML DOM to start at "/XML_DIZ_INFO", drop down into the "Attribute_Section" and find the child nodes named "Element" whose Number attribute (attributes are prefixed with the @-sign) equals 3. The convention /Element[] resembles a Winbatch array, and you can specify the attribute text name and value for it to find the node you want.

As you can see, if you've got a huge XML file and need access to certain nodes you can get at them quickly and efficiently by specifying an xpath query string.

Next we show you how to retrieve an XML file via HTTP. Since XML is designed for web work, it makes sense that you should be able to access an XML file by providing a URL to it:

   url = "http://www.winbatch.com/wwwftp/padxml/wwctl44i.xml"
   ofile = StrCat(tPath,"Q11Http.xml")
   xmlDoc = ObjectCreate("Msxml2.DOMDocument.3.0")
   xmlDoc.async = @FALSE
   xmlDoc.load(url)

Everything is the same as before except the XML file is on the Winbatch Web Site, right where the PAD file said it would be.

This time we use the document's .save() method to save the file. This means you now know of another way to "download" a file from a web server:

   xmlDoc.save(ofile)

This time using the XML DOM.

XMLSTARLET

You can also use the XMLStarlet tool to download and carry out operations on a XML file on a remote server. In this script we'll have XMLStarlet grab the file and retrieve the information from it.

First we setup the URL, then provide an "output file" to hold the data, since XMLStarlet doesn't have a DOM interface:

   url   = file
   ofile = StrCat(tpath, "Q11.txt")

Then we tell XMLStarlet to retrieve the specified data and write it to a file:

   comspec = Environment("COMSPEC")
   RunHideWait(comspec, `/c xml sel -T -t -m "/XML_DIZ_INFO/Program_Info" -v "concat(./Program_Name,'|',./Program_Version)" "%url%" > "%ofile%"`)
   RunWait(editor, ofile)

Winbatch will wait until you CLOSE your editor before continuing on.

Next we tell XMLStarlet to retrieve the 3rd Element, just as we did using the XML DOM:

   number = 3
   xpath = StrCat("/XML_DIZ_INFO/Attribute_Section/Element[@Number=", number, "]/@Name")
   RunHideWait(comspec, `/c xml sel -T -t -m "%xpath%" -v "." "%url%" > "%ofile%"`)
   RunWait(editor, ofile)

And it does.

Here's a more detailed explanation of the XMLStarlet command line:

xml sel -T means "we want  XMLStarlet  to use the select option and the output
will be text.

-t  -m   "/XML_DIZ_INFO/Program_Info"   means  "using   template,  match  node
specified"

-v  "concat(./Program_Name,'|',./Program_Version)"  simply  means  "print  the
value of the Program_Name|Program_Version.

And the second command line:

We specify the xpath the same way as we did using XML DOM. Tell it to -m match the node and -v print the value of the match.

As you know, XMLStarlet can also download files in snap:

:GetXMLDataViaXMLStarletHTTP
   url = "http://www.winbatch.com/wwwftp/padxml/wwctl44i.xml"
   ofile = StrCat(tPath,"Q11Http.xml")
   comspec = Environment("COMSPEC")
   RunHideWait(comspec, `/c xml fo "%url%" > "%ofile%"`)
   RunWait(editor, ofile)
Return

It doesn't matter whether they're local files or remote files.

XSTANDARD

And not to be forgotten, XStandard can do it as well.

:GetXMLViaXStandard
   xHTTP = ObjectCreate("XStandard.HTTP")
   url = "http://www.winbatch.com/wwwftp/padxml/wwctl44i.xml"
   ofile = StrCat(tPath,"Q11Http.xml")
   xHTTP.Get(url)
   xHTTP.SaveResponseToFile(ofile)
   xHTTP = 0
   Run(editor, ofile)
Return

At this point we'll leave it as an exercise to the reader to use the WinInet and WinSock extenders to download an xml file via a URL. You can use the code in the Q7.wbt file as a template.

Question: How do I work with values from XML text?

Q12.WBT

 Method Filter
 Get XML from Text  Get XML Attributes from Text


;   Question: How do I work with values from XML text ?

editor = "notepad.exe"
tPath  = DirScript()
file    = StrCat(tPath, "Q11.xml")

:GetXMLText
   xml = `<Root><FirstElement>1</FirstElement><SecondElement>2</SecondElement></Root>`
   xmlDoc = ObjectCreate("Msxml2.DOMDocument.3.0")
   xmlDoc.async = @FALSE
   xmlDoc.loadXML(xml)
   Message("Debug", xmlDoc.xml)
   Message("Debug", xmlDoc.selectSingleNode("/Root/FirstElement").text)
   xmlDoc = 0
Return


;   Question: How do I work with values from XML text ?

editor = "notepad.exe"
tPath  = DirScript()
file    = StrCat(tPath, "Q11.xml")

:GetXMLTextAttributes
   xml = `<Root><FirstElement Value="1"/><SecondElement Value="2"/></Root>`
   xmlDoc = ObjectCreate("Msxml2.DOMDocument.3.0")
   xmlDoc.async = @FALSE
   xmlDoc.loadXML(xml)
   Message("Debug", xmlDoc.xml)
   Message("Debug", xmlDoc.selectSingleNode("/Root/FirstElement").getAttribute("Value"))
   xmlDoc = 0
Return

Sometimes you won't have an XML file to work with. An application might send you XML as a string of text, or you want to work with a subset of a larger XML file and don't have the need to use the entire file, due to size or other issues.

The good news is that you can work with XML as a text string.

Using the XML DOM it's very similar, you specify the string, but this time you load the XML via the .loadXML() method of the document:

   xml = `<Root><FirstElement>1</FirstElement><SecondElement>2</SecondElement></Root>`
   xmlDoc = ObjectCreate("Msxml2.DOMDocument.3.0")
   xmlDoc.async = @FALSE
   xmlDoc.loadXML(xml)

Then to make sure the XML has no errors, you can peek at the document's .xml property to be sure:

   Message("Debug", xmlDoc.xml)

Then xpath queries are done normally:

   Message("Debug", xmlDoc.selectSingleNode("/Root/FirstElement").text)

Then close up as usual.

Attributes are virtually the same, specify the XML string, load the XML then view the document, then query it:

   xml = `<Root><FirstElement Value="1"/><SecondElement Value="2"/></Root>`
   xmlDoc = ObjectCreate("Msxml2.DOMDocument.3.0")
   xmlDoc.async = @FALSE
   xmlDoc.loadXML(xml)
   Message("Debug", xmlDoc.xml)
   Message("Debug",
   xmlDoc.selectSingleNode("/Root/FirstElement").getAttribute("Value"))

You'll notice in the last line we placed the .getAttribute() method on the end:

    xmlDoc.selectSingleNode("/Root/FirstElement").getAttribute("Value"))

Since the expression "xmlDoc.selectSingleNode("/Root/FirstElement")" evaluates to a single node, you can then get at its attributes. Same with .text property.

In most XML operations, you'll need to ask yourself:

Am I trying to return a collection or a single node?

If it's a collection, then you'll need to loop through the collection to get at each individual node's .text or .getAttribute().

If it's a single node, then you can get at the data right away.

Question: How do I create an XML file ?

Q13.WBT


;   Question: How do I create an XML file ?

editor = "notepad.exe"
tPath  = DirScript()
xfile    = StrCat(tPath, "Q13.html")

xmlDoc = ObjectCreate("Msxml2.DOMDocument.3.0")
xmlDoc.async = @FALSE
xmlDoc.loadXML(`<HTML/>`)

err = xmlDoc.parseerror
If err.errorCode Then GoSub ShowParseErrors

HeadElement  = xmlDoc.createElement("HEAD")
TitleElement = xmlDoc.createElement("TITLE")
TitleElement.text = "Test XML/HTML File"
StyleElement = xmlDoc.createElement("STYLE")
StyleElement.text = StrCat("caption {font-size: 12pt; font-weight: bold; color: blue}", @CRLF, "td {font-size: 8pt; color: green}")

HeadElement.appendChild(TitleElement)
HeadElement.appendChild(StyleElement)

xmlDoc.documentElement.appendChild(HeadElement)

BodyElement  = xmlDoc.createElement("BODY")

TableElement = xmlDoc.createElement("TABLE")
TableElement.setAttribute("border", "1")
TableElement.setAttribute("style",  "border-collapse:collapse;border: .25mm solid black;")
CaptionElement = xmlDoc.createElement("CAPTION")
CaptionElement.text = "Sample Table"
TableElement.appendChild(CaptionElement)
For r = 1 To 3
   TRElement = xmlDoc.createElement("TR")
   For c = 1 To 10
      TDElement = xmlDoc.createElement("TD")
      TDElement.text = Random(999)+1
      TRElement.appendChild(TDElement)
   Next
   TableElement.appendChild(TRElement)
Next

BodyElement.appendChild(TableElement)
xmlDoc.documentElement.appendChild(BodyElement)

HeadElement    = 0
TitleElement   = 0
StyleElement   = 0
BodyElement    = 0
TableElement   = 0
CaptionElement = 0
TRElement      = 0
TDElement      = 0

xmlDoc.save(xfile)
Run(xfile, "")

Exit

:ShowParseErrors
ec = err.errorCode
er = err.reason
el = err.line
elp = err.linepos
es  = err.srcText
Message("XML Parse Error", StrCat("Error Code: ", ec, @LF, "Reason: ", er, @LF, "Line: ", el, " ", "Column: ", elp, @LF, "Text: ", es))

At some point in time you may need to create an XML file. There's nothing preventing you from using the method used in the script Q11.wbt and just using Strcat() to put together XML on the fly. Hopefully you're good at this sort of thing and won't have any problems.

Of course there's no guarantee you'll have done it right until you .load it and check it using the XML DOM.

You can also build XML files using the XML DOM, which means you can add elements and values and then test it right away.

When building an XML file you need to know the following:

the document's "root" element (top-most element)
the .createElement() method
the .setAttribute() method
the .appendChild() method

ROOT ELEMENT

This is easily accessible via the XML DOM. Once you have your document object created:

    xmlDoc = ObjectCreate("Msxml2.DOMDocument.3.0")
    xmlDoc.async = @FALSE
    xmlDoc.loadXML(`<ROOT/>`)

You can specify the root element in the above .loadXML() method. In this example the document consists of only 1 element, the root element. To get at the root element using XML DOM, you now simply refer to it as:

    xmlDoc.documentElement

Next whenever you create an element, no matter where it's going to go you simply reference the document and use the .createElement() method like so:

    NewElement = xmlDoc.createElement("Program_Name")

To set a value to the new element is similar to retrieving the value from an xpath query:

    NewElement.text = "Working with XML"

And now your new element will look like:

    <Program_Name>Working with XML</Program_Name>

If you need to add an attribute to your element, use .setAttribute() method like so:

    NewElement.setAttribute("Name", "MyName")

And your new element will look like:

    <Program_Name Name="MyName">Working with XML</Program_Name>

And then to place it into the document you use the .appendChild() method. This requires a "parent node" which is the node that it's going under:

    xmlDoc.documentElement.appendChild(NewNode)

Will give you:

    <ROOT><Program_Name Name="MyName">Working with XML</Program_Name></ROOT>

With just those tools you can build most XML files you'll need to work with.

For practice we'll build an HTML file to see what it looks like. You sort of have to "think backwards" when building an XML file by this method. Most of the time you'll build a parent node, then all the child nodes for that parent, put data into them (including attributes) then append them to the parent. Once the parent is finished you'll append that to the root. Since XML files can be very simple or very complex, each attempt to build a file might be a bit different. Naturally we'll stick to a simple model for this exercise. We'll build an HTML file, because we can view it in a browser and see how it works, since most programmers are familiar with HTML in some capacity.

The .getAttribute() and .setAttribute() methods are both used in the MS HTML DOM. The .createElement() and .appendChild() are also used in the MS HTML DOM, so after you learn to build an XML file here, you probably can use the MS HTML DOM to build an HTML file, or edit one. The cross-consistency of the two DOMs are useful.

Okay, most HTML files have a "bare bones" look like:

    <HTML>
        <HEAD></HEAD>
    <BODY>
    </BODY>
    </HTML>

But of course they have more data inside the main elements. So to get started our script opens a document and places a root element into it:

    xmlDoc = ObjectCreate("Msxml2.DOMDocument.3.0")
    xmlDoc.async = @FALSE
    xmlDoc.loadXML(`<HTML/>`)

Then we start building the page:

    HeadElement  = xmlDoc.createElement("HEAD")
    TitleElement = xmlDoc.createElement("TITLE")
    TitleElement.text = "Test XML/HTML File"

Since most HTML files have a STYLE element, we'll make one too and place some specifications in there to show how the contents of a STYLE node are just text like any other XML document:

    StyleElement = xmlDoc.createElement("STYLE")
    StyleElement.text = StrCat("caption {font-size: 12pt; font-weight: bold; color: blue}", @CRLF, "td {font-size: 8pt; color: green}")

Then we append the TitleElement to the HeadElement and follow that by appending the StyleElement to the HeadElement:

    HeadElement.appendChild(TitleElement)
    HeadElement.appendChild(StyleElement)

We follow that by appending the Head to the root:

    xmlDoc.documentElement.appendChild(HeadElement)

Now the top part of our HTML document...er XML document is done. We need to make the BODY:

    BodyElement  = xmlDoc.createElement("BODY")

Now we're going to place a table inside the body, so we'll need to create it. If you've seen tables in HTML they often look strange if left unformatted, so in this script we'll specify some attributes to take advantage of the STYLE element we created earlier:

    TableElement = xmlDoc.createElement("TABLE")
    TableElement.setAttribute("border", "1")
    TableElement.setAttribute("style",  "border-collapse:collapse;border: .25mm solid black;")

The above has the effect of creating a table, giving it a border by setting it to 1, then adding a style attribute, which collapses the border (makes it thin) and it displays as a thin, solid, black outline.

Next we give the table a caption, so that it can be displayed above the table when it's rendered on the page and then append the caption to the Table element:

    CaptionElement = xmlDoc.createElement("CAPTION")
    CaptionElement.text = "Sample Table"
    TableElement.appendChild(CaptionElement)

Next we need to create some rows and cells. A simple For loop will build 3 rows and as each row is created, we'll put in 10 cells, each containing a random number between 1 and 1000:

    For r = 1 To 3
       TRElement = xmlDoc.createElement("TR")
       For c = 1 To 10
          TDElement = xmlDoc.createElement("TD")
          TDElement.text = Random(999)+1
          TRElement.appendChild(TDElement)
       Next

Since a new row is built with each loop, it won't be the same data over and over. Then once all the cells have been appended to the row, we append the row to the Table element:

       TableElement.appendChild(TRElement)
    Next

Once the table is built, we append it to the Body element and then append the Body to the root:

    BodyElement.appendChild(TableElement)
    xmlDoc.documentElement.appendChild(BodyElement)

Cleanup is important with so many objects so:

    HeadElement    = 0
    TitleElement   = 0
    StyleElement   = 0
    BodyElement    = 0
    TableElement   = 0
    CaptionElement = 0
    TRElement      = 0
    TDElement      = 0

And then we save the file and display it to the user:

    xmlDoc.save(xfile)
    Run(xfile, "")

As you can see, xfile = "Q13.html" which doesn't have .xml extension. That's one of the nice things about using XML, the extension of the file is irrelevant. While XML is not a replacement for a database, there's nothing preventing you from using XML as a "mini-database" and creating and editing files like "California.Vendors" or "Corporate.Clients" or "Company.Customers" as you need them.

You'll also notice that the file doesn't contain anything like:

    <?xml Version="1.0"?>
    <?xml-stylesheet href="/style.css" type="text/css" title="default stylesheet"?>

That's because we didn't tell it to. You can if you want, but we're creating an HTML form so we didn't need to. If you feel you must, look into the XML DOM .createProcessingInstruction() method and you'll have to .insertBefore() the root element.

Q14.WBT


;   Question: How do I transform an XML file using XSL ?

#DefineFunction TransformViaXSL(xml, xslfile)
   xmlDoc = ObjectCreate("Msxml2.DOMDocument.3.0")
   xmlDoc.async = @FALSE
   xmlDoc.loadXML(xml)
   styleDoc = ObjectCreate("Msxml2.DOMDocument.3.0")
   styleDoc.async = @FALSE      
   styleDoc.load(xslfile)
   pe = styleDoc.parseError
   If pe.errorCode <> 0
      efile = StrCat(DirWindows(0), "wwwbatch.ini")
      oh = FileOpen(efile, "append")
      FileWrite(oh, "MSXML 3.0 Error ------------------------------------>")
      FileWrite(oh, StrCat("Error#: ", pe.errorCode))
      FileWrite(oh, StrCat("Line:   ", pe.line, " Column: ", pe.linepos))
      FileWrite(oh, StrCat("Source: ", pe.srcText))
      FileWrite(oh, StrCat("Reason: ", pe.reason))
      FileWrite(oh, StrCat("URL:    ", pe.url, @CRLF))
      FileClose(oh)
      Display(2, "Error", pe.reason)
      Run("notepad.exe", efile)
   EndIf
   pe = 0
   html = xmlDoc.transformNode(styleDoc)
   xmlDoc = 0
   styleDoc = 0
   Return(html)
#EndFunction

editor   = "notepad.exe"
tPath    = DirScript()
xfile    = StrCat(tPath, "Q11http.xml")
xslfile  = StrCat(tPath, "Q14.xsl")

xsl = ""
xsl = StrCat(xsl, `<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">`,@CRLF)
xsl = StrCat(xsl, `   <xsl:output method="html"/>`,@CRLF)
xsl = StrCat(xsl, `   <xsl:template match="/*">`,@CRLF)
xsl = StrCat(xsl, `      <table border="1" style="border-collapse: collapse; border: .25mm solid blue;">`,@CRLF)
xsl = StrCat(xsl, `      <caption>XSL Transformation Test</caption>`,@CRLF)
xsl = StrCat(xsl, `      <xsl:for-each select="Program_Info/File_Info/*">`,@CRLF)
xsl = StrCat(xsl, `         <tr><td><b><xsl:value-of select="name()"/></b></td><td><xsl:value-of select="."/></td></tr>`,@CRLF)
xsl = StrCat(xsl, `      </xsl:for-each>`,@CRLF)
xsl = StrCat(xsl, `      </table>`,@CRLF)
xsl = StrCat(xsl, `   </xsl:template>`,@CRLF)
xsl = StrCat(xsl, `</xsl:stylesheet>`,@CRLF)
FilePut(xslfile, xsl)

OptionList = "UseXMLDOMandMSIE|UseXMLStarlet"
label      = AskItemlist("Select Code Option", OptionList, "|", @UNSORTED, @SINGLE, @TRUE)
GoSub %label%

Exit

:UseXMLDOMandMSIE
   xmlDoc = ObjectCreate("Msxml2.DOMDocument.3.0")
   xmlDoc.async = @FALSE
   xmlDoc.load(xfile)
   html = TransformViaXSL(xmlDoc.xml, xslfile)
   url = "about:blank"
   msie = ObjectCreate("InternetExplorer.Application")
   msie.addressbar = @FALSE
   msie.statusbar = @FALSE
   msie.menubar = @FALSE
   msie.toolbar = @FALSE
   msie.visible = @TRUE
   msie.navigate(url)
   While msie.busy || msie.readystate <> 4
      TimeDelay(0.5)
   EndWhile
   msie.document.writeln(html)
   Message("Debug", "Done")
   msie.quit
Return

:UseXMLStarlet
   tPath    = DirScript()
   hfile = StrCat(tPath, "Q14.html")
   comspec = Environment("COMSPEC")
   RunHideWait(comspec, `/c xml tr "%xslfile%" "%xfile%"  > "%hfile%"`)
   Run(hfile, "")
Return

Q15.WBT

 Method Filter
 XML DOM  XMLStarlet


;   Question: How do I make my XML file "look pretty" ?

editor   = "notepad.exe"
tPath    = DirScript()
xml      = "<Root><FirstElement>1</FirstElement><SecondElement>2</SecondElement></Root>"

:XMLDOM
   XReader = ObjectCreate("Msxml2.SAXXMLReader.3.0")
   XWriter = ObjectCreate("Msxml2.MXXMLWriter.3.0")
   XWriter.byteOrderMark = @FALSE
   XWriter.omitXMLDeclaration = @TRUE
   XWriter.indent = @TRUE
   XReader.contentHandler = XWriter
   XReader.dtdHandler = XWriter
   XReader.PutProperty("http://xml.org/sax/properties/lexical-handler", XWriter)
   XReader.PutProperty("http://xml.org/sax/properties/declaration-handler", XWriter)
   XReader.Parse(xml)
   newxml  = XWriter.output
   XWriter = 0
   XReader = 0
   Message("Debug", newxml)
Return


;   Question: How do I make my XML file "look pretty" ?

editor   = "notepad.exe"
tPath    = DirScript()
xml      = "<Root><FirstElement>1</FirstElement><SecondElement>2</SecondElement></Root>"

:XMLStarlet
   tPath  = DirScript()
   xfileA = StrCat(tPath, "Q15A.xml")
   xfileB = StrCat(tPath, "Q15B.xml")
   FilePut(xfileA, xml)
   comspec = Environment("COMSPEC")
   RunHideWait(comspec, `/c xml fo -t -o "%xfileA%" > "%xfileB%"`)
   Run(editor, xfileB)
Return

Question: How do I transform an XML file using XSL ?

At some point in time, you'll run into XSL or XSLT files. It may be curiosity or a programming need to use them.

XSL files (I'll use XSL to cover both XSL and XSLT) are a bit complex and are generally used by people who already have a good understanding of XML's basic features. In some respects they're like a program for XML files but their behavior is a bit different. For instance: you can have variables in XSL files, but they're not updatable like a variable in a Winbatch script that can change each loop. This can be done but it requires some work.

Anyway, XSL files are XML files. If you examine one you'll notice it contains processing instructions (which unlike our HTML/XML file are mandatory) and it has a root element, child elements and so on. XSL files also have a strict order to them, in that certain elements must occur in certain places, and not just appear anywhere in the file.

The idea is not to intimidate you, but rather be aware that if you're working with an XSL file, there's more complexity involved. One of the nice things about Microsoft Office is that often when you export a document or data to XML, the program will automatically generate an XSL document along with the XML.

In most cases you'll see an XML file that contains a processing instruction like the following:

    <?xml Version='1.0'?>
    <?xml-stylesheet type="text/xsl" href="book.xsl"?>

UseXMLDOMandMSIE

This script will take the XML and XSL files, load them and transform the XML into HTML via the TransformViaXSL UDF.

Let's examine the UDF first. The function takes two arguments the XML (in text form) and the full path to the XSL file. I've done it this way, because in most circumstances you'll be working with the XML already and can pass the document .xml property to it. I've rarely had to work with XSL "text" so that's why it's done like this:

    #DefineFunction TransformViaXSL(xml, xslfile)
       xmlDoc = ObjectCreate("Msxml2.DOMDocument.3.0")
       xmlDoc.async = @FALSE
       xmlDoc.loadXML(xml)

After the XML is loaded, we then have to load the XSL file, which as you can see is done just like any other XML document. We give it a unique name so that we can tell them apart later:

       styleDoc = ObjectCreate("Msxml2.DOMDocument.3.0")
       styleDoc.async = @FALSE
       styleDoc.load(xslfile)

At this point we see the familiar .parseError routine. I keep this in my UDF because creating/editing XSL files is tricky. I do all of my XSL files by hand and it's time intensive. Since I use Winbatch to test for XSL errors, this is ideal for my own learning curve. I place any XSL parse errors into the Winbatch.ini file along with the data:

       pe = styleDoc.parseError
       If pe.errorCode <> 0
          efile = StrCat(DirWindows(0), "wwwbatch.ini")
          oh = FileOpen(efile, "append")
          FileWrite(oh, "MSXML 3.0 Error ------------------------------------>")
          FileWrite(oh, StrCat("Error#: ", pe.errorCode))
          FileWrite(oh, StrCat("Line:   ", pe.line, " Column: ", pe.linepos))
          FileWrite(oh, StrCat("Source: ", pe.srcText))
          FileWrite(oh, StrCat("Reason: ", pe.reason))
          FileWrite(oh, StrCat("URL:    ", pe.url, @CRLF))
          FileClose(oh)
          Display(2, "Error", pe.reason)
          Run("notepad.exe", efile)
       EndIf

If I need to trouble shoot I get a complete message during my session and can debug it further from there.

Just in case the parseError object is set to zero and then the transform takes place.

       pe = 0
       html = xmlDoc.transformNode(styleDoc)
       xmlDoc = 0
       styleDoc = 0
       Return(html)
    #EndFunction

What's nice is that the output or the HTML from the transform is captured into a variable, which means you can place it on the clipboard, or write it to a file if further debugging is required: your XSL file might not have errors, but may not display correctly, for instance.

In the actual script we open the document containing the XML as usual:

   xmlDoc = ObjectCreate("Msxml2.DOMDocument.3.0")
   xmlDoc.async = @FALSE
   xmlDoc.load(xfile)

We then pass the correct parameters to the UDF and get back the HTML:

   html = TransformViaXSL(xmlDoc.xml, xslfile)

Then we start MSIE. The URL of "about:blank" creates a blank page:

   url = "about:blank"
   msie = ObjectCreate("InternetExplorer.Application")
   msie.addressbar = @FALSE
   msie.statusbar = @FALSE
   msie.menubar = @FALSE
   msie.toolbar = @FALSE
   msie.visible = @TRUE
   msie.navigate(url)
   While msie.busy || msie.readystate <> 4
      TimeDelay(0.5)
   EndWhile

Then we write to the blank document using the document.writeln() method:

   msie.document.writeln(html)

The document displays the HTML, then we close MSIE:

   Message("Debug", "Done")
   msie.quit

Very handy for programmers.

UseXMLStarlet

In addition, XMLStarlet can do transforms too, all behind the scenes:

    :UseXMLStarlet
       tPath    = "J:\Temp Folders\CodingTemp\Winbatch BBS\"
       hfile = StrCat(tPath, "Q14.html")
       comspec = Environment("COMSPEC")
       RunHideWait(comspec, `/c xml tr "%xslfile%" "%xfile%"  > "%hfile%"`)
       Run(hfile, "")
    Return

Not bad for 5 lines of code. Of course the debugging features don't compare to what you can achieve with the XML DOM and Winbatch, but if you have everything working and want to keep clutter to a minimum, XMLStarlet is a viable option.

All right, since we've delved into XSL files and you know a bit more about them I'll give you a little bonus. Remember in Q8.wbt we were able to export an HTML table to a CSV file?

It's also possible using XMLStarlet. Grab the file called

CSV_Export_via_XMLStarlet.wbt.htm

and examine it. A relatively small script of 9 lines with a huge command line.

Run it. You should see the 2nd table exported as CSV.

Let's examine the command line for XMLStarlet:

xml sel -T -t -m "//table [2]" -m ".//caption" -v "." -n -m "//table [2]" -m ".//tr" -m "./td" 
  -i "not(position()=3)" -v "concat(.,',')" -b -i "position()=3" -v "concat(.,'
')"

What this does, is the equivalent of an XSL or XSLT file in MS XML. However instead of having to go through the trouble of creating an XSL file and then doing a transform, the command line does everything in one step. Remember XMLStarlet formats HTML documents into XHTML. In XHTML all tags are in lower case.

Command Line explanation:


sel means we want XMLStarlet to do a selection

-T means the output is text (html or xml are other options)

-t means we'll be supplying an XSL template (what follows)

-m "//table [2]" means "match the 2nd table tag in the document"

-m ".//caption" -v "." -n means "starting with the current context node (in
this case the 2nd table), match the caption then print the value of it
followed by a newline" (@crlf in Winbatch)

-m "//table [2]" means "match the 2nd table tag in the document"

-m ".//tr" means "starting with the current context node, match all the child
nodes that have tr tags" (or: only find the rows in the 2nd table)

-m "./td" means "starting with the current  context node (which is the current
row) match all the child nodes that have td tags" (again this stays within the
2nd table)

-i "not(position()=3)" -v "concat(.,',')" means "if the current td isn't the
3rd one in the collection, print the value of the cell and concatenate a comma
after it (-i is "if" and -v is "value of" and concat() is an XML built-in
function)"

-b means "break the nesting of the current loop" (which is looping thru the
cells in the current row)

-i "position()=3" -v "concat(.,'
')" means "if the current context is the
3rd cell of the row, then print the value of the cell and a LF character"
(
  is how a line  feed is  represented  in XML)  It's  similar  to  using
Num2Char(10) in Winbatch.

Then control is returned to the previous loop of matching table rows (tr) and starts over with the next row.

Probably confusing unless you've dealt with XML and XSL before, but it helps to illustrate the power of XMLStarlet.

In addition if you change the [2] to [3] and re-run the script you'll notice the exported table doesn't have the TH column headings in it. That's because in our command line template, we specifically told XMLStarlet to only grab the values inside the TD cells, and it would skip any TH cells it finds.

One last item: if you run the command line with "-C" after the "sel" XMLStarlet will output the XSLT template it generates internally. You can save it and use it at a later time to study.

Here it is:

<?xml Version="1.0"?>
<xsl:stylesheet Version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
         xmlns:exslt="http://exslt.org/common"
         xmlns:math="http://exslt.org/math"
         xmlns:date="http://exslt.org/dates-and-times"
         xmlns:func="http://exslt.org/functions"
         xmlns:set="http://exslt.org/sets"
         xmlns:str="http://exslt.org/strings"
         xmlns:dyn="http://exslt.org/dynamic"
         xmlns:saxon="http://icl.com/saxon"
         xmlns:xalanredirect="org.apache.xalan.xslt.extensions.Redirect"
         xmlns:xt="http://www.jclark.com/xt"
         xmlns:libxslt="http://xmlsoft.org/XSLT/namespace"
         xmlns:test="http://xmlsoft.org/XSLT/"
         extension-element-prefixes="exslt math date func set str dyn saxon xalanredirect xt libxslt test"
         exclude-result-prefixes="math str">

    <xsl:output omit-xml-declaration="yes" indent="no" method="text"/>

    <xsl:param name="inputFile">-</xsl:param>

    <xsl:template match="/">
      <xsl:Call-template name="t1"/>
    </xsl:template>

    <xsl:template name="t1">
      <xsl:For-each Select="//table [2]">
        <xsl:For-each Select=".//caption">
          <xsl:value-of Select="."/>
          <xsl:value-of Select="'&#10;'"/>
          <xsl:For-each Select="//table [2]">
            <xsl:For-each Select=".//tr">
              <xsl:For-each Select="./td">
                <xsl:If test="not(position()=3)">
                  <xsl:value-of Select="concat(.,',')"/>
                </xsl:If>
                <xsl:If test="position()=3">
                  <xsl:value-of Select="concat(.,'&#10;')"/>
                </xsl:If>
              </xsl:For-each>
            </xsl:For-each>
          </xsl:For-each>
        </xsl:For-each>
      </xsl:For-each>
    </xsl:template>
</xsl:stylesheet>

Sample script Q15.WBT

Question: How do I make my XML file "look pretty" ?

Finally, the culmination of our work with the web and XML is making your hard work readable. This may seem funny, but most of the work I do with XML involves staring at it and having the file in a readable format is essential. You'll be able to look at data and tell which nodes are parents to others and how they're related to other nodes as well, much easier.

XMLDOM

This is actually a bit of a misnomer. MS XML comes with a SAX XML Reader and an MX XML Writer, which actually don't use the DOM.

The MXHTMLWriter CoClass generates HTML output from Simple API for XML (SAX) events and handles many of the details of outputting content in HTML format. When connected to SAXXMLReader, and set as a ContentHandler, MXHTMLWriter accumulates content passed by events thrown by the reader.

In simpler terms this means SAX reads the document and the Writer outputs it mainly due to the .indent property. I'm still a babe in the woods when it comes to SAX (I simply haven't much use for it) but this particular function I use constantly.

The reason I call this XML DOM is because, I often pass the xmlDoc.xml to the UDF I have defined and then reload the nice, pretty output.

Creating the objects is straight-forward:

    :XMLDOM
       XReader = ObjectCreate("Msxml2.SAXXMLReader.3.0")
       XWriter = ObjectCreate("Msxml2.MXXMLWriter.3.0")
       XWriter.byteOrderMark = @FALSE

Next comes an important line of code:

       XWriter.omitXMLDeclaration = @TRUE

This tells the writer not to put the Next is the .indent property which does the actual formatting, indenting nodes in relation to the others:

       XWriter.indent = @TRUE

The Content Handler is the writer and it also handles the document type definition (dtd) handler, which checks to see that the document is constructed correctly. This is what I meant earlier about XSL files must have certain elements in certain places within the document. The dtd handler verifies that an XSL is constructed properly when it's used. The property values are set for the writer and these are mandatory:

       XReader.contentHandler = XWriter
       XReader.dtdHandler = XWriter
       XReader.PutProperty("http://xml.org/sax/properties/lexical-handler", XWriter)
       XReader.PutProperty("http://xml.org/sax/properties/declaration-handler", XWriter)

Finally the XML is parsed for errors and then the writer creates the formatted output:

       XReader.Parse(xml)
       newxml  = XWriter.output

Once done the XML will be easily readable. As mentioned, you can try it with the .omitXMLDeclaration = @True and see how it looks. You might want to leave it as @False, then place your own processing instructions on it afterwards.

One last point: once you have the newxml you have to load that into your document (which means you have to replace the existing XML):

    xmlDoc.loadXML(newxml)
    Message("Debug", xmlDoc.xml)

Then you can .save() it.

XMLSTARLET

XMLStarlet will also handle formatting an XML file. We simply need to setup a "before and after" files to show it.


    :XMLStarlet
       tPath  = "J:\Temp Folders\CodingTemp\Winbatch BBS\"
       xfileA = StrCat(tPath, "Q15A.xml")
       xfileB = StrCat(tPath, "Q15B.xml")
       FilePut(xfileA, xml)
       comspec = Environment("COMSPEC")
       RunHideWait(comspec, `/c xml fo -t -o "%xfileA%" > "%xfileB%"`)
       Run(editor, xfileB)
    Return

In this example the command line is short and sweet: xml fo -t -o

Where "fo" means XMLStarlet is to format the input file. And "-t" means "indent with tabs" and "-o" means "omit the XML declaration" just as we did with the SAX script.

Just remember since XMLStarlet deals only with files, you'll have to .load() the new file in order to work with it.

Using the Object Browser

Throughout the tutorial you saw methods and properties like .GetElementsByTagName() and .outerHTML and .selectSingleNode() and .getAttribute() and are probably wondering where we came up with all of this.

Almost every COM or OLE application has documentation built into it, if you know where to find it. Likewise inside each of the MS Office applications (Access, Word, Excel and even Outlook) is a VBA editor, which helps users write Visual Basic for Applications code and record macros.

Using the Object Browser isn't for the faint of heart. It's something that may take some working with to understand.

In all of the applications mentioned (except Access) you can get to the VBA editor by hitting ALT-F11 keystroke from the main window. Once inside the VBA editor hit the F2 key to start the Object Browser. You should see an explorer-like window with a pair of text boxes just under the menu.

Now to look at each of the items we dealt with, you'll need to do the following:

Go up to the TOOLS menu and click REFERENCES. A smaller dialog will appear with a long list of Object Libraries to work with. Since the VBA editor has no idea what you'll be working with, you have to tell it, by going down the list and checking the box next to the entries you need. Ours is a bit lengthy but it can handle it.

You want to find and check each of the following:

Microsoft ActiveX Data Object 2.5 Library (or 2.0, 2.6, 2.7 this is for ADODB Stream)

Microsoft Internet Controls (this is for MSIE)

Microsoft HTML Object Library (for the .document and .GetElementsByTagName, etc)

Microsoft WinHTTP Services version 5.0 or 5.1 (WinHTTP)

Microsoft XML v3.0 (XMLHTTP, .selectSingleNode -- all of XML)

XStandard - HTTP 4.1 (for XStandard)

Then click OK. The top edit box should have in it. Now you can go into that, and find each area and look at how it's laid out. You should notice that SHDOCVW = MSIE. SHell DOCument VieWer. Most of the others should be close.

In general, most items are broken into their area or collection, that they relate to. So for MSXML2 you'll see DOMDocument30, click on it. Once you do you'll see all the methods, properties and objects associated with the XML DOM version that we use in the scripts. For instance you'll see .createElement and .appendChild and all the others we used. You can investigate each one by clicking on those. Notice on the bottom of the frame it'll tell you more information. Some may even have "clickable" links which take you elsewhere in the browser. Now up at the top of the Object Browser, near the edit box is a button with a golden question mark ? on it. Hit that and it should load the related help file for that item. Not all Object Libraries do this, so you might find yourself lost.

In which case, you can go to the vendor's site and get information from them. So for instance, if you wanted to get information on .setAttribute, you'd use your browser, and go to http://www.microsoft.com and then in the search box, type in "MS XML Documentation" and find the link for it. It should have extensive documentation. Occasionally you may not find something, in which case I simply go to http://www.google.com and type ".setAttribute" into the search box and see what it brings back. What you want to look for, are VB examples of how the method, object or property is used. In most instances you can download sample programs and use them to assist you.

In any case, if you examine each of the scripts, anytime an ObjectCreate() is used, you are creating the top-most object, then as the script progresses you use more and more of it's component parts. So follow the path of the object from when it's created to the next use of it and so on, and you'll soon get the feel of how the Object Browser is laid out. The more scripts you view the better you'll get at it.

To leave the Object Browser, hit ALT-Q. If you're using MS Excel, I suggest you save the blank workbook with the Object References you set, so that you can go back and view it later, without having to repeat the above process.

Lastly, if you don't have MS Office, obviously you can't use the Object Browser they provide, however, you should be able to find one on the web.

XML Starlet Build XML

We've already seen how to build an XML document using the XML DOM, but may also build XML files via XMLStarlet. In essence it's the same, you still have to know how XML files are put together and know how to query for nodes via xpath but should you find the XML DOM isn't available, it's good to know that aside from building XML files via Strcat() you can have XMLStarlet do it for you.

We'll use the same premise for this exercise, building an HTML file via XML but will change a few things.

First thing we need to do is setup variables for our XML and HTML files:

xfile  = StrCat(DirScript(), "Test.xml")
tfile  = StrCat(DirScript(), "Test2.html")

Then we place the root element into the XML file:

FilePut(xfile, "<HTML/>")

All right, now we start using XMLStarlet. Each command to add a subnode to the HTML root element, must be issued separately. Since XMLStarlet uses the DOS command line it has the potential to become outrageously long. So what we'll do is similar to creating HTML using Strcat() but instead we'll place each command to edit our XML file on a separate line. DOS command lines ignore whitespace so instead of separating each with a @CRLF we'll use @TAB.

XMLStarlet will process each command as it comes to it. In the example below, we'll be creating new nodes then using them on the next line in some places.

In addition we'll create every type of "common" node used in XML: Subnodes and their text
Sub-Subnodes and their text
Attributes and their values

The result will be an HTML page, with a TITLE, an H2 Header, followed by a 10-row table, with a colored caption. The rows of the table will alternate text color for the middle cell. This should be enough to demonstrate the power and flexibility of XMLStarlet.

Here's the code:

:XMLStarlet
   comspec = Environment("COMSPEC")
   ecode = ""
   ecode = StrCat(ecode, `-s "/HTML"   -t elem -n "HEAD"    -v ""`, @TAB)
   ecode = StrCat(ecode, `-s "//HEAD"  -t elem -n "TITLE"   -v "XML/HTML by XMLStarlet"`, @TAB)
   ecode = StrCat(ecode, `-s "//HEAD"  -t elem -n "STYLE"   -v "table {border-collapse:collapse; border:.25mm solid blue; cursor:hand;} td {border:.25mm solid blue; padding: 2mm} caption {font-weight:bold; color: green}"`, @TAB)
   ecode = StrCat(ecode, `-s "/HTML"   -t elem -n "BODY"    -v ""`, @TAB)
   ecode = StrCat(ecode, `-s "//BODY"  -t elem -n "H2"      -v "Building XML/HTML Files via XMLStarlet"`, @TAB)
   ecode = StrCat(ecode, `-s "//BODY"  -t elem -n "TABLE"   -v ""`, @TAB)
   ecode = StrCat(ecode, `-i "//TABLE" -t attr -n "BORDER"  -v "1"`, @TAB)
   ecode = StrCat(ecode, `-s "//TABLE" -t elem -n "CAPTION" -v "XMLStarlet Table Test"`, @TAB)
   For x = 1 To 10
      If x mod 2 == 0
         color = "Blue"
      Else
         color = "Red"
      EndIf
      ecode = StrCat(ecode, `-s "//TABLE"                      -t elem -n "TR"     -v ""`, @TAB)
      ecode = StrCat(ecode, `-s "//TR[last()]"                 -t elem -n "TD"     -v "Row %x%"`, @TAB)
      ecode = StrCat(ecode, `-s "//TR[last()]"                 -t elem -n "TD"     -v ""`, @TAB)
      ecode = StrCat(ecode, `-s "//TR[last()]/TD[last()]"      -t elem -n "FONT"   -v "This Text is %color%"`, @TAB)
      ecode = StrCat(ecode, `-i "//TR[last()]/TD[last()]/FONT" -t attr -n "color"  -v "%color%"`, @TAB)
      ecode = StrCat(ecode, `-s "//TR[last()]"                 -t elem -n "TD"     -v "Cell 3"`, @TAB)
   Next
   RunHideWait(comspec, StrCat(`/c xml ed `, ecode, ` "%xfile%" | xml fo -o > "%tfile%"`))
   Run(tfile, "")
Return

Exit

As you can see, nothing special really. You've probably seen Winbatch scripts like this before. The code has been formatted with whitespace so it's easy to read.

The key to all of this is xpath. If you know xpath, and know how to select nodes and their attributes, you'll find working with XMLStarlet is a snap.

Let's examine the commands:

Since we used Fileput() to make the root element, the rest of the file will be built off it. The first line selects the root element, adds a subnode to it called HEAD, but leaves it empty:

ecode = StrCat(ecode, `-s "/HTML"   -t elem -n "HEAD"    -v ""`, @TAB)

-s is shorthand for "select", since we're only selecting one node it's the same as using XML DOM's .selectSingleNode() method. -t is shorthand for "type" and followed by "elem" to tell XMLStarlet we're adding an ELEMENT and -v is shorthand for "value", the .text property of the new element.

Using the XML DOM we would have done this:

HeadElement = xmlDoc.createElement("HEAD")
xmlDoc.appendChild(HeadElement)

Next, we want to add a TITLE element so that when the page displays in the browser the titlebar shows our title. The TITLE element is a sub-node of the HEAD element so we need to select the HEAD element:

ecode = StrCat(ecode, `-s "//HEAD"  -t elem -n "TITLE"   -v "XML/HTML by XMLStarlet"`, @TAB)

So XMLStarlet will select the HEAD element, create a subnode element called TITLE and place the text "XML/HTML by XMLStarlet" inside it.

The XML DOM code would have looked like this:

TitleElement = xmlDoc.createElement("TITLE")
TitleElement.text = "XML/HTML by XMLStarlet"
xmlDoc.selectSingleNode("/HTML/HEAD").appendChild(TitleElement)

Next we want to add a STYLE element to the HEAD. The style element will contain formatting instructions for the TABLE.

ecode = StrCat(ecode, `-s "//HEAD"  -t elem -n "STYLE"   -v "table {border-collapse:collapse; border:.25mm solid blue; cursor:hand;} td {border:.25mm solid blue; padding: 2mm} caption {font-weight:bold; color: green}"`, @TAB)

So once again we select the HEAD element, add a subnode element called STYLE and insert the CSS formatting instructions inside it. The XML DOM equivalent is:

StyleElement = xmlDoc.createElement("STYLE")
SytleElement.text = "table {border-collapse:collapse; border:.25mm solid blue; cursor:hand;} td {border:.25mm solid blue; padding: 2mm} caption {font-weight:bold; color: green}"`, @tab)
xmlDoc.selectSingleNode("/HTML/HEAD").appendChild(StyleElement)

Now we're finished with adding subnodes to the HEAD element. The next element in our HTML file is the BODY element and it's a child of the root element.

So we'll select the root element, insert the BODY element, but leave it blank since it will contain other nodes:

ecode = StrCat(ecode, `-s "/HTML"   -t elem -n "BODY"    -v ""`, @TAB)

Which is the equivalent of:

BodyElement = xmlDoc.createElement("BODY")
xmlDoc.documentElement.appendChild(BodyElement)

You might be wondering why the XMLStarlet commands use "//" before the tags and the XML DOM don't. They're actually the same thing for this particular file/application of XML. The double forward slash "//" is equivalent to the HTML DOM's .GetElementsByTagName(), meaning that the double forward slash will find ALL elements in the document that fit the tag. Since in an HTML file there's only 1 HEAD the effect is identical. Of course in creating an XML file, you could have as many HEAD tags as you wish. You just have to be wary of what you're doing. You could specify the xpath "/root/node" if you wish. Do what you feel comfortable with.

Next we want to add a H2 to the BODY. This will display the text inside it in large font. The H2 (or any heading tag) will also automatically put a line break after the text it displays, so we don't have to worry about the header and the table's caption getting smashed together when the HTML is rendered by the browser.

So, by now you should be familiar with the command sequence:

ecode = StrCat(ecode, `-s "//BODY"  -t elem -n "H2"      -v "Building XML/HTML Files via XMLStarlet"`, @TAB)

Which finds the BODY element and creates an H2 element and places the text inside it.

The corresponding code in XML DOM is:

Head2Element = xmlDoc.createElement("H2")
Head2Element.text = "Building XML/HTML Files via XMLStarlet"
xmlDoc.selectSingleNode("/HTML/BODY").appendChild(Head2Element)

Next we need to create a TABLE element. In addition we need to insert a a border attribute to the TABLE element and give it a value of 1. XMLStarlet can easily accomodate us:

   ecode = StrCat(ecode, `-s "//BODY"  -t elem -n "TABLE"   -v ""`, @TAB)
   ecode = StrCat(ecode, `-i "//TABLE" -t attr -n "BORDER"  -v "1"`, @TAB)

Which appends the subnode TABLE to the BODY, then queries the document for the TABLE node, -i is "insert", -t attr makes it an attribute and -v "1" places the value of 1 in it.

XML DOM would be:

TableElement = xmlDoc.createElement("TABLE")
TableElement.setAttribute("BORDER", "1")
xmlDoc.selectSingleNode("/HTML/BODY").appendChild(TableElement)

Next we need to place a CAPTION on the table:

   ecode = StrCat(ecode, `-s "//TABLE" -t elem -n "CAPTION" -v "XMLStarlet Table Test"`, @TAB)

The document is queried to find the TABLE node, a subnode element called CAPTION is created and given the value of "XMLStarlet Table Test".

The XML DOM would look like:

CaptionElement = xmlDoc.createElement("CAPTION")
CaptionElement.text = "XMLStarlet Table Test"
xmlDoc.selectSingleNode("/HTML/BODY/TABLE").appendChild(CaptionElement)

Now comes adding the rows and cells. This can be automated using a loop. So we set up a FOR/NEXT loop to handle how many rows we want in the table. So we're going to have to append a TR to the TABLE element 10 times. After each subnode is appended, we're also going to have to add subnodes to that, 3 since we want a TABLE with 10 rows and 3 cells in each row. We're going to further the complexity by adding a FONT tag into the middle cell of each row. It'll alternate colors and contain a short message that reflects the color scheme. Seem daunting? It isn't, since our knowledge of XML DOM and xpath will take care of most of it.

So after setting up the loop, we add an IF statement to set the color for the odd and even rows:

   For x = 1 To 10
      If x mod 2 == 0
         color = "Blue"
      Else
         color = "Red"
      EndIf

Once that's done we need to append the new row first. So we query the document, find the TABLE element, append the TR subnode and leave it empty:

ecode = StrCat(ecode, `-s "//TABLE"                      -t elem -n "TR"     -v ""`, @TAB)

Now to insert cells into the new row, we query the document for the last row tag, insert a subnode cell and place text in it.

ecode = StrCat(ecode, `-s "//TR[last()]"                 -t elem -n "TD"     -v "Row %x%"`, @TAB)

If you're not familiar with XML DOM the "//TR[last()]" might throw you. Remember in the HTML DOM how you could tell how many items were in a collection by looking at it's .length property? This is similar. Since XML files can be complex, a series of functions were built into the XML DOM so you can find the "first()" or "last()" or the "position()" of an item in a collection. Even better is that you can use these in xpath expressions. Since the plan to build our table is simple, it's following a plan that's easy to duplicate using xpath.

So "//TR[last()]" simply tells XMLStarlet to query the document for the last TR tag. In the XML DOM we would have done it almost exactly the same:

TableCell = xmlDoc.createElement("TD")
TableCell.text = "Row %x%"
xmldoc.selectSingleNode("//TR[last()]").appendChild(TableCell)

Next we need to add the middle cell. This is accomplished with the same command, the only difference being we leave the contents of the cell empty for the time being:

ecode = StrCat(ecode, `-s "//TR[last()]"                 -t elem -n "TD"     -v ""`, @TAB)

Now inside that last cell we need to add a subnode FONT element. So this time, we need the last cell in the last row:

ecode = StrCat(ecode, `-s "//TR[last()]/TD[last()]"      -t elem -n "FONT"   -v "This Text is %color%"`, @TAB)

Then we need to insert an attribute to the FONT tag so that it colors it's text correctly. So we tell XMLStarlet to query the document for the FONT tag inside the last cell of the last row:

ecode = StrCat(ecode, `-i "//TR[last()]/TD[last()]/FONT" -t attr -n "color"  -v "%color%"`, @TAB)

And it does, inserting the attribute "color" and giving it the value of the color variable.

Lastly, we add the 3rd and last cell to the row and then the loop is repeated:


      ecode = StrCat(ecode, `-s "//TR[last()]"                 -t elem -n "TD"     -v "Cell 3"`, @TAB)
   Next

If you notice, that for the last cell we didn't specify the "TR[last()]/TD[last()]" but instead simply specified the last row. The reason we can do this is that when you use the -s option XMLStarlet finds the last row as directed, and then appends the new subnode element onto the end of it.

Unfortunately you can't do this with the TD cells. If you remove the "TD[last()]" from the command that inserts the FONT tag, you'll find your middle cells aren't colored. The good thing is you can experiment to make it right. Your xpath knowledge is the key. It took me about 10-15 minutes to write the above code, thanks largely to a good working knowledge of xpath.

Now one last thing, let's examine the command line. We have to use Strcat() for the ecode variable, otherwise the Winbatch will complain our substitution length is greater than 2048 characters, which is fine. Unlike the file names we don't place double quotes around the ecode variable, just leave it as-is since it contains double-quotes.

The first half of the command line carries out the editing commands as seen by the "ed" option:

RunHideWait(comspec, StrCat(`/c xml ed `, ecode, ` "%xfile%" | xml fo -o > "%tfile%"`))

The pipe character "|" is used to send the output of the editing session to another instance of XMLStarlet which formats the output "fo" and the "-o" switch "omits" the XML declaration since our HTML doesn't need it. Finally, the formatted output is redirected to the variable "tfile" which is our HTML file.

A bit complex for a first-timer but once you understand XMLStarlet's syntax and couple that with a good working knowledge of xpath, you can really do a lot with XMLStarlet.

Article ID:   W16783

File Created: 2011:07:15:09:19:36

Last Updated: 2011:07:15:09:19:36

Database Search

Working With Web Pages Tutorial

Introduction

THE MICROSOFT COMPONENTS

WINHTTP

XMLHTTP

MSIE

NOTE ABOUT THE SCRIPTS

Question: How do I retrieve the HTML of a web page?

Method Filter

WININET

WINSOCK

MSIE

WINHTTP

XMLHTTP

XMLSTARLET

XSTANDARD

Question: How do I read the text of an HTML page?

Method Filter

WININET

WINSOCK

MSIE

WINHTTP

XMLHTTP

XSTANDARD

Question: How do I find the correct control on an HTML page?

FIND TABLES

FIND INPUTS

FIND CHECK BOXES

FIND LINKS

FIND TABLE CELLS

Question: How do I access controls/documents within a FRAME element?

GET FRAME DATA

SET FRAME DATA

Question: How do I navigate between several pages after clicking a link? How can I do it without getting lost or out of synch?

Question: How do I download a file from a web page?

Method Filter

WININET

WINSOCK

WINHTTP

XMLHTTP

XSTANDARD

Question: How do I export an HTML table to a CSV file ?

Question: How can I change a page that has bad or unreadable colors?

Question: How do I attach to an already running instance of MSIE ?

Question: How do I work with values in an XML file?

Method Filter

XMLSTARLET

XSTANDARD

Question: How do I work with values from XML text?

Method Filter

Question: How do I create an XML file ?

ROOT ELEMENT

Method Filter

Question: How do I transform an XML file using XSL ?

UseXMLDOMandMSIE

UseXMLStarlet

Question: How do I make my XML file "look pretty" ?

XMLDOM

XMLSTARLET

Using the Object Browser

XML Starlet Build XML

Question: How do I navigate between several pages after clicking a link?
How can I do it without getting lost or out of synch?