close

Automation

AutomationDevOpsPowershellSQL

SQL Scripts with Powershell

PS-SQL

Needing to run SQL Scripts can be a day to day task for System Administration and DevOps. If you have dozens of SQL Scripts to run this can quickly become a tedious job. Automating this with a Powershell script is an obvious solution. But running SQL Scripts in Powershell is not as easy as it would seem. There’s 3 options available:

read more
.NETAutomationDevOpsPowershell

Powershell One Liner – Easy Directory Zipping

no thumb

Powershell One Liner – Easy Directory Zipping

Powershell One Liner - Easy Directory ZippingFor me one of most common tasks in Powershell is zipping or extracting files. Whether you’re backing up a few config files before pushing an updated version, or unzipping gigs of log files, dealing with zip files is an ever present task. In the early days of Powershell what I and many others used was the Shell.Application COM Object. For very basic functions, it worked fine.

read more
.NETAutomationDevOpsPowershell

Octopus Deploy 2.0 Review

KRAKEN-sea-monster

Octopus Deploy 2.0 Review

One of the tools that I got involved with this year was Octopus Deploy. Initially, I had my doubts about the product, but I quickly developed a respect for it. A few reasons I like Octopus, it simplifies the whole code deployment process, you can have migrations running in less than an hour. It also has built in Powershell support, so you can add custom functionality to do just about anything else using Powershell. Since it relies on a Nuget Repository to version Releases it makes rolling back code really easy and accurate.

read more
.NETAutomationPowershell

Modern Screen Scraping With HtmlAgilityPack

no thumb

Modern Screen Scraping With HtmlAgilityPack

Modern Screen Scraping With HtmlAgilityPack

In the early days of the Internet, before web services were as common as Starbucks, one of the few ways to pull data from other systems was through screen scraping web pages. I helped with at least one of these apps to scrape stock prices from Yahoo. To say it was clunky and hard to maintain is an understatement. Now days every system exposes an API through a web service, or even data brokers like Biztalk, and we’re able to orchestrate beautiful data flows….

But what happens when we don’t have any clean API to extract data from a remote system. Sometimes the best route is the most direct route, it may not pretty but screen scraping still gets the job done. And as Alec said, “Always Be Closing”. Luckily, there’s a lot better options now to accomplish HTML screen scraping.

Today there’s a great framework available on Codeplex called HTML Agility Pack. You can also find updates on their Twitter Feed. This .NET library offers a simple method for parsing and even modifying HTML files. But lets focus on how we can use it to extract data from a web page. You can also use this library to parse and then update the HTML files.

Lets assume we have a page with the following HTML.

<TABLE ALIGN=center border=1 bgcolor=lightblue width=80%>
<tr><th>ATTRIBUTE</th><th>VALUE</th><th>ATTRIBUTE DESCRIPTION</th></tr>
<tr><td>Name</td><td>HR System</td><td>Name of Application</td></tr>
<tr> <td>Version</td><td>4.2.101</td><td>Code Version</td> </tr>
<tr> <td>HOST</td><td>ATL0WAPP001</a></td><td>Name of machine</td> </tr>
</TABLE>

So lets grab the latest Nuget package, you can search for it  in Visual Studio, or find it here HTMLAgilityPack 1.4.6. Some VB.NET code that will parse the table cell data using HTML Agility Pack would look like this:

Using client As New Net.WebClient

    Dim filename As String = IO.Path.GetTempFileName

    client.Credentials = CredentialCache.DefaultNetworkCredentials
    client.DownloadFile(_URL, filename)
    
    Dim doc = New HtmlAgilityPack.HtmlDocument

    doc.Load(filename)

    Dim root As HtmlAgilityPack.HtmlNode = doc.DocumentNode

    Dim nodes As List(Of HtmlAgilityPack.HtmlNode) = root.Descendants("tr").ToList

    For Each node In nodes
        Dim tdlist As List(Of HtmlAgilityPack.HtmlNode)

        tdlist = node.Descendants("td").ToList

        Console.WriteLine(tdlist(0).InnerText & ": " & tdlist(1).InnerText)
    Next
End Using

Or if you’d rather go the Powershell route, you can write something like this:

$HAPDllPath = "C:\Users\...\packages\HtmlAgilityPack.1.4.6\lib\Net45\HtmlAgilityPack.dll"

$a = [Reflection.Assembly]::LoadFile($HAPDllPath)

$source = "http:/localhost/mytable.html"
$destination = "c:\temp\myfile.htm"

$client = New-Object System.Net.WebClient
$client.Credentials = [System.Net.CredentialCache]::DefaultNetworkCredentials 
$client.DownloadFile($source, $destination)

$doc = New-Object HtmlAgilityPack.HtmlDocument
$root = New-Object HtmlAgilityPack.HtmlNode

$doc.Load($destination)

$root = $doc.DocumentNode

$rows = $root.Descendants("tr")

foreach ($row in $rows){
    $cells = $row.Descendants("td")
	
	$cell0 = $cells[0].InnerText
	$cell1 = $cells[1].InnerText
		
	Write-Host "$cell0 - $cell1"

}

This is a very robust parsing engine and I’ve only scratched the surface, there’s several ways to interact with HTML content using the HTML Agility Pack. Its definitely a great addition to anyone’s tool box.

read more
1 2 3
Page 2 of 3