Simple Text-Only WebBrowser

This is a continuation of my previous article:
Add Undo/Redo or Back/Forward functionality to your application

In the last article we constructed a class that would help us add Undo/Redo or Back/Forward functionality to our application. We read about the capabilities of the class, but a working example was missing.
In this article we will see how to create a text-only web browser for ourselves. These kinds of browsers are useful in various situations. However I won’t go into details about that at present.
The link to source code as well as executable file is given at the end of this article.
So, first we will create the basics of the web-browser, and then later on we will see how to add the Back/Forward capabilities to it.

Creating the Basic Text-Only WebBrowser Layout

1. Create a new VB.NET Windows project. A form named Form1 will be added by default. Rename it to TextOnlyWebBrowserForm.
2. Add a ToolStrip control, a TextBox and a StatusStrip control to it.
3. Rearrange the controls so that the form finally looks like as in the screenshot below. I hope this doesn’t need much explanation. I’ve just highlighted the important properties to be set.

image

4. To make the buttons look pretty I added images to them. I downloaded the button images free from www.freedesign4.me (all credits to the creators). You may use different images if you want to.

Adding the Basic Browsing Functions to our TextOnly WebBrowser

OK. So now we are ready to start putting some code and see our browser in action. We will start adding the basic functions to our WebBrowser and later enhance it by implementing the Back/Forward functionality. We will use the System.Net.WebClient class for this demo, to keep things simple.

1. Open the form’s code window and put the following code:

Option Strict On

Imports System.Net

Public Class TextOnlyWebBrowserForm1

    Dim WithEvents MyWebClient As New WebClient

    Private Sub TextOnlyWebBrowserForm_Load(sender As System.Object, e As System.EventArgs) Handles Me.Load
        EnableOrDisableButtons()
        StatusLabel.Text = "Ready"
    End Sub

    Private Sub GoButton_Click(sender As System.Object, e As System.EventArgs) Handles GoButton.Click
        Dim url As String = UrlTextBox.Text
        If Not Uri.IsWellFormedUriString(url, UriKind.Absolute) AndAlso Not url.Contains("://") Then url = "http://" & url
        If Uri.IsWellFormedUriString(url, UriKind.Absolute) Then
            Navigate(New Uri(url))
        Else
            MessageBox.Show("Invalid Address!", "Error", MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
        End If
    End Sub

    Private Sub StopButton_Click(sender As System.Object, e As System.EventArgs) Handles StopButton.Click
        MyWebClient.CancelAsync()
    End Sub

    Private Sub UrlTextBox_KeyPress(sender As Object, e As System.Windows.Forms.KeyPressEventArgs) Handles UrlTextBox.KeyPress
        If e.KeyChar = Chr(13) Then GoButton.PerformClick() 'enter key
    End Sub

    Private Sub UrlTextBox_TextChanged(sender As Object, e As System.EventArgs) Handles UrlTextBox.TextChanged
        GoButton.Enabled = UrlTextBox.Text.Length > 0
    End Sub

    Private Sub MyWebClient_DownloadProgressChanged(sender As Object, e As DownloadProgressChangedEventArgs) Handles MyWebClient.DownloadProgressChanged
        StatusLabel.Text = e.BytesReceived \ 1024 & "KB of data recieved... "
    End Sub

    Private Sub MyWebClient_DownloadStringCompleted(sender As Object, e As DownloadStringCompletedEventArgs) Handles MyWebClient.DownloadStringCompleted
        If e.Error IsNot Nothing Then
            WebDocumentTextBox.Text = e.Error.Message
        Else
            WebDocumentTextBox.Text = e.Result
            WebDocumentTextBox.SelectionLength = 0
        End If
        WebDocumentTextBox.Select()
        EnableOrDisableButtons()
        StatusLabel.Text = "Ready"
    End Sub

    Private Sub Navigate(uri As Uri)
        If MyWebClient.IsBusy Then MyWebClient.CancelAsync()
        While MyWebClient.IsBusy
            Threading.Thread.Sleep(1000)
        End While
        WebDocumentTextBox.Text = ""
        UrlTextBox.Text = uri.AbsoluteUri
        MyWebClient.DownloadStringAsync(uri)
        StatusLabel.Text = "Looking up " & uri.AbsoluteUri
        EnableOrDisableButtons()
    End Sub

    Private Sub EnableOrDisableButtons()
        GoButton.Enabled = UrlTextBox.Text.Length > 0
        StopButton.Enabled = MyWebClient.IsBusy
    End Sub
End Class

2. Run the code. Type some URL in the UrlTextBox and hit enter key or press the Go button.

3. If it gets the requested web page, we are successful. The WebDocumentTextBox shows raw html at present. This is OK for now. We will update our code later in this session to show plain human-readable text instead of raw html.

Converting Raw HTML to Plain Text

We will now use Regular Expressions to strip off the html tags and show the web page as plain text. Regular Expressions are ideal for such scenarios because they eat up the heavy logic involved for string search/replace operations.

1. Add a new class to your project and name it HtmlToPlainTextConverter.

2. Add the following code:

Option Explicit On

Imports System.Text.RegularExpressions

Public Class HtmlToPlainTextConverter
    Public Shared Function Convert(ByVal html As String) As String
        '' first remove the unwanted white-spaces
        html = Regex.Replace(html, "\s+", " ", RegexOptions.IgnoreCase)
        html = Regex.Replace(html, ">\s+<", "><", RegexOptions.IgnoreCase)

        '' remove anything inside the <head> tags
        html = Regex.Replace(html, "<head\b[^>]*>(.*?)</head>", "", RegexOptions.IgnoreCase)

        '' remove all <style> tags
        html = Regex.Replace(html, "<style\b[^>]*>(.*?)</style>", "", RegexOptions.IgnoreCase)

        '' remove all <script> tags
        html = Regex.Replace(html, "<script\b[^>]*>(.*?)</script>", "", RegexOptions.IgnoreCase)

        '' double line breaks - <p>, <div>
        html = Regex.Replace(html, "<div\b[^>]*>(.*?)</div>", "$1" & vbCr & vbCr, RegexOptions.IgnoreCase)
        html = Regex.Replace(html, "<p\b[^>]*>(.*?)</p>", "$1" & vbCr & vbCr, RegexOptions.IgnoreCase)

        '' <br> tags
        html = Regex.Replace(html, "<br\b[^>]*>", vbCr, RegexOptions.IgnoreCase)

        '' table formatting
        html = Regex.Replace(html, "<table\b[^>]*>(.*?)</table>", "$1" & vbCr, RegexOptions.IgnoreCase)
        html = Regex.Replace(html, "<tr\b[^>]*>(.*?)</tr>", "$1" & vbCr & vbCr, RegexOptions.IgnoreCase)
        html = Regex.Replace(html, "<td\b[^>]*>(.*?)</td>", "$1" & vbTab, RegexOptions.IgnoreCase)

        '' <ul> and <li>
        html = Regex.Replace(html, "<ul\b[^>]*>(.*?)</ul>", "$1" & vbCr, RegexOptions.IgnoreCase)
        html = Regex.Replace(html, "<li\b[^>]*>(.*?)</li>", " * $1" & vbCr, RegexOptions.IgnoreCase)

        '' any other tag
        html = Regex.Replace(html, "<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>", " $2 ", RegexOptions.IgnoreCase)

        '' finally anything that looks like a tag <...>
        html = Regex.Replace(html, "<.+?>", " ", RegexOptions.IgnoreCase)

        '' reduce too much of unwanted line breaks
        html = Regex.Replace(html, "\r\r+", vbCr & vbCr, RegexOptions.IgnoreCase)

        '' replace frequently used html special characters
        Dim htmlChars() As String = Split("&Aacute; &aacute; &Acirc; &acirc; &acute; &AElig; &aelig; &Agrave; &agrave; &amp; &Aring; &aring; &Atilde; &atilde; &Auml; &auml; &brvbar; &Ccedil; &ccedil; &cedil; &cent; &copy; &curren; &deg; &divide; &Eacute; &eacute; &Ecirc; &ecirc; &Egrave; &egrave; &ETH; &eth; &Euml; &euml; &frac12; &frac14; &frac34; &gt; &Iacute; &iacute; &Icirc; &icirc; &iexcl; &Igrave; &igrave; &iquest; &Iuml; &iuml; &laquo; &lt; &macr; &micro; &middot; &nbsp; &not; &Ntilde; &ntilde; &Oacute; &oacute; &Ocirc; &ocirc; &Ograve; &ograve; &ordf; &ordm; &Oslash; &oslash; &Otilde; &otilde; &Ouml; &ouml; &para; &plusmn; &pound; &quot; &raquo; &reg; &sect; &shy; &sup1; &sup2; &sup3; &szlig; &THORN; &thorn; &times; &Uacute; &uacute; &Ucirc; &ucirc; &Ugrave; &ugrave; &uml; &Uuml; &uuml; &Yacute; &yacute; &yen; &yuml;")
        Dim plainChars() As String = Split("Á á Â â ´ Æ æ À à & Å å Ã ã Ä ä ¦ Ç ç ¸ ¢ © ¤ ° ÷ É é Ê ê È è Ð ð Ë ë ½ ¼ ¾ > Í í Î î ¡ Ì ì ¿ Ï ï « < ¯ µ •  ¬ Ñ ñ Ó ó Ô ô Ò ò ª º Ø ø Õ õ Ö ö ¶ ± £ "" » ® § ¬ ¹ ² ³ ß Þ þ × Ú ú Û û Ù ù ¨ Ü ü Ý ý ¥ ÿ")
        For i = 0 To htmlChars.Length - 1
            html = html.Replace(htmlChars(i), plainChars(i))
        Next i

        '' VbCr with vbCrlf
        html = html.Replace(vbCr, vbCrLf)
        Return html
    End Function
End Class
This class is neither complete nor perfect. But it is sufficient for our needs for this demo. I won’t go into much detail about how this class works and converts html into plain text, since this is out of the scope of this article. But feel free to ask anything you wish to know in the comments and I would be glad to answer.

3. Next, we modify our code to use this class instead of outputting raw html. It’s just a one line change in our MyWebClient DownloadStringCompleted event handler.
 
    Private Sub MyWebClient_DownloadStringCompleted(sender As Object, e As DownloadStringCompletedEventArgs) Handles MyWebClient.DownloadStringCompleted
        If e.Error IsNot Nothing Then
            WebDocumentTextBox.Text = e.Error.Message
        Else
            WebDocumentTextBox.Text = HtmlToPlainTextConverter.Convert(e.Result)
        End If
        WebDocumentTextBox.Select()
        EnableOrDisableButtons()
        StatusLabel.Text = "Ready"
    End Sub

Run the code. See what happens. It should now show plain text instead of raw html.

Adding Back/Forward Functionality to Our WebBrowser


So we are complete with our basic WebBrowser. It is capable of navigating to URLs we tell it to. Now let’s see what it would take to add the Back/Forward/Refresh functionality to our WebBrowser.

1. We begin by adding our UndoRedoClass to the application. Copy the code from the previous part of the article.

2. Then we add a class to our application that stores the URLs we have navigated. These URLs will be used to go back and forth when the user clicks the Back or Forward button. We create a class called UrlItem that has only one property named Uri.
 
Class UrlItem
    Private _Uri As Uri
    Public Property Uri() As Uri
        Get
            Return _Uri
        End Get
        Set(ByVal value As Uri)
            _Uri = value
        End Set
    End Property

    Public Sub New(url As String)
        Me.Uri = New Uri(url)
    End Sub
End Class

3. Next, we modify our form code to add support for our UndoRedoClass object. Add the following code:

    '' This declaration goes in the form declaration section
    Dim WithEvents UndoRedoHandler As New UndoRedoClass(Of UrlItem)

    Private Sub BackButton_Click(sender As System.Object, e As System.EventArgs) Handles BackButton.Click
        UndoRedoHandler.Undo()
    End Sub

    Private Sub ForwardButton_Click(sender As System.Object, e As System.EventArgs) Handles ForwardButton.Click
        UndoRedoHandler.Redo()
    End Sub

    Private Sub RefreshButton_Click(sender As System.Object, e As System.EventArgs) Handles RefreshButton.Click
        Navigate(UndoRedoHandler.CurrentItem.Uri)
    End Sub

    Private Sub UndoRedoHandler_RedoHappened(sender As Object, e As UndoRedoEventArgs) Handles UndoRedoHandler.RedoHappened
        Dim item As UrlItem = CType(e.CurrentItem, UrlItem)
        Navigate(item.Uri)
    End Sub

    Private Sub UndoRedoHandler_UndoHappened(sender As Object, e As UndoRedoEventArgs) Handles UndoRedoHandler.UndoHappened
        Dim item As UrlItem = CType(e.CurrentItem, UrlItem)
        Navigate(item.Uri)
    End Sub

We also modify the following existing procedures to fit our UndoRedoHandler object:

    Private Sub GoButton_Click(sender As System.Object, e As System.EventArgs) Handles GoButton.Click
        Dim url As String = UrlTextBox.Text
        If Not Uri.IsWellFormedUriString(url, UriKind.Absolute) AndAlso Not url.Contains("://") Then url = "http://" & url
        If Uri.IsWellFormedUriString(url, UriKind.Absolute) Then
            UndoRedoHandler.AddItem(New UrlItem(url))
            Navigate(UndoRedoHandler.CurrentItem.Uri)
        Else
            MessageBox.Show("Invalid Address!", "Error", MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
        End If
    End Sub

    Private Sub EnableOrDisableButtons()
        BackButton.Enabled = UndoRedoHandler.CanUndo
        ForwardButton.Enabled = UndoRedoHandler.CanRedo
        RefreshButton.Enabled = UndoRedoHandler.CurrentItem IsNot Nothing
        GoButton.Enabled = UrlTextBox.Text.Length > 0
        StopButton.Enabled = MyWebClient.IsBusy
    End Sub

Note: We named our class object as UndoRedoHandler which might give an illusion that it does some undo/redo operation. I did this just to keep the name meaningful according to our class & class members and avoid confusion. In reality it performs a back/forward function. Read undo synonymous to back and redo to forward.

By now you might be wondering why we took the extra trouble of creating the new class (of UrlItem) when we could have simply declared our UndoRedoHandler as New UndoRedoClass(Of Uri) or as New UndoRedoClass(Of String). Of course we could have done that, and that would have worked fine too. But there’s always a reason why things are done the way they are done. Don’t worry you’ll come to know everything as we progress further. For now, just run the code and see if the Back/Forward and Refresh buttons have started functioning as desired.

Adding More Features with the Help of Our UndoRedoClass

So now we have achieved our task of creating our TextOnly WebBrowser. It navigates properly to the URL we type in the address box; and it goes back and forward when clicking the Back/Forward buttons. So far everything looks good! Now let’s think of optimizing our WebBrowser.

Notice that when we move Back/Forward, a fresh web page request is sent each time. We have already browsed that page a few minutes ago. So we should not need to get a fresh page from the internet each time we move Back. So let’s see what it would take to implement document caching into our application.

1. Add a property to the UrlItem class named CachedDocument. In this property we will store our cached document.

So after the change our class will look like this:

Class UrlItem
    Private _Uri As Uri
    Public Property Uri() As Uri
        Get
            Return _Uri
        End Get
        Set(ByVal value As Uri)
            _Uri = value
        End Set
    End Property

    Private _CachedDocument As String
    Public Property CachedDocument() As String
        Get
            Return _CachedDocument
        End Get
        Set(ByVal value As String)
            _CachedDocument = value
        End Set
    End Property

    Public Sub New(url As String)
        Me.Uri = New Uri(url)
    End Sub
End Class

2. We modify our MyWebClient DownloadStringCompleted event handler to fill this cache whenever the document is available. We avoid saving the document when navigation is cancelled, since we might have incomplete document.

    Private Sub MyWebClient_DownloadStringCompleted(sender As Object, e As DownloadStringCompletedEventArgs) Handles MyWebClient.DownloadStringCompleted
        If e.Error IsNot Nothing Then
            WebDocumentTextBox.Text = e.Error.Message
        Else
            WebDocumentTextBox.Text = HtmlToPlainTextConverter.Convert(e.Result)
            If Not e.Cancelled Then
                UndoRedoHandler.CurrentItem.CachedDocument = WebDocumentTextBox.Text
            End If
        End If
        WebDocumentTextBox.Select()
        EnableOrDisableButtons()
        StatusLabel.Text = "Ready"
    End Sub

3. Next, we change our Navigate method to use this cached document. We add an optional parameter to it so that we can force loading a fresh document when we want to.

    Private Sub Navigate(uri As Uri, Optional allowCache As Boolean = False)
        If MyWebClient.IsBusy Then MyWebClient.CancelAsync()
        While MyWebClient.IsBusy
            Threading.Thread.Sleep(1000)
        End While
        WebDocumentTextBox.Text = ""
        UrlTextBox.Text = uri.AbsoluteUri

        If allowCache AndAlso UndoRedoHandler.CurrentItem.CachedDocument <> "" Then
            WebDocumentTextBox.Text = UndoRedoHandler.CurrentItem.CachedDocument
            StatusLabel.Text = "Ready"
        Else
            MyWebClient.DownloadStringAsync(uri)
            StatusLabel.Text = "Looking up " & uri.AbsoluteUri
        End If
        EnableOrDisableButtons()
    End Sub

4. Finally we start using this new Navigate method, to support caching wherever possible. – i.e. on Back and Forward buttons. All we need to do is pass True for this optional parameter. If we pass False, or don’t pass anything it will continue to behave the way it was behaving till now.

    Private Sub UndoRedoHandler_RedoHappened(sender As Object, e As UndoRedoEventArgs) Handles UndoRedoHandler.RedoHappened
        Dim item As UrlItem = CType(e.CurrentItem, UrlItem)
        Navigate(item.Uri, True)
    End Sub

    Private Sub UndoRedoHandler_UndoHappened(sender As Object, e As UndoRedoEventArgs) Handles UndoRedoHandler.UndoHappened
        Dim item As UrlItem = CType(e.CurrentItem, UrlItem)
        Navigate(item.Uri, True)
    End Sub

Actually we don’t need two separate procedures since both have the same body and function definition. We could combine them into one procedure and attach both the event handlers to it. Do that if you want to.

5. Finally our application is ready with cache support. Run the application and see how it goes.

Click the Back/Forward/Refresh buttons to see how they go. Back/Forward buttons should load the document immediately, while the Refresh button gets a fresh copy of the document from the website. So far everything looks excellent! But now think logically. There’s a big pitfall to this approach. Documents can be large, we don’t have any control over its size. So if we keep a lot many of these documents in memory, we will consume a lot of memory unnecessarily and start seeing performance problems with our application on long hours usage.

So what do we do now?

We will use the same technique used by other WebBrowsers. We will save to disk file and load it whenever we need it. This way the documents will not be loaded in memory forever. So now we will modify the CachedDocument property to store the FileName instead of the file contents. And we write the file to disk.

    Private _CachedDocument As String
    Public Property CachedDocument() As String
        Get
            If IO.File.Exists(_CachedDocument) Then
                Return IO.File.ReadAllText(_CachedDocument)
            Else
                Return ""
            End If
        End Get
        Set(ByVal value As String)
            Dim fileName As String = String.Format("~{0:yyMMddhhmmssffff}.tmp", Now)
            fileName = IO.Path.Combine(Application.UserAppDataPath, "Cache", fileName)
            IO.File.WriteAllText(fileName, value)
            _CachedDocument = fileName
        End Set
    End Property

That’s all we need to do for this. It will now use disk file instead of memory for storing document.

This is a very simple cache technique where we never reuse old files. As such, old cached files are a waste for us. So we clear our cache directory on our application startup, to avoid unnecessary files in the cache.

    Private Sub TextOnlyWebBrowserForm_Load(sender As System.Object, e As System.EventArgs) Handles Me.Load
        EnableOrDisableButtons()
        StatusLabel.Text = "Ready"

        '' Clean old cached files
        Dim cacheDir As String = IO.Path.Combine(Application.UserAppDataPath, "Cache")
        If Not IO.Directory.Exists(cacheDir) Then IO.Directory.CreateDirectory(cacheDir)
        Dim cutOffDate As Date = Now.AddDays(-1)
        Try
            For Each file In IO.Directory.GetFiles(cacheDir, "~*.tmp")
                If IO.File.GetLastAccessTime(file) < cutOffDate Then IO.File.Delete(file)
            Next
        Catch ex As Exception
        End Try
    End Sub

If you notice the code above, you will find that I’m not deleting all files from the cache. This is because there is a possibility that other instances of our application might be using some of those files. A gap of one day seems like a safe period. Even if we delete some files that are in use by other instances of our application, the maximum loss we have is that that instance will have to bring fresh page from the website when it doesn’t find file in the cache. So it is harmless anyways.


We now have a WebBrowser that has Back/Forward/Refresh functions and supports cache. And that brings us to the end of this tutorial. I hope you enjoyed reading it as much as I enjoyed bringing it to you.

Enjoy!

Source Code and Executable File

Link to source code and executable file is given below.

 

Advertisements

4 Responses to “Simple Text-Only WebBrowser”

  1. Add Undo/Redo or Back/Forward functionality to your application « Pradeep1210's Blog Says:

    […] UndoRedoClass Example – This application demonstrates the use of our UndoRedoClass by creating a TextOnly WebBrowser and implementing smooth flow of back/forward navigation along with some advanced features. Posted in Programming. » […]

  2. facebook fans Says:

    Hey! I’ve to offer a large thumbs up with the good info you’ve below with this post. I most certainly will be coming again for a blog for soon.

  3. Napoleon Hill Says:

    Excellent goods from you, man. I’ve understand your stuff previous to and you are just extremely excellent. I really like what you have acquired here, really like what you’re stating and the way in which you say it. You make it enjoyable and you still take care of to keep it sensible. I can not wait to read much more from you. This is actually a tremendous website.

  4. get more likes on facebook page fast Says:

    Quality articles or reviews is the secret to invite the people to go to see the site, that’s what this
    web page is providing.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: