Home All Groups Group Topic Archive Search About

C#/ASP.NET 2.0 - Convert HTML Table to DataGrid/Source

Author
20 Feb 2006 8:38 PM
David R. Longnecker
We have a third party web-based application that we need to make searchable
by other applications; unfortunately, the data is stored within the HTML of
the pages rather than a database.  The manufacturer, and other developers,
suggested using the WebControl to "screen scrape" the data off the page.
I've worked with another developer to accomplish this; however, they're a
PHP shop, not a C#.NET shop.  The handy "explode" into array function just
isn't something that exists in C#.

So, I can render the data to the screen; however, I need to use that as a
data source for the web application.  Rather than keep it in it's HTML
format, Is it possible to convert this HTML information into a data source
that can then be queried against a SQL instance?

I've attempted to load it directly into a DataGrid, but cannot find the
correct (if it exists) syntax to do so.  Ideas?

Data example:

<table width="100%" border="0">
  <tr>
   <td>Header Field 1</td>
   <td>Header Field 2</td>
   <td>Header Field 3</td>
   <td>Header Field 4</td>
   <td>Header Field 5</td>
   <td>Header Field 6</td>
   <td>Header Field 7</td>
   <td>Header Field 8</td>
   <td>Header Field 9</td>
   <td>Header Field 9</td>
  </tr>
<tr>
<td>Data Field 1</td>
<td>Data Field 2</td>
<td>Data Field 3</td>
<td>Data Field 4</td>
<td>Data Field 5</td>
<td>Data Field 6</td>
<td>Data Field 7</td>
<td>Data Field 8</td>
<td>Data Field 9</td>
<td>Data Field 9</td>
</tr>

....etc...

</table>

Any ideas or suggestions would be appreciated!

-David

--

David R. Longnecker
CCNA, MCSA, Network+, A+
Management Information Services
Wichita Public Schools, USD 259

Author
21 Feb 2006 5:42 AM
Steven Cheng[MSFT]
Hi David,

Welcome to the MSDN newsgroup.

As for ASP.NET webcontrols, they are not direclty dealing with raw HTML
content. They can be created through new control instance or perform
databinding. For your scenario, you have some raw HTML content, I'm afraid
it's not able to be directly used by ASP.NET webcontrols. Is the html
content completely compatible with XHTML, if it's compatible, we can
consider store them as XML content in database. Also, in .net we can load
them through XML API( under System.Xml namespace) and manipulate them. This
is one possible approach.

Regarrds,

Steven Cheng
Microsoft Online Support

Get Secure! www.microsoft.com/security
(This posting is provided "AS IS", with no warranties, and confers no
rights.)
Author
23 Feb 2006 12:49 PM
David R. Longnecker
Steven-

The format, I believe, is valid XHTML (or can be made to be such); could you
provide any sources or examples of parsing XHTML using the Xml components?

Thanks!

-David

Show quoteHide quote
"Steven Cheng[MSFT]" <stch***@online.microsoft.com> wrote in message
news:xGNSWmqNGHA.768@TK2MSFTNGXA01.phx.gbl...
> Hi David,
>
> Welcome to the MSDN newsgroup.
>
> As for ASP.NET webcontrols, they are not direclty dealing with raw HTML
> content. They can be created through new control instance or perform
> databinding. For your scenario, you have some raw HTML content, I'm afraid
> it's not able to be directly used by ASP.NET webcontrols. Is the html
> content completely compatible with XHTML, if it's compatible, we can
> consider store them as XML content in database. Also, in .net we can load
> them through XML API( under System.Xml namespace) and manipulate them.
> This
> is one possible approach.
>
> Regarrds,
>
> Steven Cheng
> Microsoft Online Support
>
> Get Secure! www.microsoft.com/security
> (This posting is provided "AS IS", with no warranties, and confers no
> rights.)
>
Author
24 Feb 2006 9:48 AM
Steven Cheng[MSFT]
Hi David,

If this's XHTML, we can use the .NET's System.Xml namespace's classes, such
as  the XmlDocument to load the string data and parse it through the XML
DOM api.  For example, suppose we have the following html content:(rendered
by an ASP.NET page that contain an GridView)

==========htmlcontent.xml======

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" >
    <head>
        <title>
            Untitled Page
        </title>
    </head>
    <body>
        <form name="form1" method="post" action="XMLTestPate.aspx" id="form1">
            <div>
                <input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
                <input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT"
value="" />
                <input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE"
value="/wEPDwULLTEzMzQ0MTcyNzAPZBYCAgMPZBYCAhkPPCsADQEADxYGHgtfIURhdGFCb3VuZ
GceCVBhZ2VDb3VudAIBHgtfIUl0ZW1Db3VudAIIZBYCZg9kFhICAQ9kFgQCAQ8PFgIeBFRleHQFA
TFkZAICDw8WAh8DBQlCZXZlcmFnZXNkZAICD2QWBAIBDw8WAh8DBQEyZGQCAg8PFgIfAwUKQ29uZ
GltZW50c2RkAgMPZBYEAgEPDxYCHwMFATNkZAICDw8WAh8DBQtDb25mZWN0aW9uc2RkAgQPZBYEA
gEPDxYCHwMFATRkZAICDw8WAh8DBQ5EYWlyeSBQcm9kdWN0c2RkAgUPZBYEAgEPDxYCHwMFATVkZ
AICDw8WAh8DBQ5HcmFpbnMvQ2VyZWFsc2RkAgYPZBYEAgEPDxYCHwMFATZkZAICDw8WAh8DBQxNZ
WF0L1BvdWx0cnlkZAIHD2QWBAIBDw8WAh8DBQE3ZGQCAg8PFgIfAwUHUHJvZHVjZWRkAggPZBYEA
gEPDxYCHwMFAThkZAICDw8WAh8DBQdTZWFmb29kZGQCCQ8PFgIeB1Zpc2libGVoZGQYAQUJR3JpZ
FZpZXcxD2dknkrQQmgpUotTngcD2Aq25tCstEw=" />
            </div>

            <script type="text/javascript">
                <!--
var theForm = document.forms['form1'];
if (!theForm) {
    theForm = document.form1;
}
function __doPostBack(eventTarget, eventArgument) {
    if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
        theForm.__EVENTTARGET.value = eventTarget;
        theForm.__EVENTARGUMENT.value = eventArgument;
        theForm.submit();
    }
}
// -->
            </script>


            <div>
                <span id="Label1">Label</span>
                <input name="TextBox1" type="text" id="TextBox1" />
                <input type="submit" name="Button1" value="Button" id="Button1" />
                <a id="LinkButton1"
href="javascript:__doPostBack('LinkButton1','')">LinkButton</a>

                <table width="100%">
                    <tr>
                        <td>
                            <input name="TextBox2" type="text" id="TextBox2" />
                            <input type="submit" name="Button2" value="Button" id="Button2" />
                        </td>
                    </tr>
                    <tr>
                        <td>
                            <input name="TextBox3" type="text" id="TextBox3" />
                            <input type="submit" name="Button3" value="Button" id="Button3" />
                        </td>
                    </tr>
                    <tr>
                        <td>
                            <a id="LinkButton2"
href="javascript:__doPostBack('LinkButton2','')">LinkButton</a>
                            <span id="Label2">Label</span>
                            <span id="Label3">Label</span>
                        </td>
                    </tr>
                </table>
            </div>

            <div>
                <table cellspacing="0" rules="all" border="1" id="GridView1"
style="border-collapse:collapse;">
                    <tr>
                        <th scope="col">&nbsp;</th>
                        <th scope="col">CategoryID</th>
                        <th scope="col">CategoryName</th>
                    </tr>
                    <tr>
                        <td>
                            <a href="javascript:__doPostBack('GridView1','Select$0')">Select</a>
                        </td>
                        <td>1</td>
                        <td>Beverages</td>
                    </tr>
                    <tr>
                        <td>
                            <a href="javascript:__doPostBack('GridView1','Select$1')">Select</a>
                        </td>
                        <td>2</td>
                        <td>Condiments</td>
                    </tr>
                    <tr>
                        <td>
                            <a href="javascript:__doPostBack('GridView1','Select$2')">Select</a>
                        </td>
                        <td>3</td>
                        <td>Confections</td>
                    </tr>
                    <tr>
                        <td>
                            <a href="javascript:__doPostBack('GridView1','Select$3')">Select</a>
                        </td>
                        <td>4</td>
                        <td>Dairy Products</td>
                    </tr>
                    <tr>
                        <td>
                            <a href="javascript:__doPostBack('GridView1','Select$4')">Select</a>
                        </td>
                        <td>5</td>
                        <td>Grains/Cereals</td>
                    </tr>
                    <tr>
                        <td>
                            <a href="javascript:__doPostBack('GridView1','Select$5')">Select</a>
                        </td>
                        <td>6</td>
                        <td>Meat/Poultry</td>
                    </tr>
                    <tr>
                        <td>
                            <a href="javascript:__doPostBack('GridView1','Select$6')">Select</a>
                        </td>
                        <td>7</td>
                        <td>Produce</td>
                    </tr>
                    <tr>
                        <td>
                            <a href="javascript:__doPostBack('GridView1','Select$7')">Select</a>
                        </td>
                        <td>8</td>
                        <td>Seafood</td>
                    </tr>
                </table>
            </div>
            <br />


            <div>

                <input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION"
value="/wEWEQLzm5qyAwLs0bLrBgKM54rGBgLM9PumDwLs0fbZDAK7q7GGCALs0Yq1BQLWlM+bA
gKxi96RBQKGkch3AoaR3JoIAoaR4K0CAoaR9NALAoaRuOMNAoaRjIYFAoaR0JgPAoaRpLwERozjF
BdHC34FE0yAFB8appHcSgc=" />
            </div>
        </form>
    </body>
</html>
=================================

We can use the following code to load the file into XmlDocument and use
XPath to query nodes from the document. Here I query the html
<table>element of the GridView1, and print out each html table row(tr)'s
xml content.

==========================
protected void Button1_Click(object sender, EventArgs e)
    {
        XmlDocument doc = new XmlDocument();
        doc.Load(Server.MapPath("~/htmlcontent.xml"));

        Response.Write("<br/>" + doc.DocumentElement.Name);


        XmlNamespaceManager manager = new
XmlNamespaceManager(doc.NameTable);
        manager.AddNamespace("x", "http://www.w3.org/1999/xhtml");

        XmlNode node =
doc.SelectSingleNode("//x:table[@id='GridView1']",manager);

        if (node != null)
        {
            XmlNodeList nodes = node.SelectNodes("//x:tr", manager);

            foreach (XmlNode tr in nodes)
            {
                Response.Write("<br/>" + Server.HtmlEncode(tr.OuterXml));
            }
        }
    }
===================================

You can find more .NET framework XML processing reference on the MSDN site
or in your local MSDN:

#XML Documents and Data 
http://msdn2.microsoft.com/en-us/library/2bcctyt8.aspx

Hope this helps.

Regards,

Steven Cheng
Microsoft Online Support

Get Secure! www.microsoft.com/security
(This posting is provided "AS IS", with no warranties, and confers no
rights.)