Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / C# / September 2007

Tip: Looking for answers? Try searching our database.

load all HTML into string....

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Rogelio - 09 Sep 2007 17:56 GMT
hey, I want to get the entire contents of an HTML page, and put all the html
code it returns in a string. so that I can parse that string for data,

how would I go about doing this? would I need to use a web browser control ?
any help/advise ? thanks.
Nicholas Paldino [.NET/C# MVP] - 09 Sep 2007 18:02 GMT
Rogelio,

   You can use the WebClient class in the System.Net namespace, or, if you
need more control over the request, you can use the
HttpWebRequest/HttpWebResponse classes.

Signature

         - Nicholas Paldino [.NET/C# MVP]
         - mvp@spam.guard.caspershouse.com

> hey, I want to get the entire contents of an HTML page, and put all the
> html
[quoted text clipped - 3 lines]
> ?
> any help/advise ? thanks.
Martin Honnen - 09 Sep 2007 18:10 GMT
> hey, I want to get the entire contents of an HTML page, and put all the html
> code it returns in a string. so that I can parse that string for data,
>
> how would I go about doing this? would I need to use a web browser control ?
> any help/advise ? thanks.

Well you can read from a file or a URL with various .NET APIs,
WebRequest/HttpWebRequest is useful to read data from a URL. It depends
where you need to access that "HTML page", whether it is on the local
file system, a HTTP server, an FTP server. The main problem to get a
string is to find out the encoding of the HTML document, HTML browsers
go to complicated attempts to identify that by looking at HTTP headers
and at meta elements in the HTML document, looking at HTTP headers is
easy with HttpWebRequest/Response, looking at meta elements is more work.
However there are tools to parse HTML documents, one is SgmlReader
<URL:http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=B90FDDCE-
E60D-43F8-A5C4-C3BD760564BC
>
so I wouldn't try to parse a string of HTML with string functions or
regular expressions if that is your aim. With SgmlReader you get an
XmlReader API over the HTML document so the reader recognizes the
different nodes like element nodes, attribute nodes, text nodes, comment
nodes. And you can pass the SgmlReader to other .NET APIs like
XmlDocument or XPathDocument to make use of DOM and/or XPath and XSLT
support in the .NET framework. Much better than string parsing of a HTML
document.

Signature

    Martin Honnen --- MVP XML
    http://JavaScript.FAQTs.com/

Arne Vajhøj - 09 Sep 2007 18:37 GMT
> hey, I want to get the entire contents of an HTML page, and put all the html
> code it returns in a string. so that I can parse that string for data,
>
> how would I go about doing this? would I need to use a web browser control ?
> any help/advise ? thanks.

(Http)WebRequest Create, wrap the resulting Stream in a StreamReader and
use ReadToEnd.

Arne

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.