Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / ASP.NET / General / August 2007

Tip: Looking for answers? Try searching our database.

Retrievel Hyperlinks for a web page in code

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Enigma Boy - 14 Aug 2007 07:01 GMT
Hi folks,

I am retrieving a website for a site using httpWebRequest.  What I want to
do with the retrieved webpage is list all the hyperlinks in the page.  If I
do a simple regex search for <a then I get links that are commented out in
code and I don't want that.  I want links that are actually active.  This is
to do with reciprocal link check.

Can someone please point me in the right direction.

Thanks.

Signature

<a href="http://1pakistangifts.com">Send Gifts to Pakisan at #Pakistan Gifts
Store</a> | <a href="http://dotspecialists.com">Leading Software offshoring
and outsourcing service provider</a> | <a
href="http://websitedesignersrus.com">Professional Websites at affordable
prices</a>

Alexey Smirnov - 14 Aug 2007 09:17 GMT
> Hi folks,
>
[quoted text clipped - 3 lines]
> code and I don't want that.  I want links that are actually active.  This is
> to do with reciprocal link check.

Hi, I think you can try to clean the text before you get the links.
For example:

html_code = Regex.Replace(html_code, "<!--((.|\n)*?)-->", "");

This will replace all commented code by an empty string and then you
can get the links.
Jesse Houwing - 14 Aug 2007 11:56 GMT
Hello Enigma,

> Hi folks,
>
[quoted text clipped - 7 lines]
>
> Thanks.

Have a look at the HTML Agility pack. It allows you to treat the HTML as
it were XML.

http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack

--
Jesse Houwing
jesse.houwing at sogeti.n

Rate this thread:







Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.