Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / C# / March 2008

Tip: Looking for answers? Try searching our database.

HTTPWebRequest not working with Wikipedia

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Mugunth - 10 Mar 2008 12:49 GMT
I'm trying to use a HTTPWebRequest class to retrieve a webpage. Below
is the following code....

           string google = "http://www.google.com.sg/search?
hl=en&btnI=I'm+feeling+Lucky&q=";
           string wikipedia = "http://en.wikipedia.org/wiki/
Special:Search?fulltext=Search&search=";

           string website = wikipedia; // wikipedia does not work,
google works....

           string query = textBoxUserQuery.Text;

           // prepare the web page we will be asking for
           HttpWebRequest  request  =
(HttpWebRequest)WebRequest.Create(website + query);

           // execute the request
           HttpWebResponse response = (HttpWebResponse)
               request.GetResponse();

           // we will read data via the response stream
           Stream resStream = response.GetResponseStream();

Somehow, when is use google, I get a response, where as if I use
wikipedia, I get a Http Error stating
The remote server returned an error: (403) Forbidden.

The status says "System.Net.WebExceptionStatus.ProtocolError"

However I'm able to query for a page like http://en.wikipedia.org/wiki/Main_Page,
but cannot access the search page.

Am I missing something? Please help.

Mugunth
Jon Skeet [C# MVP] - 10 Mar 2008 12:53 GMT
> I'm trying to use a HTTPWebRequest class to retrieve a webpage. Below
> is the following code....

Could you produce a short but complete (preferrably console)
application which demonstrates the problem? See http://pobox.com/~skeet/csharp/complete.html
for what I mean by that.

Jon
Mugunth - 10 Mar 2008 13:03 GMT
I've posted the complete code in my prev post.
It's a console app.

string google = "http://www.google.com.sg/search?
hl=en&btnI=I'm+feeling+Lucky&q=";
           string wikipedia = "http://en.wikipedia.org/wiki/
Special:Search?fulltext=Search&search=";

           string website = wikipedia; // wikipedia does not work,
google works....

           string query = "Microsoft";

                   // prepare the web page we will be asking for
                   HttpWebRequest  request  =
(HttpWebRequest)WebRequest.Create(website + query);

                   // execute the request
                   HttpWebResponse response = (HttpWebResponse)
                           request.GetResponse();

                   // we will read data via the response stream
                   Stream resStream = response.GetResponseStream();

the request.GetResponse() call throws an exception when I use
wikipedia search but runs fine and returns a html page when I use
google.

Any Help is appreciated,
Mugunth
Jon Skeet [C# MVP] - 10 Mar 2008 13:07 GMT
> I've posted the complete code in my prev post.
> It's a console app.

Your previous post contained a reference to "textBoxUserQuery.Text"
which doesn't sound like a console app.

See http://pobox.com/~skeet/csharp/incomplete.html

If it doesn't start with using directives and a class declaration, it's
unlikely to be complete.

Try cutting and pasting what you've posted into a brand new text file
and compile it. It won't work.

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk

Marc Gravell - 10 Mar 2008 13:18 GMT
I can reproduce the 403... at the end of the day, if they want to
prevent this type of access that is their prerogative?

You could probably go to town trying to spoof a standard request, but
I suspect you might be in violation of their policies (I haven't
checked).

Alternatively, host in a WebBrowser (which is shdocvw), or search for
a *supported* search API / web-service

Marc
Nicholas Paldino [.NET/C# MVP] - 10 Mar 2008 15:09 GMT
This action is disallowed by Wikipedia.  If you check the Robots.txt
file:

http://en.wikipedia.org/robots.txt

   You will see this in it:

User-agent: *
Disallow: /wiki/Special:Search

   So your response of 403 - Forbidden is expected.  They don't want you
doing this.

Signature

         - Nicholas Paldino [.NET/C# MVP]
         - mvp@spam.guard.caspershouse.com

> I'm trying to use a HTTPWebRequest class to retrieve a webpage. Below
> is the following code....
[quoted text clipped - 33 lines]
>
> Mugunth
Peter Bromberg [C# MVP] - 10 Mar 2008 16:02 GMT
Wikipedia exposes its content for search via several APIs including a few
that have been written and are managed by third -parties. There is an XML
version that returns the MediaWiki markup for a result page inside an Xml
element. You would still have to convert the wiki markup to formatted HTML, a
process which is not trivial.  As Nicholas indicated, Wikipedia doesn't want
people "faking" their seach box and redisplaying the scraped content.
-- Peter
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
Short Urls & more: http://ittyurl.net

> I'm trying to use a HTTPWebRequest class to retrieve a webpage. Below
> is the following code....
[quoted text clipped - 32 lines]
>
> Mugunth

Rate this thread:







Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.