.NET Forum / .NET Framework / Internationalization / November 2004
Store in a file a web page written in chinese
|
|
Thread rating:  |
Antonio - 25 Oct 2004 09:10 GMT Hi, I want to read an html page written in chinese and store it in a file having extension .aspx , I'm not sure where is the problem, I use the following lines of code:
String sAddress = "http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http:/ /www.etantonio.it/EN/index.aspx" ;
WebRequest req = WebRequest.Create(sAddress); WebResponse result = req.GetResponse(); Stream ReceiveStream = result.GetResponseStream(); StreamReader reader = new StreamReader(ReceiveStream, Encoding.UTF8 ); String sHtmlTradotto = reader.ReadToEnd();
StreamWriter writer = new StreamWriter( "prova.aspx" , false, System.Text.Encoding.UTF8) ;
writer.Write(sHtmlTradotto); writer.Flush(); writer.Close();
But the file produced didn't contain the chinese characters so, how can I solve the problem???
Many Thanks in advance ...
Ing. Antonio D'Ottavio
Nitin - 29 Oct 2004 09:26 GMT file which u have might have the chinese character but u might not be seeing ??? because of improper font setting select the font for chinese language and then check
> Hi, > I want to read an html page written in chinese and store it in a file [quoted text clipped - 23 lines] > > Ing. Antonio D'Ottavio Antonio - 02 Nov 2004 15:57 GMT With the following page aspx I try to translate one my page from English to Chinese, using UTF8, the result Is that the Chinese characters do not come read correctly, if instead I insert directly the address http://babelfish.altavista.com/babelfish/trurl_pagecontent?url=http://www.etanto nio.it/en/index.aspx&lp=en_zh into the browser the page he comes shown correctly in Chinese, if i save it and put it in my site and with the same below script I try to read it and to save it always with utf8, the Chinese characters come saves you normally, than problem there is to your opinion? My scope is to save in automatic way in a file with extension aspx the content of the page http://babelfish.altavista.com/babelfish/trurl_pagecontent?url=http://www.etanto nio.it/en/index.aspx&lp=en_zh
hello and thanks.... Antonio D'Ottavio www.etantonio.it
<%@ Page Language="c#" debug="true" trace="true"%> <%@ import Namespace="System" %> <%@ import Namespace="System.IO" %> <%@ import Namespace="System.Net" %>
<script runat="server"> static string sLanguageSrc = "EN"; static string sLanguageDest = "ZH"; string PathDirectory ; static FileInfo[] fi ;
void Page_Load(Object Src, EventArgs E ) { String sAddressEncoded = HttpUtility.UrlEncode("http://www.etantonio.it/en/index.aspx") ; String sAddress = "http://babelfish.altavista.com/babelfish/trurl_pagecontent?url=" + sAddressEncoded + "&lp=" + sLanguageSrc + "_" + sLanguageDest ; WebRequest req = WebRequest.Create(sAddress); WebResponse result = req.GetResponse(); Stream ReceiveStream = result.GetResponseStream(); StreamReader reader = new StreamReader(ReceiveStream, Encoding.UTF8 ); String sHtmlTradotto = reader.ReadToEnd();
String RegStringSymError = "(?i)\\<script\\slanguage=\"JavaScript\"\\>(\\s\\n)*\\<!--(\\s\\n)*function\\sSymError\\(\\)(\\s|\\n)*{(\\s|\\n)*return\\strue;(\\s|\\n)*}(\\s|\\n)*window.onerror\\s=\\sSymError;(\\s\\n)*//--\\>(\\s\\n)*\\</script\\>"; sHtmlTradotto = Regex.Replace(sHtmlTradotto, RegStringSymError, ""); Trace.Write("sHtmlTradotto", sHtmlTradotto); StreamWriter writer = new StreamWriter( Server.MapPath("/Etantonio/EN/ZH_Tradotta.aspx") , false, System.Text.Encoding.UTF8) ; writer.Write(sHtmlTradotto); writer.Flush(); writer.Close(); }
</script>
<html> <head> <title>Traduttore Cinese</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <META name="author" content="Antonio DOttavio"> <META name="keywords" content="Motore Ricerca Gif Animate, Animated Gif, Gif Animate, Gif, Animated, WebMaster, Web, Azioni, Borsa, Grafici, Criteri, Elettronica, Telecomunicazioni, Informatica, Università, Economia, Finanza"> <meta name="description" content="Motore Ricerca Gif Animate, Animated Gif"> <link href="../../Stili.css" rel="stylesheet" type="text/css"> </head>
<body> </body> </html>
Sylvain Lafontaine - 02 Nov 2004 18:44 GMT Trying to display Chinese with the charset iso-8859-1? If you want to display Chinese, all of your page must be in Unicode and not only just a part of it, the other part being in italian.
Replace iso-8859-1 with utf-8 and take at the following two articles (especially the end of the first one). The second one is there in case you need to know the code page for UTF-8 (65001: Response.Codepage = 65001 or Session.CodePage=65001 but Reponse.CharSet="UTF-8").
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/sq l_dataencoding.asp
http://support.microsoft.com/?kbid=232580
S. L.
> With the following page aspx > I try to translate one my page from English to Chinese, using UTF8, [quoted text clipped - 72 lines] > </body> > </html> Antonio - 04 Nov 2004 08:58 GMT Hi Sylvain, I maded what you suggested, in my page named TraduttoreCinese, I changed to utf-8 in fact now I have "charset=utf-8" and System.Text.Encoding.UTF8 both for reading the page from the web and for writing to a file, this is the code:
//////////////////////////////////////////////////////////////////////// <%@ Page Language="c#" debug="true" trace="true"%> <%@ import Namespace="System" %> <%@ import Namespace="System.IO" %> <%@ import Namespace="System.Net" %>
<script runat="server"> static string sLanguageSrc = "EN"; static string sLanguageDest = "ZH"; string PathDirectory ; static FileInfo[] fi ;
void Page_Load(Object Src, EventArgs E ) { String sAddressEncoded = HttpUtility.UrlEncode("http://www.etantonio.it/en/index.aspx") ; String sAddress = "http://babelfish.altavista.com/babelfish/trurl_pagecontent?url=" + sAddressEncoded + "&lp=" + sLanguageSrc + "_" + sLanguageDest ; WebRequest req = WebRequest.Create(sAddress); WebResponse result = req.GetResponse(); Stream ReceiveStream = result.GetResponseStream(); StreamReader reader = new StreamReader(ReceiveStream, Encoding.UTF8 ); String sHtmlTradotto = reader.ReadToEnd(); Trace.Write("sHtmlTradotto", sHtmlTradotto); StreamWriter writer = new StreamWriter( Server.MapPath("/Etantonio/EN/ZH_Tradotta.aspx") , false, System.Text.Encoding.UTF8) ; writer.Write(sHtmlTradotto); writer.Flush(); writer.Close(); }
</script>
<html> <head> <title>Traduttore Cinese</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> </body> </html> ///////////////////////////////////////////////////////////////////////////
still the result is not good in fact this is the result showing no chinese character, result different from see directly on the browser at the url:
http://babelfish.altavista.com/babelfish/trurl_pagecontent?url=http://www.etanto nio.it/en/index.aspx&lp=en_zh
this instead is my ugly result: /////////////////////////////////////////////////////////////////////////// sHtmlTradotto <html><meta http-equiv="content-type" content="text/html; charset=UTF-8"><base href="http://www.etantonio.it/en/index.aspx"> <!-- removed --><meta http-equiv="Content-Type" content="text/html ; CHARSET=UTF-8"><base href="http://www.etantonio.it/EN/index.aspx"> <!doctype HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<head> <title>Etantonio</title> <meta name="author" content="Antonio DOttavio"> <meta name="description" content="Etantonio Index"> <link href="Stili.css" rel="stylesheet" type="text/css"> </head> <body>
<script language=JavaScript src="menu_array.js" type=text/javascript></script> <script language=JavaScript src="mmenu.js" type=text/javascript></script>
<table width="750" height="430" border="0" cellpadding="0" cellspacing="0" background="/images/EsserSpettatoriNonEstSerioElefante.jpg"> <tr> <td valign="top">
<table width="90%" border="0" align="center" cellspacing="12"> <tr height="70" valign="top"> <td> </td> <td width="25%" rowspan="2"> <p align="center"><a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3 a%2f%2fwww.etantonio.it%2fEN%2fUniversita%2findex.aspx" class="testoMedioVerde"></a></p> <p align="center" class="testoPiccolissimoVerde">, </p> </td> <td width="25%" rowspan="2"> <p align="center"><a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3 a%2f%2fwww.etantonio.it%2fEN%2fEconomia%2findex.aspx" class="testoMedioVerde"></a><a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3 a%2f%2fwww.etantonio.it%2fEN%2fEconomia%2findex.aspx" class="testoMedioVerde"></a><a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3 a%2f%2fwww.etantonio.it%2fEN%2fEconomia%2findex.aspx" class="testoMedioVerde"></a> </p> <p align="center" class="testoPiccolissimoVerde">, , , 1994 </p></td> <td width="25%"> </td> </tr> <tr height="140" valign="top"> <td width="25%"> <p align="center"><a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3 a%2f%2fwww.etantonio.it%2fEN%2fLavoro%2findex.aspx" class="testoMedioVerde"></a><a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3 a%2f%2fwww.etantonio.it%2fEN%2fLavoro%2findex.aspx" class="testoMedioVerde"></a><a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3 a%2f%2fwww.etantonio.it%2fEN%2fLavoro%2findex.aspx" class="testoMedioVerde"></a> </p> <p align="center" class="testoPiccolissimoVerde">, , </p> </td> <td width="25%"> <p align="center" ><a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3 a%2f%2fwww.etantonio.it%2fEN%2fWeb%2fGifAnimate%2findex.aspx" class="testoMedioVerde"></a><a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3 a%2f%2fwww.etantonio.it%2fEN%2fWeb%2fGifAnimate%2findex.aspx" class="testoMedioVerde"></a><a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3 a%2f%2fwww.etantonio.it%2fEN%2fWeb%2fGifAnimate%2findex.aspx" class="testoMedioVerde"></a> </p> <p align="center" class="testoPiccolissimoVerde">GIF , </p> </td> </tr> <tr valign="top"> <td width="25%"> <p align="center"><a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3 a%2f%2fwww.etantonio.it%2fEN%2fVarie%2findex.aspx" class="testoMedioVerde"></a><a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3 a%2f%2fwww.etantonio.it%2fEN%2fVarie%2findex.aspx" class="testoMedioVerde"></a><a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3 a%2f%2fwww.etantonio.it%2fEN%2fVarie%2findex.aspx" class="testoMedioVerde"></a> </p> <p align="center" class="testoPiccolissimoVerde">, , , </p> </td> <td width="25%"> <div align="center"></div></td> <td width="25%"> <div align="center"></div></td> <td width="25%"> <p align="center"><a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3 a%2f%2fwww.etantonio.it%2fEN%2fContatti%2findex.aspx" class="testoMedioVerde"></a><a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3 a%2f%2fwww.etantonio.it%2fEN%2fContatti%2findex.aspx" class="testoMedioVerde"></a><a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3 a%2f%2fwww.etantonio.it%2fEN%2fContatti%2findex.aspx" class="testoMedioVerde"></a></p> <p align="center" class="testoPiccolissimoVerde">nel delle </p> </td> </tr> </table>
</td> </tr> </table> <script>InserisciFooter();</script> <br>
</body> </html> ///////////////////////////////////////////////////////////////////////////
Sylvain Lafontaine - 04 Nov 2004 19:03 GMT Hi,
I didn't have the time to mount a full in my system right now. However; I can see this duplicate header:
sHtmlTradotto <html><meta http-equiv="content-type" content="text/html; charset=UTF-8"><base href="http://www.etantonio.it/en/index.aspx"> <!-- removed --><meta http-equiv="Content-Type" content="text/html ; CHARSET=UTF-8"><base href="http://www.etantonio.it/EN/index.aspx"> <!doctype HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
Maybe IE is unable to see that the charset is indeed UTF-8. Have you tried to set the encoding directly to UNICODE-8 in the options of IE?
You should also try your code by first writing only the chinese page, without your own writing, and also trying to use an IFrame.
S. L.
> Hi Sylvain, > I maded what you suggested, in my page named TraduttoreCinese, I [quoted text clipped - 161 lines] > </html> > ///////////////////////////////////////////////////////////////////////////
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|