Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / C# / August 2006

Tip: Looking for answers? Try searching our database.

Convert string to "best possible" ascii representation

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Achim Domma - 30 Aug 2006 11:58 GMT
Hi,

I have to convert a string to its "best possible" ascii representation.
It's clear to me that this is not possible or sense full for all unicode
characters. But for most European characters it should be possible.

For example:

"Müller" should become "Muller" and "é" should become "e".

Does some functionality like this already exist?

Achim
Peter Bromberg [C# MVP] - 30 Aug 2006 12:39 GMT
"Best possible"? Who, pray tell, is the arbiter of that? You are the one that
chooses the encoding, and there are many to choose from.  If you use strict
ASCII encoding, you may have characters that render as ? Question Marks.
Peter

Signature

Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com

> Hi,
>
[quoted text clipped - 9 lines]
>
> Achim
Morten Wennevik - 30 Aug 2006 12:40 GMT
Hi Achim,

There is nothing out of the box that will do this for you.
You are probably best served using a lookup table to convert the  
characters, but there is a method that will approximate most of the  
characters.  This is not guaranteed to work!

string s = "éëæñúüøå";
byte[] data = Encoding.GetEncoding("ISO-8859-6").GetBytes(s);
s = Encoding.GetEncoding("ISO-8859-1").GetString(data);

// s == "eeanuuoa"

Signature

Happy Coding!
Morten Wennevik [C# MVP]

joachim@yamagata-europe.com - 30 Aug 2006 16:55 GMT
> there is a method that will approximate most of the
> characters.  This is not guaranteed to work!

I once needed a converter from any codepage to any codepage (as a
matter of fact, all windows codepages to all macintosh codepages). On
this link you can get all the
mappings you'll need for ASCII to Unicode:

http://www.unicode.org/Public/MAPPINGS/VENDORS/

I wrote a parser that built a substitution matrix from two files to
only switch the characters that had different ASCII codes for the same
unicode value. In your case, I'd suggest
you build your matrix from one single file (don't hard code it to keep
your solution flexible).

To make the substitiutions I implemented an Aho-Corasick engine with
callbacks
(you'll definitely want to use this if you want your replacement to be
efficient when processing large files - let's say 1GB)

http://en.wikipedia.org/wiki/Aho-Corasick_algorithm

With this method you are in complete control of what you want to
change. It is also flexible, because you only need to change the file
which holds your substitutions.

Drop me a line and I'll send you some code,

Best Regards,
Joachim
Larry Lard - 30 Aug 2006 14:37 GMT
> Hi,
>
[quoted text clipped - 7 lines]
>
> Does some functionality like this already exist?

Would you say this is something that's commonly done? Because that's
what gets in the Framework.

By the way, what are you going to do with the Scandinavian å and ø ?
Replacing them with a and o would be wrong at best.

Signature

Larry Lard
larrylard@googlemail.com
The address is real, but unread - please reply to the group
For VB and C# questions - tell us which version

Cor Ligthert [MVP] - 30 Aug 2006 16:07 GMT
Achim,

Maybe these two links on this page can help you in addition to the other
information you have got.

http://www.vb-tips.com/dbPages.aspx?ID=cca7e08a-9580-42b3-beff-76c81839e6c9

I hope this helps,

Cor

> Hi,
>
[quoted text clipped - 9 lines]
>
> Achim

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.