Hi,
I have to convert a string to its "best possible" ascii representation.
It's clear to me that this is not possible or sense full for all unicode
characters. But for most European characters it should be possible.
For example:
"Müller" should become "Muller" and "é" should become "e".
Does some functionality like this already exist?
Achim
Peter Bromberg [C# MVP] - 30 Aug 2006 12:39 GMT
"Best possible"? Who, pray tell, is the arbiter of that? You are the one that
chooses the encoding, and there are many to choose from. If you use strict
ASCII encoding, you may have characters that render as ? Question Marks.
Peter

Signature
Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com
> Hi,
>
[quoted text clipped - 9 lines]
>
> Achim
Morten Wennevik - 30 Aug 2006 12:40 GMT
Hi Achim,
There is nothing out of the box that will do this for you.
You are probably best served using a lookup table to convert the
characters, but there is a method that will approximate most of the
characters. This is not guaranteed to work!
string s = "éëæñúüøå";
byte[] data = Encoding.GetEncoding("ISO-8859-6").GetBytes(s);
s = Encoding.GetEncoding("ISO-8859-1").GetString(data);
// s == "eeanuuoa"

Signature
Happy Coding!
Morten Wennevik [C# MVP]
joachim@yamagata-europe.com - 30 Aug 2006 16:55 GMT
> there is a method that will approximate most of the
> characters. This is not guaranteed to work!
I once needed a converter from any codepage to any codepage (as a
matter of fact, all windows codepages to all macintosh codepages). On
this link you can get all the
mappings you'll need for ASCII to Unicode:
http://www.unicode.org/Public/MAPPINGS/VENDORS/
I wrote a parser that built a substitution matrix from two files to
only switch the characters that had different ASCII codes for the same
unicode value. In your case, I'd suggest
you build your matrix from one single file (don't hard code it to keep
your solution flexible).
To make the substitiutions I implemented an Aho-Corasick engine with
callbacks
(you'll definitely want to use this if you want your replacement to be
efficient when processing large files - let's say 1GB)
http://en.wikipedia.org/wiki/Aho-Corasick_algorithm
With this method you are in complete control of what you want to
change. It is also flexible, because you only need to change the file
which holds your substitutions.
Drop me a line and I'll send you some code,
Best Regards,
Joachim
Larry Lard - 30 Aug 2006 14:37 GMT
> Hi,
>
[quoted text clipped - 7 lines]
>
> Does some functionality like this already exist?
Would you say this is something that's commonly done? Because that's
what gets in the Framework.
By the way, what are you going to do with the Scandinavian å and ø ?
Replacing them with a and o would be wrong at best.

Signature
Larry Lard
larrylard@googlemail.com
The address is real, but unread - please reply to the group
For VB and C# questions - tell us which version
Cor Ligthert [MVP] - 30 Aug 2006 16:07 GMT
Achim,
Maybe these two links on this page can help you in addition to the other
information you have got.
http://www.vb-tips.com/dbPages.aspx?ID=cca7e08a-9580-42b3-beff-76c81839e6c9
I hope this helps,
Cor
> Hi,
>
[quoted text clipped - 9 lines]
>
> Achim