My database contains multiple languages such as Englisg, French, Chinese,
Japanese...(16 languages). Can any one tell me how I can detect which kind
of language is saved in a column without any additional flag by program?
Nicholas Paldino [.NET/C# MVP] - 12 Oct 2007 17:35 GMT
Zhangming,
There is really no way to do that. Assuming that you are storing
everything in unicode columns, you can try and scan every character, and see
if they all fall within some subset of unicode which corresponds to a
particular language, but at best, that is a heuristic. You really need to
store the language along with the information.

Signature
- Nicholas Paldino [.NET/C# MVP]
- mvp@spam.guard.caspershouse.com
> My database contains multiple languages such as Englisg, French, Chinese,
> Japanese...(16 languages). Can any one tell me how I can detect which kind
> of language is saved in a column without any additional flag by program?
deerchao - 12 Oct 2007 20:25 GMT
On Oct 12, 10:25 pm, "Zhangming Su" <s...@ExportTradingNetwork.com>
wrote:
> My database contains multiple languages such as Englisg, French, Chinese,
> Japanese...(16 languages). Can any one tell me how I can detect which kind
> of language is saved in a column without any additional flag by program?
You need a NLP Language Identifiction tool/algorithm.
Check this:
http://www.let.rug.nl/~vannoord/TextCat/competitors.html
I myself is not an expert, saw that months ago, still not get my hands
wet yet.
Rad [Visual C# MVP] - 15 Oct 2007 18:17 GMT
>On Oct 12, 10:25 pm, "Zhangming Su" <s...@ExportTradingNetwork.com>
>wrote:
[quoted text clipped - 8 lines]
>I myself is not an expert, saw that months ago, still not get my hands
>wet yet.
Certainly an interesting solution ... however I wonder if there are
some limitations in term of
a) Accuracy -- how will the tool function with a few single words that
could be cross-language?
b) Performance -- how long do the tools take to evaluate each input?
I wonder if it wouldn't be simpler to have a column that stores the
language ...
--
http://bytes.thinkersroom.com