Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / Internationalization / September 2007

Tip: Looking for answers? Try searching our database.

Determining whether the text is RTL

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Jan Kucera - 11 Sep 2007 11:29 GMT
Hello,
 I entered a little problem concerning automatic text alignment in WPF
mentioned at http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=2123352 
and it seems I'd have to do the workaround myself, yet this group seems more
appropriate to look for the answer in.

 The application gets some text (from XML) and is supposed to display it.
However, this XML contains data from several cultures and some comes from
RTL ones (eg. the text is Hebrew). Now I need to find out, wheter I should
align the text to the left, or to the right. Is there any function, either
in .NET or in Win32 that would determine this for me? I could get the first
character and test whether it is Arabic, Hebrew and so on, but I'll likely
miss some case (or future one), so I'm looking for more general way of doing
that.

     Thank you for any hints,
           Jan
Mihai N. - 12 Sep 2007 06:25 GMT
>   The application gets some text (from XML) and is supposed to display it.
> However, this XML contains data from several cultures and some comes from
[quoted text clipped - 4 lines]
> miss some case (or future one), so I'm looking for more general way of
> doing that.

This is how you determine if some culture needs RTL rendering:
  http://blogs.msdn.com/michkap/archive/2006/07/12/663013.aspx

But you need to have a way in the XML itself to tag data with a culture.

There is no 100% safe way to determine if the text is RTL based on the text
content only. Imagine you have a mixture like this: "XXXXX YYYYY"
with XXXXX some English text, and YYYYY some Arabic text.
Is that English with an Arabic inset, or Arabic with an English inset?

Signature

Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

Jan Kucera - 12 Sep 2007 06:36 GMT
Hi Mihai,
 thank you for answer. However, the Michael's post is expecting to have a
CultureInfo. That way, because targeting newer .NET Framework, I could use
the CultureInfo.TextInfo.IsRightToLeft.

 Okay, I know your sample would be a problem. So, how to check it for a
single character? Is there any way to test for all RTL cases?

 Actually I think I do have ISO-639-2 tag for the text, but I'm not sure
whether it is worth to create separate info about textflow with them.

 Jan

>>   The application gets some text (from XML) and is supposed to display
>> it.
[quoted text clipped - 20 lines]
> with XXXXX some English text, and YYYYY some Arabic text.
> Is that English with an Arabic inset, or Arabic with an English inset?
Mihai N. - 13 Sep 2007 04:45 GMT
>   thank you for answer. However, the Michael's post is expecting to have a
> CultureInfo. That way, because targeting newer .NET Framework, I could use
> the CultureInfo.TextInfo.IsRightToLeft.

>   Okay, I know your sample would be a problem. So, how to check it for a
> single character? Is there any way to test for all RTL cases?
Withoug a CultureInfo you can try calling (the native) GetStringTypeEx.
It takes a locale ID, but you can use whatever you want,
The strong attributes in CT_CTYPE2 (C2_RIGHTTOLEFT/C2_LEFTTORIGHT) are
not affected by locale.

But there is still no reliable way to test for all RTL cases.
Sometimes not even a human can do it.

>   Actually I think I do have ISO-639-2 tag for the text, but I'm not sure
> whether it is worth to create separate info about textflow with them.

I think most of the time text content is in a single language.
A document is mostly in language A, with small chunks of other languages.
But those areas have to be tagged.
Designing a document where all the languages are mixed, without properly
tagging them, is not very usefull.
Think MS Word, where you can mark text sections with a different language
for spell-checking.

If possible it would be a good idea to tag the documents
(if not paragraphs, or records, or whatever) with a full locale ID,
RFC 4646 style.

There are quite a few things that cannot be done properly without
locale info. For example sorting, case conversion are culture sensitive.
Font selection (you cannot use a Chinese Traditional font for
Chinese Simplified text, even when the text is identical).
In fact, unless all you do is move text around (no processing, no display),
it is best to know what is the locale of that text.

Signature

Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

Jan Kucera - 13 Sep 2007 11:21 GMT
> Withoug a CultureInfo you can try calling (the native) GetStringTypeEx.
> It takes a locale ID, but you can use whatever you want,
[quoted text clipped - 3 lines]
> But there is still no reliable way to test for all RTL cases.
> Sometimes not even a human can do it.

I will give it a try. I just want to avoid (not mentioning that I did not
find any way of such checking in .NET)
 if (char is Arabic || char is Hebrew || char is Urdu || char is Persian ||
char is Syriac)
and forget the Divehi case, or any new culture that will come. I thought
that going the way 'if any version of the Windows (or .NET) I am runnig
thinks it is RTL I should think it as well' would do the trick.

> I think most of the time text content is in a single language.
> A document is mostly in language A, with small chunks of other languages.
[quoted text clipped - 3 lines]
> Think MS Word, where you can mark text sections with a different language
> for spell-checking.

Yes I agree, I wanted to mentioned it with your example too. I know the text
I'm displaying will always be whole (or rarely except a word or two) within
the same language. So I can afford to just check the first character in a
title for example.

> If possible it would be a good idea to tag the documents
> (if not paragraphs, or records, or whatever) with a full locale ID,
[quoted text clipped - 7 lines]
> display),
> it is best to know what is the locale of that text.

Well fortunately enough, I define the schema here and I could do some
changes or improvements. I have set of data coming from different cultures
and as Michael has written in the blog and suggested me as well, the user is
most likely expecting behaviour based on his culture. So I do sorting of
this data and case insensitive searching in context of the user's culture.
All I do with data themselves is just to display them. For that reason and
because of WPF I need to have an idea, wheter I should mark the document as
RTL.  The only other reason for knowing CultureInfo I could came up with is
the ToTitleCase method, but I expect the titles of documents are already
properly cased.

The problem here is, that I have data in languages which do not match with
any existing culture. Like Latin, Old or Middle English and so on, artifical
languages not foreclased either. Filtering data to show only these in Middle
English (enm) is far more important to my application than having a
CultureInfo for the language, since I need only to display it. This is the
reason I choosed ISO-639-2 table instead of .NET supported cultures.

If there was a table mapping ISO-639-2 or -3 languages to appropriate
CultureInfo classes, even if not accurate, my problems would have been
solved. The document could be kept with the ISO marks and the application
would get corresponding CultureInfo for properly displaying it. Until then,
the GetStringTypeEx would do the work I think.

 Thank you for your hints and thoughts.
        Jan
Mihai N. - 14 Sep 2007 05:24 GMT
>  ... will always be whole (or rarely except a word or two) within
> the same language. So I can afford to just check the first character in a
> title for example.
If you don't notice any performance hit, try going beyond the first
character, exactly for the rare "word or two," or digits, or other
characters.
Maybe calculate a percentage (72% rtl, 12% ltr, 6% others), establish
a threshold, and go from there.

> The problem here is, that I have data in languages which do not match with
> any existing culture. Like Latin, Old or Middle English and so on,
> artifical languages not foreclased either.

Yes, I understand how this can be a problem :-)

If you can control the environment (and it is Vista) you can create your
own custom locales.

See:
http://blogs.msdn.com/shawnste/archive/2005/11/23/496440.aspx
http://msdn.microsoft.com/msdnmag/issues/06/12/LocaleHero/
http://msdn.microsoft.com/msdnmag/issues/06/06/CLRInsideOut/
http://windowsvistablog.com/blogs/windowsvista/archive/2006/07/19/442572.aspx

And the tools:
- Microsoft Locale Builder (Beta 2)
 http://www.microsoft.com/downloads/details.aspx?FamilyID=e4588c5e-8f21-
45cc-b862-38df8d9bd528&DisplayLang=en
- Microsoft Keyboard Layout Creator
 http://www.microsoft.com/globaldev/tools/msklc.mspx

Signature

Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

Jan Kucera - 14 Sep 2007 08:28 GMT
> Maybe calculate a percentage (72% rtl, 12% ltr, 6% others), establish
> a threshold, and go from there.

Yes I thought about this already. It should not cost much performance since
checking only the title of document. But I think I'll try to keep it simple
at the moment (the GetStringTypeEx works as expected, thanks!) untill I find
any problematic data, or solve the problem the other way.

> If you can control the environment (and it is Vista) you can create your
> own custom locales.

Thanks for the links. Regardless whether I could afford to support only
Vista...well.. there are 500 items in ISO-639-2 and 7500 in ISO-639-3...
Uh.. :-)) About most of them I've never heard, not to say about knowing the
culture/language so deeply to be able to create corresponding CultureInfo.

Jan
Michael S. Kaplan [MSFT] - 14 Sep 2007 08:13 GMT
Jan,

You can use code like in this post:

http://blogs.msdn.com/michkap/archive/2007/01/06/1421178.aspx

or use GetStringTypeW to get the info back.

Signature

MichKa [Microsoft]
Fundamentals Technical Lead
Windows International
Blog: http://blogs.msdn.com/michkap

This posting is provided "AS IS" with
no warranties, and confers no rights.

>> Withoug a CultureInfo you can try calling (the native) GetStringTypeEx.
>> It takes a locale ID, but you can use whatever you want,
[quoted text clipped - 65 lines]
>  Thank you for your hints and thoughts.
>         Jan
Jan Kucera - 14 Sep 2007 08:39 GMT
> Jan,
>
> You can use code like in this post:
> http://blogs.msdn.com/michkap/archive/2007/01/06/1421178.aspx
> or use GetStringTypeW to get the info back.

Hmmm... thanks for the managed way, Michael!
Although I'd have to find a very good reason to leave PInvoke and move to
Reflection... ;-)

Any improvements in .NET 3.0 or 3.5?
Jan
Michael S. Kaplan [MSFT] - 14 Sep 2007 14:20 GMT
Unfortunately, no -- red bits/green bits rules, you see. :-(

Signature

MichKa [Microsoft]
Fundamentals Technical Lead
Windows International
Blog: http://blogs.msdn.com/michkap

This posting is provided "AS IS" with
no warranties, and confers no rights.

>> Jan,
>>
[quoted text clipped - 8 lines]
> Any improvements in .NET 3.0 or 3.5?
> Jan

Rate this thread:







Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.