Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / CLR / November 2006

Tip: Looking for answers? Try searching our database.

Char.IsPunctuation vs. CRT is(w)punct

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Jeff Pek (Autodesk) - 06 Nov 2006 12:40 GMT
Hi all -

A Kb article indicates that Char.IsPunctuation is the "equivalent" of the
CRT's isXpunct (e.g., iswpunct) function in .NET. However, I've found
significant differences in their behaviors. As a test, I ran each function
through the first 1000 or so unicode characters, and found the results that
follow. It identifies that characters for which the 2 functions returned
different results, and shows what the .NET method said. I'm sure there are
other differences later on in the character set.

So far, I haven't seen any documentation regarding the specific differences.
I wonder if anything exists. Thanks for any pointers.

Regards,
 Jeff

--------

IsPunctuation mismatch: ! (33). .NET says: True
IsPunctuation mismatch: " (34). .NET says: True
IsPunctuation mismatch: # (35). .NET says: True
IsPunctuation mismatch: $ (36). .NET says: False
IsPunctuation mismatch: % (37). .NET says: True
IsPunctuation mismatch: & (38). .NET says: True
IsPunctuation mismatch: ' (39). .NET says: True
IsPunctuation mismatch: ( (40). .NET says: True
IsPunctuation mismatch: ) (41). .NET says: True
IsPunctuation mismatch: * (42). .NET says: True
IsPunctuation mismatch: + (43). .NET says: False
IsPunctuation mismatch: , (44). .NET says: True
IsPunctuation mismatch: - (45). .NET says: True
IsPunctuation mismatch: . (46). .NET says: True
IsPunctuation mismatch: / (47). .NET says: True
IsPunctuation mismatch: : (58). .NET says: True
IsPunctuation mismatch: ; (59). .NET says: True
IsPunctuation mismatch: < (60). .NET says: False
IsPunctuation mismatch: = (61). .NET says: False
IsPunctuation mismatch: > (62). .NET says: False
IsPunctuation mismatch: ? (63). .NET says: True
IsPunctuation mismatch: @ (64). .NET says: True
IsPunctuation mismatch: [ (91). .NET says: True
IsPunctuation mismatch: \ (92). .NET says: True
IsPunctuation mismatch: ] (93). .NET says: True
IsPunctuation mismatch: ^ (94). .NET says: False
IsPunctuation mismatch: _ (95). .NET says: True
IsPunctuation mismatch: ` (96). .NET says: False
IsPunctuation mismatch: { (123). .NET says: True
IsPunctuation mismatch: | (124). .NET says: False
IsPunctuation mismatch: } (125). .NET says: True
IsPunctuation mismatch: ~ (126). .NET says: False
IsPunctuation mismatch: ­ (161). .NET says: True
IsPunctuation mismatch: > (162). .NET says: False
IsPunctuation mismatch: o (163). .NET says: False
IsPunctuation mismatch:  (164). .NET says: False
IsPunctuation mismatch:  (165). .NET says: False
IsPunctuation mismatch: Ý (166). .NET says: False
IsPunctuation mismatch:  (167). .NET says: False
IsPunctuation mismatch: " (168). .NET says: False
IsPunctuation mismatch: c (169). .NET says: False
IsPunctuation mismatch: ¦ (170). .NET says: False
IsPunctuation mismatch: ® (171). .NET says: True
IsPunctuation mismatch: ª (172). .NET says: False
IsPunctuation mismatch: - (173). .NET says: True
IsPunctuation mismatch: r (174). .NET says: False
IsPunctuation mismatch: _ (175). .NET says: False
IsPunctuation mismatch: ø (176). .NET says: False
IsPunctuation mismatch: ñ (177). .NET says: False
IsPunctuation mismatch: ý (178). .NET says: False
IsPunctuation mismatch: 3 (179). .NET says: False
IsPunctuation mismatch: ' (180). .NET says: False
IsPunctuation mismatch: æ (181). .NET says: False
IsPunctuation mismatch:  (182). .NET says: False
IsPunctuation mismatch: ú (183). .NET says: True
IsPunctuation mismatch: , (184). .NET says: False
IsPunctuation mismatch: 1 (185). .NET says: False
IsPunctuation mismatch: § (186). .NET says: False
IsPunctuation mismatch: ¯ (187). .NET says: True
IsPunctuation mismatch: ¬ (188). .NET says: False
IsPunctuation mismatch: « (189). .NET says: False
IsPunctuation mismatch: _ (190). .NET says: False
IsPunctuation mismatch: ¨ (191). .NET says: True
IsPunctuation mismatch: x (215). .NET says: False
IsPunctuation mismatch: ö (247). .NET says: False
IsPunctuation mismatch: ; (894). .NET says: True
IsPunctuation mismatch: ? (903). .NET says: True
RobinS - 06 Nov 2006 17:28 GMT
Well, technically, the ones that .Net is not marking
as punctuation are NOT punctuation. In what sentence
do you use > or = or < or << or $ as punctuation?

You might check out Char.IsWhiteSpace to take out
some of the weird control characters.

What exactly are you trying to accomplish?

Robin S.
-----------------------------------------

> Hi all -
>
[quoted text clipped - 81 lines]
> IsPunctuation mismatch: ; (894). .NET says: True
> IsPunctuation mismatch: ? (903). .NET says: True
Jeff Pek (Autodesk) - 06 Nov 2006 18:25 GMT
I agree. The issue here is that there is some existing C++ code that I'm
trying to refactor and use within a C# library. I'd like to have equivalent
functionality; this is one important aspect of accomplishing that.

I could use a C++/CLI module to ensure equivalent behavior, but I'd like to
avoid that.

Thanks for the response.

Jeff

> Well, technically, the ones that .Net is not marking
> as punctuation are NOT punctuation. In what sentence
[quoted text clipped - 93 lines]
>> IsPunctuation mismatch: ; (894). .NET says: True
>> IsPunctuation mismatch: ? (903). .NET says: True
RobinS - 07 Nov 2006 03:40 GMT
So are you just trying to clear all the junk out of a string,
or you want to know if there's junk in the string, or what?

If that's the case, you could write a function to do that, and
just call it. Would that work?

Robin S.

>I agree. The issue here is that there is some existing C++ code that I'm
>trying to refactor and use within a C# library. I'd like to have equivalent
[quoted text clipped - 105 lines]
>>> IsPunctuation mismatch: ; (894). .NET says: True
>>> IsPunctuation mismatch: ? (903). .NET says: True
Ben Voigt - 07 Nov 2006 21:05 GMT
>I agree. The issue here is that there is some existing C++ code that I'm
>trying to refactor and use within a C# library. I'd like to have equivalent
>functionality; this is one important aspect of accomplishing that.
>
> I could use a C++/CLI module to ensure equivalent behavior, but I'd like
> to avoid that.

I think you could just p/invoke _iswpunct from MSVCRT80.DLL, if there's a
function definition and not just a macro.

> Thanks for the response.
>
[quoted text clipped - 98 lines]
>>> IsPunctuation mismatch: ; (894). .NET says: True
>>> IsPunctuation mismatch: ? (903). .NET says: True
Chris Mullins - 08 Nov 2006 01:23 GMT
Once you hit Unicode land, I think determining punctuation is difficult.
There is a good answer though:
Stringprep - http://www.ietf.org/rfc/rfc3454.txt

Stringprep addresses case folding, whitespace, prohibited characters,
bidirectional validity, and normalization form.

An example profile is nameprep, which is how Internationalized Domain Names
work:
http://tools.ietf.org/html/rfc3491

Another example profile is "resourceprep" which is part of the XMPP
standard:
http://www.xmpp.org/internet-drafts/attic/draft-ietf-xmpp-resourceprep-03.html

For example, this profile prohibits all characters in :
Table C.1.2
Table C.2.1
Table C.2.2
Table C.3
Table C.4
Table C.5
Table C.6
Table C.7
Table C.8
Table C.9

It specifies unicode normalication form KC, and that bidirectional checking
must be performed.

--
Chris Mullins

> Hi all -
>
[quoted text clipped - 81 lines]
> IsPunctuation mismatch: ; (894). .NET says: True
> IsPunctuation mismatch: ? (903). .NET says: True
Chris Mullins - 08 Nov 2006 01:33 GMT
I should add there is an open-source C# implementation of stringprep that
part of libidn. This implementation is a bit memory hungry, and not exactly
tuned for optimal performance, but it works.

--
Chris Mullins, MCSD.NET, MCPD:Enterprise
http://www.coversant.net/blogs/cmullins

> Once you hit Unicode land, I think determining punctuation is difficult.
> There is a good answer though:
[quoted text clipped - 114 lines]
>> IsPunctuation mismatch: ; (894). .NET says: True
>> IsPunctuation mismatch: ? (903). .NET says: True
Jeff Pek (Autodesk) - 09 Nov 2006 22:34 GMT
Thanks, all. This is all good stuff. What I was trying to do was to mimic
the behavior of iswpunct (and therefore the existing code). PInvoking
iswpunct seems reasonable, provided that I know that that DLL is going to be
there.

- jp

> I should add there is an open-source C# implementation of stringprep that
> part of libidn. This implementation is a bit memory hungry, and not
[quoted text clipped - 123 lines]
>>> IsPunctuation mismatch: ; (894). .NET says: True
>>> IsPunctuation mismatch: ? (903). .NET says: True
Ben Voigt - 10 Nov 2006 16:41 GMT
> Thanks, all. This is all good stuff. What I was trying to do was to mimic
> the behavior of iswpunct (and therefore the existing code). PInvoking
> iswpunct seems reasonable, provided that I know that that DLL is going to
> be there.

MSVCRT.DLL has been distributed with recent versions of windows, and service
packs for not-so-recent versions, and it exports all the character
classification functions.

> - jp
>
[quoted text clipped - 125 lines]
>>>> IsPunctuation mismatch: ; (894). .NET says: True
>>>> IsPunctuation mismatch: ? (903). .NET says: True

Rate this thread:







Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.