Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / New Users / August 2005

Tip: Looking for answers? Try searching our database.

Extracting words from a sentence.

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
David - 23 Aug 2005 22:28 GMT
Given a string that represents a sentence, is there a quick way of extracting
the words that make up the sentence?  For example, given a string "See Dick
run."  it would return an array of strings {"See","Dick","run"}  One catch is
that it has to be locale sensitive.  

I thought about just removing all the punctuation, but that leaves problems
like the word "don't".  I don't want to end up with two words, "don" and "t".
Bill Priess (MCP) - 23 Aug 2005 22:47 GMT
Hi David,

Use Regular expressions:

Regex re = new Regex(@"\b\w*\b");
MatchCollection mc = re.Matches("See Dick Run");
foreach(Match m in mc)
{
   Console.WriteLine(m.Value);
}

(I didn't test the code, but it *should* be mostly correct.

HTH,
Bill Priess MCP

> Given a string that represents a sentence, is there a quick way of
> extracting
[quoted text clipped - 8 lines]
> like the word "don't".  I don't want to end up with two words, "don" and
> "t".
Nick Hertl - 23 Aug 2005 23:06 GMT
You could also use System.String.Split() passing the ' ' char as the
argument.
so something like this:
string s = "See Dick Run";
string[] a = s.Split(' ');
David - 24 Aug 2005 14:05 GMT
Thanks for the help, Bill and Nick.

I tried split(), but it had some issues.  In particular, given "See Dick
run." the third word that it returned would be "run.", where what I wanted
was "run".  (The period isn't part of the word.)  

The function I'm looking for would actually have to be quite sophisticated,
because the application allows the user to select a language, and so the
function would have to work with whatever rules were appropriate for the
culture of the string.  I can come up with the correct rules for English and
French, but after that, I get a bit lost.  

I was hoping that maybe the wizards who did all this .Net Framework
globalization stuff had already tackled the problem.  It was a long shot, but
sometimes I am amazed at what is buried in there, so I was hoping.

I think I'll have to use the regular expression route, which I hadn't
thought of before, and count on user feedback to improve the situation for
languages that don't seem to work.

> Given a string that represents a sentence, is there a quick way of extracting
> the words that make up the sentence?  For example, given a string "See Dick
[quoted text clipped - 3 lines]
> I thought about just removing all the punctuation, but that leaves problems
> like the word "don't".  I don't want to end up with two words, "don" and "t".

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.