Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / XML / March 2008

Tip: Looking for answers? Try searching our database.

any way to parse unstructured data with XML? example included

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
rajc_144@msn.com - 11 Mar 2008 00:58 GMT
i am doing some research where i need to parse some data from SEC web site.
the data is not in xml format and sort of unstructured.
can someone recommand me a way to parse this data.

i need to gather a lot of filings of the sort which i would rather not do
manually.
how can i programattically parse these sort of text files?

http://www.sec.gov/Archives/edgar/data/1074272/0001074272-08-000001.txt

http://www.sec.gov/Archives/edgar/data/1428793/000121465908000555/0001214659-08-
000555.txt


http://www.sec.gov/Archives/edgar/data/791191/0000791191-08-000001.txt

thank you,
Martin Honnen - 11 Mar 2008 13:33 GMT
> i am doing some research where i need to parse some data from SEC web site.
> the data is not in xml format and sort of unstructured.
[quoted text clipped - 5 lines]
>
> http://www.sec.gov/Archives/edgar/data/1074272/0001074272-08-000001.txt

As that document seems to be a mixture of XML and plain text I would
consider a mixed approach, use an XML parser to parse the XML, then
regular expression based text parsing.
XSLT 1.0 is certainly not a language that is suitable for that task. If
you use XSLT 2.0 however then you have support for regular expressions.
There are currently three XSLT processors, Saxon 9 has a Java and a .NET
version (http://saxon.sourceforge.net/), AltovaXML is a COM solution
(http://www.altova.com/altovaxml.html) and Gestalt is an Eiffel
implementation (http://gestalt.sourceforge.net/).

If you want to do it with tools available in the .NET framework class
library then combine XmlReader or XPathDocument/XPathNavigator with the
regular expression support in the .NET framework.

Signature

    Martin Honnen --- MVP XML
    http://JavaScript.FAQTs.com/

raj@aol.com - 12 Mar 2008 01:11 GMT
thank you very much!

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.