.NET Forum / .NET Framework / ADO.NET / February 2008
Incredibly slow writing by CSV driver
|
|
Thread rating:  |
Norman Diamond - 08 Feb 2008 06:05 GMT Since my application uses the Excel driver to read some files, I used the same Excel driver to write some CSV files. I also tried using the Text driver to write CSV files. The results are correct but they take enormous amounts of CPU time.
Of course I know how to write a CSV file using plain old Shift-JIS character strings and quotation marks and commas, and probably I'll do that in order to speed up this operation, but the question still remains.
Why is this so slow? The sample data table had 213 rows (header plus 212 data rows) and 25 columns, all strings (some of them empty strings). The total file size on disk is 34KB. The computations and database operations in memory take a few milliseconds, not even noticeable when running under a debugger. But the call to dataAdapter.Update(dataTable); takes 55 SECONDS OF CPU TIME on a Pentium 4 running at 3 GHz. It occupies 100% of one CPU core for 55 seconds.
Actual time to write the file might be a few hundred milliseconds since 34KB occupies several NTFS structures. Anyway, this thing isn't disk bound, and it's not CPU bound in my code, it's CPU bound in the Update method.
What is going on here?
#undef UseOdbc // Use OleDb string fileName = @"C:\test.csv"; // (not really) string Headers[] = new string[25] { "1", "2", "3", /* ... */ "25" }; int columnCount = 25; // (not really) FileInfo fileInfo = new System.IO.FileInfo(fileName); string dirName = fileInfo.DirectoryName; string tableName = fileInfo.Name; #if UseOdbc OdbcConnection connection = new OdbcConnection( "Provider=MSDASQL;" + "DRIVER={Microsoft Text Driver (*.txt; *.csv)};DBQ=" + dirName + ";Extended Properties='Text;Extensions=asc,csv,tab,txt;" + "HDR=Yes;FMT=Delimited'"); connection.Open(); OdbcDataAdapter dataAdapter = new OdbcDataAdapter( "SELECT * FROM [" + tableName + "]", connection); OdbcCommand insertCmd = new OdbcCommand(); OdbcType varcharType = OdbcType.VarChar; #else // OleDb OleDbConnection connection = new OleDbConnection( "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + dirName + ";Extended Properties='Text;Extensions=asc,csv,tab,txt;" + "HDR=Yes;FMT=Delimited'"); connection.Open(); OleDbDataAdapter dataAdapter = new OleDbDataAdapter( "SELECT * FROM [" + tableName + "]", connection); OleDbCommand insertCmd = new OleDbCommand(); OleDbType varcharType = OleDbType.VarChar; #endif // OleDb DataTable dataTable = new DataTable(tableName); DataColumnCollection dataColumns = dataTable.Columns; int columnNum; string dataColumnName; DataColumn dataColumn; StringBuilder insertCmdHead = new StringBuilder( "INSERT INTO [" + tableName + "] ("); StringBuilder insertCmdTail = new StringBuilder("VALUES ("); StringBuilder fileHeader = new StringBuilder(); for (columnNum = 0; columnNum < columnCount - 1; columnNum++) { dataColumnName = Headers[columnNum]; dataColumnName = dataColumnName.Replace('\n', ' '); dataColumn = dataColumns.Add(dataColumnName); dataColumn.DataType = typeof(string); dataColumn.DefaultValue = ""; insertCmdHead.Append("[" + dataColumnName + "], "); insertCmdTail.Append("?, "); insertCmd.Parameters.Add(dataColumnName, varcharType, 255, dataColumnName); fileHeader.Append("\"" + dataColumnName + "\","); } dataColumnName = Headers[columnCount - 1]; dataColumn = dataColumns.Add(dataColumnName); dataColumn.DataType = typeof(string); dataColumn.DefaultValue = ""; insertCmdHead.Append("[" + dataColumnName + "]) "); insertCmdTail.Append("?)"); insertCmd.Parameters.Add(dataColumnName, varcharType, 255, dataColumnName); insertCmd.CommandText = insertCmdHead.ToString() + insertCmdTail.ToString(); insertCmd.Connection = connection; dataAdapter.InsertCommand = insertCmd; fileHeader.Append("\"" + dataColumnName + "\""); for (int rowNum = 0; rowNum < 212; rowNum++) // (not really) { DataRow dataRow = dataTable.NewRow(); dataRow[2] = "hi i'm 2"; // (not really) dataRow[18] = "18"; // (not really) dataRow[19] = "19"; // (not really) // (around half the cells default to empty strings) dataTable.Rows.Add(dataRow); } StreamWriter fileWriter = new StreamWriter(fileInfo.Create(), Encoding.Default); fileWriter.WriteLine(fileHeader.ToString()); fileWriter.Close(); fileWriter.Dispose(); // UP TO THIS POINT TAKES A FEW MILLISECONDS, OK // // 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES dataAdapter.Update(dataTable); // 55 SECONDS TO WRITE 34 KILOBYTES // 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES // // OK AFTER THIS dataAdapter.Dispose(); dataTable.Dispose(); connection.Close();
Steven Cheng[MSFT] - 08 Feb 2008 10:30 GMT Hi Norman,
From your description, you're encountering high CPU issue when loading a csv file via the OLEDB provider, correct?
As for the high cpu behavior when loading such data file, I think the following things maybe the potential cause:
** the number of records in the csv file ** the data content that is contained in each record.
As for number of record, I think you can try reduce the number of record and columns and test again. As for the content, sometimes the provider may run into poor performance when some particular data(characters) in the file cause the provider spend much time parsing it. Therefore, you can also try test by isolating the data in the csv files(check whether there are some records contains particular characters ).
Sincerely,
Steven Cheng
Microsoft MSDN Online Support Lead
==================================================
Get notification to my posts through email? Please refer to http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif ications.
Note: The MSDN Managed Newsgroup support offering is for non-urgent issues where an initial response from the community or a Microsoft Support Engineer within 1 business day is acceptable. Please note that each follow up response may take approximately 2 business days as the support professional working with you may need further investigation to reach the most efficient resolution. The offering is not appropriate for situations that require urgent, real-time or phone-based interactions or complex project analysis and dump analysis issues. Issues of this nature are best handled working with a dedicated Microsoft Support Engineer by contacting Microsoft Customer Support Services (CSS) at http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.--------------------
>From: "Norman Diamond" <ndiamond@newsgroup.nospam> >Subject: Incredibly slow writing by CSV driver >Date: Fri, 8 Feb 2008 15:05:09 +0900
>Since my application uses the Excel driver to read some files, I used the >same Excel driver to write some CSV files. I also tried using the Text [quoted text clipped - 108 lines] > dataTable.Dispose(); > connection.Close(); Norman Diamond - 08 Feb 2008 10:55 GMT The high CPU issue is when WRITING a CSV file via either the OleDb or Odbc driver.
READING is no problem, for these particular CSV files.
I already told you the number of records in the CSV file: 213, consisting of a header row and 212 data rows. Each row contains 25 columns, about half of which are text and half are null, and the nulls convert to "" because all of the column definitions are set to be text. The total file size is around 34,000 bytes. Don't you think that 55 seconds of CPU time to format 55,000 bytes is pretty slow? Your posting includes my message where I showed all this.
I wrote C# code to write the strings to the file myself. It takes a few milliseconds now. I tested reading it back using the OleDb text driver, and it reads correctly in a few milliseconds.
> Hi Norman, > [quoted text clipped - 163 lines] >> dataTable.Dispose(); >> connection.Close(); Patrice - 08 Feb 2008 12:28 GMT I would likely try to see if some tools from http://technet.microsoft.com/fr-fr/sysinternals/default.aspx such as "filemon" would allow to track a bit what could happen behind the scene (I suspect the driver could do a bit more than what you expect such as rewriting the whole file on each insert ?)
Generally my personal preference is to handle these kind of files by myself -- Patrice
> The high CPU issue is when WRITING a CSV file via either the OleDb or Odbc > driver. [quoted text clipped - 188 lines] >>> dataTable.Dispose(); >>> connection.Close(); Norman Diamond - 12 Feb 2008 00:42 GMT Of course filemon is a very useful tool, but how would it solve the problem of 55 seconds of CPU time? The 34000 bytes of file contents get written correctly to the file. The CPU is not waiting for the disk, the disk is waiting for the CPU.
I worked around it by writing this file myself too. It takes a few milliseconds now.
>I would likely try to see if some tools from >http://technet.microsoft.com/fr-fr/sysinternals/default.aspx such as [quoted text clipped - 206 lines] >>>> dataTable.Dispose(); >>>> connection.Close(); Steven Cheng[MSFT] - 12 Feb 2008 05:14 GMT Hi Norman,
For such high cpu issue, it is likely to be issue specific. I've checked some former issue and there is no definite problem of the provider and most of such issues are due to the data that is read or something related to the running environment. To troubleshoot on such issue and get the root cause, it may require much more work like dump analysis which is complex and time costing.
If this problem is urgent and must be fixed in short time, I suggest you contact product support service for further assistance: http://msdn.microsoft.com/subscriptions/support/default.aspx.
Sincerely,
Steven Cheng
Microsoft MSDN Online Support Lead
This posting is provided "AS IS" with no warranties, and confers no rights. --------------------
>From: "Norman Diamond" <ndiamond@newsgroup.nospam> >References: <ujmcTihaIHA.5128@TK2MSFTNGP05.phx.gbl> <0wyGz2jaIHA.360@TK2MSFTNGHUB02.phx.gbl> <#wr6XEkaIHA.5976@TK2MSFTNGP05.phx.gbl> <#e3kr4kaIHA.4880@TK2MSFTNGP03.phx.gbl>
>Subject: Re: Incredibly slow writing by CSV driver >Date: Tue, 12 Feb 2008 09:42:48 +0900
>Of course filemon is a very useful tool, but how would it solve the problem >of 55 seconds of CPU time? The 34000 bytes of file contents get written [quoted text clipped - 62 lines] >>>> >>>> Get notification to my posts through email? Please refer to http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
>>>> ications. >>>> [quoted text clipped - 147 lines] >>>>> dataTable.Dispose(); >>>>> connection.Close(); Norman Diamond - 12 Feb 2008 08:07 GMT Mr. Cheng, again, data are being WRITTEN not read. Reading is OK and only takes a few milliseconds. WRITING takes an enormously long time.
I can't imagine what needs dumping. My first posting in this thread gave sample code.
As mentioned, I worked around it the same way as Patrice, by writing my own code in C# (or other language as might be used in any project) instead of using the text driver, to WRITE CSV files.
> Hi Norman, > [quoted text clipped - 262 lines] >>>>>> dataTable.Dispose(); >>>>>> connection.Close(); Steven Cheng[MSFT] - 13 Feb 2008 03:15 GMT Hi Norman,
Yes, I saw that you got the workaround. The further troubleshooting I mentioned means the work that required to identify what is the exact things cause the poor performance when writing the data into the csv file through OLE db provider. Normally, we are not abvle to give a rapid answer for such issue.
Sincerely,
Steven Cheng
Microsoft MSDN Online Support Lead
This posting is provided "AS IS" with no warranties, and confers no rights. --------------------
>From: "Norman Diamond" <ndiamond@newsgroup.nospam> >Subject: Re: Incredibly slow writing by CSV driver >Date: Tue, 12 Feb 2008 17:07:31 +0900
>Mr. Cheng, again, data are being WRITTEN not read. Reading is OK and only >takes a few milliseconds. WRITING takes an enormously long time. [quoted text clipped - 113 lines] >>>>>> >>>>>> Get notification to my posts through email? Please refer to http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
>>>>>> ications. >>>>>> [quoted text clipped - 153 lines] >>>>>>> dataTable.Dispose(); >>>>>>> connection.Close(); Norman Diamond - 08 Feb 2008 11:04 GMT Correcting a typo (34,000 bytes not 55,000)
The high CPU issue is when WRITING a CSV file via either the OleDb or Odbc driver.
READING is no problem, for these particular CSV files.
I already told you the number of records in the CSV file: 213, consisting of a header row and 212 data rows. Each row contains 25 columns, about half of which are text and half are null, and the nulls convert to "" because all of the column definitions are set to be text. The total file size is around 34,000 bytes. Don't you think that 55 seconds of CPU time to format 34,000 bytes is pretty slow? Your posting includes my message where I showed all this.
I wrote C# code to write the strings to the file myself. It takes a few milliseconds now. I tested reading it back using the OleDb text driver, and it reads correctly in a few milliseconds.
> Hi Norman, > [quoted text clipped - 163 lines] >> dataTable.Dispose(); >> connection.Close();
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|