This past week I have revisited some of the different techniques for moving Excel data into a .net datatable in an effort to improve the speed/reliability of my code. In a perfect world Excel would only be a destination for data, I think there is a certain amount of futility in using Excel as a data source – but in practice Excel often seems like the only available option to collect a wide variety of data (reports to Excel seem common – end-user direct database access on the other hand…). This post does not contain any information that cannot be found in other articles/posts/blogs, but I thought an overview with links might be useful.
The most useful list I have ever seen of links relevant to this topic: http://blogs.msdn.com/pranavwagh/articles/excel_ado.aspx
The title of the post is ‘USING ADO AND ADO.NET WITH EXCEL: Resources and Known Issues’ (it also covers automation and general excel information).
Methods-Notes-Ideas
Jet/ADO.NET
There are many examples online using Jet (I think the two links below give enough examples for a good overview). Jet is fast and can query saved/closed files. If your data is very consistent (no values in a single column that Jet will confuse the type of) and you are interested in saved files (rather than information in a running instance of Excel) this may be a good choice – it certainly seems to retrieve data quite quickly.
The chance that Jet will incorrectly identify the data type of a column and introduce errors into the data (esp. in situations where the exact data is unpredictable (dynamic!)) makes me reluctant to use this solution. Also, this style solution may be a poor choice for the end user if you want to import what they see on screen (the file on-screen must be saved before changes can be read, it does not seem like a good user-interface choice to me (and automation of saving the file seems problematic/dangerous)). These links have good information: http://blog.lab49.com/?p=196 http://support.microsoft.com/?scid=kb;en-us;316934&spid=1249&sid=global
Range.Value/Arrays
Excel can return a range of values as a 2-dimensional Object array. This method is fairly fast (although maybe not seem quite as fast as Jet), and the values are far more predictable (in my opinion). This solution is interesting when you want to extract the information a user is seeing on-screen. For the most part this solution is direct and simple (although beware confusing differences formatting can cause between .value and displayed value). I like this solution and have been using it frequently. For small sets of cells it may be just as effective to loop through each cell in the range and retrieve the range.value – this is even slower (and less reliable see: http://support.microsoft.com/kb/216400/) – but may allow you to bypass the Object array and go directly into your DataTable or list. This link has code examples: http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q302094
Taking Information from the Clipboard
M.B. mentioned this solution to me before I saw an online code example. This is a simple idea, copy the data to the clipboard in Excel and then pull it off the clipboard in .net. This is an interesting solution, the .net application does not need an Excel reference (no version conflicts and pias!) and you could write your program to take information from a wide variety of programs. I like the user-interface options - most users are comfortable with copy/paste . The link below will get you started:
http://www.codeguru.com/vb/controls/vbnet_controls/datagridcontrol/article.php/c6393/
The problems I have had with this technique are related to the choice of formats available on the clipboard. Csv seems to be the best choice (other opinions?), which means that commas in your data will be a problem. Replacing commas in Excel with a substitute string (and then reversing the replace in .net) is a possibility, but it detracts from the simplicity of the solution. I think that cell formats can be more problematic here than in the range.value methods (maybe my prejudice from seeing monetary values so often – the copy to clipboard will can pick up the dollar sign, very likely not what you want – although likely a pretty safe string manipulation to remove…)
Here is the code that I use to go from an excel range (tableRange in the example – which is declared at the class level in my code) to a streamreader (for use in .net) via the clipboard.
Public Function WftToStreamreaderViaClipboardCsv( _
ByVal commaSubstituteString As String) _
As StreamReader
'Because the .Find inherits the current settings I set everything each time.
'Later I will transform the stream with:
'
' Dim commaSepChar As Char = ","c
' splitArray = rowFromStream.Split(commaSepChar)
'
' For loopElements As Integer = 0 To ColumnCount - 1
' splitArray(loopElements) = _
' splitArray(loopElements).Replace(commaSubstituteString, ",")
' Next
'
'Because of the coding choice I throw an exception if the range contains the
'comma substitution string even if commas are not present in the data.
Dim foundPreExistingSubstitutionString As Excel.Range = _
_tableRange.Find(What:=commaSubstituteString, _
After:=_tableRange.Cells(1, 1), _
LookIn:=Excel.XlFindLookIn.xlValues, _
LookAt:=Excel.XlLookAt.xlPart, _
SearchOrder:=Excel.XlSearchOrder.xlByRows, _
SearchDirection:=Excel.XlSearchDirection.xlNext, _
MatchCase:=False)
If foundPreExistingSubstitutionString IsNot Nothing Then
Throw New ArgumentException("The Comma Substitution String" _
& "is found in the original text and are not valid.")
Return Nothing
End If
Dim rangeWithCommas As Excel.Range = _tableRange.Find(What:=",", _
After:=_tableRange.Cells(1, 1), _
LookIn:=Excel.XlFindLookIn.xlValues, _
LookAt:=Excel.XlLookAt.xlPart, _
SearchOrder:=Excel.XlSearchOrder.xlByRows, _
SearchDirection:=Excel.XlSearchDirection.xlNext, _
MatchCase:=False)
Dim reverseCommaSubstitutionNeeded As Boolean = False
Do While rangeWithCommas IsNot Nothing
'Although the find should be limited to finding xlValues this is
'a x2 that a formula is not being modified. The Try...Catch block
'is cautious but better than false results.
If CType(rangeWithCommas.HasFormula, Boolean) = False Then
Dim stringToModify As String = ""
Try
stringToModify = CType(rangeWithCommas.Value, String)
Catch ex As Exception
Throw
End Try
rangeWithCommas.Value = stringToModify.Replace(",", _
commaSubstituteString)
reverseCommaSubstitutionNeeded = True
End If
rangeWithCommas = _tableRange.Find(What:=",", After:=rangeWithCommas, _
LookIn:=Excel.XlFindLookIn.xlValues, LookAt:=Excel.XlLookAt.xlPart, _
SearchOrder:=Excel.XlSearchOrder.xlByRows, _
SearchDirection:=Excel.XlSearchDirection.xlNext, _
MatchCase:=False)
Loop
_tableRange.Copy()
If Clipboard.ContainsData(DataFormats.CommaSeparatedValue) = False Then
Return Nothing
End If
Dim streamFromClipboard As New StreamReader _
(CType(Clipboard.GetData(DataFormats.CommaSeparatedValue), Stream))
If reverseCommaSubstitutionNeeded Then
'Returns the Excel values to their original with comma values
rangeWithCommas = _tableRange.Find(_tableRange.Find( _
What:=commaSubstituteString, _
After:=_tableRange.Cells(1, 1), _
LookIn:=Excel.XlFindLookIn.xlValues, _
LookAt:=Excel.XlLookAt.xlPart, _
SearchOrder:=Excel.XlSearchOrder.xlByRows, _
SearchDirection:=Excel.XlSearchDirection.xlNext, _
MatchCase:=False))
Do While rangeWithCommas IsNot Nothing
If CType(rangeWithCommas.HasFormula, Boolean) = False Then
Dim stringToModify As String = ""
Try
stringToModify = CType(rangeWithCommas.Value, String)
Catch ex As Exception
Throw
End Try
rangeWithCommas.Value = _
stringToModify.Replace(commaSubstituteString, ",")
End If
rangeWithCommas = _tableRange.Find(What:=commaSubstituteString, _
After:=rangeWithCommas, _
LookIn:=Excel.XlFindLookIn.xlValues, _
LookAt:=Excel.XlLookAt.xlPart, _
SearchOrder:=Excel.XlSearchOrder.xlByRows, _
SearchDirection:=Excel.XlSearchDirection.xlNext, _
MatchCase:=False)
Loop
End If
Return streamFromClipboard
End Function
Other Options
I am not familiar with VSTO but I believe that the databinding of named ranges might be worth looking into if you/your target machines are all running Excel 2003(+) and the cost of VSTO is not an issue.
Save as – Excel has a number of save as options (such as tab-delimited) that can be useful to work with.
XML – XML is becoming more important in Office – there may be some possibilities here although I would guess solutions are going to be pretty specific to various office versions for the time being.
DTS/Sql Server – this steps out a little from .net DataTables and overlap with the Jet discussion but may be options also depending on the project.
All comments welcome,
CM