cmiles - info

Life, Tech and Unimportant Minutiae

Moving Excel data into .net

Created by cmiles on 7/22/2006.

This past week I have revisited some of the different techniques for moving Excel data into a .net datatable in an effort to improve the speed/reliability of my code. In a perfect world Excel would only be a destination for data, I think there is a certain amount of futility in using Excel as a data source – but in practice Excel often seems like the only available option to collect a wide variety of data (reports to Excel seem common – end-user direct database access on the other hand…). This post does not contain any information that cannot be found in other articles/posts/blogs, but I thought an overview with links might be useful.

The most useful list I have ever seen of links relevant to this topic: http://blogs.msdn.com/pranavwagh/articles/excel_ado.aspx

The title of the post is ‘USING ADO AND ADO.NET WITH EXCEL: Resources and Known Issues’ (it also covers automation and general excel information).

Methods-Notes-Ideas

Jet/ADO.NET

There are many examples online using Jet (I think the two links below give enough examples for a good overview). Jet is fast and can query saved/closed files. If your data is very consistent (no values in a single column that Jet will confuse the type of) and you are interested in saved files (rather than information in a running instance of Excel) this may be a good choice – it certainly seems to retrieve data quite quickly.

The chance that Jet will incorrectly identify the data type of a column and introduce errors into the data (esp. in situations where the exact data is unpredictable (dynamic!)) makes me reluctant to use this solution. Also, this style solution may be a poor choice for the end user if you want to import what they see on screen (the file on-screen must be saved before changes can be read, it does not seem like a good user-interface choice to me (and automation of saving the file seems problematic/dangerous)). These links have good information: http://blog.lab49.com/?p=196 http://support.microsoft.com/?scid=kb;en-us;316934&spid=1249&sid=global

Range.Value/Arrays

Excel can return a range of values as a 2-dimensional Object array. This method is fairly fast (although maybe not seem quite as fast as Jet), and the values are far more predictable (in my opinion). This solution is interesting when you want to extract the information a user is seeing on-screen. For the most part this solution is direct and simple (although beware confusing differences formatting can cause between .value and displayed value). I like this solution and have been using it frequently. For small sets of cells it may be just as effective to loop through each cell in the range and retrieve the range.value – this is even slower (and less reliable see: http://support.microsoft.com/kb/216400/) – but may allow you to bypass the Object array and go directly into your DataTable or list. This link has code examples: http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q302094

Taking Information from the Clipboard

M.B. mentioned this solution to me before I saw an online code example. This is a simple idea, copy the data to the clipboard in Excel and then pull it off the clipboard in .net. This is an interesting solution, the .net application does not need an Excel reference (no version conflicts and pias!) and you could write your program to take information from a wide variety of programs. I like the user-interface options - most users are comfortable with copy/paste . The link below will get you started:

http://www.codeguru.com/vb/controls/vbnet_controls/datagridcontrol/article.php/c6393/

The problems I have had with this technique are related to the choice of formats available on the clipboard. Csv seems to be the best choice (other opinions?), which means that commas in your data will be a problem. Replacing commas in Excel with a substitute string (and then reversing the replace in .net) is a possibility, but it detracts from the simplicity of the solution. I think that cell formats can be more problematic here than in the range.value methods (maybe my prejudice from seeing monetary values so often – the copy to clipboard will can pick up the dollar sign, very likely not what you want – although likely a pretty safe string manipulation to remove…)

Here is the code that I use to go from an excel range (tableRange in the example – which is declared at the class level in my code) to a streamreader (for use in .net) via the clipboard.


  Public Function WftToStreamreaderViaClipboardCsv( _
    ByVal commaSubstituteString As String) _
    As StreamReader
 
    'Because the .Find inherits the current settings I set everything each time.
    'Later I will transform the stream with:
    '
    '  Dim commaSepChar As Char = ","c
    '  splitArray = rowFromStream.Split(commaSepChar)
    '
    '  For loopElements As Integer = 0 To ColumnCount - 1
    '    splitArray(loopElements) = _
    '      splitArray(loopElements).Replace(commaSubstituteString, ",")
    '  Next
    '
    'Because of the coding choice I throw an exception if the range contains the
    'comma substitution string even if commas are not present in the data.
 
    Dim foundPreExistingSubstitutionString As Excel.Range = _
      _tableRange.Find(What:=commaSubstituteString, _
        After:=_tableRange.Cells(1, 1), _
        LookIn:=Excel.XlFindLookIn.xlValues, _
        LookAt:=Excel.XlLookAt.xlPart, _
        SearchOrder:=Excel.XlSearchOrder.xlByRows, _
        SearchDirection:=Excel.XlSearchDirection.xlNext, _
        MatchCase:=False)
 
    If foundPreExistingSubstitutionString IsNot Nothing Then
      Throw New ArgumentException("The Comma Substitution String" _
        & "is found in the original text and are not valid.")
      Return Nothing
    End If
 
    Dim rangeWithCommas As Excel.Range = _tableRange.Find(What:=",", _
        After:=_tableRange.Cells(1, 1), _
        LookIn:=Excel.XlFindLookIn.xlValues, _
        LookAt:=Excel.XlLookAt.xlPart, _
        SearchOrder:=Excel.XlSearchOrder.xlByRows, _
        SearchDirection:=Excel.XlSearchDirection.xlNext, _
        MatchCase:=False)
 
    Dim reverseCommaSubstitutionNeeded As Boolean = False
 
    Do While rangeWithCommas IsNot Nothing
      'Although the find should be limited to finding xlValues this is
      'a x2 that a formula is not being modified. The Try...Catch block
      'is cautious but better than false results.
 
      If CType(rangeWithCommas.HasFormula, Boolean) = False Then
        Dim stringToModify As String = ""
        Try
          stringToModify = CType(rangeWithCommas.Value, String)
        Catch ex As Exception
          Throw
        End Try
 
        rangeWithCommas.Value = stringToModify.Replace(",", _
          commaSubstituteString)
        reverseCommaSubstitutionNeeded = True
      End If
 
      rangeWithCommas = _tableRange.Find(What:=",", After:=rangeWithCommas, _
        LookIn:=Excel.XlFindLookIn.xlValues, LookAt:=Excel.XlLookAt.xlPart, _
        SearchOrder:=Excel.XlSearchOrder.xlByRows, _
        SearchDirection:=Excel.XlSearchDirection.xlNext, _
        MatchCase:=False)
 
    Loop
 
    _tableRange.Copy()
 
    If Clipboard.ContainsData(DataFormats.CommaSeparatedValue) = False Then
      Return Nothing
    End If
 
    Dim streamFromClipboard As New StreamReader _
        (CType(Clipboard.GetData(DataFormats.CommaSeparatedValue), Stream))
 
    If reverseCommaSubstitutionNeeded Then
      'Returns the Excel values to their original with comma values
 
      rangeWithCommas = _tableRange.Find(_tableRange.Find( _
        What:=commaSubstituteString, _
        After:=_tableRange.Cells(1, 1), _
        LookIn:=Excel.XlFindLookIn.xlValues, _
        LookAt:=Excel.XlLookAt.xlPart, _
        SearchOrder:=Excel.XlSearchOrder.xlByRows, _
        SearchDirection:=Excel.XlSearchDirection.xlNext, _
        MatchCase:=False))
 
      Do While rangeWithCommas IsNot Nothing
        If CType(rangeWithCommas.HasFormula, Boolean) = False Then
          Dim stringToModify As String = ""
          Try
            stringToModify = CType(rangeWithCommas.Value, String)
          Catch ex As Exception
            Throw
          End Try
          rangeWithCommas.Value = _
            stringToModify.Replace(commaSubstituteString, ",")
        End If
 
        rangeWithCommas = _tableRange.Find(What:=commaSubstituteString, _
        After:=rangeWithCommas, _
        LookIn:=Excel.XlFindLookIn.xlValues, _
        LookAt:=Excel.XlLookAt.xlPart, _
        SearchOrder:=Excel.XlSearchOrder.xlByRows, _
        SearchDirection:=Excel.XlSearchDirection.xlNext, _
        MatchCase:=False)
      Loop
 
    End If
 
    Return streamFromClipboard
 
  End Function

Other Options

I am not familiar with VSTO but I believe that the databinding of named ranges might be worth looking into if you/your target machines are all running Excel 2003(+) and the cost of VSTO is not an issue.

Save as – Excel has a number of save as options (such as tab-delimited) that can be useful to work with.

XML – XML is becoming more important in Office – there may be some possibilities here although I would guess solutions are going to be pretty specific to various office versions for the time being.

DTS/Sql Server – this steps out a little from .net DataTables and overlap with the Jet discussion but may be options also depending on the project.

All comments welcome,

CM


Tags:
Posts After: