The other day I was trying to read some data from and Excel worksheet using the OLEDB .NET provider. The data was something like this:

COLUMN1

Sarang-Datye

22951

3244

343556

CXYDS

When I would do a select on this column, the reader would return a null value. After a lot of playing around with the data, different OLEDB .NET options and searching on windows live, here was what I arrived at:

Observation: The data in the column is text and integers. So we can call this mixed data however this needs to be interpreted as "text".

Discovery 1: The OLEDB .NET provider dynamically tries to determine the data type of the column by looking at the first few rows of the excel sheet. The moment it finds that there is mixed data, it doesn't understand what to do and hence, totally ignores that column altogether during the read operation.

Discovery 2: The connection string for the OLEDB .NET provider has a property called "Extended Properties" where additional properties can be set to achieve granular control on the connection and operations on that connection. One of the extended properties is "IMEX" which stands for "Intermixed" or represents the property called "Import Mixed Types". This needs to be set/enabled for the OLEDB .NET provider to know that, the excel sheet will have columns with mixed data types.

Discovery 3: The question after that was, how does OLEDB .NET provider know that mixed types have to be read as "text"? How does the OLEDB .NET provider know how many rows in a column to look at before assuming a particular data type for that column? This led to another discovery. The OLEDB .NET provider has a set of pre-configured settings in the windows registry (HKLM\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel\) where 2 important following things can be configured apart from other settings:

A] ImportMixedTypes - I talked about this earlier. This is where the default setting is picked up from, when OLE DB .NET provider finds mixed types in a column.

B] TypeGuessRows: This setting indicated to the OLEDB.NET provider the number of rows to read to determine the column type. Default is 8.

Here's a snap-shot of the registry for reference:

 reg1

That's about it for the day. I can spend the rest of the day happily now :D!