Sunday, December 16, 2007

OPENROWSET and not Text files but Excel files


The other day, I was wondering if you can query against a database that is on a remote server without first having to create a linked server in SQL Server 2005. A co-worker told me that I could use OPENROWSET and then I read the following page on MSDN, which gives a very good overview of what it is and how to use it:

OPENROWSET Help Page on MSDN

Most of the SQL Experts out there already knows about this functionality, but I was very excited to learn about it. I spent quite a bit of time learning about it as it seems very powerful in what it can do, I mean, you can use it to not only query databases on remote server, you could also use it with your text files, Excel spreadsheets etc. Suppose you have textual data that relates to some tables in your database and using OPENROWSET you could turn that data table and use it like any other table.

For example, just let's say you have a business and you have a database similar to Northwind and you received an Excel spreadsheet from one of your consultants that you hired to do a customer satisfaction survey for all your customers in Canada. And your consultant is one of those people, who only works with Excel to do all her tracking. Since your consultant only works with Excel, you exported few columns (CompanyName, ContactTitle, ContactName, Phone) from the Customers table to an Excel file and gave her that file.

You ran the following query to create the Excel file:

SELECT CompanyName, ContactTitle, ContactName, Phone
FROM Customers
WHERE Country = 'Canada'


You got the following rows in the Excel file:



She filled in the satisfaction on a scale of 1 to 10 for last year on quartely basis and optional comments associated with each score. She sent you back the Excel sheet with the following extra columns:




So, now you can join this data as a table in your queries or import into a temporary table using something like this:

SELECT A.* INTO #temp1 FROM OPENROWSET( 'Microsoft.Jet.OLEDB.4.0', 'Excel 8.0;Database=C:\temp\CanadaCustomersWithScores.xls','SELECT * FROM [Sheet1$]' ) AS A
WHERE A.CompanyName IS NOT NULL


Assuming the data is in Sheet1 and by saying CompanyName IS NOT NULL we will only include rows that are valid.

Now, you might ask, can I do this with a text file instead of using Excel. Theortically yes you can, at least, according to the help page listed above. However, I was not able to get it working using the samples given there plus another annoying thing about using a text file is that you need to have a format file. It seems like you would have better flexibility using an Excel file, plus you can easily edit the data in Excel and the OPENROWSET seems to work without must fuss.

Another thing to note here is that, I have used xls files not xlsx file because I could not get to link to xlsx (Excel 2007) files with this provider. My later research revealed to me that I need to use a different provider to open xlsx files, which I listed later on this post. I also wonder if I can use OPENXML on xlsx sheets, which I am going to try it out later.

Speaking of OPENXML, I am planning to explore other OPENSET functions and will post my findings in a future blog:

CONTAINSTABLE
FREETEXTTABLE
OPENDATASOURCE
OPENQUERY
OPENXML

Another thing I had a hardtime trying to find is the typical list of providers as it does not list them in the help page and few searches on the web did not turn up anything. Then later on I stumbled onto the node in Object Explorer in SQL Server Management Studio that lists all the providers that are registered on my system. To see the list on your SQL Server instance:

Expand the Server Objects node > Linked Server > Providers.

You don't have SQL Server installed, but you still would like to know all the OLE DB Providers registered on your system, you could try the following C# code:

Guid guidOleDbEnum = new Guid("{C8B522D0-5CF3-11ce-ADE5-00AA0044773D}");

Type type = Type.GetTypeFromCLSID(guidOleDbEnum);

using (OleDbDataReader rdr = OleDbEnumerator.GetEnumerator(type))
{

while (rdr.Read())

Console.WriteLine("{0} - {1}", rdr["SOURCES_NAME"], rdr["SOURCES_DESCRIPTION"]);

}


Here is the list of providers that I see on my SQL Server instance:

ADsDSOObject - OLE DB Provider for Active Directory Services
DTSPackageDSO - OLE DB Provider for DTS Package Decision Support Object
Microsoft.ACE.OLDDB.12.0 - OLE DB Provider for Microsoft Access/Excel 2007
Microsoft.Jet.OLDDB.4.0 - Microsoft Jet OLE DB Provider for ISAM Database, e.g. Access/Excel 2003, dBase III, Lotus WK3, Paradox 5.x etc.
MSDAIPP.DSO - OLE DB Provider for Microsoft Internet Publishing Provider
MSDAORA - Microsoft OLE DB Provider for Oracle
MSDAOSP - Microsoft OLE DB Simple Provider
MSDASQL - Microsoft OLE DB Provider for ODBC
MSDMine - Microsoft OLE DB Provider for Data Mining Services
MSIDXS - Microsoft OLE DB Provider for Index Server
MSOLAP - Microsoft OLE DB Provider for OLAP Services (or Analysis Services)
SQLNCLI - Microsoft SQL Native Client OLE DB provider
SQLOLEDB - Microsoft OLE DB Provider for SQL Server 2000
SQLReplication.OLEDB - Microsoft SQL Server Replication OLE DB Provider for DTS
SQLXMLOLEDB.4.0 - OLE DB provider that exposes Microsoft SQLXML functionality through ActiveX Data Objects (ADO)


I do not think we can use all of these with OPENROWSET function, I would have to investigate as to which ones can be used and which one can't be used. I kind of feel bad to leave so many questions unanswered in this post, but at the same time, it is giving me an opportunity to learn more about this area and share it with my blog readers.

I feel like I am just scratching the surface and there is a lot to learn in this area.

I would love to hear from you what you think about the OPENROWSET function and the related information and any of your experiences in using it.

Thanks for reading.