If you are preparing to do some serious analysis of your web server logs, it makes sense to transfer the logs to a database, so that you can easily extract information by running SQL queries. This article shows how to easily import IIS logs into MS SQL Server.
IIS logs each web server hit as a separate line in a log file. Information is logged in W3C Extended Log format, but the individual fields that are logged is configurable.
At the top of your log file will be a section that specifies the fields which are being logged:
#Software: Microsoft Internet Information Services 6.0
#Date: 2008-09-11 14:00:00
#Fields: date time s-sitename s-ip cs-method cs-uri-stem cs-uri-query cs-username c-ip cs(User-Agent) cs(Cookie) cs(Referer) sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken
First, let’s create a database to store our logs.
1 2 3 4 5 6 7 8 9
-- Create a database called "WebLogs" IF EXISTS (SELECT * FROM master..sysdatabases WHERE name = N'WebLogs') DROP DATABASE WebLogs GO CREATE DATABASE WebLogs GO
Now let’s create a table with columns that match every field of the log file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
-- Create a table for logs. USE WebLogs; CREATE TABLE [JDSWEB01] ( [DATE] [DATE] NULL, [TIME] [TIME] NULL, [s-sitename] [VARCHAR] (32) NULL, [s-ip] [VARCHAR] (16) NULL, [cs-method] [VARCHAR] (8) NULL, [cs-uri-stem] [VARCHAR] (255) NULL, [cs-uri-query] [VARCHAR] (2048) NULL, [cs-username] [VARCHAR] (16) NULL, [c-ip] [VARCHAR] (16) NULL, [cs(User-Agent)] [VARCHAR] (1024) NULL, [cs(Cookie)] [VARCHAR] (2048) NULL, [cs(Referer)] [VARCHAR] (4096) NULL, [sc-STATUS] [INT] NULL, [sc-substatus] [INT] NULL, [sc-win32-STATUS] [INT] NULL, [sc-bytes] [BIGINT] NULL, [cs-bytes] [BIGINT] NULL, [time-taken] [INT] NULL )
SQL Server has a neat feature where you can do a bulk import of data from a character-delimited text file (like a CSV file). The fields in the log file are separated with a single space character, so the import query becomes:
1 2 3 4 5 6 7
-- Import data from web server log into table USE WebLogs; BULK INSERT [JDSWEB01] FROM 'c:\temp\ex080912.log' WITH ( FIELDTERMINATOR = ' ', ROWTERMINATOR = '\n' )
Now that you have imported all of your log data into SQL Server, you should read my Tech Tip on extracting useful information from it with SQL queries.
As a bonus for reading this far, here are some useful tips:
- Web server logs quickly become quite large. A popular website will generate gigabytes of logs per day.
- Fortunately, these log files contain a lot of redundant information, so they compress really well. 5 GB of log files may compress to 200 MB (a 25:1 ratio).
- Don’t ever open a large log file in Notepad. Notepad is written in an inefficient way that will cause it to become unresponsive and use an entire CPU while the file is opening (which takes a long time with Notepad). Use a good text editor like TextPad instead.
- The time required for a BULK INSERT varies linearly with the number of lines in the log file. Expect a ballpark figure of 1 minute per gigabyte of log file.
- Remove the headers at the top of the log file to avoid errors like: Bulk insert data conversion error (type mismatch) for row 1, column 1 (date).
- The headers appear every time the web server is restarted, or when a new log file is created, so the headers may appear in the middle of a log file too.
- You may need to tweak the size of the VARCHARS for some fields if you get an error message like: Bulk insert data conversion error (truncation) for row 8394, column 10 (cs(User-Agent)).
- Don’t make your VARCHARs too big because SQL Server has a limitation of 0860 bytes. If the total size of all your columns exceeds this, you will have problems: INSERT or UPDATE of a row in this table will fail if the resulting row length exceeds 8060 bytes.
- If you cannot get rid of truncation errors entirely (and still stay within the 8060 byte limit for a row), you can add MAXERRORS=somebignumber to the WITH clause of the BULK INSERT.
- Double check that your BULK INSERT worked correctly by counting the number of records in the table, and comparing with the number of lines in the log file.
- Use a datatype bigger than an int for any field that you will SUM() (such as [sc-bytes]) to avoid overflow errors like “Arithmetic overflow error converting expression to data type int.”
- Don’t bother using Log2SQL (a commercial software product), it tries to make all fields 255 characters (obvious truncation problems), and it is completely unnecessary because of SQL Server’s BULK INSERT feature.
- Queries may take a long time if there is lots of data in the database. You can speed things up considerably if you remote any log entries for static content (*.gif, *.js etc). Note that this will make your hits per second and sent/received bytes incorrect, but (for me anyway) my key metric is pages rather than raw hits.