Using Analog C:Amie Edition to generate live statistics for inclusion on web pages

System Requirements:

Analog C:Amie Edition

The Problem:

While it isn’t very common these days, and is generally considered to be somewhat crass, there are occasions where you might want to display statistical information in terms of page views or hit counts directly on a web page – for example you may have an administrative view for your website, be running a competition and need to track hits on specific files or just want to tell the world how popular you are.

This article outlines how to use Analog C:Amie Edition’s XML mode to do this.

More Info

By default most users will either generate the HTML output for display or Computer output for piping into ReportMagic when configuring Analog. There are however other options for the structure of the data export as originally implemented by Stephen Turner. One of which allows the export of statistics into XML which you can use to parse for live statistical data and ultimately include on a web page.

How-to

The how-to is split into the following sections:

What is not covered here
Pre-requisites
File System Considerations
Prepare Analog
Configure the Analog Server Settings
Running Analog on a Schedule
Displaying the Page Statistics
Final Considerations

What is not covered here

This guide is specifically written to demonstrate the configuration of Analog and associated web scripts running under IIS on Windows when used in conjunction with ASP 3. The process is equally possible in PHP, JSP or Ruby as well as .net however only ASP 3 VBScript examples are given.

Pre-requisites

Before you begin, you will need to be running Analog C:Amie Edition 6.04 or higher as there is a bug in the XML DTD of lower versions of both Analog and Analog C:Amie edition.

File System Considerations

You need to consider where you place the XML output from Analog. While placing it in the public web root is entirely possible, this will make it potentially available to anyone who happens to become aware of its existence. Visibility of its existence is highly possible when debugging or in the event that there is an error on the part of the parser, file system or file system permissions.

It is therefore suggested that you keep the output XML file outside of the publicly accessible web root. A good candidate for this would be in the same location as your analog.cfg file.

For the rest of this example the following structures will be assumed

Public Web Root	D:\sites\www.domain.com\web\
Log Files	D:\sites\www.domain.com\logs\W3SVC1\
Configuration (.cfg) and Output Destination (.xml)	D:\sites\www.domain.com\
Analog Executable	D:\parser\Analog\Analog.exe

Configure the Analog Server Settings

In most cases, you will probably want to maintain your existing statistics output in HTML format for casual browsing. Therefore it is necessary to create a second Analog configuration file which will generate the required output.

The configuration file for Analog’s XML process can be less complex compared to that of the main one, although some customisation to improve parser time and minimise the size of the XML output should be considered essential.

Assuming that an existing Analog .cfg file exists at D:\sites\www.domain.com\logs\analog.cfg we will create a new .cfg file at D:\sites\www.domain.com\logs\analog-xml.cfg.

The example .cfg file shown below outlines an example XML output configuration for our example with paths highlighted for completeness.

# Analog C:Amie Edition XML Statistics Configuration
# Version 1.0.2
# See http://www.c-amie.co.uk/ for updates and moreLOGFILE d:\sites\www.domain.com\logs\w3svc1\*.logOUTPUT XML
OUTFILE d:\sites\www.domain.com\analog.xml
HOSTNAME "Analog Test Site"
HOSTURL http://www.domain.com/
IMAGEDIR "images/"

# Reports Enabled/Disabled List
GENERAL OFF #General Summary
YEARLY OFF #Yearly Report
QUARTERLY OFF #Quarterly Report
MONTHLY OFF #Monthly Report
WEEKLY OFF #Weekly Report
DAILYREP OFF #Daily Report
DAILYSUM OFF #Daily Summary
HOURLYREP OFF #Hourly Report
HOURLYSUM OFF #Hourly Summary
WEEKHOUR OFF #Hour of the Week Summary
QUARTERREP OFF #Quarter-Hour Report
QUARTERSUM OFF #Quarter-Hour Summary
FIVEREP OFF #Five-Minute Report
FIVESUM OFF #Five-Minute Summary
HOST OFF #Host Report
REDIRHOST OFF #Host Redirection Report
FAILHOST OFF #Host Failure Report
ORGANISATION OFF #Organisation Report
DOMAIN OFF #Domain Report
REQUEST ON #Request Report
DIRECTORY OFF #Directory Report
FILETYPE OFF #File Type Report
SIZE OFF #File Size Report
PROCTIME OFF #Processing Time Report
REDIR OFF #Redirection Report
FAILURE OFF #Failure Report
REFERRER OFF #Referrer Report
REFSITE OFF #Referring Site Report
SEARCHQUERY OFF #Search Query Report
SEARCHWORD OFF #Search Word Report
INTSEARCHQUERY OFF #Internal Search Query Report
INTSEARCHWORD OFF #Internal Search Word Report
REDIRREF OFF #Redirected Referrer Report
FAILREF OFF #Failed Referrer Report
BROWSERREP OFF #Browser Report
BROWSERSUM OFF #Browser Summary
OSREP OFF #Operating System Report
VHOST OFF #Virtual Host Report
REDIRVHOST OFF #Virtual Host Redirection Report
FAILVHOST OFF #Virtual Host Failure Report
USER OFF #User Report
REDIRUSER OFF #User Redirection Report
FAILUSER OFF #User Failure Report
STATUS OFF #Status Code Report

# Referring URL Report REFLINKINCLUDE * REFREPEXCLUDE http://domain.com/* REFREPEXCLUDE http://www.domain.com/* REFFLOOR 10r

# Referring Site REFSITEEXCLUDE http://domain.com/

# Request Report REQFLOOR 1r REQEXCLUDE *.jpg REQEXCLUDE *.gif REQEXCLUDE *.png REQEXCLUDE *.bmp REQEXCLUDE *.class REQEXCLUDE *.css REQEXCLUDE *.js
REQEXCLUDE *.ico

# Status Code Report
304ISSUCCESS ON # Includes 304 errors on the request report

# Custom Exclusions
FILEEXCLUDE /stats/*

Optimisation

It is important to disable any and all unwanted reports, this will speed-up processing of the log files, reduce the size of the output file and reduce the expense of processing and reading data into your website later on. If you are going to follow this example and are only interested in accessing the hit count then the only report that you need to turn on is the REQUEST report.

Additionally, you should configure the REQFLOOR and REQEXCLUDE options to optimise the scope of the output. In most cases REQFLOOR will be set to 1r (1 request) while exclusions should include files that you have no intention or ability to measure. For example, if you are only going to display the hit count of the currently visible .asp file then everything apart from .asp can be excluded from the report.

Running Analog on a Schedule

Now that Analog has been configured, you need to run it. You can run the process manually (including via Task Scheduler) via the command

"d:\parser\analog\Analog.exe" +gd:\sites\www.domain.com\analog-xml.cfg

This will merge any master configuration file with the custom XML .cfg file in generating the output as instructed in the analog-xml.cfg

If you wish to schedule it as part of a larger stats run process, then following the example from my “Using Analog C:Amie Edition to provide automatic statistics on a multi-site production IIS 4.0, 5.0, 5.1, 6.0, 7.0, 7.5 or 8.0 web server” guide, the following script can be used to automate both the parsing of the HTML and XML output

cls
@echo off
SET ALGROOT=d:\parser\Analog\
SET WEBROOT=d:\sitesFOR /f "tokens=*" %%A IN ('dir /b d:\sites') DO (echo Looking for: analog.cfg in %%A

IF EXIST "%WEBROOT%\%%A\analog.cfg" (
echo Found analog.cfg for %%A
md "%WEBROOT%\%%A\web\stats"
echo.
"%ALGROOT%analog.exe" +g%WEBROOT%\%%A\analog.cfg > %WEBROOT%\%%A\analog.log
echo.

IF EXIST "%WBROOT%\%%A\analog-xml.cfg" (
"%ALROOT%analog.exe" +g%WBROOT%\%%A\analog-xml.cfg > %WBROOT%\%%A\analog.log
) ELSE (
echo ANALOG XML NOT FOUND
)

) ELSE (
echo ANALOG CONFIG FILE NOT FOUND IN "%WEBROOT%\%%A\"
echo.
)
)

This script searches all sub-folders of d:\sites for the presence of an analog.cfg file e.g. d:\sites\www.mydomain.com\analog.cfg. If it finds one it ensures that there is an appropriate output directory (\web\stats) and then runs the Analog C:Amie Edition executable using the baseline configuration (this is implicit) and the local site-level analog.cfg configuration file (explicit) to produce the report.

It will then repeat the process, looking for analog-xml.cfg and if present will generate the associated XML output.

Displaying the Page Statistics

Moving forward with our example, we will now have a XML file ready to re-parse located at d:\sites\www.domain.com\analog.xml. The next step is to create a piece of code that will match up the currently displayed web page with its entry in the XML output. The steps to do this are:

Normalise the current page URL
Filter the current page URL
Load the XML file
Query the XML file
Display the output

Normalise the current page URL

Your web servers log file will log anything that it is sent by the browser, often the structure of this is entirely at the mercy of the web user or web service sending the request. You should therefore take precautions to protect the parser and increase the likelihood of finding a match in the XML file.

The following obtains the current URL from ASP’s ServerVariables object, removes unnecessary spaces and enforces that we will only search the XML file using Lower Case URL’s.

Dim strLocalPath
strLocalPath = Request.ServerVariables("URL")
strLocalPath = Trim(strLocalPath)
strLocalPath = Replace(strLocalPath," ", "+")
strLocalPath = LCase(strLocalPath)

Filter the current page URL

Assuming that you are using ASP 3, by default your index page will be identified as “default.asp” (for example www.domain.com/folder/default.asp). It is however possible for a client to access the same page using “/” (for example www.domain.com/folder/ or www.domain.com/folder). Which of these is logged will depend entirely on which of the three (equally valid) options the client chose to use. In theory this should be normalised, however Analog may record impressions for www.domain.com/folder/default.asp, www.domain.com/folder/ and www.domain.com/folder separately, therefore what we need to do is ensure that when we query the XML file later on that we are querying for all valid combinations on your server.

The simplest way to achieve this is to start building a query that assumes we will need all permissible options in order to obtain a valid result.

Dim strSearchSelector
' Normalise the Path String, generate the XPath OR Statement
if ((strLocalPath = "/") OR (strLocalPath = "/default.asp")) then
strSearchSelector = "col=""/"" or col=""/default.asp"""
else
if (Right(strLocalPath, 12) = "/default.asp") then
strLocalPath = Left(strLocalPath, (Len(strLocalPath) - 12))
end ifif (Right(strLocalPath, 1) = "/") then
strLocalPath = Left(strLocalPath, (Len(strLocalPath) - 1))
end if

strSearchSelector = "col=""" & strLocalPath & """ or col=""" & strLocalPath & "/"" or col=""" & strLocalPath & "/default.asp"""

end if

In the above code we first check to see if the use is querying the absolute website home page (either www.domain.com/ or www.domain.com/default.asp) and create a select statement accordingly.

If we are not on the absolute home page, we remove the trailing “/default.asp” from the URL and we remove the trailing “/” from the URL and then construct a select statement that will query for all three possible combinations:

www.domain.com/folder
www.domain.com/folder/
www.domian.com/folder/default.asp

Load the XML file

Now that we have a search query, it is time to load the XML file and configure Microsoft XML to perform the leg work required to find our up to three possible URL combinations for the current page.

Dim xmlDoc
set xmlDoc = Server.CreateObject("MSXML2.DOMDocument.6.0")
xmlDoc.async = False
xmlDoc.setProperty "ServerHTTPRequest", true
xmlDoc.setProperty "ProhibitDTD", false ' Required for ~MSXML 6 (default in 6 = true; <6 = false)
xmlDoc.setProperty "SelectionLanguage","XPath"xmlDoc.validateOnParse = false' Load the specified XML file (returns XML output)
xmlDoc.load("d:\sites\www.domain.com\analog.xml")

In the above example, a Microsoft XML 6.0 object is created with configuration for the use of XPath to execute our query and permission to use the Analog DTD. ServerHTTPRequest is required for processing of synchronous XML documents within ASP.

xmlDoc.load() copies the XML file from the file system, parses it into memory and permits us to progress to the data extraction phase.

Query the XML file

As mentioned above, the weapon of choice in this example will be to query the XML file using XPath syntax. XPath in a somewhat complicated way allows us to walk the DOM hierarchy of an XML file by applying selection filters are we transverse the file towards our desired result.

Without wishing to overcomplicate analysis on how this works or the processes involved, the XPath query required to extract a hit count from the Analog C:Amie Edition 6.04+ DTD is:

/analog-data/report[@name='rep_req']/row[@level='1'][col='/folder/default.asp']/col[@name='col_reqs']

Or in a more logical format:

Select the analog-data parent
Select the report sub tree
Expand all elements named row where the attribute level = 1
Then find sub elements named col where the data (value) equals /folder/default.asp
If all of the above matches return the value of the element col under row under report under analog-data where the attribute name = col_reqs

In our code, we substitute the value of strSearchSelector into the query so that all three valid path combinations are returned

Dim xmlResult

set xmlResult = xmlDoc.selectNodes("/analog-data/report[@name='rep_req']/row[@level='1'][" & strSearchSelector & "]/col[@name='col_reqs']")

Display the Output

The final task is to display the request count. Our xmlResult object does not contain a single value for this, but may contain 3 (or more) values, one for each of our valid paths – more if there is an anomaly in the data.

In order to obtain the actual page request count, we need to aggregate the three values together

Dim lngOut
Dim strOut
if (xmlResult.length > 0) then
for i = 0 to (xmlResult.length - 1)
if (IsNumeric(xmlResult(i).text)) then
lngOut = (lngOut + CLng(xmlResult(i).text))
end if
next
strOut = CStr(FormatNumber(lngOut,0))
else
strOut = "0"
end if

In this code segment, if there are no valid results in the XML file, we return “0”. Else, we read the value of each of the results into a variable lngOut, adding each time. We then convert this number into a formatted string (adding comma’s as the thousands separator) and can return the actual page request count as defined by analog.

The complete code sample

Bringing the above segments together, the final algorithm should look something like this:

' Not for resale or use in commercial profit making activities. Use of this script sample is permitted as long as attributions is maintained.' Call the function and write the value

Response.Write("The Request Count Is: " & getPageHits(Request.ServerVariables("URL")) )Function getPageHits(ByVal strLocalPath)
Dim i
Dim xmlDoc
Dim xmlResult
Dim lngOut
Dim strSearchSelector
Dim strOut

strLocalPath = Trim(strLocalPath)
strLocalPath = Replace(strLocalPath," ", "+")
strLocalPath = LCase(strLocalPath)

' Normalise the Path String, generate the XPath OR Statement
if ((strLocalPath = "/") OR (strLocalPath = "/default.asp")) then

strSearchSelector = "col=""/"" or col=""/default.asp"""

else

if (Right(strLocalPath, 12) = "/default.asp") then
strLocalPath = Left(strLocalPath, (Len(strLocalPath) - 12))
end if

if (Right(strLocalPath, 1) = "/") then
strLocalPath = Left(strLocalPath, (Len(strLocalPath) - 1))
end if
strSearchSelector = "col=""" & strLocalPath & """ or col=""" & strLocalPath & "/"" or col=""" & strLocalPath & "/default.asp"""
end if

set xmlDoc = Server.CreateObject("MSXML2.DOMDocument.6.0")
xmlDoc.async = False
xmlDoc.setProperty "ServerHTTPRequest", true
xmlDoc.setProperty "ProhibitDTD", false ' Required for ~MSXML 6 (default in 6 = true; <6 = false)
xmlDoc.setProperty "SelectionLanguage","XPath"
xmlDoc.validateOnParse = false
' Load the specified XML file (returns XML output)
xmlDoc.load("d:\sites\www.domain.com\analog.xml")
if (xmlDoc.parseError.errorCode <> 0) then
strOut = "Error in parsing XML file"
else
set xmlResult = xmlDoc.selectNodes("/analog-data/report[@name='rep_req']/row[@level='1'][" & strSearchSelector & "]/col[@name='col_reqs']")
if (xmlResult.length > 0) then
for i = 0 to (xmlResult.length - 1)
if (IsNumeric(xmlResult(i).text)) then
lngOut = (lngOut + CLng(xmlResult(i).text))
end if
next
strOut = CStr(FormatNumber(lngOut,0))
else
strOut = "0"
end if
set xmlResult = nothing
end if
set xmlDoc = nothing
getPageHits = strOut
End Function

Final Considerations

Now that you have the ability to extract statistics from analog, you can get creative. You can pass a URL into the function getPageHits(“/downloads/myfile.exe”) to extract the request count for other content that is not directly attached to the current URL of the page.

You should also consider how frequently you want to generate updated statistics. You could run Analog continually in a loop if you wanted to generate near real-time statistical data, every 2 hours or daily (or anything in between). Windows Task Scheduler or cron under unix is the easiest way to schedule updates in most cases. Don’t forget to balance the processing load requirements of the server against the size of the log file store that you are scanning. You may also want to consider limiting the amount of logs in the statistics to the “last year” or “last month”. The logs for C:Amie (not) Com go back to January 2002, HPC:Factor’s repository is even older than that and so it would be utterly impractical for a production web server to undertake real-time scanning activities.

With a dedicated log parser server however…?