System Requirements:
- Analog C:Amie Edition
The Problem:
While it isn’t very common these days, and is generally considered to be somewhat crass, there are occasions where you might want to display statistical information in terms of page views or hit counts directly on a web page – for example you may have an administrative view for your website, be running a competition and need to track hits on specific files or just want to tell the world how popular you are.
This article outlines how to use Analog C:Amie Edition’s XML mode to do this.
More Info
By default most users will either generate the HTML output for display or Computer output for piping into ReportMagic when configuring Analog. There are however other options for the structure of the data export as originally implemented by Stephen Turner. One of which allows the export of statistics into XML which you can use to parse for live statistical data and ultimately include on a web page.
How-to
The how-to is split into the following sections:
- What is not covered here
- Pre-requisites
- File System Considerations
- Prepare Analog
- Configure the Analog Server Settings
- Running Analog on a Schedule
- Displaying the Page Statistics
- Final Considerations
What is not covered here
This guide is specifically written to demonstrate the configuration of Analog and associated web scripts running under IIS on Windows when used in conjunction with ASP 3. The process is equally possible in PHP, JSP or Ruby as well as .net however only ASP 3 VBScript examples are given.
Pre-requisites
Before you begin, you will need to be running Analog C:Amie Edition 6.04 or higher as there is a bug in the XML DTD of lower versions of both Analog and Analog C:Amie edition.
File System Considerations
You need to consider where you place the XML output from Analog. While placing it in the public web root is entirely possible, this will make it potentially available to anyone who happens to become aware of its existence. Visibility of its existence is highly possible when debugging or in the event that there is an error on the part of the parser, file system or file system permissions.
It is therefore suggested that you keep the output XML file outside of the publicly accessible web root. A good candidate for this would be in the same location as your analog.cfg file.
For the rest of this example the following structures will be assumed
Public Web Root | D:\sites\www.domain.com\web\ |
---|---|
Log Files | D:\sites\www.domain.com\logs\W3SVC1\ |
Configuration (.cfg) and Output Destination (.xml) | D:\sites\www.domain.com\ |
Analog Executable | D:\parser\Analog\Analog.exe |
Configure the Analog Server Settings
In most cases, you will probably want to maintain your existing statistics output in HTML format for casual browsing. Therefore it is necessary to create a second Analog configuration file which will generate the required output.
The configuration file for Analog’s XML process can be less complex compared to that of the main one, although some customisation to improve parser time and minimise the size of the XML output should be considered essential.
Assuming that an existing Analog .cfg file exists at D:\sites\www.domain.com\logs\analog.cfg we will create a new .cfg file at D:\sites\www.domain.com\logs\analog-xml.cfg.
The example .cfg file shown below outlines an example XML output configuration for our example with paths highlighted for completeness.
Optimisation
It is important to disable any and all unwanted reports, this will speed-up processing of the log files, reduce the size of the output file and reduce the expense of processing and reading data into your website later on. If you are going to follow this example and are only interested in accessing the hit count then the only report that you need to turn on is the REQUEST report.
Additionally, you should configure the REQFLOOR and REQEXCLUDE options to optimise the scope of the output. In most cases REQFLOOR will be set to 1r (1 request) while exclusions should include files that you have no intention or ability to measure. For example, if you are only going to display the hit count of the currently visible .asp file then everything apart from .asp can be excluded from the report.
Running Analog on a Schedule
Now that Analog has been configured, you need to run it. You can run the process manually (including via Task Scheduler) via the command
This will merge any master configuration file with the custom XML .cfg file in generating the output as instructed in the analog-xml.cfg
If you wish to schedule it as part of a larger stats run process, then following the example from my “Using Analog C:Amie Edition to provide automatic statistics on a multi-site production IIS 4.0, 5.0, 5.1, 6.0, 7.0, 7.5 or 8.0 web server” guide, the following script can be used to automate both the parsing of the HTML and XML output
This script searches all sub-folders of
for the presence of an analog.cfg file e.g. d:\sites\www.mydomain.com\analog.cfg. If it finds one it ensures that there is an appropriate output directory (\web\stats) and then runs the Analog C:Amie Edition executable using the baseline configuration (this is implicit) and the local site-level analog.cfg configuration file (explicit) to produce the report.It will then repeat the process, looking for analog-xml.cfg and if present will generate the associated XML output.
Displaying the Page Statistics
Moving forward with our example, we will now have a XML file ready to re-parse located at d:\sites\www.domain.com\analog.xml. The next step is to create a piece of code that will match up the currently displayed web page with its entry in the XML output. The steps to do this are:
- Normalise the current page URL
- Filter the current page URL
- Load the XML file
- Query the XML file
- Display the output
Normalise the current page URL
Your web servers log file will log anything that it is sent by the browser, often the structure of this is entirely at the mercy of the web user or web service sending the request. You should therefore take precautions to protect the parser and increase the likelihood of finding a match in the XML file.
The following obtains the current URL from ASP’s ServerVariables object, removes unnecessary spaces and enforces that we will only search the XML file using Lower Case URL’s.
Filter the current page URL
Assuming that you are using ASP 3, by default your index page will be identified as “default.asp” (for example www.domain.com/folder/default.asp). It is however possible for a client to access the same page using “/” (for example www.domain.com/folder/ or www.domain.com/folder). Which of these is logged will depend entirely on which of the three (equally valid) options the client chose to use. In theory this should be normalised, however Analog may record impressions for www.domain.com/folder/default.asp, www.domain.com/folder/ and www.domain.com/folder separately, therefore what we need to do is ensure that when we query the XML file later on that we are querying for all valid combinations on your server.
The simplest way to achieve this is to start building a query that assumes we will need all permissible options in order to obtain a valid result.
In the above code we first check to see if the use is querying the absolute website home page (either www.domain.com/ or www.domain.com/default.asp) and create a select statement accordingly.
If we are not on the absolute home page, we remove the trailing “/default.asp” from the URL and we remove the trailing “/” from the URL and then construct a select statement that will query for all three possible combinations:
- www.domain.com/folder
- www.domain.com/folder/
- www.domian.com/folder/default.asp
Load the XML file
Now that we have a search query, it is time to load the XML file and configure Microsoft XML to perform the leg work required to find our up to three possible URL combinations for the current page.
In the above example, a Microsoft XML 6.0 object is created with configuration for the use of XPath to execute our query and permission to use the Analog DTD.
is required for processing of synchronous XML documents within ASP.xmlDoc.load() copies the XML file from the file system, parses it into memory and permits us to progress to the data extraction phase.
Query the XML file
As mentioned above, the weapon of choice in this example will be to query the XML file using XPath syntax. XPath in a somewhat complicated way allows us to walk the DOM hierarchy of an XML file by applying selection filters are we transverse the file towards our desired result.
Without wishing to overcomplicate analysis on how this works or the processes involved, the XPath query required to extract a hit count from the Analog C:Amie Edition 6.04+ DTD is:
Or in a more logical format:
- Select the analog-data parent
- Select the report sub tree
- Expand all elements named row where the attribute level = 1
- Then find sub elements named col where the data (value) equals /folder/default.asp
- If all of the above matches return the value of the element col under row under report under analog-data where the attribute name = col_reqs
In our code, we substitute the value of
into the query so that all three valid path combinations are returnedDisplay the Output
The final task is to display the request count. Our xmlResult object does not contain a single value for this, but may contain 3 (or more) values, one for each of our valid paths – more if there is an anomaly in the data.
In order to obtain the actual page request count, we need to aggregate the three values together
In this code segment, if there are no valid results in the XML file, we return “0”. Else, we read the value of each of the results into a variable lngOut, adding each time. We then convert this number into a formatted string (adding comma’s as the thousands separator) and can return the actual page request count as defined by analog.
The complete code sample
Bringing the above segments together, the final algorithm should look something like this:
Final Considerations
Now that you have the ability to extract statistics from analog, you can get creative. You can pass a URL into the function
to extract the request count for other content that is not directly attached to the current URL of the page.You should also consider how frequently you want to generate updated statistics. You could run Analog continually in a loop if you wanted to generate near real-time statistical data, every 2 hours or daily (or anything in between). Windows Task Scheduler or cron under unix is the easiest way to schedule updates in most cases. Don’t forget to balance the processing load requirements of the server against the size of the log file store that you are scanning. You may also want to consider limiting the amount of logs in the statistics to the “last year” or “last month”. The logs for C:Amie (not) Com go back to January 2002, HPC:Factor’s repository is even older than that and so it would be utterly impractical for a production web server to undertake real-time scanning activities.
With a dedicated log parser server however…?