Creating a Link Anonymiser Service for Analog CE’s ANONYMIZERURL setting

This article discusses how to create an link anonymiser service redirector to make use of the Analog CE 6.0.16+ ANONYMIZERURL setting.

 

Why use an anonymiser?

If a user clicks a link on an Analog CE “Requesting Site” or “Requesting URL” report. The users web browser will send a HTTP Referrer header with the request to download the web page; this request will include the full URL or your Analog CE report. The receiving server will likely log the request, allowing its owner to see where the request originated.

This may expose the Internet or Intranet URL or your stats page to the target website owner. They may in-turn inadvertantly publicise it via their own statistics page and/or link-back tracker service. This makes it possible for other agents, including competitors, search engines and malicious users to discover information about your website. Worse your web server may become the target of SEO spammers.

 

What is SEO spam?

SEO spam is the practice of attempting to improve a website/page position on a search engine by creating ‘false’ links into that website. If your referring site/URL report is public it is possible for a malicious actor to artificially position one or more URLs on the report. This is achieved through a manipulated HTTP GET request containing a HTTP Referrer header with the URL/site that they want to inject onto your report. After making several hundred requests in this fashion the spammer will wait for the report to be updated. After confirming that their site has appeared in the report, they submit your statistics page(s) to search engines.

Once compromised, it is likely that your exploitability will be recorded in one or more botnets and will see wider exploitation.

 

Why create your own anonymiser?

You can use public anonymiser services such as anonymizer.info or anon.to with Analog CE using one of the code samples below.

ANONYMIZERURL https://anon.to/?

ANONYMIZERURL https://anonymizer.info/?

This may not be acceptable to you, or your organisational security policy. Firstly because while the owner of the resultant web server will not discover the true origin of the request, the public anonymiser service will. Secondly, there is no contract assuring service availability or the privacy of its log files. Finally, it is inevitable that the service is going to profit from your transaction. Advertising placement is likely, creating a delay in the redirect.

 

Code your own Anonymous Link Redirector

The following code snippets can be used to program your own basic redirector using service side scripting technology.

 

ASP 3 / Classic ASP

<%@LANGUAGE="VBSCRIPT" CODEPAGE="65001" EnableSessionState="False"%>
<% Option Explicit %>
<%
  Response.Status = "302 Found"
  call Response.AddHeader("Location", Request.QueryString)
  Response.End()
%>

Save the file as redirector.asp and add the following to your Analog CE global configuration file:

ANONYMIZERURL http://my-server.domain.com/redirector.asp?

 

ASP.net

<%@ Page Language="C#" %>
<script runat="server">
  private void Page_Load(object sender, EventArgs e)
  {
    Response.Redirect(HttpContext.Current.Request.ServerVariables["QUERY_STRING"], true);
  }
</script>

Save the file as redirector.aspx and add the following to your Analog CE global configuration file:

ANONYMIZERURL http://my-server.domain.com/redirector.aspx?

 

PHP

<?php
  header('Location: ' . $_SERVER['QUERY_STRING'], true, 302);
  exit;
?>

Save the file as redirector.php and add the following to your Analog CE global configuration file:

ANONYMIZERURL http://my-server.domain.com/redirector.php?

 

Conclusion

The above code samples illustrate how to create a redirector in several different languages. The redirector URL will be sent to the destination server however the originating statistics page will now be protected. This protects your Analog CE stats pages from prying eyes while reducing the risk of SEO spamming.

Analog’s Search Query Report is empty or sparsely populated

If you are an Analog CE user, you may have noticed that the Search Query and Search Word reports suggest that your site has been receiving little to no search engine traffic. The Search reports may be empty or sparsely populated with results. This article discusses the cause of the problem.

 

What are the External Search Reports?

The premise of the external (rather than internal) search report is to show you what a visitor typed into a search engine before landing on your site. The Search Query report shows the entire search phrase used, while the Search Word report creates a word count report for anything de-marked by a space.

They are intended to help you identify what your users are searching for as part of your SEO activities. Unfortunately, over the last couple of years the data available on the report has been declining. On some sites, the report may have disappeared altogether as no data is available to Analog CE.

 

Fixing the empty or sparsely populated report

There are three reasons for the decline in report quality. They are outlined below in order of lest to most severe.

 

Missing Search Engines

The default config file in the Analog CE source release is a clone of the original Analog 6 config file. It (and your own) config files have not been updated with modern search engines. Consequently traffic from Bing and DuckDuck Go (and others) will be wholly absent.

I will be including an updated config file sample in Analog CE starting with Analog CE 6.0.16. Analog CE users will need to replace or merge this into your running configuration files. To perform this manually, you must add relevant SEARCHENGINE entries to your global/site config files.

 

HTTP > HTTPS

As a result of security issues and the gross mismanagement of private data transiting public networks by the CIA. The world made a rapid switch from HTTP to HTTPS in 2015. The default Analog 6 config file only included Search Engine detection definitions for unencrypted HTTP connections. Consequently Analog CE has stopped reporting traffic from any HTTPS search engine source.

Starting with Analog CE 6.0.16, I will fix this in the bundled sample file. You will need to merge the SEARCHENGINE changes into you respective config files.

 

Privacy & Commercialisation of Analytics

The crux of the problem stems from Search Engines no-longer send the search query in their referrer header. Some see this as being for legitimate privacy reasons, others for commercial benefit. DuckDuck Go would certainly argue the former. For Google, the reason is less clear. With up to 98% market share, Google is effectively forcing webmasters to use Google Search Console and/or Google Analytics to understand their user needs. In turn, this feeds Google with more accurate data on user activity for ad-profiling. It also ensures that you as a webmaster are creating the web that Google wants, above any other concern. Any webmaster threatened with de-listing due to mobile device incompatibility will understand this.

The following sample log line is from the 5th April 2015

2015-04-05 15:38:29 W3SVC5 web02 xxx.xxx.xxx.xxx GET /downloads/msie/ie60sp1/ - 80 - yyy.yyy.yyy.yyy HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+4.01;+Windows+98) - http://www.google.nl/search?q=internet+explorer+download+windows+98&ie=ISO-8859-1&hl=nl&source=hp&gbv=1 www.hpcfactor.com 200 0 0 10747 331 406

This is a similar one from 4th June 2019

2019-06-04 00:17:31 W3SVC5 web01 xxx.xxx.xxx.xxx GET /hardware/devices/specification.asp d=142 443 - yyy.yyy.yyy.yyyy HTTP/2 Mozilla/5.0+(Linux;+Android+6.0;+HM-G552-FL+Build/MRA58K)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/56.0.2924.87+Mobile+Safari/537.36 - https://www.google.com/ www.hpcfactor.com 200 0 0 12534 491 495

If you scroll to the right, you will notice that in the earlier file, the user searched for “Internet Explorer Download Windows 98”. In the latter file, Google only informs the server that the request came from google.com. The context of the search is no longer offered.

As a consequence, the data required to create the Search Query report is not available to Analog CE. This is the reason why the Analog Search Query Report is empty or sparsely populated. It is not a bug. Rightly or wrongly, it is a symptom of the modern web.

How to compact VHDX files in the most efficient way

Have you ever run the Hyper-V “Edit Disk…” tool’s ‘compact’ action, to discover that your VHD or VHDX disk file didn’t shrink? This article discusses how to optimise the chances for success when compacting your virtual hard drives on Windows Server.

 

Why Compact VHDX files?

Compacting virtual disk files is an activity that most Hyper-V administrators seldom ever do. In most cases, the space savings are going to be nominal and outweigh the benefits. A well designed Hyper-V deployment will ensure accounting for future storage optimisation/loading during the capacity planning phase; with scale-out design considerations being made during this stage too.

Compaction is not a fix for poor design.

If you are unfamiliar with what compaction does at the technical level, Alataro has a good introduction guide.

View: Altaro “Why You Should Be Compacting Your Hyper-V Virtual Disks”

In practice, compaction can help in two scenarios:

  1. Improving the speed and disk space use for non-VSS block-based backups
  2. Reducing the amount of data transferred during live migration

 

Why doesn’t Edit Disk… work?

It is common for the Hyper-V Manager Compact tool to not achieve any reduction in the size of the VHDX file, putting many administrators off from spending time running it.

Two problems exist with the GUI compaction wizard:

  1. The VHDX file is not in a pre-optimised state prior to compaction, reducing the success rate
  2. For performance reasons. Execution through the wizard does not use the full range of compaction tools available to the system. These can only be accessed (currently) via PowerShell.

So how do you properly shrink a virtual disk?

 

Optimise your VM

Inevitably there will be some data inside the VM which is not necessary to retain. Removing this prior to compaction, will automatically increase the amount of space that you save.

Optimising Windows

Under Windows, running the Disk Clean-up tool is a good place to start. Additional places to clear-out include:

  1. C:\Windows\Temp
  2. C:\Windows\SoftwareDistribution\Download
    The SoftwareDistribution\Download folder will usually save over 1GB of space as it contains the installation sources for Windows Updates. Microsoft Update will re-download the files again if it needs them on the next update scan.
  3. C:\Users\<user>\AppData\Local\Temp

You can run defrag from within the VM, however if you are in a position where you can off-line compact the virtual disk, it is more time efficient to defrag the VHDX while offline.

Optimising Linux

Optimisation should also be performed on Linux systems. Unlike with Windows, Linux self-clears its own Temp space on restart. You can also free space from the updates process using your package manager. The below example can be used to free space using apt:

sudo apt-get update
sudo apt-get autoremove
sudo apt-get autoclean
sudo apt-get clean

Under native Linux file systems, there is no concept of running a defragmentation. ext4 performs this automatically on behalf of the system. This optimisation does not release free space in a “physical” fashion on the hard drive. Unless this “physical” space is released (zeroed out), Hyper-V will be unable to compact the disk.

Fortunately it is possible to force Linux to zero-out unused disk space in the VM. It can be performed using the following command

su
cd /
cat /dev/zero > zero.dat ; sync ; sleep 1 ; sync ; rm -f zero.dat

This command creates a file on the drive mounted in the “cd /” line. It then writes 0’s into this file until the system literally runs out of disk space. The new 0’s file is then deleted. If you are compacting a VHDX that is not the root partition, you must change the “cd /” line to represent the correct drive e.g. “cd /mnt/myDisk“.

Note: You will completely fill the volume for a few seconds. This will impact other write activities occurring on the disk and so can be considered to be a dangerous process.

 

Shrinking an Online VHDX

It is possible to perform an online compaction of a virtual disk. Hyper-V itself performs light-touch background optimisation automatically. If you cannot shutdown a VM and can only optimise an online VHDX then you must:

  1. Delete temp files
  2. Internally defragment the disk from within the VM itself
  3. Use Disk Management or diskpart to shrink the size of the mounted partition currently in use on the VHDX
diskpart
list vol
sel vol #
shrink
  1. Perform the compact using Hyper-V Manager or PowerShell
    Optimize-VHD <path to VHDX> -Mode Full
  2. Reverse the reduction in the size of the partition using Disk Management or diskpart
diskpart

list vol
sel vol #
extend

 

Shrinking an Offline VHDX

The most efficient method to free space from a VHDX file is to perform the compact wile it is offline. The disadvantage of an offline compaction is that the VM will need to be shutdown prior to the operation. The amount of time that the VM will be down for is directly proportional to the amount of work required to complete the optimisation process. You can improve this through online defragmentation prior to starting, however this takes significant additional administrative effort.

The following steps outline the process that I use to greatest effect. I use X:\ as the example drive letter in this scenario and the use of PowerShell is expected.

  1. Get the initial VHDX size
    $sizeBefore = (Get-Item <path to VHDX file>).length
  2. {optionally} Defrag the VM while online
  3. Shutdown the VM
  4. Mount the VHDX in read/write mode (not read only mode)
    Mount-VHD <path to VHDX file>
  5. Purge Temp files and the contents of X:\Windows\SoftwareDistribution\Download
  6. Defragment the VHDX while mounted to the management system
    1. If the VHDX is stored on SSD drives
      defrag x: /x
      defrag x: /k /l
      defrag x: /x
      defrag x: /k
    2. If the VHDX is stored on Hard Drives (/x /k /l are retained for Trim capable SANs)
      defrag x: /d
      defrag x: /x
      defrag x: /k /l
      defrag x: /x
      defrag x: /k /l

      Note: Defrag /d  can be particularly time consuming
  7. Dismount the VHDX
    Dismount-VHD <path to VHDX file>
  8. Perform a full optimisation of the VHD file
    Optimize-VHD <path to VHDX file> -Mode Full
  9. Get the new size of the VHDX
    $sizeAfter = (Get-Item <path to VHDX file>).length
  10. Start the VM
  11. Find out how much disk space you saved (if any)
    Write-Host "Total disk space Saved: $(($sizeBefore - $sizeAfter) /1Mb) MB" -Foreground Yellow

 

You will note that I repeat the steps of running defrag /x and defrag /k /l. In experimenting, this is because the repitition appears to allow a small amount of additiona space to be freed in some situations as show in the table below.

 

Results

Efficiency Purgable slabs
Size at start 73,991,716,864 100% 11
/k Slab consolidation 69,965,185,024 100% 11
/l Retrim 69,965,185,024 100% 11
/x Free space consolidation 69,965,185,024 100% 9
/x /l /k (repeat) 69,898,076,160 100% 9

The table shows the first /l retrim operation reducing the number of purgable slabs, after which a further 67,108,864 bytes (64MB) of space is freed – the two 32MB slabs.

 

Why not run Defrag /d on an SSD?

Defrag /d aka “traditional defrag” physically moves the file data to the end of its home partition, reconstructs the file in a contiguous series of blocks and moves the file back to the start of the disk. This process is unnecessary on an SSD. Here, there is virtually no likelihood that the data is stored in a contiguous fashion on the SSD NAND flash. There is also no performance benefit for the file being stored contiguously. While you can perform defrag /d on an SSD. In reality, you are needlessly shortening its cell-write life and the step should be skipped.

 

Conclusion

It is unfortunate that the process of compacting a VHDX file is not a seamless one. To realise the highest returns, it is necessary to shutdown the VM; which may not be practical in many scenarios. Equally, the amount of time required to perform the offline compact scales with the utilisation size of the VHDX, number of files and number of tasks performed as part of the maintenance.

Done right, and with the help of script automation, it can be a valuable task – especially before planned VM moves. I regularly save over 130GB in total when draining a hypervisor for maintenance in my home lab – around 25-30 minutes less file copy time over 1Gbps Ethernet. A worthwhile saving as it only takes 20 seconds to execute the automation script that does the work for me.

Sniffing the Parent Partitions Network Traffic in a Hyper-V Virtual Machine

This article discusses a situation whereby you want to monitor/mirror/sniff network port traffic on a Hyper-V Parent Partition inside on of its own child VM’s.

Why would you need to do this?

Under a traditional architecture you have the flexibility to tell your switch to mirror all traffic into or out of Port 6 onto Port 21. You then connect a laptop to Port 21 and promiscuously monitor the traffic coming into that port. Under a modern Converged/Software Defined Network architecture, this will not work.

In a modern Converged Fabric design, physical NICs are teamed. The parent partition on the hypervisor no-longer uses the physical NICs, but logically uses its own synthetic NICs for data transfers.

  1. Link Aggregation/LCAP/EtherChannel will split the traffic at the switch
  2. Teaming/LBFO will split the traffic at the hypervisor
  3. Data security will fire a red flag as you will be monitoring too much unrelated traffic
  4. If you combine them, you will overload the monitoring Port with aggregated traffic, causing performance issues and packet loss
  5. You may impact the performance of tenant VM’s and mission critical services

Fortunately the Parent Partitions own Virtual NICs are identical to the vNICs in any Hyper-V virtual machine. Consequently, you can use the same Hyper-V functionality on the Parent Partition as you would any VM.

 

Requirements

In order to sniff traffic on the Parent Partition you must ensure the following:

  1. The Parent Partition and the VM must be connected to the same Virtual Switch
  2. The “Microsoft NDIS Capture” extension must be enabled on the Virtual Switch (this is enabled by default)
    Enable the Microsoft NDIS Capture Extensions
  3. The monitoring VM should have 2 vNICs. The vNIC used to monitor traffic should be configured onto the same VLAN as the vNIC on the Parent Partition. The monitoring NIC should have all of its service and protocol bindings disabled to ensure that only port mirrored traffic is appearing in the WireShark logs
    Disabling service and protocol bindings on the vNIC
  4. Wireshark, Microsoft NetMonitor or another promiscuous network traffic monitor
  5. If you are in a corporate environment, ensure that you have approvals from your Information Security team. In some jurisdictions port sniffing can be considered an offence

 

Enabling Port Sniffing

You cannot enable Port Sniffing on the Parent Partition using the Hyper-V Manager GUI. Open PowerShell on/to the Parent Partition

Execute Get-NetAdapter

Identify the name of vNIC that you will sniff traffic to/from e.g. vEthernet (Management)

Taking only the value inside the parenthesis "Management" enter the following command

Get-VMNetworkAdapter -ManagementOS 'Management' | Set-VMNetworkAdapter -PortMirroring Source

Substituting WireSharkVm for the name of your monitoring VM. Execute Get-VMNetworkAdapter 'WireSharkVm'

Identify the MAC Address of the vNIC’s that you will use to receive the Port Mirror from the Hyper-V host and enable it as the recipient for the mirror

Get-VMNetworkAdapter 'WireSharkVm' | ?{$_.MacAddress -eq '001512AB34CD'} | Set-VMNetworkAdapter -PortMirroring Destination

If the Parent Partition and VM vNICs are in the same VLAN. You should now be able to sniff traffic inbound to / outbound from the Parent Partition.

 

Disabling Port Sniffing

When using Port Mirroring, remember that it consumes CPU time and network resources on the hypervisor. To disable the port mirror, repeat the above commands substituting ‘None’ as the key-word for the PortMirroring parameter e.g.

Get-VMNetworkAdapter -ManagementOS 'Management' | Set-VMNetworkAdapter -PortMirroring None
Get-VMNetworkAdapter 'WireSharkVm' | ?{$_.MacAddress -eq '001512AB34CD'} | Set-VMNetworkAdapter -PortMirroring None