SQL Server combining LIKE with IN (SELECT LIKE IN query)

System Requirements

  • SQL Server

 

The Problem

This was one of those “it should be obvious” moments.

Dating back to 2004 in its current iteration, the hpcfactor.com membership database has been blighted by SPAM. We are no different to any web community in this regard. A can be a particular problem as many accounts that we may miss are registered and (if they survive) become a valuable commodity to sell as trojan accounts for use in mischief at a later date.

The moderation team at HPC:Factor does a great job of ensuring that these ‘pay per post’ (usually from Indian IP’s) urchins are wasting their time, but it does not stop them from trying. At one point we were having to content with several hundred a day, and had to shutdown registration completely. A custom layer that I wrote to sit on-top of the BBS registration system has done a good job to convert that SPAM torrent into a trickle of a “couple of times a month”.

Over the years, I have added more and more banned email domains to a ban list that is maintained in the user registration database. It occurred to me that as that list has increased, the historic pool of user account has never been re-parsed against the updated banned domains list.

Sounds simple right? Fire up SQL Server Management Studio and create a SQL SELECT LIKE IN query… is it that simple?

 

The Solution

Yes actually it is.

The obvious logic that you want to follow is to take this syntax

SELECT id, username, emailaddress
FROM members
WHERE emailaddress IN (
   SELECT emailaddress FROM bannedemails
)

… and merge it with a LIKE statement to create a SELECT LIKE IN query, something like

SELECT id, username, emailaddress
FROM members
WHERE emailaddress LIKE IN (
   SELECT '%' + emailaddress FROM bannedemails
)

Unfortunately however this syntax is completely illegal.

The answer is to perform the LIKE/IN as part of an INNER JOIN

SELECT members.id, members.username, members.emailaddress
FROM members
INNER JOIN bannedemails ON members.emailaddress LIKE '%' + bannedemails.emailaddress

 

Performance

This is a not a “real time” friendly SQL expression. The inefficiency of using LIKE evaluation becomes very apparent as the record set scales. The HPC:Factor database had 2,271 banned email domains and 30,336 user. Giving a total record iteration and evaluation count of 68,893,056 rows.

The total query execution time for this on was 2 minutes 13 seconds (ouch). Having said that, on reviewing why it took that long, I realised that at some point in time I was being extremely stingy with the SQL Server VM. It only had access to 2 CPU cores and 12 GB of RAM. The hypervisor server itself has access to a 10 core hyper-threaded Xeon.

 

The Result

I have a lot of respect and sympathy for people who use VPNs and temporary email addresses and I genuinely understand why they feel the need (or in some cases must, depending on what part of the world you are in).

I saying that, I also have to deal with the issue from the other side of the equation. If the website is having a bad day, it’s usually because of something coming from one or both of these camps. As ever in life, a minority will insist on spoiling a good thing for everyone else.

After running the query, we had only missed 8 user accounts – credit to the moderation team. The oldest account was 26/09/2006 and my favourite that we probably should not have missed was “mrputin” from January 2013.

Returning multiple rows from a SQL generated record set as a concatenated string value using SQL in Microsoft Access

System Requirements:

  • Microsoft Access 97, 2000, 2002 (XP), 2003, 2007

The Problem:

If you have a one to many relationship in your database and you want to query the following (rudimentary and fairly poor) information:

table_Year

ID YearCode
1 2008
2 2009
3 2010

table_Semster

YearCode Semester
2008 A
2009 A
2009 B
2010 A
2010 B
2010 C

The query of:
SELECT table_Semester.YearCode, table_Semester.Semester FROM table_Semester WHERE table_Semester.YearCode = table_Year.YearCode AND table_Year.ID = 2;

Returns:

YearCode Semester
2009 A
B

But what if you (for some reason, the why here is your own business) want to return:

YearCode Semester
2009 A,B

or even just

2009 AB

 

More Information:

Larger database management systems such as MySQL, Oracle and Microsoft SQL Server can utilise built-in functions, stored procedures or cursor based logic to create a string from multiple rows, however Access is not powerful enough to support this.

However, it is possible (although not necessarily ideal) to retrieve a single string value from a one-to-many record set in access under one condition:

It only works IF there is a finite and definable number of “many” values.

Take the example above. An academic year has three semesters. Semester A, B and C. We have three values which we can enumerate as being true. As a consequence we can create a concatenated string output using a SQL SELECT statement similar to the following:

SELECT DISTINCT (

SELECT tblYear.YearCode, table_Semester.SemesterCode
FROM table_Year, table_Semester
WHERE (((table_Year.YearCode = table_Semester.Yearcode AND table_Year.ID = tblYear.ID AND SemesterCode=’A’)

) & (

SELECT table_Semester.SemesterCode
FROM table_Year, table_Semester
WHERE (((table_Year.YearCode = table_Semester.Yearcode AND table_Year.ID = tblYear.ID AND SemesterCode=’B’)

) & (

SELECT table_Semester.SemesterCode
FROM table_Year, table_Semester
WHERE (((table_Year.YearCode = table_Semester.Yearcode AND table_Year.ID = tblYear.ID AND SemesterCode=’C’)

) AS strSemester
FROM table_Year tblYear
WHERE tblYear.ID = 1 ;

What we are doing here is defining that there are three known options, searching for each option against the record sets join. Note the redefinition of table_Year as tblYear.

Each sub query returns a single permissible value from the many relationship and concatenates it to a string named “strSemester” (using the & character). The result of executing the query above would be:

YearCode Semester
2009 AB

If you wanted comma’s to separate the Semesters you would have to use

) & “, ” & (

This on its own would of course return excess commas on the example above, i.e.

YearCode Semester
2009 A,B,

Now, I can hear database administrators across the planet groaning right now, and yes I concur… but like I said above, the “why” that prompted you to do this is your own business. So I have no qualms in making this worse than it already is.

If you wanted to return the above without the stray comma’s at the end you would have to augment the SQL statement thusly:

SELECT DISTINCT (

SELECT tblYear.YearCode, table_Semester.SemesterCode
FROM table_Year, table_Semester
WHERE (((table_Year.YearCode = table_Semester.Yearcode AND table_Year.ID = tblYear.ID AND SemesterCode=’A’)

) & (SELECT “, ” as strComma FROM Faculty WHERE (
SELECT table_Semester.SemesterCode
FROM table_Year, table_Semester
WHERE (((table_Year.YearCode = table_Semester.Yearcode AND table_Year.ID = tblYear.ID AND SemesterCode=’A’)) <> NULL;
) & (

SELECT table_Semester.SemesterCode
FROM table_Year, table_Semester
WHERE (((table_Year.YearCode = table_Semester.Yearcode AND table_Year.ID = tblYear.ID AND SemesterCode=’B’)

) & (SELECT “, ” as strComma FROM Faculty WHERE (
SELECT table_Semester.SemesterCode
FROM table_Year, table_Semester
WHERE (((table_Year.YearCode = table_Semester.Yearcode AND table_Year.ID = tblYear.ID AND SemesterCode=’C’)) <> NULL;
) & (

SELECT table_Semester.SemesterCode
FROM table_Year, table_Semester
WHERE (((table_Year.YearCode = table_Semester.Yearcode AND table_Year.ID = tblYear.ID AND SemesterCode=’C’)

) AS strSemester
FROM table_Year tblYear
WHERE tblYear.ID = 1 ;

The horror of the above isn’t as complicated as it looks. We test for Semester A where the ID = 1 and send the result OR Null to the string. We then test it again, but this time we only send a result – the comma – back when the result is not Null, that means that if there is an instance of A the result is written out as:

A,

Repeat for B, but this time only if there is an instance of C the result is written out as:

B,

Giving us a not quite flawless answer (more tests are needed to cover every combination).For the example however the result would be:

YearCode Semester
2009 A,B

With no trailing comma!