Making MySQL Searches Relevant
Posted by Duxter
26 January 2013 - 01:45 AM
Posted by Duxter
26 January 2013 - 01:45 AM
no tagsOne thing that annoys me about some websites is search functionality, and the lack of relevant results that they return. In theory a search feature is a relatively simple concept, you type a keyword for what you are looking for, and a list of matching results are provided. While this looks pretty simple on the front end, the back-end of search queries is a complicated process.
For me, when creating a new search system, it’s always best to start it out pretty simple. A simple database query to find a simple match in a simple database field. Let’s use a recent project I’ve been working on as an example.
Let’s assume our table looks like this.
+-------+------+--------+----------------+-------+---------------+--------------+------------+ | entry | item | system | title | extra | description | tags | date | +-------+------+--------+----------------+-------+---------------+--------------+------------+ | 11 | 47 | 2 | Anthonys Video | 23412 | anthony sho.. |anthony,kinson| 1346172183 |
SELECT * FROM tag_items WHERE title='Anthonys Video'
To simple right? The reason for this is that I want to just return something, anything at all, so that I can start programming and styling the front end to start receiving data for output. At this point there is no sense in me creating complex queries because I’m not absolutely certain that the table or front end will continue to work the way it is at this point. However, once I have completed my front end and have it all functioning we knock things up a step, and this next step is usually where most websites leave it.
SELECT * FROM tag_items WHERE title LIKE '%Ant%' ORDER BY date DESC LIMIT 0,10
So as you can tell from this query, we’re searching the table for items which have a “Title” like “Ant” (where Ant is the search term) and orders them so the most recent entries will show first. Some websites will give various options for this type of search, such as a drop-down offering to search by Title, Description or tag words. This is all very well for a lot of sites as they don’t need anything more expansive than this, especially when the user knows exactly what they are looking for.
The problem with this is that with today’s internet, search is used a lot as a discovery portal, a way for users to find and discover new things which are of more relevance to what they are looking for. To do this, we need to offer a search that doesn't just deliver content to a user based on a single factor such as a title, but instead, be smart and offer up results which are all round relevant to the key word you provided.
At Duxter, to tackle this problem I devised a simple relevance and weighting system, we take each possible search field, and we assign it a value. In this case we’ll use the following where a higher number is more relevant.
- Exact “title” Match = 100 points
- Partial “title” Match = 10 points
- Partial “tags” Match = 7 points
- Partial “extra” match = 5 points
- Partial “description” match = 2 points
Tags are usually a good way to find content based on keywords, but tags get reused a lot and often assigned to things with little relevance, so we consider these less important than the title of the item. Extra, for us, stores extra identifying information, such as a video ID’s and object keys, we only use this in our search queries so that you can search for items based off of an exact id of a certain object. Lastly is our lowest priority, the description. The reason for this is because descriptions often contain a lot of words so there’s more chance of matching something irrelevant.
The way we handle this with an SQL statement is as follows.
SELECT *, sum(relevance) FROM ( SELECT *, 100 AS relevance FROM tag_items WHERE title='Anthony' UNION SELECT *, 10 AS relevance FROM tag_items WHERE title like '%Anthony%' UNION SELECT *, 7 AS relevance FROM tag_items WHERE tags like '%Anthony%' UNION SELECT *, 5 AS relevance FROM tag_items WHERE extra like '%Anthony%' UNION SELECT *, 2 AS relevance FROM tag_items WHERE description like '%Anthony%' ) results GROUP BY entry ORDER BY sum(relevance) desc
So here I have created a number of select statements that will grab items where a field matches the search term. Depending on the match that is found a relevance value is given and all the results get grouped together by their entry id. We then sort them by which item has the highest relevance count.
an example of a returned result would be
+-------+------+--------+----------------+-------+---------------+--------------+----------------+ | entry | item | system | title | extra | description | tags | sum(relevance) | +-------+------+--------+----------------+-------+---------------+--------------+----------------+ | 11 | 47 | 1 | Anthony | 23412 | Anthony sho.. |anthony,kinson| 109 | | 25 | 22 | 4 | Anthonys video | 23876 | This is ant.. |anthony,video,| 19 | | 83 | 63 | 4 | Last Nights g. | 23747 | Anthony loo.. |anthony,game,l| 9 | | 22 | 13 | 3 | Seattle even.. | 92834 | Our event in. |seatle,anthony| 7 |
So as you can see above, each item was given a combined relevance score, the higher the score the more relevant the item was to the search query. Searching for Anthony in this case we found an exact match in system 1 (which is a user). If you searched Anthony, chances are that the user with the name Anthony is who you are looking for and will be the most relevant result to show you, we then found a video (system 4) which had a partial title match, a tag and a description match, and so on and so forth.
Now, the system I have set up for Duxter is much more complicated than this, I've tried to simplify it as much as possible to show how it works. We in fact have a separate table with indexed tag words and keys and we return the top 5 results from each “system” in our quick ajax search feature. For our explorer page (Duxplorer) We also now search various systems with vast amounts of data, Communities, Gamers, Videos, Posts, Products and Games. Things start getting real complicated at this point and some of the queries we use are huge but are still able to provide relevant results in less than half a second. We’ll save all that for another day. for now though, I hope this look into making searches a bit more relevant comes in handy for you. If you would like to check out our new Duxplorer, visit http://duxter.com/explore/halo/ and experiment with the search bar in right side the sub-header.
Until next time Duxtonians.