Katrina Soderquest studied a range of sciences at university and followed a path into Data. Here Katrina reflects on ‘Search and Resolve’ and popularity models that help Hotels.com® give customers the best results.
‘Data On the Ground’ blog for Women In Data UK
Background to the role
I work as a data scientist for Hotels.com® brand. This is my second data science role following a transition from academic biology in my earlier career. Given the breadth and buzzwords of the data science field, it’s easy to become overwhelmed as someone looking from the outside in. When I was making my career change, I found focusing on the concrete reality of how data science projects work ‘on the ground’ helped me see through some of the hype to an area that was challenging yet comprehensible. I hope this post helps others achieve the same.
Applying data science to search
When customers start searching for a destination or property on Hotels.com, the Search and Resolve Service (SRS) suggests possible final search destinations or hotels, based on the initial letters typed, in a drop-down list. In order to make these suggestions, the system must score destinations as to how likely it is that this is the destination the customer is searching for. Scoring is based on a mix of string matching with the letters the customer has typed, the popularity of a destination and a few other business rules, based on customer language, for example. Given that a customer who cannot find what they are looking for is likely to become frustrated and try a different website, getting SRS wrong really means failing at the first hurdle in helping a customer choose to make a booking with Hotels.com over a competitor.
However, geography is messy – a single name can correspond to multiple places (London UK, London Ontario) and we can refer to a place in multiple ways (New York, Big Apple). Add to that a variety of different languages (London, Londres, etc.) and things can get quite complicated. As such, determining customer intention quickly when they start searching for a hotel or location is somewhat challenging. Historically, SRS has been driven by business logic and heuristics as well as the need to develop crucial software infrastructure to best serve customers when a search string is misspelled for example. As with any tech company, Hotels.com has always tested these changes with on-line A/B testing but there is an increasing feeling that testing isolated business rule changes and picking the winner is no longer sufficient in the increasingly competitive travel marketplace.
Since starting at Hotels.com less than a year ago, I have worked on the algorithmic development of the SRS system. In short, can we make the system better for customers, using machine learning and other data science techniques. Ultimately, we want suggestions that are highly relevant and updated in real-time. Of note, despite the data science focus, such development requires input from a wide range of colleagues within the company. One set of skills crucial to any data scientist role, but often overlooked relative the need for statistical thinking and programming ability are the skills needed to work well with other people, communicate effectively and understand the business problem. At Hotels.com, we are very lucky to have a range of highly competent software and data engineering teams as well as strong product and technical product owners, all of which input to the range of facilities that comprise Hotels.com. Working across different disciplines, offices, languages and even time zones requires an ability to anticipate issues, build consensus and work transparently before you even get to building your first model. But it brings the excitement of understanding new perspectives and making a tangible impact in the trillion-pound global travel industry.
My first major project on joining was to find a better way of scoring the popularity of our destinations. In many respects this project followed a fairly typical data science workflow, highlighting many of the common themes in working as a data scientist – common thoughts and challenges on the ground as well as the satisfaction of producing a working model that goes on to the live site and the excitement of the up and coming possibilities.
Getting the data right
Fundamental to any data science project is having good quality data, understanding that data and understanding any caveats to that data. In the case of destination popularity for SRS, our primary data-source is our clickstream data – a table containing hundreds of columns of anonymized information on customer interaction with the website including the search string typed and the destination of the page visited. Each row records an interaction with the site and, from here, we can get an idea of how popular a destination is by counting how many times its page is visited. Indeed, this was our baseline in starting the project. However, this fairly basic method raises two questions seen over and over again in data science:
- How do we avoid skewed feedback loops? For example, if we make a suggestion that is not relevant, but a customer clicks on it because they were confused by it being suggested then this destination gets credit for that when it shouldn’t. Let this happen a few times and you have a skewed dataset. On the flip side, if we add a new destination to our site, it won’t have any clicks so might not get suggested so can’t get any clicks and the cycle continues.
- How do we do better when we have small amounts of data? Where we have new / unique combinations of customer and search string, how do we make the best suggestions?
In attempting to solve these problems, we also turned to a newly available dataset which contains further information on how SRS operates on our website – detailing what strings customers have typed and what destinations were shown as a result of a customer typing these strings. This level of granularity is very helpful in understanding what strings customers are typing, what problems customers might be seeing as a result of typing certain strings (e.g. are there common misspellings that we should resolve for our customers) and also in evaluating our new popularity models (more on this later).
In terms of modeling, to better model the popularity without the problems of skew and sparse data, we turned the problem into a regression problem. Can we take our clickstream data from a set period of time, supplement it with other internal and external data sources, and use it to model the popularity of a destination beyond that period of time as a smoothed score with fewer biases and data sparsity issues? The task then becomes finding the best regression model using standard techniques such as cross-validation and scoring on a held-out test set. We did this work using the databricks platform over AWS. Data transformation and modeling was with scala-spark using the ML pipeline libraries.
In this project, we tried to model a destination’s popularity as a regression problem which immediately suggests a first line evaluation metric of root-mean-square-error (RMSE) on our held-out test set. However, unlike more ‘textbook’ examples of linear regression, we are ultimately not actually interested in how accurately our model predicts future popularity in absolute terms – instead we care about how well the destination ranking fits customer expectations. Indeed, in some cases, (where our data is skewed or sparse, as discussed above), we actually want our model to be less than perfectly accurate.
So, what do we want to measure? Ultimately, we are interested in two things about our model. When the customer searches for their desired destination:
- Does the customer’s desired destination appear on our suggested list?
- Where does the desired destination appear on our suggested list?
And, remember, the desired destination can be different for the same search string. As a striking example of this customers in Wisconsin, Illinois and surrounding US states regularly book holidays to the local resort of ‘Geneva Lake’, while customers typing ‘Geneva Lake’ from Europe are generally looking for the landmark on the Swiss-French border.
And remember, the SRS system does not score destinations based on popularity alone – string matching and other business rules also play a part. To assess our model more fully, we used the previously mentioned dataset on how SRS operates on our website. This table captures information on the destinations shown and the order in which they are shown for every search query string typed as well as information on the final score assigned to each destination. This allows us to use the Normalised Discounted Cumulative Gain (NDCG) metric which assesses ranking quality – scoring relevant destinations that are near the top of a ranked list more highly than relevant hits further down. If we take the current scores of a destination for any typed string, recalculate the scores based on our model and re-rank the destinations in the order they would be seen by the customer we can compare how well the current rank performs versus how well the rank with our new popularity performs with respect to how often a destination was clicked.
Deployment and A/B Testing
Having produced a model and collected preliminary data to show it might be effective on the website, it was time to test effectiveness in the real world. Like many other tech companies, this means A/B testing to find out how well our new model does against the current situation in terms of two key metrics – conversion and gross profit. This part was run by a separate team who specialize in web analytics and who can run tests impartially and without the conflict of interest that the team proposing the test would have. In this case, after a few weeks wait, we were informed of a test-win and the excitement of now being able to roll the model out live to the website. This also gives us the opportunity to examine how close our off-line evaluation correlated with on-line test results – in essence, how well can we predict success before going to test.
Next steps and final thoughts
In many ways, this work demonstrates some of the core techniques and methodologies that are a standard part of day-to-day data science. But having demonstrated the utility of using more algorithmic process in our SRS, we now have the opportunity to go further, incorporating further features and ideas and drilling down to better and better suggestions at the customer level.
About me – Katrina Soderquest
Like many, I did not start out as a data scientist. I studied a range of sciences at university but specialized in biochemistry and went on to take a PhD in immunology and genetics. As part of this, I was analyzing large genetic datasets and had my first taste of programming as I tried to wrangle data and run statistics in R. At the end of my PhD, I took a short break from research to work in public engagement and science communication. This also gave the opportunity to reflect on my career and work out what I wanted to do next. I had really enjoyed the programming and analytical aspects of my PhD and wanted to develop these skills further. It was also becoming increasingly apparent that the roles requiring those skills were both growing in number and also saner in terms of work-life balance than some of the more lab-based projects available (no more cell culture in the lab at midnight for me…!). Taken together, my bioinformatics post-doc, studying genetic variants in a rare type of kidney disease was a very appealing next step. During this, I learnt the basics of Python to which I became a fairly complete convert following my discovery of the Pandas library. I also developed skills in machine learning, with the help of multiple on-line courses, some playing around on Kaggle and the realization that one of my projects at work could actually benefit from a never tried before machine learning approach. After nearly two years, having developed an even stronger taste for those problems typically seen in the data science world and a desire to see what the fuss around ‘big data’ was about (and with half an eye on how an academic salary might struggle to pay a mortgage in London), I joined the ranks of the private sector. My first official data science job was with the Tesco Clubcard Analytics team. It was here that I first learnt Spark and the basics of Scala to go with it and built my first ever commercial data science product – a recommender to help customers find the most relevant boost partner with which to spend their Clubcard vouchers. From here, I moved to my current role at Hotels.com, complete with new challenges and new tech but a lovely team and exciting opportunities. We will see what the next few years bring…