A machine’s touch

Turn your search engine into your best salesperson

Doug Turnbull

No matter how bright your salesman’s smile is, no matter how pretty the pattern on his clip-on tie, a search engine can still do a better understand what your customer is looking for. When it comes to selling stuff, online search systems can do better than humans, demonstrates Doug Turnbull.

Here come the holidays! Stores are training new sales staff like mad. Unfortunately the search bar, by far the busiest sales “person,” is left out of all this expertise. Customers try hard to explain what they want in their searches, but end up leaving the online store, frustrated at the random set of unrelated product offerings returned by the dumb search engine.

If you’ve ever talked to a rushed salesperson that can’t or won’t understand you, you know this frustration. Sadly, this feeling dominates the search world. Few enterprises care or try to get search up to snuff to understand customers. And it’s not just sales: search fills expert roles including doctor’s assistant, medical librarian, matchmaker and even hospital information desk employee. A good search application understands its user audience’s language and needs to hunt through a set of content and rank the content based on what these users consider valuable or important.

What do we mean? It’s easier to show than tell. Let’s take a brief tour through a simple problem using the Solr search engine. Here we’ve got a simple Solr loaded with data from The Movie Database. We’ve hosted this Solr for you to play with too – so check it out. You’ll turn the search engine into a movie salesperson. Let’s program the search engine to find and prioritize what our shoppers deem important when purchasing movies.

You’d like a robot movie? Here’s our selection …

The first step in any sales engagement is to lay out all the options. A customer asks you a question – movies about “robots” – and you lay out your store’s broad range of options. “Over here, you’ll see we have a broad range of movies featuring robots …”

How could a sales-driven search application use Solr to get this initial list of offerings? Here, we’ll start with a basic search that returns just about any marginally relevant search result. Getting your hands dirty, you’ll see that Solr works over HTTP. It takes search parameters in the URL like so: title overview&tie=1.0

Here the search engine answers the user’s query (in the q parameter) to find movies with robots. The search engine is asked to run this search string over a number of important fields (specified in the qf – “query fields” parameter). We tell Solr that each field contributes to the relevance solution (tie = 1.0).

Our search results show the range of results possibly interesting to the customer. The search results with this basic query include films:

  1. Robots
  2. I, Robot
  3. Robot & Frank
  4. Automata
  5. Robocop
    … Extending to a smattering of a large set of results …
  6. Pacific Rim

But which robot movie should I buy?

When browsing a big selection of movies, shoppers can get stuck choosing. Now they begin to weigh the many factors that come into play during the buying decision. A good salesperson knows this. They steer customers to movies the customer would probably enjoy. Whereas the salesperson might highlight a set of movies they know would likely satisfy customers, search can use metrics that measure these factors.

What does our search engine have to work with? Let’s start with how well rated a movie is (a field called vote_average). This 1-10 rating corresponds to how users have rated the movie (1 being terrible, 10 being excellent). Let’s boost these highly rated results higher to highlight them as possibly better options to our customer just like a salesperson would.

Now the creative space here is huge. You can tell the search engine to execute whatever math you’d like. But we’ll keep it relatively simple and admittedly naive. After some basic tuning of Solr, you arrive at the idea to multiply the base text relevance score by 3.2 ^ vote_average. Taking vote_average as an exponent makes a 7.0 rated movie mean significantly more than a 6.0 rated movie. In the end, this computes a blend of text relevance (“robot” matched in the title/overview) along with prioritizing well-reviewed films. To tell Solr about this math, we apply to the URL query:


Which now gives back these more highly rated robot films:

  1. Robot and Frank (vote_average: 6.7)
  2. I, Robot (vote_average: 6.5)
  3. BURN-E (vote_average 8.0)
  4. Big Hero 6 (vote_average 6)
  5. Robots (vote_average 5.8)

Noticeably missing from our previous list are some lower-rated Robot movies:

  • Robocop (vote_average 5.8)
  • Automata (vote_average 5.6)

In other words, the search experience says to users, “here are some movies, definitely about robots, that appear to be reasonably well regarded”
As a developer, you’re likely starting to question and ponder. Have the right decisions been made? Is this sales experience going to please the customer? It’s easy to start seeing places it could use further tweaking. Maybe, much like a keen salesperson, the search criteria should be better personalized to each shopper. The 18-year-old student and 80-year-old grandparent probably have different tastes. You might also wonder to what extent the text relevance should come into play. Maybe the text should be enhanced by teaching the search engines about common synonyms, like “robot”? Your first step into any search application begins an endless process of experimentation and tweaking.

You got anything more recent?

What other signals might factor into our customer’s buying decision? For many, the recency of the content matters. And here’s where a good salesperson understands their audience. Is this a store for aficionados? Or would the average customer hunt for something released more recently? Let’s presume it is the latter – the recent hot movies are what our customers probably want.

After some additional consideration, you decide the best course of action is to slightly boost down any content older than ten years. After much work, you arrive at the following formula:


Hopefully such gobbledygook makes you long for the days of working with Excel spreadsheets. Long story short, this complex expression simply evaluates whether the film is more than ten years old (that big number is ten years in milliseconds). Unfortunately there’s currently no “less than,” so you play tricks like taking the minimum of a large subtraction, letting movies older than ten years take negative values


The rest of the “if” says what to do when movies are older than ten years

… 0.8,1)

If this evaluates to be more than ten years old, than we multiply by 0.8, otherwise we multiply the score by 1, not impacting relevance at all. Ok, how does this help our robot search? With this additional factor, now we’ve got:

  1. Robot and Frank (vote_average: 6.7, release_date 2012-08-16)
  2. I, Robot (vote_average 6.4, release_date 2004-07-15)
  3. BURN-E (vote_average 8, release_date 2008-11-17)
  4. Big Hero 6 (vote_average 8, release_date 2014-11-07)
  5. Ex Machina (vote_average 8.1, release_date 2015-01-21)

Bumped from our candidate list is the one time leader: Robots (vote_average 5.8, release_date 2005-03-21).

Also interesting is Ex Machina, a highly-rated movie certainly about robots, has only come into the top 5 at this point. Further the older I Robot still remains a contender. It’s certainly about robots, reasonably high rated, but nevertheless was release more than ten years ago.

Robotic Sales People

The search engine is getting better. It’s prioritizing factors important to the sales experience. But these solutions are only scratching the surface of what’s possible. For an e-commerce search product, this would be just the beginning of a lengthy process to build the right sales expertise into the search engine. You need to ask:

  • Do the factors we’ve uncovered here work for more than the “robot” query? How do you test search to account for all the important search queries?
  • How do you tweak text-based relevance factors for short snippets of text like titles or short, terse description fields?
  • Should synonyms be included? (such as equating “robot” with “android”?) Should these matches be treated equally, or weighted less?
  • Should users in different segments receive different weightings (the film aficionado vs the casual moviegoer? the parent shopping for their kids vs the elderly couple?)
  • How do you satisfy different use-cases such as highly targeted search (going straight to the movie the shopper wants) vs more discovery-oriented search (the shopper that says “I’m just browsing, thanks.”)
  • What about the entities being searched: how does searching for movies by cast member differ from searching for movies by description?
  • How do you track, in production, what searches matter most to this shopping experience? How can you tell whether search is answering user’s questions?

The number of factors can be overwhelming. Every question could be a lifelong project. Google spends billions on their search for a reason.

You too can build smart search!

But taking the first step is the most important. You CAN build search that seems like it “gets” users. All it takes is figuring out what users expect: recent movies? highly-rated ones? Those that feature preferences they’ve asserted in the past? Those that mention concepts related to the words searched for? All this information can be brought into the search engine. You can incorporate it using the search engine’s Query API.

The future of user experience won’t be dictated by those that know just how to put buttons and menus on a screen. Instead it will be led by those that can answer questions. Let this article be your first step in exploring how you can expertly answer your search users’ questions. Don’t let the sales (or other) interaction only occur between humans. You CAN do this!

Doug Turnbull
Doug Turnbull, Solr and Elasticsearch expert and author of Relevant Search, lives and breathes smart, relevant search. Doug moves clients away from basic text matching to search with domain and business intelligence built in. To help bridge the gap, Doug created Quepid, a search relevancy collaboration canvas used extensively in Open Source Connection’s search work, and Splainer a Solr search results explanation tool and sandbox.

Inline Feedbacks
View all comments