What is Query Understanding?
Query understanding is the process of dissecting search queries to create an improved query that can return more relevant search results. It’s one of the most crucial components of excellent search experiences.
Examples of Query Understanding
Query rewriting, synonyms, spelling corrections, classification, NLP, vectorization, bigram, and trigram detection for query segmentation, semantic query understanding, personalization, localization, and query scoping (attribute mapping).
How does Query Understanding work?
Many helpful query understanding techniques, such as synonyms, spelling correction, and semantics, are part of the query rewriting family, which tries to increase overall precision, recall, or both by modifying the question. Below are some of those techniques most commonly used and how they work.
Synonyms
Synonyms are usually rather simple. Since they are usually operated as an alternative reality, a search for “gigantic shoes” can grow to “(gigantic OR enormous) shoes.” This is rather easy, however it could get more difficult if you have to decide which to prioritize. If synonyms are produced using machine learning, such as through word embeddings, the situation may become much more complicated. There is no need to construct additional synonym libraries because more modern semantic search tools, like vector search, already have a grasp of related, everyday phrases.
Spelling correction
Spell checking can function in various ways. One method for word correction is to identify the most likely replacement and then give users the option to override it. Another strategy is to be sympathetic and permit spelling errors of up to two letters. We refer to this as “typo tolerance.” The wisest course of action isn’t always to correct because language can be confusing and there’s a very high cost to making a mistake because people might lose hope and give up. Similar to synonyms, vector search tools may correct a variety of frequent misspellings.
Semantic query understanding
The technique of actually attempting to grasp the intent of questions is known as semantic query understanding. Since language is inherently ambiguous, polysemy is an ideal concept to explain why this is a problem for search: ” Poly” denotes numerous, and “semy” here refers to senses or meanings.
A prime example of this is the word “bank,” which can refer to either a financial institution or the bank of a river. It is hard to know without additional background. There are numerous examples in English in particular. Fortunately, there are ways to deal with this.
The context is usually more evident for inquiries containing many terms. It is more challenging for single search phrases; however, in these circumstances, the history of previous query sequences can be helpful. For instance, it would be unlikely that the second question was about the side of a river if someone looked for “atm” and then “bank.”
The importance of Query Understanding
People frequently misspell words, and English is also notoriously ambiguous (words like “test,” “mobile,” “apps,” “summer,” and “north” are all nouns, yet “Jaguar” might refer to a vehicle, an operating system, an animal, etc.). Additionally, people use jargon and make references to items that aren’t always specified in the result item text (such as “size 14 shoes,” “near a park,” “next Thursday evening,” etc.). To improve the ability to query the underlying data structure, all of this must be transformed into something relevant.
Additionally, results can be prioritized via query understanding. For instance, historically, when people searched for “license” on a government website in the US or Australia, they were typically looking for a page about renewing their driver’s license rather than the hundreds of other pages referencing “license”. Even if there is no other signal in the question itself that the desired destination is most significant, historical performance data can automatically increase its value.
Understanding search queries is frequently an excellent test of search technology. What if you needed to automatically spell check queries or map “size 14” to a size attribute? What about “next Thursday evening” to a time filter? Can you achieve this with your flow of production? If not, you are irritating your clients and leaving them behind.
Best practices to implement Query Understanding
Your companion when it comes to setting priorities for query understanding should be the query logs. For instance:
- High volume, zero-result searches indicate opportunities. What causes them to fail?
- Query scoping may not be particularly helpful if no one uses natural language and/or if the material is not well-structured.
- Synonyms may be helpful if business terminology does not translate well to customer talk.
- How frequently do your consumers mistake in spelling? Corrections will surely be beneficial if they often make mistakes.
Advanced methods like vectorization and personalization ought to be used last because they demand a lot more work. Getting the fundamentals right can yield considerably more value than an individualized search, which frequently seems to be the holy grail. After all, personalizing poor outcomes won’t be helpful.