Follow a single word as it travels from the search bar all the way to ranked results — every detection, correction, dictionary lookup and learning step in between. The journey splits into two connected engines: Search Suggest (turning raw keystrokes into a clean suggestion) and Search Result (turning that suggestion into matched inventory).
A character (or partial word) lands in the search bar. The raw, unprocessed string enters the pipeline exactly as typed — including typos, casing and stray characters.
Identify the script and spoken language so the right rules apply. Disambiguates near-identical alphabets — Japanese vs Chinese, Hindi vs Bengali, Urdu vs Persian — and handles cross-language intent (search in CJK, results in English).
Fix misspellings using N-Gram / Shingle similarity and Edit-Distance scoring. Four classic error types are repaired by transposition, insertion, deletion or substitution.
Clean and reduce the term: strip stop words (the, to, of, and), flag slang / negative / abusive tokens, then stem to the root word so variants collapse to one canonical form.
The cleaned term is checked against three suggestion dictionaries in priority order. The first tier that has a match wins and returns immediately — personalised history beats regional, which beats global.
What this person has searched & picked before.
scope · individualPopular searches within the user’s region.
scope · regionalSystem-wide demand, ordered by rank & word frequency.
scope · everyoneReturn the suggestion immediately, ranked by frequency & popularity. Pipeline ends here — fast path. → flows into Phase B.
No history match anywhere. Fall through to the authoritative Words Dictionary check below.
Is the cleaned term a real, valid word at all? The Words Dictionary is the source of truth that decides whether this becomes a brand-new suggestion or gets rejected as noise.
Promote it — add the term into all three suggestion dictionaries (user, country & global) so it’s instantly available next time, then return it as a suggestion.
No suggestion. The input is treated as a bug / garbage / nonsense string and the suggest pipeline stops cleanly.
The clean, corrected, language-aware keyword arrives from Phase A as the trusted query seed.
Map the keyword onto the inventory’s tag vocabulary — the words products are labelled with, e.g. men, shirt, uniqlo.
Fetch tagged items
Pull every product carrying the tags men + shirt + uniqlo.
Establish context
The intersection defines intent: “Uniqlo men’s shirts” inventory context.
Emit the matched list with concrete product codes / product links — the visible search results shown to the user.
Search isn’t one-and-done. Each interaction quietly upgrades the dictionaries and tag graph, so the next query for everyone gets better. This is the engine behind the keyword → word → tag promotion ladder.
Every searched word that returns inventory gets its usage count bumped.
When frequency crosses a threshold, the word climbs to the next ranking tier.
On a product click, log the trail into user history & link it to that user’s other picks.
If a word’s rank reaches “tag” level, attach it to the product’s tags. The vocabulary grows itself.