Search Ranking
Guides, references, and examples to build with datalumo
Search Ranking
When someone types into your search box, which results should appear first? This page explains the three options Datalumo gives you, and how to pick between them.
The three ordering modes
| Mode | What it does | Good for |
|---|---|---|
| Relevance (default) | Best matches for what the user typed appear first. | Most search boxes. Works out of the box. |
| Boost | Best matches first, with a nudge toward newer, cheaper, higher rated, or some other signal you care about. | When both relevance and a meta signal should matter. |
| Sort | Results appear in a specific order you choose, like newest first. | A "Sort by" dropdown where the user picked an option. |
The API parameters for boost and sort are documented on the Search API reference. This page is the "when and why" guide.
How results get matched in the first place
Before ranking, Datalumo has to decide which entries even make it into the results. This step is the same for all three modes:
- The query is compared to every entry in your collection. Entries that are semantically related to what the user typed become candidates.
- The
thresholdparameter controls how loose or strict that match is. A low threshold lets in more entries; a high one keeps only the strongest matches. - If you apply
metafilters, only entries that pass them survive.
Once this candidate set is built, the ranking mode decides the order.
Boost: nudging the ranking
A boost is a gentle push. It tells Datalumo: "these results are all good matches, but prefer the ones that also score well on this meta field."
For example, imagine a search for support returns five equally relevant articles. Without boost, they appear in whatever relevance order the similarity math produced. With a recency boost, articles published in the last few weeks rise to the top. Articles from three years ago fall toward the bottom. All five still appear; the order just changed.
Boost types
| Type | What it does |
|---|---|
recency |
Newer entries rank higher. |
prefer_low |
Entries with a lower numeric value rank higher. |
prefer_high |
Entries with a higher numeric value rank higher. |
near_value |
Entries whose value is closest to a target you choose rank higher. |
Strength: how strong is the nudge?
Every boost has an optional strength setting:
low: a tiebreaker. If two entries are almost identically relevant, the boosted one wins. Otherwise it barely matters.medium(default): a meaningful nudge. A clearly fresher or cheaper entry will rise a few positions.high: the boost signal dominates among close matches. Use this when the meta field should almost always decide the order.
Start with medium. Move to high if the results feel too close to the un-boosted order. Move to low if the boost feels too aggressive.
Scale: what counts as "far from ideal"?
Scale defines how quickly the nudge fades:
- For
recency, scale is a duration like"7d","30d", or"90d"(the default). An article atscaleold is about halfway between "fully boosted" and "no boost at all." - For numeric boosts, scale is a number in the same units as the field. If your prices are in euros, a scale of
50means the boost fades noticeably across a 50-euro range.
Missing data
If an entry does not have the meta field you are boosting on, the boost has no effect on it. The entry ranks as it would have without the boost. Missing data is never treated as a penalty.
Combining boosts
You can send up to three boosts in one request. They stack. An entry that scores well on all of them rises the most; an entry that scores well on none stays at its natural relevance position.
Sort: forcing a specific order
Sort is different. Instead of nudging the order, it replaces it. The match filter still decides which entries appear, but once they are in, they are arranged strictly by the field and direction you pick.
Use sort when the user has explicitly asked for a specific order. A "Sort by: Newest first" dropdown is the most common example.
Type matters
When you sort, tell Datalumo how to read the values:
date: for timestamps. Entries with values like"2026-04-21T..."sort chronologically.number: for numeric values. Use this for anything that should sort like a number, including prices, ratings, counts.text(default): for strings like categories or titles.
Important: using text on a number field gives weird results. For example, "10" sorts before "2" because text comparison is letter by letter. Always pick number for numeric fields.
Missing data
Entries without the sort field end up at the bottom of the results, regardless of direction. This way, entries with data always appear before entries without.
Multi-key sort
You can pass up to three sort keys. The first one is the primary sort; the second breaks ties among equal values; the third breaks ties further. A common pattern is sorting by category first, then by date within each category.
Boost and sort cannot be combined
A single search request can use boost or sort, but not both. They answer different questions, and combining them would make the result order hard to predict.
If you send both, the API returns 422 Unprocessable Entity with an explanation.
The score breakdown
Every result includes two numbers you can use for debugging or UI polish:
"score_breakdown": {
"similarity": 0.74,
"final_score": 0.68
}
similarity: how well this entry matches the query, from 0 to 1. Higher is better.final_score: the number actually used for ranking.
If the two numbers are equal, the boost or sort did not move this entry. If they differ, the gap tells you how much the ranking changed. You can use this to show "strong match" badges in your UI, or to figure out why a result is appearing where it is.
Languages
Datalumo search works across languages. The semantic match is powered by a multilingual embedding model, which handles every major language including Chinese, Japanese, Korean, Arabic, Hebrew, Hindi, Russian, and all widely spoken European languages. Queries can cross languages too: a query in English can surface relevant content written in French or Spanish, and vice versa.
The keyword match safety net (which catches exact-term queries the semantic layer might underrank) uses two paths:
- Word-level match for English content. This path handles stemming (a search for "running" also matches "run").
- Substring match for every language. If the user's query appears verbatim anywhere in the text, this path catches it. This is the path that guarantees Chinese, Japanese, Korean, and other non-Latin keyword searches return the entry when the exact term is present.
Tuning the threshold for non-English content
Cross-lingual similarity scores tend to run a little lower than same-language scores. If a non-English collection seems to be filtering out relevant results, try lowering the threshold parameter from the default to around 0.25. Start there, widen further only if you see too many misses.
Known limitations
- For content written in one of the widely spoken European languages (French, German, Spanish, etc.), exact-term search still works via the substring match path, but without stemming. A query for "courir" will not auto-match "court" the way the English path auto-matches "running" and "run". The semantic layer typically covers this.
- Collections mixing multiple languages are fully supported. The search does not need to know what language an entry is in.
Performance
- Relevance-only searches are the fastest and scale to very large collections without extra work.
- Boost and sort are slightly more expensive because they look at meta data alongside the relevance score. At typical collection sizes this is not noticeable. For very large collections (roughly 1M content chunks or more), measure before relying on either in a latency-sensitive place.
- If you know you will always sort on the same field and your collection is very large, reach out. A dedicated index can make that specific field fast.
Which should I use?
- Only a search bar in your UI: use relevance. Add a boost if you notice that recency, price, or rating should clearly matter.
- A search bar plus a sort dropdown: use relevance when the dropdown says "Most relevant", and sort when the user picks anything else.
- A listing page with no search box, just filters: you probably do not need this endpoint. Use the entries list endpoint with filters instead.
When you are unsure, start with plain relevance. Look at the results. Only reach for boost when a specific ranking complaint comes up. Only reach for sort when your UI is explicitly offering the user a choice.