Readers/Information Retrieval/Phase 1
Phase 1 builds directly on the findings and discussions from Phase 0: Idea Validation. In Phase 0, we learned that some readers struggle to find specific information on Wikipedia using internal search—especially when their queries are phrased as questions or natural language—and that a hybrid approach to search may be worth exploring further.
The purpose of Phase 1 is to move from conceptual exploration to a small, time-bound production experiment on a small number of pilot wikis, so that we can learn from real reader behavior at scale. This phase is not about launching a permanent feature. Instead, it is designed to help us understand whether hybrid search meaningfully improves information-finding on Wikipedia, and under what conditions.
The experiment will be shown to a subset of readers on pilot wikis, primarily logged-out readers, so that we can compare behavior between existing search and hybrid variants. At this stage, editors will not be included in the experiment group.
What we mean by “hybrid search”
[edit]In this project, hybrid search refers to using two approaches to information retrieval:
- Keyword-based (lexical) search, which works well when readers know the exact article title or name they are looking for.
- Meaning-based (semantic) retrieval, which can better support question-style or exploratory queries even if the words don't perfectly match the title or the exact words in an article.
All results shown during this phase will be drawn from existing, human-authored Wikipedia articles and sections. This work does not involve generating new answers, rewriting content, or summarizing articles. The focus is on improving how readers are directed to relevant parts of the encyclopedia that already exist by adding semantic retrieval to search results.
At the end of Phase 1, we expect to be better equipped to decide whether this work should continue, change direction, or stop.
What Phase 1 will not include
[edit]Based on Phase 0 findings and community feedback, several ideas are intentionally out of scope for this phase.
Phase 1 of semantic search will not introduce chat-style interfaces or content summarization. It will also not include personalization, predictive queries, or cross-language retrieval from other Wikimedia projects such as Commons or Wikidata. Finally, Phase 1 will also not introduce new search entry points or replace existing search behavior; it builds on the current search experience with limited, testable changes.
These constraints help us avoid introducing additional technical, editorial, and governance questions that are not appropriate to address before foundational retrieval and trust questions are answered.
How we will learn and evaluate outcomes
[edit]Phase 1 will evaluate different retrieval and presentation approaches through an A/B test.
In this experiment, the control group will see the current Wikipedia search experience, while the experiment group will see the same experience with the addition of semantic results. This allows us to compare how readers interact with existing search versus seeing semantic retrieval in addition to lexical under real conditions.
In particular, we are interested in learning:
- Whether showing short excerpts from articles (for example, sentences or short paragraphs anchored to a section) helps readers decide more quickly whether a result is relevant.
- How clearly indicating the source of an excerpt—such as the article and section it comes from—affects comprehension and trust.
These questions emerged repeatedly in Phase 0 research and community discussions, especially around concerns of context loss, attribution, and reader confusion.
Additionally, we will look at a combination of behavioral signals, qualitative feedback, and trust indicators. We will also closely monitor guardrails around performance, and overall retention. If results suggest harm to trust, clarity, or usability, the experiment will be adjusted or ended.
However, should indicators suggest this approach is one to continue with, we can learn from the data gathered in the experiment and use it as an input for deciding when search should rely on keyword search, when to incorporate semantic retrieval, and how the two can complement each other depending on the query.
This experiment is intentionally tightly constrained. Testing will run in short, time-bound iterations rather than as a long-running deployment, allowing us to compare variants, incorporate feedback, and adjust or stop as needed.
Phase 1 status and timing
[edit]Phase 1 work is beginning, with the team focused on design exploration, technical feasibility, and incorporating early community feedback on the structure of the Phase 1 experiment. Any production testing with readers will occur only after this preparatory work is complete, and details will be shared here and via updates in advance and throughout the life of the experiment.
Because this is a deliberately small and iterative experiment, our aim is to test in production and have initial insights for discussion by sometime in March 2026. This timeline is flexible and may change as we learn more about technical effort and incorporate feedback from the community.
Readers will not be asked to choose between different search modes. Instead, the experiment may surface results from both keyword-based and semantic retrieval approaches within the same search experience, so we can better understand how each performs across different types of queries.
Success indicators
[edit]- Increase reader engagement with on-wiki search: achieve an increase in the number of search sessions initiated per unique user compared to baseline.
- Increase depth of engagement: achieve an increase in average session length (from first query to last interaction).
- Supporting indicator: track queries per search session, segmented into:
- Exploration queries occur after a successful click + normal dwell time (healthy curiosity / rabbitholing).
- Reformulation queries near-identical or quickly repeated queries after no click or short dwell (friction).
- Supporting indicator: track queries per search session, segmented into:
- Increase efficiency of discovering content: reduce median time-to-click compared to lexical baselines.
- Supporting indicators: lower reformulation rate and higher click-through on top-ranked results.
- Track good abandonment separately as searches with no reformulation or exit within the observation window (indicating the user likely found what they needed in the snippet or preview).
- Increase perceived relevance and satisfaction: at least 80% of users agree that search results are relevant and satisfactory.
- Increase retention of logged-out users: raise the seven-day search return rate for logged-out readers by 5% versus baseline.
- Generate validated learning data.
- Understand performance implications in production: does median search latency increase by more than 15% compared to baseline.
- Maintain trust: at least 85% of surveyed/interviewed users correctly identify the source of a snippet; negative “untrustworthy or confusing” feedback remains ≤ 5%
- Maintain overall app retention: we don’t see a statistically significant decrease in overall app retention when comparing test and control groups
Though we will monitor all of the above success indicators, our main metric of success is an increase in total search engagement, defined as sessions initiated x average session length.
What this phase helps us decide
[edit]By the end of Phase 1, we aim to have clearer answers to questions such as:
- Does hybrid search help readers find information on Wikipedia more effectively?
- Which retrieval and presentation approaches work best—and which do not?
- Are the technical and editorial tradeoffs acceptable within Wikimedia’s values?
- Is this a direction worth continuing to explore?
Phase 1 is a step toward informed decision-making, not a commitment to a specific outcome. With this in mind, if there are other areas you’d like us to evaluate, please let us know on the discussion page.
Early design concepts
[edit]As part of Phase 0, we explored a set of early design concepts to help us understand how readers might experience improved information discovery on Wikipedia. These concepts were intentionally non-production mockups, used in research and community conversations to surface reactions, questions, and concerns—rather than to propose finished features.
In particular, Phase 0 design work focused on how readers respond when search results surface short, article-derived excerpts (such as section-level snippets) instead of only full-article links. Research and feedback suggested that many readers value being able to quickly assess relevance and, when appropriate, jump directly to a specific part of an article. At the same time, this work also surfaced important questions around context, attribution, and the risk of confusing human-authored content with machine-generated output.
These early concepts helped us clarify what is worth exploring further and what requires additional care or should be deferred.
Phase 1 wireframes
[edit]The wireframes shared in Phase 1 build directly on the lessons from Phase 0. Rather than introducing new interaction models, they focus on a narrower set of questions that emerged from earlier research and community discussion:
- What is the best way to have users make an informed decision around trying the experiment?
- How do we visually work around the technical limitation of users deciding which retrieval method to choose, instead of the system?
- How much context is helpful when showing excerpts in search results?
- How should provenance be displayed so readers clearly understand where information comes from?
- How do different visual layouts affect scanning, trust, and decision-making?
- How might we get feedback from users about search results, especially if they’re possibly getting the information they need without clicking through to a full article?
The wireframes are therefore exploratory artifacts, meant to support discussion and testing—not a final experience. They illustrate different ways existing, human-authored Wikipedia content might be presented during search, so that we can evaluate which approaches help readers without introducing confusion or misrepresentation.
-
First option for the hybrid search phase1 wireframe
-
Second option for the hybrid search phase1 wireframe
Phase 1 MVP designs
[edit]-
This screenshot shows the onboarding message to participants included in the experiment.
-
This screenshot shows an example of what the search experience looks like once someone types in a semantic query.
-
This screenshot shows the main screen that will be shown to participants in the hybrid search experiment.
As with Phase 0, any discussion of these concepts will be weighed alongside technical feasibility, before determining a path forward.
We welcome thoughts on what feels clear, confusing, helpful, or concerning about these wireframes.
Community involvement
[edit]Community feedback has already shaped the scope and boundaries of Phase 1, particularly around concerns of mixing AI-generated and human-authored content, attribution, and the role of search versus recommendation. We’re happy to continue the conversation with the Wikimedia communities on the discussion page.
We talked to the English Wikipedia Discord community on December 3, 2025, and at WikiCon North America on October 19, 2025. Pilot communities at English, French, and Portuguese wikis have also been engaged on-wiki. We have also shared semantic search with the African admins community to get feedback from editors at smaller language wikis.
Below are some highlighted quotes:
- "Yesterday I was searching for some New Zealand-related topics. I typed NZ {phrase}, but it didn't come up with the correct article; although it works often enough for me to instinctively do it, stuff like this still fails for me a reasonable portion of the time. Having a search system that handles this sort of thing would be a better way of handling this rather than creating redirects for every conceivable topic."
- "Working on a better search system is a good idea, question based search has been a my staple on Google for 20 years and now I use LLMs."
- "Chat-style search is going to be a necessary part of Wikipedia as that becomes more and more how people expect to engage with the internet. Younger kids shout questions to Alexa or Siri without ever surfing the web. Early Wikipedia used a model that barely made use of search at all; this just feels like a logical step."
Additional ideas and recommendations from the community included:
- Editors adding anchors in-article to relevant sections to support jump-to-section search.
- Editors providing metadata or annotations that improve semantic search relevance.
- Search sending readers to specific headings in-article.
- Use of tech like scroll-to-text fragments to highlight relevant article text.
- Testing ways to show citations in search.
- Improved snippets on search results pages.
- Allowing mixing semantic and keyword search in the same query.
Editors underscored the need to make readers aware if any writing or content has been generated by AI, for future explorations. Another noted risk was to avoid creating incentives for editors to "optimize" articles for search visibility to prevent SEO-style competition, promotional content, or result manipulation.
Another concern was "What happens when users ask questions Wikipedia cannot answer?" There needs to be a “Wikipedia-appropriate” way to let readers down gently and guide them toward understanding that Wikipedia isn’t the right source.
For readers exposed to the experiment, the Phase 1 experience will be clearly labeled as an experiment. Readers will have the opportunity to opt-out of the experiment. Those who opt-out will continue to see the current search experience unchanged. Readers who opt in will continue to see the existing keyword-based (lexical) results, with additional semantic results surfaced alongside them as part of the experiment. This will allow us to learn without disrupting readers' information retrieval.
Throughout Phase 1, we will continue to share updates, findings, and open questions on the project page and invite discussion. Any proposal to expand, continue, or make aspects of this work permanent would come back to the community for further conversation first. If you’d like the team working on this project to join any community spaces, please do not hesitate to let us know on the project discussion page.
Pilot wikis
[edit]At this time, we will be working with Greek, French, English and Portuguese Wikipedia for Phase 1. Although the pilot experiment will be available to a subset of readers on those language wikis, we welcome discussion with community members beyond these language editions. We wanted to share a timeline update here that we will be beginning the experiment on the Android app for Greek Wikipedia in late February, followed by English, French, and Portuguese likely in March.