This portal provides access to scientific publications and research materials related to SHARE (Survey of Health, Ageing and Retirement in Europe). Use natural language queries to discover relevant work on health, ageing, retirement, cognition, and socioeconomic conditions across European populations.
SARA — SHARE Research Assistant
About this service
The SHARE Research Portal is a publication-discovery and research-support environment for the SHARE (Survey of Health, Ageing and Retirement in Europe) community. It brings together semantic search, a browsable publication index, topic-based exploration, and an AI research assistant over the SHARE-related scientific literature.
What it does
- Research Assistant (SARA): conversational access to the publication corpus with citation support, summarisation, and methodology questions grounded in the SHARE literature. SARA runs in two modes: a research mode over the paper index, and a data analyst mode for questions about SHARE variables, cross-wave harmonisation, and analytical workflows.
- Search Publications: semantic and hybrid (BM25 + dense vector) search over the full text of indexed publications, with optional neural reranking.
- Browse by Topic: OpenAlex-taxonomy-based topic tree (domain → field → subfield → topic) with publication timelines.
- Publication Index: bibliographic inventory compiled from the SHARE repository crawler, including entries whose full text is available and those known by reference only.
Semantic search — what it is, and why use it
Traditional keyword search only finds papers that contain the exact words you type. Semantic search goes a step further: queries and paper passages are both converted into high-dimensional numerical vectors (embeddings) that capture the meaning of the text rather than its surface form. Two passages that talk about the same concept end up close together in that vector space even if they share no vocabulary.
This has several concrete advantages for literature discovery:
- Paraphrase and synonym tolerance: a query for “cognitive decline in old age” also surfaces papers written as “neurocognitive ageing trajectories” or “memory loss in the elderly”.
- Natural-language queries: you can ask full questions (“how does early retirement affect depression risk?”) instead of guessing the right keyword boolean.
- Multilingual robustness: because the embedding model places semantically related terms close together, near-synonyms and closely related concepts across languages or disciplines tend to match.
- Better recall on conceptual questions: semantic search finds relevant material that keyword search would miss because the authors used different terminology.
The portal uses a hybrid retrieval strategy: semantic (dense-vector) scoring is combined with classical BM25 keyword scoring, so you keep the precision of exact term matches (for example specific variable names, author names, or waves) while also benefiting from meaning-based recall. An optional neural cross-encoder reranker then re-scores the top candidates by looking at the query and each passage together, which sharpens the ordering of the final results.
Data sources
- SHARE document repository (scientific publications, working papers, methodology reports, questionnaires, codebooks).
- Bibliographic inventory produced by the SHARE repository crawler
(
repository_inventory.bib), enriched with DOI and OpenAlex-topic assignments where available. - Topic assignments derived from the OpenAlex topic taxonomy; papers without an OpenAlex match receive zero-shot assignments as a fallback.
How it works
- Text extraction: PDF content is extracted, cleaned, and chunked before indexing.
- Embeddings: chunks are embedded with
BAAI/bge-large-en-v1.5and stored in a FAISS vector index. - Hybrid retrieval: queries are answered by combining
BM25 lexical scoring with dense-vector similarity; an optional
cross-encoder reranker (
BAAI/bge-reranker-large) refines the top results. - Research assistant: SARA runs a locally-hosted instruction-tuned LLM with retrieval-augmented generation against the publication index; it cites the source publications it draws from.
- Topic exploration: per-paper topic assignments are aggregated into a domain/field/subfield/topic tree and visualised as a publication-per-year timeline.
Index refresh
The index is refreshed on a regular schedule. New publications added to the SHARE repository are picked up by the next indexing pass, which reconciles the docstore against the live repository, re-runs text extraction and embedding for new files, and rebuilds the topic assignments.
Access
Access to the research portal is limited to registered SHARE-ERIC data users in good standing. Use of the service is governed by the SHARE-ERIC Conditions of Use and is restricted to scientific research consistent with each user's registered SHARE project.
Technical stack
- FAISS vector store, BGE-large embeddings, BM25 lexical index, BGE cross-encoder reranker.
- FastAPI back-end (Python); static front-end (HTML/CSS/JS).
- Locally-hosted LLM inference for the research assistant.
- OpenAlex-based topic taxonomy with zero-shot fallback.
Using SARA as data analyst
SARA has a dedicated data-analysis mode for questions about the SHARE dataset itself — variables, coding, harmonisation across waves, analytical workflows, and exploratory statistics — rather than the scientific literature. It complements the research-mode assistant: where research mode reasons over published papers, data-analysis mode reasons over the SHARE data structure and your analytical question.
How to use it:
- Open the Research Assistant tab and type your question.
- If the question clearly looks analytical (for example “how do I compute a change score between waves 6 and 8 for the EURO-D depression scale?”), SARA routes it to the data-analysis mode automatically.
- If the question is ambiguous, SARA shows a small disambiguation card asking whether you want the Research & Literature assistant or the Data Analysis assistant — click the one that matches your intent.
- Once you pick a mode, the conversation stays locked to that mode until you start a new chat, so follow-up questions are answered consistently.
The data-analysis mode can execute small analytical snippets, draw plots, and reference SHARE questionnaires and codebooks that are also part of the publication index. No extra setup is required beyond being logged into the research portal.
Connecting external AI tools (MCP)
The publication index is also exposed as a Model Context Protocol server, so you can plug the SHARE paper search into external AI assistants (Claude.ai, ChatGPT, Claude Desktop, and any MCP-compatible client) and let them run searches against the SHARE literature during a conversation.
Connection details:
- MCP server URL:
https://research.share-austria.at/mcp - Authentication: OAuth 2.0 (Authorization Code + PKCE)
- OAuth Client ID:
paper-search-client - OAuth Client Secret:
psc_Ti8oUlakh8pirXzmISY0tvgNkOJ1T2V4xU2xwvqQq5w
Claude.ai (web):
- Open claude.ai/settings/connectors and choose “Add custom connector”.
- Paste the server URL, client ID, and client secret from above.
- Click Connect; you will be redirected through the OAuth flow and back to Claude.
- Ask, for example, “search the SHARE papers for work on retirement and cognition”, and Claude will call the MCP search tool.
ChatGPT: under Settings → Integrations (or Custom Actions), choose “Add MCP server”, paste the same credentials, and complete the OAuth authorisation.
Claude Desktop: add the following to your
claude_desktop_config.json
(~/Library/Application Support/Claude/ on macOS,
~/.config/Claude/ on Linux,
%APPDATA%\Claude\ on Windows), then restart the app:
{
"mcpServers": {
"share-paper-search": {
"url": "https://research.share-austria.at/mcp",
"auth": {
"type": "oauth2",
"client_id": "paper-search-client",
"client_secret": "psc_Ti8oUlakh8pirXzmISY0tvgNkOJ1T2V4xU2xwvqQq5w"
}
}
}
}
External MCP clients are subject to the same SHARE-ERIC Conditions of Use as the portal itself — only registered SHARE users in good standing may connect, and all queries are logged with the same privacy-safe usage metrics as direct portal traffic.
Contact
Questions, bug reports, or feedback can be directed to the SHARE-AUSTRIA research support team.
Access and Use Conditions (SHARE-ERIC)
Access to the research portal and associated AI services is provided exclusively to registered SHARE-ERIC data users in good standing.
Use of this service and any AI outputs is subject to and governed by the SHARE-ERIC Conditions of Use and is strictly limited to scientific research purposes consistent with each user's registered SHARE project and user licence.
No additional rights are granted beyond those already conferred under the SHARE data access framework.
Note on Scientific Research Use
Under EU copyright law, in particular Article 3 of Directive (EU) 2019/790, research organisations are permitted to carry out text and data mining (TDM) of works to which they have lawful access for the purposes of scientific research.
The research portal and associated AI services form part of SHARE-ERIC's controlled research infrastructure. Access is therefore limited to registered SHARE users and may only be used for scientific research in accordance with the SHARE Conditions of Use.
Use of the research portal or AI services for commercial purposes, general public services, or other non-research activities is not permitted.
Usage Metrics and Data Collection
To operate and improve service reliability, SHARE-ERIC collects privacy-safe, aggregated usage metrics for the research portal and AI endpoints.
Collected metrics include:
- Service and endpoint name
- Time bucket (hour/day) and total request counts
- HTTP status classes (2xx/4xx/5xx) and latency percentiles
- For search endpoints only: query-length buckets and result-count buckets
Not collected in these metrics:
- Raw query text, prompts, or model responses
- Uploaded document contents
- API keys or authentication secrets
- Full IP addresses or direct personal identifiers
Public metrics views apply low-volume suppression (k-anonymity threshold) to reduce re-identification risk.