Leveraging LLMs for Automated Extraction and Validation of STI Policy Data

Toqeer Ehsan
VTT Technical Research Centre of Finland Ltd.

Investigate how Large Language Models (LLMs) can automate the extraction and validation of structured science, technology, and innovation (STI) policy information from unstructured web data.

  • Reduce the manual effort required in large international policy surveys (e.g., the OECD STIP Compass) by creating AI-assisted tools that can pre-fill and verify policy indicators such as instruments, themes, and target groups.
  • – Develop a data extraction pipeline to encode task-specific policy schemas and uses generative LLMs to extract relevant content from web sources.
  • – LLM-based automated response validation to ensure the quality of data extraction pipeline. Evaluation combines overlap ratios and label agreements.
  • – Addresses social and economic challenges related to the efficiency, transparency, and comparability of national STI policies.

Organisations

Find out the research institutes that implement the Programme and where to find open positions.

Progress

The second kick-off event brought together postdocs who joined the programme at a later stage.
Tulanet logo

Tutkimuslaitosten yhteenliittymä Tulanet

Latokartanonkaari 9
00790 Helsinki

Sanna Marttinen

Toiminnanjohtaja
puh. 029 532 6356
sanna.marttinen@tulanet.fi
© 2026 Tulanet