Leveraging LLMs for Automated Extraction and Validation of STI Policy Data

Toqeer Ehsan
VTT Technical Research Centre of Finland Ltd.

Investigate how Large Language Models (LLMs) can automate the extraction and validation of structured science, technology, and innovation (STI) policy information from unstructured web data.

  • Reduce the manual effort required in large international policy surveys (e.g., the OECD STIP Compass) by creating AI-assisted tools that can pre-fill and verify policy indicators such as instruments, themes, and target groups.
  • – Develop a data extraction pipeline to encode task-specific policy schemas and uses generative LLMs to extract relevant content from web sources.
  • – LLM-based automated response validation to ensure the quality of data extraction pipeline. Evaluation combines overlap ratios and label agreements.
  • – Addresses social and economic challenges related to the efficiency, transparency, and comparability of national STI policies.

Programme

Find out the research institutes that implement the Programme and where to find open positions.

Progress

Great start for the Programme! The first 36 postdoctoral researchers have been recruited and started their work. Half of them relocated to Finland from abroad.
Tulanet logo

Tutkimuslaitosten yhteenliittymä Tulanet

Latokartanonkaari 9
00790 Helsinki

Sanna Marttinen

Toiminnanjohtaja
puh. 029 532 6356
sanna.marttinen@tulanet.fi
© 2026 Tulanet