Leveraging LLMs for Automated Extraction and Validation of STI Policy Data
Toqeer Ehsan
VTT Technical Research Centre of Finland Ltd.
Investigate how Large Language Models (LLMs) can automate the extraction and validation of structured science, technology, and innovation (STI) policy information from unstructured web data.
Reduce the manual effort required in large international policy surveys (e.g., the OECD STIP Compass) by creating AI-assisted tools that can pre-fill and verify policy indicators such as instruments, themes, and target groups.
– Develop a data extraction pipeline to encode task-specific policy schemas and uses generative LLMs to extract relevant content from web sources.
– LLM-based automated response validation to ensure the quality of data extraction pipeline. Evaluation combines overlap ratios and label agreements.
– Addresses social and economic challenges related to the efficiency, transparency, and comparability of national STI policies.
Great start for the Programme! The first 36 postdoctoral researchers have been recruited and started their work. Half of them relocated to Finland from abroad.