Webscraping Kursnet - Data Collection & Research Tool

Project Overview

This project developed an automated web scraping system to systematically collect data from Kursnet, the Federal Employment Agency's continuing education database. The scraped data formed the foundation for scientific qualitative evaluation of continuing education offerings in Germany.

Technical Highlights

Intelligent Scraping: Adaptive scraping algorithms that handle dynamic content
Data Quality Assurance: Built-in validation and error handling
Scalable Architecture: Designed to handle thousands of course listings
Research-Ready Output: Structured data export for qualitative analysis

Implementation Details

The scraper was built with:

Scraping Engine: Node.js with Puppeteer for handling JavaScript-rendered content
Data Processing: TypeScript for type-safe data transformation
Storage: MongoDB for flexible schema and easy querying
Batch Processing: Queue-based system for reliable large-scale scraping

Research Contribution

This tool enabled:

Analysis of continuing education trends
Identification of skill gaps in the job market
Evaluation of regional differences in course offerings
Published research in peer-reviewed journal

Publication

The findings from this project were published in a peer-reviewed article discussing the implications for adult education policy.

Tech Stack

Node.js

TypeScript

Puppeteer

MongoDB

Express

Queue Management

Data Validation

Security Features

Rate Limiting

Respectful scraping with built-in rate limiting and delays

Data Anonymization

Personal data automatically anonymized during collection

Secure Storage

Encrypted database storage for collected data

Quick Access

Source Code

View on GitHub