Datenhalle.de
Datenhalle.de is a platform designed for companies seeking high-quality business data for marketing purposes (B2B). Our goal was to achieve a quality advantage over competitors – the data we sell is verified for currency, absence of duplicates, accuracy, and usefulness in real marketing activities. The key was applying AI and LLM models at various project stages along with advanced web crawling algorithms.
Responsibilities: I focused on data acquisition and processing – from web crawling to extraction and preparation for further analysis. I developed classification models that assigned data to nearly 1000 German market sectors. I also handled integration and optimization of LLM models in the data processing pipeline.
Challenges: The biggest challenge was the project's scale – analyzing hundreds of millions of web pages. Particularly difficult was extracting unstructured data like proper names and personal information. Additionally, creating precise classification models for highly specialized industries required preparing massive training datasets and continuous operational cost optimization.