Choosing a model means measuring cost vs quality on your data
I wanted to evaluate model-based extraction in a way that would tell me more than benchmarks alone. The scenario is building an AI recruiting agent to help match candidates to job postings. To do t...

Source: DEV Community
I wanted to evaluate model-based extraction in a way that would tell me more than benchmarks alone. The scenario is building an AI recruiting agent to help match candidates to job postings. To do this, we need to ingest job postings from career pages, aggregators, social media posts, and other messy sources. Every posting needs to be parsed into structured JSON: title, company, salary range, requirements, benefits. I set up a comparison with a small dataset of 25 job postings across three model tiers to answer a practical question: does the quality difference between a more expensive model and a budget model justify the cost over time? Setup For this exploration, I used Baseten's Model APIs. You can use whatever model provider you like. I picked three models across the cost spectrum (priced March 2026): Tier Model Active Params ~Input $/1M tokens Frontier DeepSeek V3.1 671B / 37B active $0.50 Mid-tier Nvidia Nemotron 3 Super 120B / 12B active $0.30 Budget OpenAI GPT-OSS-120B 117B / 5.1