Back to KB
Difficulty
Intermediate
Read Time
8 min

How to use Vercel's Deepsec with ollama

By Codcompass TeamΒ·Β·8 min read

Tiered AI Security Scanning: Optimizing LLM-Based Code Audits with Local-Cloud Routing

Current Situation Analysis

Traditional static application security testing (SAST) tools operate on rigid pattern matching. They flag every string concatenation, every environment variable reference, and every file system call regardless of execution context. The result is a high false-positive rate that forces engineering teams to either disable the scanner or train developers to ignore its output. Context-aware AI scanners solve the signal-to-noise problem by evaluating code intent, dependency graphs, and architectural patterns. However, they introduce a new operational bottleneck: cost scaling.

The misconception driving this bottleneck is the assumption that AI-powered security analysis requires frontier models for every file. In reality, security scanners process entire codebases linearly. A typical repository contains a heavy tail of low-risk files: static assets, configuration objects, test fixtures, and generated utilities. Routing these through a high-capacity cloud model like Claude Opus burns budget without improving detection accuracy. The economic model breaks down quickly. At approximately $0.30 per file for frontier reasoning, a 1,000-file codebase costs $300 per scan. When you enable the mandatory revalidation pass to suppress hallucinations, the cost doubles to $600. Running this nightly in CI becomes financially unsustainable for most organizations.

The industry has overlooked a simple architectural truth: not all code requires frontier reasoning. Security-critical files (auth middleware, payment handlers, cryptographic implementations) demand high-context models. Utility files and static configurations do not. By decoupling the scanner from the model provider and introducing a complexity-aware routing layer, teams can preserve detection fidelity while reducing operational costs by an order of magnitude. This approach also addresses data residency constraints, allowing sensitive code to be evaluated locally while reserving cloud inference for high-risk artifacts.

WOW Moment: Key Findings

The economic and operational impact of tiered routing becomes immediately visible when comparing uniform cloud AI scanning against a hybrid local-cloud architecture. The following data reflects real-world scanning patterns across medium-sized TypeScript/Node.js repositories.

ApproachCost per 1k FilesFalse Positive RateContext AwarenessData Residency
Regex-Based SAST~$065-80%LowLocal
Uniform Cloud AI (Opus)~$600 (with revalidation)15-20%HighCloud
Hybrid Tiered Routing~$2012-18%HighMixed/Local

Why this matters: The hybrid approach collapses the cost barrier to continuous AI security auditing. By routing roughly 70% of files to a local inference engine, 25% to a mid-tier cloud model, and 5% to a frontier model, teams achieve near-parity in detection quality while spending less than 5% of the uniform cloud budget. This enables shift-left security practices where AI scans run on every pull request without triggering budget alerts or compliance violations.

Core Solution

The architecture relies on three components working in sequence: a context-aware scanner, a local proxy that evaluates request complexity, and a tiered model pool. We will implement this using Vercel's deepsec as the scanning engine, Lynkr as the routing proxy, and Ollama for local inference.

Architecture Decisions & Rationale

  1. Proxy Interception: Instead of configuring the scanner to call cloud APIs directly, we point it at a local proxy. The proxy inspects the payload, scores the file's security complexity, and routes th

πŸŽ‰ Mid-Year Sale β€” Unlock Full Article

Base plan from just $4.99/mo or $49/yr

Sign in to read the full article and unlock all 635+ tutorials.

Sign In / Register β€” Start Free Trial

7-day free trial Β· Cancel anytime Β· 30-day money-back