coding-router
Routes coding tasks locally and complex reasoning to cloud · listening on 0.0.0.0:8899
Backend models
2
1 local · 1 cloud
Routing decisions
3
incl. default-route
Signals
2
keyword signals
Routing latency (P50)
40ms
P99: 93 ms
Local requests
86%
stayed on local model
Cache hit rate
12%
HNSW semantic cache
Routing flow
Agents
incoming requests
coding-router
localhost:8899
reasoning_keywords
→
gemini-2.5-pro
coding_keywords
→
Qwen3-Coder-Next
default
→
Qwen3-Coder-Next
local
Qwen3-Coder-Next-4bit
:8000
cloud
gemini-2.5-pro
googleapis.com
Router configuration
Listener
0.0.0.0:8899 (timeout: 300s)
Last verified
3 minutes ago
Description
Routes coding tasks locally and complex reasoning to cloud
Default model
mlx-community/Qwen3-Coder-Next-4bit
Advanced features
Semantic cache (HNSW) · PII detection
Created
May 20, 2026
Backend models
Add backend model
| Model name | Endpoint | Weight | Quality | Capabilities | Pricing (USD/1M) | |
|---|---|---|---|---|---|---|
|
mlx-community/Qwen3-Coder-Next-4bit
80B · http · default
|
host.containers.internal:8000 |
100 | 0.85 |
coding
debugging
refactoring
|
Free (local) | |
|
gemini-2.5-pro
400B · https
|
generativelanguage.googleapis.com/… |
100 | 0.95 |
reasoning
math
analysis
coding
creative
|
$1.25 / $10.00 |
Signals
New signal
Keyword
reasoning_keywords
operator: OR · case-insensitive
prove, derive, theorem, induction, research, formal verification, proof by contradiction
Keyword
coding_keywords
operator: OR · case-insensitive
implement, refactor, debug, function, class, import, build, code
Routing decisions — evaluated highest priority first
New routing decision
100
Priority
reasoning-route
Route complex reasoning tasks to Gemini 2.5 Pro
80
Priority
coding-route
Route coding tasks to local Qwen3-Coder-Next
1
Priority
default-route auto-generated
Default route to local model for cost savings
Advanced features
Semantic cache (HNSW)
Cache semantically similar prompts to skip redundant LLM calls — 12% hit rate
Jailbreak detection
Block prompt injection and policy-violation attempts before they reach a backend
PII detection & redaction
Detect and strip personally identifiable information from requests before routing
Complexity scoring
Score request complexity and route high-complexity prompts to more capable models
Domain classifier
Classify requests by domain (code, math, language…) to improve routing accuracy
config.yaml — coding-router
version: v0.1 listeners: - name: "http-8899" address: "0.0.0.0" port: 8899 timeout: "300s" providers: models: - name: "mlx-community/Qwen3-Coder-Next-4bit" param_size: "80b" endpoints: - name: "mlx-local" weight: 100 endpoint: "host.containers.internal:8000" protocol: "http" capabilities: ["coding", "debugging", "refactoring"] quality_score: 0.85 pricing: currency: "USD" prompt_per_1m: 0.0 completion_per_1m: 0.0 - name: "gemini-2.5-pro" param_size: "400b" endpoints: - name: "gemini-primary" weight: 100 endpoint: "generativelanguage.googleapis.com/v1beta/openai" protocol: "https" access_key: "••••••••••••••••" capabilities: ["reasoning", "math", "analysis", "coding", "creative"] quality_score: 0.95 pricing: currency: "USD" prompt_per_1m: 1.25 completion_per_1m: 10.00 default_model: "mlx-community/Qwen3-Coder-Next-4bit" signals: keywords: - name: "reasoning_keywords" operator: "OR" keywords: ["prove", "derive", "theorem", "induction", "research", "formal verification"] case_sensitive: false - name: "coding_keywords" operator: "OR" keywords: ["implement", "refactor", "debug", "function", "class", "import", "build", "code"] case_sensitive: false decisions: - name: "reasoning-route" description: "Route complex reasoning tasks to Gemini 2.5 Pro" priority: 100 rules: operator: "OR" conditions: - type: "keyword" name: "reasoning_keywords" modelRefs: - model: "gemini-2.5-pro" use_reasoning: true - name: "coding-route" description: "Route coding tasks to local Qwen3-Coder-Next" priority: 80 rules: operator: "OR" conditions: - type: "keyword" name: "coding_keywords" modelRefs: - model: "mlx-community/Qwen3-Coder-Next-4bit" use_reasoning: false - name: "default-route" description: "Default route to local model for cost savings" priority: 1 rules: operator: "AND" conditions: [] modelRefs: - model: "mlx-community/Qwen3-Coder-Next-4bit" use_reasoning: false advanced: semantic_cache: true jailbreak_detection: false pii_detection: true complexity_scoring: false