Models Semantic Routers coding-router

coding-router

Routes coding tasks locally and complex reasoning to cloud · listening on 0.0.0.0:8899

Edit
Backend models
2
1 local · 1 cloud
Routing decisions
3
incl. default-route
Signals
2
keyword signals
Routing latency (P50)
40ms
P99: 93 ms
Local requests
86%
stayed on local model
Cache hit rate
12%
HNSW semantic cache
Routing flow
Agents
incoming requests
coding-router
localhost:8899
reasoning_keywords gemini-2.5-pro
coding_keywords Qwen3-Coder-Next
default Qwen3-Coder-Next
local
Qwen3-Coder-Next-4bit
:8000
cloud
gemini-2.5-pro
googleapis.com
Router configuration
Listener 0.0.0.0:8899 (timeout: 300s)
Last verified 3 minutes ago
Description Routes coding tasks locally and complex reasoning to cloud
Default model mlx-community/Qwen3-Coder-Next-4bit
Advanced features Semantic cache (HNSW) · PII detection
Created May 20, 2026
Backend models
Add backend model
Model name Endpoint Weight Quality Capabilities Pricing (USD/1M)
mlx-community/Qwen3-Coder-Next-4bit
80B · http · default
host.containers.internal:8000 100 0.85
coding debugging refactoring
Free (local)
gemini-2.5-pro
400B · https
generativelanguage.googleapis.com/… 100 0.95
reasoning math analysis coding creative
$1.25 / $10.00
Signals
New signal
Keyword reasoning_keywords operator: OR · case-insensitive
prove, derive, theorem, induction, research, formal verification, proof by contradiction
Keyword coding_keywords operator: OR · case-insensitive
implement, refactor, debug, function, class, import, build, code
Routing decisions — evaluated highest priority first
New routing decision
100 Priority
reasoning-route
Route complex reasoning tasks to Gemini 2.5 Pro
Condition: reasoning_keywords (OR) → Model: gemini-2.5-pro Reasoning mode: on
80 Priority
coding-route
Route coding tasks to local Qwen3-Coder-Next
Condition: coding_keywords (OR) → Model: mlx-community/Qwen3-Coder-Next-4bit Reasoning mode: off
1 Priority
default-route auto-generated
Default route to local model for cost savings
Condition: none (catches all remaining requests) → Model: mlx-community/Qwen3-Coder-Next-4bit
Advanced features
Semantic cache (HNSW)
Cache semantically similar prompts to skip redundant LLM calls — 12% hit rate
Jailbreak detection
Block prompt injection and policy-violation attempts before they reach a backend
PII detection & redaction
Detect and strip personally identifiable information from requests before routing
Complexity scoring
Score request complexity and route high-complexity prompts to more capable models
Domain classifier
Classify requests by domain (code, math, language…) to improve routing accuracy
config.yaml — coding-router
version: v0.1

listeners:
  - name: "http-8899"
    address: "0.0.0.0"
    port: 8899
    timeout: "300s"

providers:
  models:
    - name: "mlx-community/Qwen3-Coder-Next-4bit"
      param_size: "80b"
      endpoints:
        - name: "mlx-local"
          weight: 100
          endpoint: "host.containers.internal:8000"
          protocol: "http"
      capabilities: ["coding", "debugging", "refactoring"]
      quality_score: 0.85
      pricing:
        currency: "USD"
        prompt_per_1m: 0.0
        completion_per_1m: 0.0

    - name: "gemini-2.5-pro"
      param_size: "400b"
      endpoints:
        - name: "gemini-primary"
          weight: 100
          endpoint: "generativelanguage.googleapis.com/v1beta/openai"
          protocol: "https"
      access_key: "••••••••••••••••"
      capabilities: ["reasoning", "math", "analysis", "coding", "creative"]
      quality_score: 0.95
      pricing:
        currency: "USD"
        prompt_per_1m: 1.25
        completion_per_1m: 10.00

  default_model: "mlx-community/Qwen3-Coder-Next-4bit"

signals:
  keywords:
    - name: "reasoning_keywords"
      operator: "OR"
      keywords: ["prove", "derive", "theorem", "induction", "research", "formal verification"]
      case_sensitive: false
    - name: "coding_keywords"
      operator: "OR"
      keywords: ["implement", "refactor", "debug", "function", "class", "import", "build", "code"]
      case_sensitive: false

decisions:
  - name: "reasoning-route"
    description: "Route complex reasoning tasks to Gemini 2.5 Pro"
    priority: 100
    rules:
      operator: "OR"
      conditions:
        - type: "keyword"
          name: "reasoning_keywords"
    modelRefs:
      - model: "gemini-2.5-pro"
        use_reasoning: true

  - name: "coding-route"
    description: "Route coding tasks to local Qwen3-Coder-Next"
    priority: 80
    rules:
      operator: "OR"
      conditions:
        - type: "keyword"
          name: "coding_keywords"
    modelRefs:
      - model: "mlx-community/Qwen3-Coder-Next-4bit"
        use_reasoning: false

  - name: "default-route"
    description: "Default route to local model for cost savings"
    priority: 1
    rules:
      operator: "AND"
      conditions: []
    modelRefs:
      - model: "mlx-community/Qwen3-Coder-Next-4bit"
        use_reasoning: false

advanced:
  semantic_cache: true
  jailbreak_detection: false
  pii_detection: true
  complexity_scoring: false