coding-router

Routes coding tasks locally and complex reasoning to cloud · listening on 0.0.0.0:8899

Edit

Backend models

1 local · 1 cloud

Routing decisions

incl. default-route

Signals

keyword signals

Routing latency (P50)

40ms

P99: 93 ms

Local requests

86%

stayed on local model

Cache hit rate

12%

HNSW semantic cache

Routing flow

Agents

incoming requests

coding-router

localhost:8899

reasoning_keywords → gemini-2.5-pro

coding_keywords → Qwen3-Coder-Next

default → Qwen3-Coder-Next

local

Qwen3-Coder-Next-4bit

:8000

cloud

gemini-2.5-pro

googleapis.com

Router configuration

Listener 0.0.0.0:8899 (timeout: 300s)

Last verified 3 minutes ago

Description Routes coding tasks locally and complex reasoning to cloud

Default model mlx-community/Qwen3-Coder-Next-4bit

Advanced features Semantic cache (HNSW) · PII detection

Created May 20, 2026

Backend models

Add backend model

Model name

Endpoint URL

Protocol

Weight (load balancing)

Quality score (0–1)

Capabilities (comma-separated)

Set as default backend

Model name	Endpoint	Weight	Quality	Capabilities	Pricing (USD/1M)
mlx-community/Qwen3-Coder-Next-4bit 80B · http · default	`host.containers.internal:8000`	100	0.85	coding debugging refactoring	Free (local)
gemini-2.5-pro 400B · https	`generativelanguage.googleapis.com/…`	100	0.95	reasoning math analysis coding creative	$1.25 / $10.00

Signals

New signal

Signal name

Type

Keywords (comma-separated)

Operator

Case-insensitive

Keyword reasoning_keywords operator: OR · case-insensitive

prove, derive, theorem, induction, research, formal verification, proof by contradiction

Keyword coding_keywords operator: OR · case-insensitive

implement, refactor, debug, function, class, import, build, code

Routing decisions — evaluated highest priority first

New routing decision

Decision name

Priority (higher = evaluated first)

Condition signal

Route to backend

Description (optional)

Enable reasoning mode for this backend

100 Priority

reasoning-route

Route complex reasoning tasks to Gemini 2.5 Pro

Condition: reasoning_keywords (OR) → Model: gemini-2.5-pro Reasoning mode: on

80 Priority

coding-route

Route coding tasks to local Qwen3-Coder-Next

Condition: coding_keywords (OR) → Model: mlx-community/Qwen3-Coder-Next-4bit Reasoning mode: off

1 Priority

default-route auto-generated

Default route to local model for cost savings

Condition: none (catches all remaining requests) → Model: mlx-community/Qwen3-Coder-Next-4bit

Advanced features

Semantic cache (HNSW)

Cache semantically similar prompts to skip redundant LLM calls — 12% hit rate

Jailbreak detection

Block prompt injection and policy-violation attempts before they reach a backend

PII detection & redaction

Detect and strip personally identifiable information from requests before routing

Complexity scoring

Score request complexity and route high-complexity prompts to more capable models

Domain classifier

Classify requests by domain (code, math, language…) to improve routing accuracy

config.yaml — coding-router

version: v0.1

listeners:
  - name: "http-8899"
    address: "0.0.0.0"
    port: 8899
    timeout: "300s"

providers:
  models:
    - name: "mlx-community/Qwen3-Coder-Next-4bit"
      param_size: "80b"
      endpoints:
        - name: "mlx-local"
          weight: 100
          endpoint: "host.containers.internal:8000"
          protocol: "http"
      capabilities: ["coding", "debugging", "refactoring"]
      quality_score: 0.85
      pricing:
        currency: "USD"
        prompt_per_1m: 0.0
        completion_per_1m: 0.0

    - name: "gemini-2.5-pro"
      param_size: "400b"
      endpoints:
        - name: "gemini-primary"
          weight: 100
          endpoint: "generativelanguage.googleapis.com/v1beta/openai"
          protocol: "https"
      access_key: "••••••••••••••••"
      capabilities: ["reasoning", "math", "analysis", "coding", "creative"]
      quality_score: 0.95
      pricing:
        currency: "USD"
        prompt_per_1m: 1.25
        completion_per_1m: 10.00

  default_model: "mlx-community/Qwen3-Coder-Next-4bit"

signals:
  keywords:
    - name: "reasoning_keywords"
      operator: "OR"
      keywords: ["prove", "derive", "theorem", "induction", "research", "formal verification"]
      case_sensitive: false
    - name: "coding_keywords"
      operator: "OR"
      keywords: ["implement", "refactor", "debug", "function", "class", "import", "build", "code"]
      case_sensitive: false

decisions:
  - name: "reasoning-route"
    description: "Route complex reasoning tasks to Gemini 2.5 Pro"
    priority: 100
    rules:
      operator: "OR"
      conditions:
        - type: "keyword"
          name: "reasoning_keywords"
    modelRefs:
      - model: "gemini-2.5-pro"
        use_reasoning: true

  - name: "coding-route"
    description: "Route coding tasks to local Qwen3-Coder-Next"
    priority: 80
    rules:
      operator: "OR"
      conditions:
        - type: "keyword"
          name: "coding_keywords"
    modelRefs:
      - model: "mlx-community/Qwen3-Coder-Next-4bit"
        use_reasoning: false

  - name: "default-route"
    description: "Default route to local model for cost savings"
    priority: 1
    rules:
      operator: "AND"
      conditions: []
    modelRefs:
      - model: "mlx-community/Qwen3-Coder-Next-4bit"
        use_reasoning: false

advanced:
  semantic_cache: true
  jailbreak_detection: false
  pii_detection: true
  complexity_scoring: false