New Semantic Router

Configure a Semantic Router

Define backend model pools, signal rules, and routing decisions. The router exposes a single /v1/chat/completions endpoint your agents use — no code changes needed.

Basic setup

Give the router a name and configure the listener. Agents connect to the listener endpoint; they never talk to backends directly.

Used as the identifier when selecting this router in workspace creation.
Use host.containers.internal when running inside a container to reach host services.
Default: 8899. Agents connect to http://host:8899/v1/chat/completions.
Maximum time before the request is cancelled.
Backend models

Add the model backends this router can forward to. At least one is required. The default model receives requests when no decision rule matches.

Backend model 1
Default
Must match exactly what the backend expects (Hugging Face model ID or alias).
Do not add /v1 — the router appends the path automatically.
coding debugging refactoring reasoning math analysis creative
Backend model 2
Stored encrypted on this device. For shared deployments, reference a Secret Vault entry instead.
reasoning math analysis coding creative debugging
Signals

Signals detect patterns in incoming requests. Decisions consume signals and pick a model. Start with keyword signals; add embedding or domain signals for more precision.

Keyword reasoning_keywords
prove derive theorem induction research formal verification
Keyword coding_keywords
implement refactor debug function class build
Decisions

Decisions are evaluated by priority (highest first). The first matching rule selects the backend model for the request.

Priority 100 reasoning-route

When on, the router adds reasoning parameters to the request before forwarding it to the target model.

Priority 80 coding-route
Priority 1 default-route auto-generated

Catches all unmatched requests and forwards to the default model: mlx-community/Qwen3-Coder-Next-4bit. Change the default model in Step 2.

Advanced features

Optional capabilities that layer on top of signal-decision routing. Each is independently toggleable.

Generated config

Read-only preview of the config.yaml this router configuration would produce. Deploy it with vllm-sr serve --config config.yaml.

config.yaml
version: v0.1

listeners:
  - name: "http-8899"
    address: "0.0.0.0"
    port: 8899
    timeout: "300s"

providers:
  models:
    - name: "mlx-community/Qwen3-Coder-Next-4bit"
      param_size: "80b"
      endpoints:
        - name: "mlx-local"
          weight: 100
          endpoint: "host.containers.internal:8000"
          protocol: "http"
      capabilities: ["coding", "debugging", "refactoring"]
      quality_score: 0.85
      pricing:
        currency: "USD"
        prompt_per_1m: 0.0
        completion_per_1m: 0.0

    - name: "gemini-2.5-pro"
      param_size: "400b"
      endpoints:
        - name: "gemini-primary"
          weight: 100
          endpoint: "generativelanguage.googleapis.com/v1beta/openai"
          protocol: "https"
      access_key: "••••••••••••••••"
      capabilities: ["reasoning", "math", "analysis", "coding", "creative"]
      quality_score: 0.95
      pricing:
        currency: "USD"
        prompt_per_1m: 1.25
        completion_per_1m: 10.00

  default_model: "mlx-community/Qwen3-Coder-Next-4bit"

signals:
  keywords:
    - name: "reasoning_keywords"
      operator: "OR"
      keywords: ["prove", "derive", "theorem", "induction", "research", "formal verification"]
      case_sensitive: false
    - name: "coding_keywords"
      operator: "OR"
      keywords: ["implement", "refactor", "debug", "function", "class", "build"]
      case_sensitive: false

decisions:
  - name: "reasoning-route"
    description: "Route complex reasoning tasks to Gemini 2.5 Pro"
    priority: 100
    rules:
      operator: "OR"
      conditions:
        - type: "keyword"
          name: "reasoning_keywords"
    modelRefs:
      - model: "gemini-2.5-pro"
        use_reasoning: true

  - name: "coding-route"
    description: "Route coding tasks to local Qwen3-Coder-Next"
    priority: 80
    rules:
      operator: "OR"
      conditions:
        - type: "keyword"
          name: "coding_keywords"
    modelRefs:
      - model: "mlx-community/Qwen3-Coder-Next-4bit"
        use_reasoning: false

  - name: "default-route"
    description: "Default route to local model for cost savings"
    priority: 1
    rules:
      operator: "AND"
      conditions: []
    modelRefs:
      - model: "mlx-community/Qwen3-Coder-Next-4bit"
        use_reasoning: false

advanced:
  semantic_cache: true
  jailbreak_detection: false
  pii_detection: false
  complexity_scoring: false