enterprise-router

Running

Routes by cost, data residency & latency — local · OpenShift AI · Vertex AI · listening on 0.0.0.0:8901

Edit

Backend models

local · OpenShift AI · Vertex AI

Routing decisions

incl. default-route

Signals

keyword + semantic

Routing latency (P50)

55ms

P99: 120 ms

Local requests

61%

stayed on qwen3-code

Est. cost savings

74%

vs. all-cloud baseline

Routing flow

Agents

incoming requests

enterprise-router

localhost:8901

sensitive_data → ibm-granite-3.3-8b (RHOAI)

high_complexity → claude-sonnet-4-5 (Vertex)

default → qwen3-code (local)

local

qwen3-code

:11434 (Ollama)

61% Free

OpenShift AI

ibm-granite-3.3-8b

rhoai.corp.example/…

28% $0.50/1M

Vertex AI

claude-sonnet-4-5

aiplatform.googleapis.com/…

11% $3–15/1M

Cost savings — this week

With enterprise-router $7.82

Local · $0.00 (61%) OpenShift AI · $1.58 (28%) Vertex AI · $6.24 (11%)

Without router — all Vertex AI $29.78

$21.96 saved this week · 74% reduction

Routing 61% of requests to local and 28% to on-prem avoids $21.96 in Vertex AI charges

Router configuration

Listener 0.0.0.0:8901 (timeout: 300s)

Last verified 1 minute ago

Description Enterprise hybrid router — optimises for cost, data residency, and latency across local, on-prem, and cloud backends

Default model qwen3-code (local · free)

Advanced features Cost optimizer · Data residency enforcement · Latency scoring

Created May 29, 2026

Backend models

Add backend model

Model name

Endpoint URL

Tier

Weight

Quality score (0–1)

Model name	Tier	Endpoint	Weight	Quality	Capabilities	Pricing (USD/1M)
qwen3-code 32B · http · default	local	`localhost:11434`	100	0.82	codingdebugging	Free (local)
ibm-granite-3.3-8b-instruct 8B · https · on-prem	OpenShift AI	`rhoai.corp.example/…`	100	0.88	enterpriseanalysissecure	$0.50 / $0.50
claude-sonnet-4-5@20250929 (Vertex) 200K ctx · https · GCP enterprise	Vertex AI	`aiplatform.googleapis.com/…`	100	0.96	reasoningcodinganalysiscreative	$3.00 / $15.00

Signals

New signal

Signal name

Type

Keywords / patterns

Keyword sensitive_data operator: OR · case-insensitive

confidential, internal only, GDPR, PII, personal data, classified, proprietary, trade secret, NDA, restricted

Semantic high_complexity threshold: 0.75 · embedding: nomic-embed-v2

Requests semantically similar to: multi-step reasoning, formal analysis, architectural design, complex refactoring, system design at scale

Routing decisions — evaluated highest priority first

New routing decision

Decision name

Priority

Condition signal

Route to backend

100 Priority

residency-route

Sensitive or classified data must stay on-premise — routed to OpenShift AI

Condition: sensitive_data (OR) → Model: ibm-granite-3.3-8b-instruct Data residency: on-prem enforced

80 Priority

complexity-route

High-complexity requests benefit from frontier reasoning — escalated to Vertex AI

Condition: high_complexity ≥ 0.75 → Model: claude-sonnet-4-5 (Vertex) GCP enterprise contract

50 Priority

latency-route

P99 latency budget exceeded on local backend — overflow to OpenShift AI

Condition: latency_p99 > 400 ms → Model: ibm-granite-3.3-8b-instruct SLA: 300 ms target

1 Priority

default-route auto-generated

All remaining requests — served locally at zero cost

Condition: none (catches all remaining requests) → Model: qwen3-code

Advanced features

Cost optimizer

Prefers the cheapest backend that meets quality and latency requirements — estimated 74% cost saving vs. all-cloud baseline

Data residency enforcement

Detects sensitive / classified content and hard-routes it to on-premise OpenShift AI — data never leaves the corporate network

Latency scoring

Tracks per-backend P50/P99 latency in a rolling window and overflows to a faster backend when SLA budgets are exceeded

PII detection & redaction

Strip personally identifiable information before any request leaves the on-prem perimeter

Jailbreak detection

Block prompt injection and policy-violation attempts before they reach any backend

config.yaml — enterprise-router

version: v0.1

listeners:
  - name: "http-8901"
    address: "0.0.0.0"
    port: 8901
    timeout: "300s"

providers:
  models:
    - name: "qwen3-code"
      tier: "local"
      param_size: "32b"
      endpoints:
        - name: "ollama-local"
          weight: 100
          endpoint: "localhost:11434"
          protocol: "http"
      capabilities: ["coding", "debugging"]
      quality_score: 0.82
      pricing:
        prompt_per_1m: 0.0
        completion_per_1m: 0.0

    - name: "ibm-granite-3.3-8b-instruct"
      tier: "openshift"
      param_size: "8b"
      endpoints:
        - name: "rhoai-primary"
          weight: 100
          endpoint: "rhoai.corp.example/v1/chat/completions"
          protocol: "https"
      data_residency: "on-prem"
      capabilities: ["enterprise", "analysis", "secure"]
      quality_score: 0.88
      pricing:
        prompt_per_1m: 0.50
        completion_per_1m: 0.50

    - name: "claude-sonnet-4-5@20250929"
      tier: "vertex"
      endpoints:
        - name: "vertex-primary"
          weight: 100
          endpoint: "us-east5-aiplatform.googleapis.com/v1/projects/my-gcp-project/locations/us-east5/publishers/anthropic/models/claude-sonnet-4-5@20250929"
          protocol: "https"
      gcp_project: "my-gcp-project"
      gcp_region: "us-east5"
      capabilities: ["reasoning", "coding", "analysis", "creative"]
      quality_score: 0.96
      pricing:
        prompt_per_1m: 3.00
        completion_per_1m: 15.00

  default_model: "qwen3-code"

signals:
  keywords:
    - name: "sensitive_data"
      operator: "OR"
      keywords: ["confidential", "internal only", "GDPR", "PII", "personal data", "classified", "proprietary", "trade secret", "NDA", "restricted"]
      case_sensitive: false
  semantic:
    - name: "high_complexity"
      embedding_model: "nomic-embed-v2"
      threshold: 0.75
      reference_prompts:
        - "multi-step reasoning and formal analysis"
        - "architectural design at scale"
        - "complex system-level refactoring"

decisions:
  - name: "residency-route"
    description: "Sensitive data stays on-premise"
    priority: 100
    rules:
      operator: "OR"
      conditions:
        - type: "keyword"
          name: "sensitive_data"
    modelRefs:
      - model: "ibm-granite-3.3-8b-instruct"

  - name: "complexity-route"
    description: "High-complexity tasks escalated to Vertex AI"
    priority: 80
    rules:
      operator: "OR"
      conditions:
        - type: "semantic"
          name: "high_complexity"
    modelRefs:
      - model: "claude-sonnet-4-5@20250929"

  - name: "latency-route"
    description: "Overflow to OpenShift AI when local P99 exceeds 400ms"
    priority: 50
    rules:
      conditions:
        - type: "latency"
          backend: "qwen3-code"
          metric: "p99"
          threshold_ms: 400
    modelRefs:
      - model: "ibm-granite-3.3-8b-instruct"

  - name: "default-route"
    description: "All remaining requests — served locally at zero cost"
    priority: 1
    rules:
      conditions: []
    modelRefs:
      - model: "qwen3-code"

advanced:
  cost_optimizer: true
  data_residency: true
  latency_scoring: true
  latency_window_s: 60
  pii_detection: true
  jailbreak_detection: false