enterprise-router
RunningRoutes by cost, data residency & latency — local · OpenShift AI · Vertex AI · listening on 0.0.0.0:8901
Backend models
3
local · OpenShift AI · Vertex AI
Routing decisions
4
incl. default-route
Signals
2
keyword + semantic
Routing latency (P50)
55ms
P99: 120 ms
Local requests
61%
stayed on qwen3-code
Est. cost savings
74%
vs. all-cloud baseline
Routing flow
Agents
incoming requests
enterprise-router
localhost:8901
sensitive_data
→
ibm-granite-3.3-8b (RHOAI)
high_complexity
→
claude-sonnet-4-5 (Vertex)
default
→
qwen3-code (local)
local
qwen3-code
:11434 (Ollama)
61%
Free
OpenShift AI
ibm-granite-3.3-8b
rhoai.corp.example/…
28%
$0.50/1M
Vertex AI
claude-sonnet-4-5
aiplatform.googleapis.com/…
11%
$3–15/1M
Cost savings — this week
With enterprise-router
$7.82
Local · $0.00 (61%)
OpenShift AI · $1.58 (28%)
Vertex AI · $6.24 (11%)
Without router — all Vertex AI
$29.78
$21.96 saved this week · 74% reduction
Routing 61% of requests to local and 28% to on-prem avoids $21.96 in Vertex AI charges
Router configuration
Listener
0.0.0.0:8901 (timeout: 300s)
Last verified
1 minute ago
Description
Enterprise hybrid router — optimises for cost, data residency, and latency across local, on-prem, and cloud backends
Default model
qwen3-code (local · free)
Advanced features
Cost optimizer · Data residency enforcement · Latency scoring
Created
May 29, 2026
Backend models
Add backend model
| Model name | Tier | Endpoint | Weight | Quality | Capabilities | Pricing (USD/1M) | |
|---|---|---|---|---|---|---|---|
|
qwen3-code
32B · http · default
|
local | localhost:11434 |
100 | 0.82 | codingdebugging |
Free (local) | |
|
ibm-granite-3.3-8b-instruct
8B · https · on-prem
|
OpenShift AI | rhoai.corp.example/… |
100 | 0.88 | enterpriseanalysissecure |
$0.50 / $0.50 | |
|
claude-sonnet-4-5@20250929 (Vertex)
200K ctx · https · GCP enterprise
|
Vertex AI | aiplatform.googleapis.com/… |
100 | 0.96 | reasoningcodinganalysiscreative |
$3.00 / $15.00 |
Signals
New signal
Keyword
sensitive_data
operator: OR · case-insensitive
confidential, internal only, GDPR, PII, personal data, classified, proprietary, trade secret, NDA, restricted
Semantic
high_complexity
threshold: 0.75 · embedding: nomic-embed-v2
Requests semantically similar to: multi-step reasoning, formal analysis, architectural design, complex refactoring, system design at scale
Routing decisions — evaluated highest priority first
New routing decision
100
Priority
residency-route
Sensitive or classified data must stay on-premise — routed to OpenShift AI
80
Priority
complexity-route
High-complexity requests benefit from frontier reasoning — escalated to Vertex AI
50
Priority
latency-route
P99 latency budget exceeded on local backend — overflow to OpenShift AI
1
Priority
default-route auto-generated
All remaining requests — served locally at zero cost
Advanced features
Cost optimizer
Prefers the cheapest backend that meets quality and latency requirements — estimated 74% cost saving vs. all-cloud baseline
Data residency enforcement
Detects sensitive / classified content and hard-routes it to on-premise OpenShift AI — data never leaves the corporate network
Latency scoring
Tracks per-backend P50/P99 latency in a rolling window and overflows to a faster backend when SLA budgets are exceeded
PII detection & redaction
Strip personally identifiable information before any request leaves the on-prem perimeter
Jailbreak detection
Block prompt injection and policy-violation attempts before they reach any backend
config.yaml — enterprise-router
version: v0.1 listeners: - name: "http-8901" address: "0.0.0.0" port: 8901 timeout: "300s" providers: models: - name: "qwen3-code" tier: "local" param_size: "32b" endpoints: - name: "ollama-local" weight: 100 endpoint: "localhost:11434" protocol: "http" capabilities: ["coding", "debugging"] quality_score: 0.82 pricing: prompt_per_1m: 0.0 completion_per_1m: 0.0 - name: "ibm-granite-3.3-8b-instruct" tier: "openshift" param_size: "8b" endpoints: - name: "rhoai-primary" weight: 100 endpoint: "rhoai.corp.example/v1/chat/completions" protocol: "https" data_residency: "on-prem" capabilities: ["enterprise", "analysis", "secure"] quality_score: 0.88 pricing: prompt_per_1m: 0.50 completion_per_1m: 0.50 - name: "claude-sonnet-4-5@20250929" tier: "vertex" endpoints: - name: "vertex-primary" weight: 100 endpoint: "us-east5-aiplatform.googleapis.com/v1/projects/my-gcp-project/locations/us-east5/publishers/anthropic/models/claude-sonnet-4-5@20250929" protocol: "https" gcp_project: "my-gcp-project" gcp_region: "us-east5" capabilities: ["reasoning", "coding", "analysis", "creative"] quality_score: 0.96 pricing: prompt_per_1m: 3.00 completion_per_1m: 15.00 default_model: "qwen3-code" signals: keywords: - name: "sensitive_data" operator: "OR" keywords: ["confidential", "internal only", "GDPR", "PII", "personal data", "classified", "proprietary", "trade secret", "NDA", "restricted"] case_sensitive: false semantic: - name: "high_complexity" embedding_model: "nomic-embed-v2" threshold: 0.75 reference_prompts: - "multi-step reasoning and formal analysis" - "architectural design at scale" - "complex system-level refactoring" decisions: - name: "residency-route" description: "Sensitive data stays on-premise" priority: 100 rules: operator: "OR" conditions: - type: "keyword" name: "sensitive_data" modelRefs: - model: "ibm-granite-3.3-8b-instruct" - name: "complexity-route" description: "High-complexity tasks escalated to Vertex AI" priority: 80 rules: operator: "OR" conditions: - type: "semantic" name: "high_complexity" modelRefs: - model: "claude-sonnet-4-5@20250929" - name: "latency-route" description: "Overflow to OpenShift AI when local P99 exceeds 400ms" priority: 50 rules: conditions: - type: "latency" backend: "qwen3-code" metric: "p99" threshold_ms: 400 modelRefs: - model: "ibm-granite-3.3-8b-instruct" - name: "default-route" description: "All remaining requests — served locally at zero cost" priority: 1 rules: conditions: [] modelRefs: - model: "qwen3-code" advanced: cost_optimizer: true data_residency: true latency_scoring: true latency_window_s: 60 pii_detection: true jailbreak_detection: false