Skip to main content

Industry-Leading AI Security for MCP Ecosystems

Runlayer ToolGuard is an industry-leading suite of specialized machine learning models that protect your MCP environment from tool poisoning, prompt injection, and output manipulation attacks. With fast 50-100ms inference times, ToolGuard delivers real-time threat detection without compromising performance.
Currently featuring three specialized threat classification models, with additional models in active development to address emerging attack vectors.

The Models

Tool List Guard

Scans tool definitions at registration to detect risky descriptions, prompt injection attempts, and hidden instructions before tools are made available to your environment.

Tool Call Guard

Scans tool execution outputs in real-time to detect risky responses, data exfiltration attempts, and prompt injection before they reach your LLMs.

Tool Intent Guard

Detects tool intent drift from prompt injections that lead to data exfiltration, privilege escalation, credential theft, and infrastructure damage. Unlike the Tool Call Guard which evaluates individual responses, Tool Intent Guard analyzes tool inputs and outputs together to detect semantic misalignment — catching cases where a tool’s actual behavior diverges from what was requested.

Skill File Scanning

ToolGuard can also scan skill files uploaded to the platform. Each file’s content is analyzed using the same threat classification models, with results cached per content hash and scanner version. Large files are automatically chunked for processing. Skill file scanning runs when skills are uploaded via the CLI (skills push) or the API, producing per-file risk scores and an overall skill-level classification.

Skill Risk Policy

Admins can configure how the platform responds when a skill scan detects elevated risk. Navigate to Settings → Security Scanners to set the action for each risk tier:
Risk tierDefault actionOptions
HighBlockBlock, Alert, Allow
MediumAlertBlock, Alert, Allow
  • Block — the skill import is rejected
  • Alert — the skill is imported with a warning badge visible in the UI; acceptance is logged to the Audit Log
  • Allow — the skill is imported without restriction
Low and Minimal risk skills are always allowed. These settings apply globally to all skill imports (CLI, API, and web UI).

LLM Risk Categorization

When Tool List Guard flags a tool at Medium or High risk, an LLM-based categorizer automatically classifies the threat into specific attack categories. The full taxonomy includes:
  • Prompt Injection
  • Data Exfiltration
  • Privilege Escalation
  • Destructive Action
  • Unauthorized Communication
  • Resource Abuse
  • Shadow Persistence
  • Context Poisoning
  • Guardrail Bypass
  • Supply Chain Compromise
Categories are not mutually exclusive — a single tool can match multiple categories. These labels appear in security violation details and audit logs, replacing generic “risky tool” messages with actionable context so you know what kind of threat was detected.
LLM categorization requires the Bedrock integration to be enabled in your deployment. When disabled, violations retain the default ToolGuard reason.
Additional specialized models are in development to address new attack vectors as they emerge in the MCP ecosystem.

Why Industry-Leading?

Purpose-Built for MCP - Custom-trained threat classification models specifically designed for MCP ecosystem attacks. High Performance - Fast inference with typical scan times of 50-100ms. Continuously Evolving - Models are regularly refined based on emerging threat patterns. Battle-Tested - Deployed in production environments protecting real-world MCP deployments. Enterprise-Ready - Complete audit logging, flexible configuration, and Security Dashboard integration.

Configuration

Navigate to Settings → Security Scanners to enable Runlayer ToolGuard models.

Sensitivity Levels

Each scanner phase (Tool List Guard, Tool Call Guard, Tool Intent Guard) has a configurable sensitivity that controls how aggressively it flags findings:
LevelBehavior
StrictLowest tolerance — flags more items, fewer false negatives
Balanced (default)Recommended for most environments
ModerateHighest tolerance — fewer flags, useful for noisy connectors
Sensitivity is set globally in Settings → Security Scanners and can be overridden per connector in the connector’s security settings. Connectors without an explicit override inherit the global value.

Violation Actions

Each scanner has a configurable action that controls what happens when it detects a finding. The available actions depend on the scanner type:
ActionBehavior
BlockReject the request — the tool call does not proceed
Block (Self-Approve)Block the request but allow the caller to approve and retry (per-connector only, PII scanner)
MaskRedact or strip the detected content, then let the request continue
AlertLog the detection but take no blocking action — the request proceeds unchanged
AllowDisable the scanner entirely
Not every scanner supports every action. The table below shows which actions are available for each scanner:
ScannerAllowAlertMaskBlock
PII detectionYesYesYes (redact PII values)Yes
Invisible character detectionYesYesYes (strip hidden chars)Yes
Credential detectionYesYesYes (mask credentials)Yes
ToolGuard ML scanners (Tool List Guard, Tool Call Guard, Tool Intent Guard)YesYesYes
Skill risk policyYesYes (warn on import)Yes
Defaults: PII detection defaults to Alert. Invisible character detection and credential detection default to Mask. ToolGuard ML scanners default to Block. Actions are set globally in Settings → Security Scanners and can be overridden per connector.

PII Detection

PII scanning uses pattern-based detection with validators to identify sensitive data in MCP traffic. The following built-in PII types are detected:
TypeDescription
SSNUS Social Security numbers
Credit CardCard numbers (validated with Luhn check)
PhonePhone numbers (with context validation)
EmailEmail addresses
PassportPassport numbers (with context validation)
Driver’s LicenseUS driver’s license numbers
IBANInternational Bank Account Numbers
IP AddressIPv4 addresses
Date of BirthDates of birth (with context validation)
MRNMedical Record Numbers
VINVehicle Identification Numbers (with checksum validation)
BitcoinBitcoin wallet addresses
EthereumEthereum wallet addresses
Admins can also add custom PII rules with regex patterns in Settings → Security Scanners. Custom rules run alongside built-in patterns and appear in audit logs with a CUSTOM label. Individual built-in types can be disabled per organization.

PII Scan Direction

PII detection can be applied to tool inputs, outputs, or both. The direction controls which traffic the PII scanner inspects:
DirectionBehavior
Input (default)Scans data sent to MCP tools
OutputScans data returned from MCP tools
BothScans in both directions
Set the direction globally in Settings → Security Scanners and override per connector in the connector’s security settings. Connectors without an override inherit the global value.

Risk Tiers

Tool List Guard assigns a risk tier to each scanned tool based on its confidence score. The default thresholds are:
TierScore RangeMeaning
Minimal< 0.6Clean scan, no concern
Low0.6 – 0.7Low-confidence flag
Medium0.7 – 0.9Elevated risk, review recommended
High≥ 0.9High-confidence detection
Self-hosted deployments can tune these thresholds via environment variables:
  • RUNLAYER_TOOL_GUARD_LIST_RISK_THRESHOLD_LOW (default 0.6)
  • RUNLAYER_TOOL_GUARD_LIST_RISK_THRESHOLD_MEDIUM (default 0.7)
  • RUNLAYER_TOOL_GUARD_LIST_RISK_THRESHOLD_HIGH (default 0.9)
If you don’t see Runlayer ToolGuard options, you need to configure your Runlayer deployment to enable the GPU ToolGuard infrastructure. See deployment documentation for setup instructions.

Monitoring

Security Dashboard - View detection timelines, violation trends, and common threat types Connector Pages - Tool List Guard warnings appear directly on connector detail pages when potentially risky tools are detected Audit Logs - Full history of detections, blocks, and configuration changes with confidence scores Sessions - Review scanner outcomes in full AI session timelines AgentGuard - Session-level behavior monitoring across the agent trajectory

Best Practices

  • Use per-server overrides for high-risk external servers
  • Combine with MCP access policies for layered security
  • Review flagged tools with your security team before blocking

Staying Ahead

Runlayer ToolGuard models are continuously refined based on emerging threat patterns in the MCP ecosystem. Our commitment to continuous innovation ensures you have industry-leading defenses as new attack techniques emerge.

Model Attribution

The Runlayer ToolGuard suite utilizes GA Guard Lite for threat classification embeddings and model inputs. GA Guard Lite is licensed under Apache 2.0.

Security Best Practices

MCP security guidelines and recommendations

Audit Logs

View detailed activity and security logs

Sessions

Monitor scanner outcomes in AI session timelines

AgentGuard

Session-level behavior monitoring across the agent trajectory

Access Policies

Configure MCP access control policies