Overview
IronClaw implements defense-in-depth against prompt injection attacks that attempt to manipulate the AI’s behavior through malicious instructions embedded in external data sources (emails, web pages, API responses, etc.).Security Layers
Input Validation
Validator Checks
Before processing any input, the validator enforces basic constraints:Validation Rules
| Check | Action | Severity |
|---|---|---|
| Empty input | Reject | Error |
| Too long | Reject | Error |
| Null bytes | Reject | Error |
| Forbidden patterns | Reject | Error |
| Excessive whitespace (>90%) | Warn | Warning |
| Repeated characters (>20 in a row) | Warn | Warning |
Validation warnings don’t block processing but are logged for monitoring suspicious input patterns.
Policy Enforcement
Default Policy Rules
The safety layer includes pre-configured rules for common threats:Built-in Rules
| Rule ID | Pattern | Severity | Action |
|---|---|---|---|
system_file_access | /etc/passwd, .ssh/, .aws/credentials | Critical | Block |
crypto_private_key | Private key patterns (64-char hex after “private key”) | Critical | Block |
sql_pattern | DROP TABLE, DELETE FROM, etc. | Medium | Warn |
shell_injection | ; rm -rf, ; curl ... | sh | Critical | Block |
excessive_urls | 10+ URLs in one message | Low | Warn |
encoded_exploit | base64_decode, eval(base64, atob( | High | Sanitize |
obfuscated_string | 500+ non-whitespace characters | Medium | Warn |
Custom Rules
Add application-specific rules:Content Sanitization
Sanitizer Operations
Wheninjection_check_enabled=true or a policy rule triggers PolicyAction::Sanitize, the sanitizer:
- Removes dangerous patterns: Strips known injection markers
- Escapes special characters: Prevents markup interpretation
- Strips ANSI codes: Removes terminal control sequences
- Normalizes whitespace: Collapses excessive spacing
Injection Warnings
The sanitizer detects and logs suspicious patterns:Sanitized outputs include a
was_modified: bool flag so callers can decide whether to use the modified content or reject the input entirely.External Content Wrapping
Security Notice Wrapper
When injecting external data into the conversation, wrap it with explicit instructions for the LLM:Tool Output Wrapping
XML Delimiters
Tool outputs are wrapped in XML tags before being sent to the LLM:- Clear structural boundary: LLM knows this is data, not instructions
- Metadata tracking:
sanitizedattribute indicates processing - XML escaping:
<,>,&are escaped to prevent tag injection
Leak Detection
Secret Scanning
The safety layer includes a leak detector that scans content for secret patterns:“Your message appears to contain a secret (API key, token, or credential). For security, it was not sent to the AI. Please remove the secret and try again.”
The leak detector also scans tool outputs before they reach the LLM. See Credential Protection for details.
Safety Configuration
Configuration Options
Disabling Checks
For trusted environments or testing:Threat Models
Direct Injection
Attack: User includes instructions in their own messageTool Output Injection
Attack: Malicious API embeds instructions in response- Sanitizer removes
SYSTEM:markers - XML wrapper creates structural boundary
- Policy blocks dangerous patterns
Email/Webhook Injection
Attack: External email contains instructions- External content wrapper with security notice
- Policy blocks SQL patterns
- Context tracking shows source is external
Indirect Injection via Files
Attack: Malicious content in workspace file- HTML comment stripping during sanitization
- File read operations logged for audit
- Workspace isolation (WASM tools have limited access)
Best Practices
For Users
- Review external data: Don’t blindly trust content from emails, webhooks, or web scraping
- Use allowlists: Restrict which tools can process external data
- Monitor audit logs: Check for suspicious tool invocations
- Report false positives: Help improve detection patterns
For Developers
- Always sanitize external inputs: Use
safety_layer.sanitize_tool_output() - Wrap untrusted content: Use
wrap_external_content()for emails, webhooks, etc. - Implement tool allowlists: Don’t let tools call arbitrary other tools
- Log security events: Track blocked patterns and sanitization
- Test with malicious inputs: Include injection attacks in your test suite
For System Administrators
- Enable all safety layers: Don’t disable checks unless absolutely necessary
- Customize policies: Add rules for your specific threat model
- Monitor sanitization rates: High rates may indicate attack attempts
- Update patterns regularly: New injection techniques emerge constantly
- Audit external integrations: Review which tools access external data
Limitations
Not a Perfect Defense
Prompt injection defense is an arms race. The safety layer provides multiple barriers but cannot guarantee complete protection:- LLM behavior is unpredictable: Models may interpret instructions in unexpected ways
- Pattern evasion: Attackers can obfuscate malicious instructions
- Context overflow: Very long external content may dilute safety notices
- Model capabilities: Advanced models may be better at ignoring safeguards
Complementary Mitigations
- Human-in-the-loop: Require approval for sensitive operations
- Capability restrictions: Limit what tools can do even if compromised
- Audit logging: Track all actions for forensic analysis
- Rate limiting: Prevent automated attack attempts
- Network isolation: Restrict outbound connections from tools
Related Sections
- WASM Sandbox - Capability-based tool isolation
- Credential Protection - Secrets management
- Data Protection - Local storage encryption
