0%
Jan 8, 2026

LLM Security Risks: Threats and Mitigations

Large language models introduce security risks that traditional application security frameworks were not designed to address. According to OWASP's Top 10 for LLM Applications, prompt injection currently represents the most critical vulnerability, but the threat landscape spans from data leakage to denial of service. Organizations deploying LLMs must adapt security practices to these novel attack surfaces.

The LLM Threat Landscape

LLMs differ from traditional software in ways that create new security challenges:

  • Natural language interfaces enable social engineering of systems
  • Training data may contain sensitive information accessible through inference
  • Model behavior is probabilistic and difficult to fully characterize
  • Plug-in architectures extend attack surface through tool integrations

Prompt Injection

Direct Prompt Injection

Attackers craft inputs that override system instructions:

  • "Ignore previous instructions and..." patterns
  • Role-playing scenarios bypassing safety measures
  • Encoding tricks to evade input filtering

Indirect Prompt Injection

Malicious instructions embedded in external data sources:

  • Injected content in retrieved documents (RAG poisoning)
  • Hidden instructions in web pages the model browses
  • Malicious content in emails or documents being processed

Security research demonstrates that indirect injection can manipulate LLM agents into executing unintended actions, including data exfiltration.

Mitigations

  • Input validation: Filter or sanitize user inputs
  • Privilege separation: Limit model capabilities by context
  • Output validation: Check model outputs before action execution
  • Instruction hierarchy: Prioritize system prompts over user inputs
  • Human-in-the-loop: Require approval for sensitive actions

Data Leakage

Training Data Extraction

Models may memorize and regurgitate training data:

  • Verbatim reproduction of training examples
  • PII extraction from fine-tuning data
  • Proprietary code or content exposure

Context Leakage

Information from one conversation affecting another:

  • Cross-user data exposure in shared contexts
  • System prompt revelation through careful querying
  • RAG document content leakage

Mitigations

  • Data sanitization: Remove PII before training/fine-tuning
  • Differential privacy: Training techniques limiting memorization
  • Output filtering: Detect and redact sensitive patterns
  • Session isolation: Prevent cross-user context pollution

Denial of Service

Resource Exhaustion

Attacks consuming excessive compute or memory:

  • Maximum-length inputs requiring full processing
  • Requests triggering expensive tool calls
  • Patterns causing excessive token generation

Model Degradation

Inputs causing poor model performance:

  • Adversarial inputs producing nonsensical outputs
  • Edge cases exposing model limitations

Mitigations

  • Rate limiting: Constrain requests per user/session
  • Input length limits: Cap token counts
  • Output length limits: Prevent runaway generation
  • Timeout enforcement: Kill long-running requests
  • Cost attribution: Track and limit per-user resource consumption

Supply Chain Risks

Model Provenance

Risks from model origin and distribution:

  • Backdoored models from untrusted sources
  • Tampered weights during distribution
  • Outdated models with known vulnerabilities

Dependency Risks

  • Vulnerable ML libraries
  • Compromised tokenizers or preprocessing
  • Malicious plugins or extensions

Mitigations

  • Verify model signatures: Cryptographic verification of weights
  • Trusted sources: Use reputable model providers
  • Dependency scanning: Regular vulnerability assessment
  • Version pinning: Control dependency updates

Plugin and Tool Security

Tool Abuse

LLM agents with tool access face additional risks:

  • Unintended tool invocation through prompt injection
  • Excessive permission grants to tools
  • Tool outputs influencing subsequent model behavior

Mitigations

  • Least privilege: Minimal permissions for each tool
  • Action confirmation: Human approval for sensitive operations
  • Tool sandboxing: Isolate tool execution environments
  • Audit logging: Track all tool invocations

Security Architecture

Defense in Depth

Multiple layers of protection:

  1. Input layer: Validation, sanitization, rate limiting
  2. Model layer: Instruction hierarchy, capability limits
  3. Output layer: Filtering, validation, human review
  4. Integration layer: Tool permissions, action confirmation

Monitoring and Detection

  • Anomaly detection for unusual query patterns
  • Content classification for policy violations
  • Tool usage monitoring for abuse
  • Performance monitoring for DoS indicators

Incident Response

  • Defined procedures for security incidents
  • Kill switches for rapid model disabling
  • Communication plans for stakeholders
  • Post-incident analysis processes

Testing and Assessment

Red Teaming

Adversarial testing of LLM systems:

  • Prompt injection attempts
  • Jailbreaking and safety bypass
  • Data extraction testing
  • Tool abuse scenarios

Automated Testing

  • Fuzzing with adversarial inputs
  • Regression testing for known vulnerabilities
  • Continuous security scanning

Third-Party Assessment

  • Independent security audits
  • Penetration testing by specialists
  • Bug bounty programs

Organizational Practices

Security Policies

  • Acceptable use policies for LLM systems
  • Data classification for LLM processing
  • Incident reporting procedures

Training and Awareness

  • Developer training on LLM security
  • User awareness of risks and limitations
  • Security team familiarity with LLM-specific threats

Governance

  • Security review for LLM deployments
  • Ongoing risk assessment
  • Compliance with emerging AI regulations

At Arazon, we help organizations deploy LLM applications securely, addressing novel risks while maintaining utility. Contact us to discuss how security best practices can protect your AI deployments.