Distillation & Anti-Extraction Techniques in AI Systems
๐ง 1. What is Distillation?โ
Distillation (Knowledge Distillation) is the process of transferring knowledge from a large, complex model (teacher) to a smaller, efficient model (student).
Why it is used:โ
- Reduce latency
- Reduce cost
- Improve scalability
๐ 2. Real-World Usage of Distillationโ
1. Chatbots & Customer Supportโ
- Large model handles initial queries
- Data is logged (input + output)
- Smaller model is trained on common queries
2. Mobile / Edge AIโ
- Large model trained in cloud
- Distilled model deployed on-device
- Enables offline + low-latency execution
3. Recommendation Systemsโ
- Large model learns ranking patterns
- Small model serves real-time recommendations
4. RAG Systemsโ
- Large model generates high-quality answers
- Data stored and reused
- Smaller model handles repeated queries
5. Code Assistantsโ
- Large model for deep reasoning
- Small model for autocomplete
6. Fraud Detectionโ
- Complex model detects patterns
- Distilled model used for real-time decisions
๐ก๏ธ 3. Risk: Unauthorized Distillation (Model Extraction)โ
Attackers may:
- Send large number of queries
- Collect input-output pairs
- Train their own model
๐ 4. How Companies Prevent Itโ
1. Rate Limitingโ
- Limits requests per user
- Prevents bulk extraction
2. Behavioral Monitoringโ
- Detects structured or repeated queries
- Flags suspicious usage
3. Output Randomizationโ
- Slight variation in responses
- Prevents clean dataset generation
4. Output Limitingโ
- Avoids giving exhaustive or structured outputs
5. Legal Protectionโ
- Terms prohibit training on outputs
6. No Access to Internal Signalsโ
- No logits or probabilities exposed
๐งจ 5. Claude Code Leak Learningsโ
Key Insight:โ
Security is not just about protecting the model, but controlling:
- Outputs
- Interfaces
- System architecture
๐ญ 6. Fake Tool Injection (Core Concept)โ
What it is:โ
Injecting fake/decoy tools into the tool list provided to the LLM.
Example:โ
- Real: search_docs, run_code
- Fake: advanced_debugger_v2
๐ฏ 7. Why Fake Tool Injection Worksโ
1. Corrupts Training Dataโ
- Attackers collect noisy data
- Leads to poor distilled models
2. Adds Non-Determinismโ
- Same input โ different tool usage
3. Hides Real Capabilitiesโ
- Hard to distinguish real vs fake tools
4. Breaks Planning-Level Distillationโ
- Affects tool selection logic, not just output
โ๏ธ 8. How System Handles Fake Toolsโ
Key Idea:โ
LLM suggests โ Orchestrator validates
Architecture:โ
User โ LLM โ Tool Selection โ Orchestrator โ Execution
Orchestrator Responsibilities:โ
- Validate tool
- Execute real tools
- Handle fake tools safely
๐ 9. Fake Tool Handling Strategiesโ
1. Ignore & Retryโ
- Reject fake tool
- Ask model to retry
2. Simulate Responseโ
- Return fake but plausible output
3. Force Valid Toolsโ
- Guide model to use real tools
๐ง 10. Core Design Principleโ
LLM is probabilistic (can be wrong) Backend is deterministic (enforces correctness)
๐ฅ 11. Key Takeawaysโ
- Distillation is essential for scaling AI systems
- Preventing distillation is about increasing cost, not eliminating it
- Fake tool injection is a form of defensive data poisoning
- Separation of LLM and execution layer is critical