The 4 Learning Types of Modern AI
ChatGPT is not a single model; it's the result of a precise, sequential pipeline that combines four fundamentally different ways of learning. This is how raw text becomes an assistant.
Modern AI systems combine multiple strategies. GPT, Claude, and Gemini are not just "trained"—they are carefully orchestrated through a sequence of learning paradigms.
The Four Flavors of Machine Learning
Understanding these categories is the first step to understanding how any AI application actually works.
The ML Paradigms
- Metaphor: The Classroom.
- Signal: Human labels.
- Use Case: Classification (Spam vs Not Spam).
- Metaphor: The Detective.
- Signal: Natural patterns.
- Use Case: Clustering (Grouping similar items).
- Metaphor: The Gamer.
- Signal: Reward/Penalty.
- Use Case: Games (AlphaGo), RLHF.
- Metaphor: The Star.
- Signal: Mask-and-Predict.
- Use Case: Pre-training all LLMs.
The Secret 3-Step Pipeline
OpenAI (and every major lab) stacks these learning types into a precise sequence to build ChatGPT.
From Raw Text to Assistant
Type: Self-Supervised. Trillions of words from the web. Builds "World Knowledge."
Type: Supervised. 10,000+ human-written examples. Builds "Instruction Following."
Type: Reinforcement. 100,000+ preference rankings. Builds "Human Taste."
The Scale of the Moat
The difference between these stages is what separates research projects from production-grade AI.
- Pre-training: 600B+ words, months on thousands of GPUs, ~$100M cost.
- SFT: 10K–100K curated examples written by expert humans.
- RLHF: 100K–1M human preference comparisons for safety and tone.
- Result: A model that is not just smart, but helpful and safe.
Base Model vs. Instruct Model
Why you should never use a raw model for a chat application.
Model Personality
- Training: Pre-training only.
- Behavior: Continues text (Wikipedia style).
- Result: Q: "What is 2+2?" A: "Addition is a basic..."
- Training: SFT + RLHF added.
- Behavior: Answers questions directly.
- Result: Q: "What is 2+2?" A: "4."
Key Takeaways
The reason you can't replicate GPT-4 in your basement is the pre-training scale. But you can apply SFT and RLHF to open models to create your own specialty AI.
SFT (Step 2) teaches the model how to be correct. RLHF (Step 3) teaches it how to be high-quality and aligned with human preferences.
Safety isn't a filter bolted on after training. It's baked into the model's "taste" during the final reinforcement learning phase.