How to Use
LLMs

Beyond the internals — a practical walkthrough of how to actually use large language models in your daily work. Based on Andrej Karpathy's follow-up to his LLM deep dive.

Tools Covered: 14+
Use Cases: 12
Models: 8+
Source: 3h

Companion to Part 1: How LLMs Work. All content and examples traced directly to Karpathy's 2025 video.

Q: What should I know before using this?

Chapter 1 · Foundation

You're Talking to
a ZIP File

Karpathy's mental model: ChatGPT is a "one-tab ZIP file" — a highly compressed snapshot of the internet. It read virtually every web page, book, and document up to its training cutoff, roughly 6–12 months ago. What comes back is a probabilistic recollection of that data.

The context window is its working memory — a finite tape of tokens it can see right now. Anything in it is directly accessible. Anything outside it doesn't exist for this conversation. There's no persistent memory between sessions unless you enable it.

There's no live connection to the web by default. The model produces the most statistically likely continuation of your prompt — not a lookup, not a search, not a guarantee.

The Introduction "Hi, I'm ChatGPT. I'm a one-tab ZIP file. My knowledge comes from reading the internet about 6 months ago. I only know what's in this conversation. Every word I generate is a probabilistic sample — treat it accordingly."

Context Window · live working memory

system

You are a helpful assistant.

user

How much caffeine is in an Americano?

assistant

About 63mg per shot...

user

What about a double shot?

▌

Everything above is visible to the model. The moment the window fills, old context falls off the edge and is gone.

Stale knowledge caveat For timeless facts like caffeine content, the model's weights are reliable. For last month's news — they're not. Know the difference before trusting the answer.

Chapter 2 · Ecosystem

Models &
Tiers

ChatGPT is the "Original Gangster" — most features, most popular, most polished. But the ecosystem has exploded since 2022. Pick the right tool for the task.

ChatGPT

OpenAI

The original. Most features: web search, deep research, code execution, advanced voice, image generation, memory. Karpathy's primary demo throughout.

Primary pickMost features

Claude

Anthropic

Exceptional at coding and document analysis. Powers Cursor (3.7 Sonnet) under the hood. Often outperforms on nuanced reasoning tasks.

CodingDocs

Gemini

Google

Google's entrant. Gemini 2.0 Pro experimental available. Deep integration with Google Workspace. Strong multimodal capabilities.

Multimodal

Perplexity

Perplexity AI

Search-first LLM. Always retrieves and cites sources. Karpathy demoed its Deep Research feature for the rapamycin research example.

Search-firstCitations

Le Chat

Mistral

French startup alternative. Mistral's consumer chat interface. Strong at European languages and code.

Alternative

DeepSeek

Chinese AI lab. Surprisingly strong at code and reasoning. Different training approach from US labs — worth benchmarking.

AlternativeCode

Model Families

OpenAI

GPT-4o fast · smart · default

o1 / o3 / o3-mini thinking models

o1 Pro $200/mo · deep reasoning

Anthropic

Claude 3.7 Sonnet coding + reasoning

Claude 3.5 Sonnet fast + capable

Claude Haiku lightweight

Google

Gemini 2.0 Pro multimodal

Gemini Flash fast · often free

Others

DeepSeek Chinese · strong at code

Mistral French · Le Chat

Where to compare LM Arena (lmarena.ai) — formerly Chatbot Arena — maintains a live leaderboard ranked by human preference votes. It's the most reliable signal for "which model is actually better right now."

Chapter 3 · Reasoning

Thinking
Models

OpenAI's o1, o1 Pro, o3, and o3-mini are a different breed — all model names starting with "o" are thinking models. Before returning an answer, they run an extended internal monologue: exploring approaches, backtracking, trying alternatives.

This emerged from reinforcement learning: the model discovered that deliberation strategies lead to better outcomes on hard problems. It tries different ideas, backtracks, checks its reasoning — much like the inner monologue you have when problem-solving.

Karpathy noted that Claude 3.7 Sonnet (non-thinking) solved a hard coding problem that o1 Pro could not. Model selection isn't always obvious — the right tool depends on the specific task.

When to use thinking models Hard math, complex multi-step code, formal reasoning, logic puzzles. Skip them for simple tasks — they're slower, more expensive, and deliberation helps less when there's nothing difficult to reason through.

o1 Pro · Extended Thinking ready

"Prove that the sum of two odd numbers is always even."

Click Run to see extended thinking unfold

Chapter 4 · Information

When to
Search

By default the model runs on its weights alone — no internet, no live data. Enabling web search means it retrieves pages first, then synthesizes an answer. This costs latency but unlocks real-time information.

The key question: is the model's stale recollection good enough? For well-documented, timeless knowledge — yes. For anything time-sensitive, recent, or niche — enable search or use Perplexity.

✕
Skip search: "How much caffeine in an Americano?" — well-documented, timeless, model knows it
✕
Skip search: "Explain Rayleigh scattering" — textbook physics, no search needed
✓
Use search: "When does White Lotus Season 3 air?" — time-sensitive, release dates change
✓
Use search: "Is it safe to travel to Vietnam right now?" — current situation may differ from training
✓
Use search: "What's the deal with recent USAID cuts?" — recent news, not in training data
✓
Use search: "What toothpaste does [person] use?" — niche, possibly recent, esoteric

Should you enable web search?

Is this information time-sensitive or potentially outdated?

No — timeless

Is it niche or not well-documented on the web?

No

Skip search — weights are enough

Yes

Enable search

Yes — recent / changing

Enable search

Perplexity vs ChatGPT search Perplexity always searches — it's search-first by design. ChatGPT's search is opt-in per message. For research-heavy workflows, Perplexity's default-on approach often saves the decision overhead.

Chapter 5 · Synthesis

Deep
Research

Deep Research = extended thinking + web search, run for 5–15 minutes. The model searches dozens of sources in parallel, reasons across them, and produces a structured report — work that would take a human researcher hours.

Karpathy's demo: researching rapamycin and longevity. The model looked at 27+ sources, thought for 5 minutes, and produced a report covering mechanism of action (mTOR inhibition), worm/mouse/human trial data, safety concerns, and ongoing studies.

Both ChatGPT Deep Research (requires $200/mo Pro) and Perplexity's research mode offer this. For literature reviews, competitive analysis, and due diligence — it dramatically lowers the research bar.

Best for Scientific literature surveys, competitive landscape analysis, due diligence on decisions, medical/legal research (with verification). Not worth it for simple factual questions.

Deep Research Pipeline

1

Query Planning

Break the question into subtopics and parallel search queries

2

Parallel Web Search

Fetches 20–30 sources simultaneously across subtopics

3

Extended Thinking

Reasons across sources, resolves conflicts, identifies gaps

4

Report Generation

Structured report with citations, mechanisms, caveats

Chapter 6 · Reading

Docs &
Books

Attaching documents transforms the model into a reading assistant. Upload a PDF, paste a chapter, share a spreadsheet — then ask questions, request summaries, or generate conceptual diagrams from the content.

Karpathy's example: reading The Wealth of Nations with Claude. "I'm attaching Chapter 3, Book 1 — please create a conceptual diagram of this chapter." Claude responds with Mermaid code, a diagram markup language that renders as a graph connecting key concepts.

For data, the model can write and run Python to generate charts. But treat it as a very junior data analyst — brilliant at writing the code, but it hallucinated a "1.7 trillion" figure in a chart Karpathy caught. Always scrutinize the numbers, not just the chart shape.

Key workflow Attach document → request summary → drill into sections with Q&A → ask for concept map (Mermaid) → verify any specific figures against the source.

The Wealth of Nations · Book I, Ch. 3 Claude 3.7 Sonnet

Please create a conceptual diagram of this chapter

mermaid

graph TD
  A[Division of Labor] -->|limited by| B[Extent of Market]
  B --> C[Local Market]
  B --> D[Trade Networks]
  D --> E[Water Transport]
  D --> F[Land Transport]
  E -->|lower cost| G[Coastal Cities]
  G -->|develop first| H[Specialization]

Rendered diagram showing how Smith argues market size constrains specialization — larger markets enable deeper division of labor.

What implicit assumption is in node D?

Node D assumes that trade networks are legally and politically accessible — Smith's implicit premise that functioning exchange infrastructure already exists. He notes this precondition without arguing for it.

Chapter 7 · Data Analysis

Code
Execution

ChatGPT's Advanced Data Analysis wires the model to a live Python runtime. You describe a task in plain language — it writes code, runs it, and shows you the result. No copy-paste, no local setup.

This is the integration of language with computation. Arithmetic, statistics, data cleaning, chart generation — anything Python can do. Upload a CSV and ask for a trend analysis; get a matplotlib chart in seconds.

Karpathy's caution: he caught the model generating a chart with a hallucinated "1.7 trillion" instead of the correct value. The code ran fine; the number was wrong. Treat it like a very capable but unreliable junior — verify the figures, not just the output shape.

The rule Use code execution when you need computation, transformation, or visualization. Always check: does the generated code match what you asked? Does the output look plausible against your source data?

Advanced Data Analysis · Python Runtime

"Plot GDP growth for G7 countries from 1990–2023"

Chapter 8 · Development

Agentic
Coding

Beyond chat, a new class of tools integrates LLMs directly into your code editor. Cursor and Windsurf run Claude or GPT under the hood, operating autonomously across your entire codebase — reading files, writing code, running commands, and iterating.

Cursor's Composer (⌘I) is an autonomous agent loop: describe a task, and it plans, writes files, runs shell commands, reads errors, and loops — asking your confirmation before any destructive action. Karpathy built a React app from scratch in a few minutes.

The model under the hood in Karpathy's setup: Claude 3.7 Sonnet. The key insight is that these tools are most powerful when you understand the model well enough to guide and correct it, not just prompt and hope.

Cursor keyboard shortcuts ⌘K inline edit · ⌘L chat sidebar · ⌘I Composer (agentic)

Composer Agent Loop

1

Plan

Break the task into file changes and shell commands

↓

2

Generate

Write or edit source files across the codebase

↓

3

Execute

Run shell commands — asks your approval first

↓

4

Observe

Read output, catch errors, update its plan

↺ loops until done or stuck

Chapter 9 · Multimodal

Voice &
Audio

Karpathy routes roughly half his queries through voice using Super Whisper — his pick among Super Whisper, WhisperFlow, and MacWhisper. Press a hotkey, speak, press again — query transcribed and sent. No typing, no friction.

ChatGPT's Advanced Voice Mode goes further: audio tokens flow directly to and from the model, with no text transcription layer. The result feels genuinely conversational, not a text-to-speech wrapper.

NotebookLM (Google) generates audio podcasts from your documents. Upload papers, books, or notes — it produces a two-host discussion. Karpathy uses it on walks and long drives for passive learning on topics outside his expertise.

Voice tip from Karpathy For queries with product names, library names, or technical terms — switch to typing. Whisper often mistranscribes niche technical vocabulary. Voice is best for natural-language questions.

🎙

Speak

→

⚡

Whisper
transcribes

→

🧠

LLM
responds

→

📝

Text
response

Super Whisper

Karpathy's pick · Mac

Global hotkey to record → auto-transcribe → paste anywhere. Works system-wide.

NotebookLM

Google · Free

Upload docs → generate a two-host podcast discussion. Good for passive learning.

Advanced Voice

ChatGPT

Native audio tokens — low latency, no transcription layer, genuinely conversational.

Chapter 10 · Visual Input

Vision &
Camera

Modern LLMs accept images as input — photos, screenshots, scans, diagrams. The model reasons about visual content as fluently as it reasons about text, drawing on training data that included billions of image-text pairs.

Karpathy's examples: uploading a blood test scan for interpretation, pointing a camera at an Aeronet 4 CO2 monitor to identify the device and interpret the 713 PPM reading, and showing a Lord of the Rings map which it correctly identified as Middle-Earth.

Vision is most reliable for well-documented subjects — blood test reference ranges, common consumer devices, famous maps — where training data covers the domain thoroughly. For proprietary or rare objects, expect more hallucination.

Strong vision use cases Identifying unknown objects, interpreting standard lab results, explaining charts and diagrams, OCR on printed text, reading handwriting, and analyzing screenshots.

🩸

Blood Test Panel

"Here are my lab results — explain the flagged values"

Works well — ranges are extensively documented in training data. Karpathy verified the ingredient lists against the actual box. Always confirm with a doctor for medical decisions.

📊

CO2 Monitor (Aeronet 4)

"What is this device, and is 713 PPM a good reading?"

Correctly identified the device, explained that 713 PPM is acceptable indoors (target: below 800 PPM, ventilate above 1000 PPM).

🗺

Fantasy Map Identification

"Do you know what this map is?"

Immediately identified as the map of Middle-Earth from The Lord of the Rings — a famous, widely-reproduced image in training data.

Chapter 11 · Personalization

Memory &
Personalization

By default, every conversation is stateless — the model forgets everything when the tab closes. Two features change this: Memory (ChatGPT auto-saves facts about you across sessions) and Custom Instructions (a persistent system prompt shaping every response).

Karpathy's custom instructions: request educational framing ("be educational whenever you can"), set Korean language formality register for language learning, and share context about his work and interests.

Think of custom instructions as your personal system prompt — it loads before every conversation. Good instructions compress preferences you'd otherwise repeat on every query, making each session feel like it already knows you.

Starter custom instructions "Be concise. Prefer code over prose when both work. When I give you a document, start with a one-paragraph summary. Flag your assumptions explicitly. I work in [your field]."

Custom Instructions · ChatGPT

What should ChatGPT know about you?

I'm a software engineer interested in ML. I prefer concise, technical answers. I'm learning Korean — when providing Korean text, use polite-formal register (합쇼체) by default.

How should ChatGPT respond?

Be educational when explaining concepts. Lead with the most important information first. Use code snippets liberally. Flag any assumptions you make explicitly.

Memory · auto-saved across sessions

User prefers bullet lists for multi-step summaries

User monitors indoor CO2 levels at home

User is learning Korean, wants 합쇼체 register

+ saved memories accumulate over time

Chapter 12 · Reference

Tools &
Resources

Every tool, model, and resource mentioned in Karpathy's lecture — linked and categorized.

LLM Apps

ChatGPT

OpenAI

The original. Most features: web search, deep research, code execution, voice, vision, memory. Karpathy's primary demo throughout.

PrimaryMost features

Claude

Anthropic · claude.ai

Exceptional at coding and document analysis. Powers Cursor (Claude 3.7 Sonnet). Strong nuanced reasoning.

CodingDocs

Gemini

Google · gemini.google.com

Google's LLM app. Gemini 2.0 Pro experimental. Deep Google Workspace integration and strong multimodal.

Multimodal

Perplexity

Perplexity AI · perplexity.ai

Search-first LLM — always retrieves and cites sources. Karpathy demoed its Deep Research feature. Great default for research.

Search-firstCitations

Le Chat

Mistral · chat.mistral.ai

French startup alternative. Mistral's consumer chat interface. Strong at European languages and code.

Alternative

DeepSeek

Chinese AI lab with surprisingly strong code and reasoning. Different training methodology — worth benchmarking against US labs.

AlternativeCode

Developer & Power-User Tools

Cursor

Cursor · cursor.com

Karpathy's coding IDE of choice. Agentic Composer mode (⌘I) runs Claude 3.7 Sonnet across your entire codebase autonomously.

Karpathy's pickAgentic

Windsurf

Codeium · windsurf.com

VS Code-based agentic coding IDE. Cursor alternative — mentioned alongside Cursor and VS Code as the main options.

Agentic

Super Whisper

Super Whisper · Mac

Karpathy's voice input tool of choice. Global hotkey → record → auto-transcribe → paste. Handles ~half his queries.

Karpathy's pickVoice

NotebookLM

Google · notebooklm.google.com

Generate two-host audio podcast discussions from any documents. Karpathy uses for passive learning on walks and drives.

Audio

Ideogram

Ideogram · ideogram.ai

Image generation tool. Used for several images in the lecture as an alternative to DALL-E.

Image Gen

Mermaid

mermaid.js.org

Diagram-from-code library. When you ask Claude for a "conceptual diagram," it often produces Mermaid markup that renders as a graph.

Diagrams

Reference & Further Reading

LM Arena

LMSYS · lmarena.ai

Live model leaderboard ranked by human preference votes (Chatbot Arena). Best signal for "which model is actually better right now."

Leaderboard

Project Gutenberg

gutenberg.org

Free public-domain books in plain text. Karpathy used it to get The Wealth of Nations for LLM document analysis demos.

Books

Part 1: How LLMs Work

Andrej Karpathy · YouTube

The companion video covering LLM internals — training, tokenization, transformer architecture, post-training, and RLHF.

Prerequisite

Part 2: How I Use LLMs

Andrej Karpathy · YouTube

The source video for this guide. Practical walkthrough of Karpathy's full LLM workflow with live demos of every tool.

Source

Chapter 13 · Summary

Key
Takeaways

01

You're talking to a ZIP file

The model compressed the internet into weights. Knowledge is ~6–12 months stale, output is probabilistic, and it has no working memory outside the context window. It cannot verify its own answers.

Foundation

02

Know your tier and model

Free → limited. $20/mo → GPT-4o / Claude Sonnet. $200/mo → o1 Pro, Deep Research. Match the model to the task — thinking models for hard reasoning, fast models for simple queries.

Models

03

Search for time-sensitive info only

For timeless, well-documented knowledge — the weights are enough, skip search. For recent events, changing situations, or niche topics — enable search or use Perplexity.

Search

04

Deep Research for multi-source synthesis

5–15 minutes, 20–30 sources, structured report. Genuinely useful for literature reviews and due diligence. Currently behind the $200/mo paywall on ChatGPT; Perplexity is cheaper.

Research

05

Verify code and data output

Advanced Data Analysis runs real Python — but the model can hallucinate values in the code it writes. Check the numbers against your source data, not just the chart's visual shape.

Code

06

Voice removes half the friction

A Whisper-based dictation tool eliminates the typing barrier. Karpathy routes ~50% of queries through voice. Use text for technical product names and library names that Whisper mistranscribes.

Voice

07

ChatGPT is the default — for now

Most features, largest ecosystem, most polished UX. Claude for coding. Perplexity for search-first. The landscape shifts quickly — check LM Arena for current rankings before committing.

Ecosystem

Built from Andrej Karpathy's "How I use LLMs" lecture. All content, examples, and framings traced directly to that source. Interactive visualizations built with AI assistance.

← Part 1: How LLMs Work · Full transcript · GitHub

How to UseLLMs

You're Talking toa ZIP File

Models &Tiers

ThinkingModels

When toSearch

DeepResearch

Docs &Books

CodeExecution

AgenticCoding

Voice &Audio

Vision &Camera

Memory &Personalization

Tools &Resources

KeyTakeaways

How to Use
LLMs

You're Talking to
a ZIP File

Models &
Tiers

Thinking
Models

When to
Search

Deep
Research

Docs &
Books

Code
Execution

Agentic
Coding

Voice &
Audio

Vision &
Camera

Memory &
Personalization

Tools &
Resources

Key
Takeaways