How Andrej Karpathy uses LLMs
https://www.youtube.com/watch?v=EWvNQjAaOHw
Personal takeaways
- A great overview of ways to use AI. Some thoughts and ideas stimulated by this:
- I should start reading books using a ChatAI as a research companion.
- I could use this with audiobooks -> mp3 -> txt -> LLM summaries, key info.
- Add summaries to reviews so I keep everything together.
- I need to try Cursor again, and VS Code CoPilot.
- Also experiment more with Projects in Claude and ChatGPT.
- I should start reading books using a ChatAI as a research companion.
- allows you to see what the tokens look like.
- you build a token sequence over time.
- known as the context window
- The LLM at base (pre-training) is like a huge lossy zip file of the web, it has the “gestalt” of the web. Pre-training is very expensive, takes months, only done occasionally, so often out of date by some months.
- Post-training:
- Much cheaper fine-tuning
- Supervised Fine-Tuning (SFT):
- The model is further trained on curated prompt / responses.
- Reinforcement Learning (RL):
- Feedback given to the agent for its actions. Can be automated with AI.
- Reinforcement Learning from Human Feedback (RLHF):
- Humans rank multiple model responses to the same prompt.
- Supervised Fine-Tuning (SFT):
- The addition of the persona 🙂
- Much cheaper fine-tuning
- Resulting system is a fully self-contained entity, there is no additional thing (like a calculator, web search etc). Like an approx. 1 TB file on a disk for GPT-3, 10T GPT-4.
- Pre-training is the knowledge, post-training is the style and form.
“Hi I am ChatGPT. I am a 1 terabyte zip file. My knowledge comes from the internet, which I read ~6 months ago and remember only vaguely. My winning personality was programmed, by example, by human labelers at OpenAI :)”
Stuff that occurs frequently on the internet is remembered better. Ask a question where the answer doesn’t change and is well represented on the internet, it will (probably) give an accurate answer.
Always start a new chat when you start a new topic. Previous tokens are a distraction. Also the query gets gradually more expensive and slow. Tokens in the context are like working memory.
Keep in mind what model you are using — is it right for the task? If you don’t have a paid account you may be using a dumber model. Ask all of them.
“Thinking” models
- Additional RF
- Use RL in such a way as a model can try out lots of ways to find a solution.
- Used for example with math problems.
- A recent method.
- A lot slower and more expensive during inference.
- For OpenAI, it is the models that start with o that are the thinking models, e.g. o1, o3-mini-high.
- DeepSeek R1 is a thinking model.
LLMs with tools
Web search
- LLM will emit a special token with a search query, and it gets the results back, which then becomes part of its context window.
- Perplexity one of the first.
Deep Research
- Combination of web search and thinking.
- Creates a huge context window.
- Grok has a good interface for this.
- ChatGPT the most thorough.
- [Me: Problem of companies filling the web with information just for AI, for example on medicines or products]
- There can still be hallucinations. Treat as first research but not as definitely true.
Adding documents to increase the context
- Some of the information in a PDF may be lost (e.g. images)
- Or complex diagrams may not be understood by the LLM.
- PDF will be converted to text file, that is loaded into the token window.
- Useful when reading books.
- I could use this with audiobooks -> mp3 -> txt -> LLM summaries, key info.
- Copy-paste into LLM text box, by chapter.
- Read with the LLM, asking questions as you read.
Python interpreter
- LLM writes a computer program to answer a question.
- LLM writes code, then uses special tokens to get it executed.
- If a code interpreter is not used, LLMs will give the wrong result for big sums.
- ChatGPT advanced data analysis
- ask it to plot data - it will plot using Python.
- Careful - it can misreport and change figures.
Claude artifacts
- Create JS apps and see/edit them in the browser.
- Create diagrams (uses Mermaid).
Cursor
- VS Code, Windsurf, Cursor…
- I need to try Cursor again, and VS Code CoPilot.
Other modalities
Speech
Talk to the LLM.
Much faster.
With basic type speech -> Text -> Tokens.
But there is advanced voice mode where the voice is handled natively inside the language model.
- These use audio tokens, not text tokens.
Notebooklm from Google for making podcasts from documents.
Images
- Possible to convert images into tokens.
- Model doesn’t know which tokens are text, which audio, which images.
- Multimodal.
- It does the same thing with all of the tokens.
- Transcribe from image of table into text table.
ChatGPT memory
- Memory is prepended to all conversations.
- Claude can save new memories as md files in a project.
Custom instructions
- How you want ChatGPT to respond.
Custom GPTs
- Just special instructions/prompts.
- Give description and examples.