Improving LLM responses by cleaning the context

llm

I have started to get the LLM to make a summary of a thread when it feels like the context is getting too large, and then give that as the first prompt in a new chat.

Published

June 6, 2025

I have switched my go-to LLM from ChatGPT o3 to Claude Opus 4. I have found it to be a great model. I’ve been using it for my new Life and Business Coach LOS, which you can read about here: Using my AI Life and Business Coach

I made this switch because I don’t trust OpenAI much, and I don’t like that they are now using all prior chats as memory without allowing you to prune it.

One problem I’ve run into is that I have been fairly quickly using up all my quota on the Pro plan. I think part of the problem is that I’ve been keeping threads going for a long time, so the context grows huge, filling up the context window. In order to get around this, I have started to get it to make a summary of a thread when it feels like the context is getting too large. Here’s an example of a prompt I’ve used to do this:

The context window in large language models refers to the maximum amount of text that the model refers to at any one time when generating responses or processing input.

I think our context window has become too long. Can you (the assistant, Claude) summarize our discussion so far with the key points that you  will need to continue this discussion in a new thread? This will be for you, so add instructions for yourself. I won't read it, I will just give it to you at the start of the new discussion thread.

I start a new chat and paste the response to the above prompt in, with these instructions added at the top:

You (the assistant, Claude) generated this yourself. This is just a brief, no need to respond unless you want to ask clarifying questions.
---
(The response from the previous thread)

The response is actually quite interesting in itself, as the model identifies on-going issues that need to be resolved. This seems to work well. It keeps the context clean and short, making the llm respond more quickly and allowing me to do more prompts before I reach the maximum. Of course I could always upgrade to the next level plan (100 USD a month) but I think keeping the context clean is good practice anyway, and should enable better responses.