This video explains how to add memory and context management to large language models (LLMs) within agentic workflows. The presenter, Saurav Prateek, demonstrates two primary methods: passing the entire chat history as context and passing a summarized version of the chat history. The goal is to enable LLMs to retain information from previous interactions, leading to more accurate and context-aware responses. The video includes a code walkthrough illustrating these concepts using LangChain.
The two main methods discussed for adding memory to LLMs are:
The limitation of passing the entire chat history to an LLM is the context window size, which is limited. This means you cannot pass an excessive amount of information at once.
This is addressed by using a summarized chat history. Instead of sending the entire conversation, the history is condensed into a smaller summary, which is then provided to the LLM. This approach is more efficient and respects the LLM's context window.
The preserve_chat_history flag is a boolean that controls whether the chat history is included when querying the model.
True, the chat history is passed along with the current message, allowing the model to retain context from previous interactions.False, only the current message is sent, and the model will not have access to past conversations, treating each interaction independently.Essentially, this flag allows you to control the memory behavior of the agent – whether it should remember past exchanges or not.
The video demonstrates the difference through two scenarios:
Without Memory: A user asks, "What is John's profession?" If the LLM does not have memory (i.e., the chat history is not preserved), it responds with something like, "I don't have a context on this" or "Currently, I don't have a context," because it has no record of previous discussions about John.
With Memory: In a scenario where the LLM does have memory, the previous chat history indicates that "John is a software engineer." When the user asks the same question, "What is John's profession?", the LLM correctly responds, "John is a software engineer," because it can recall the information from its stored context.
This contrast highlights how memory allows the LLM to provide accurate and contextually relevant answers, whereas a stateless model cannot.