This video features Angel Poon interviewing Zachary Huang, a PhD student at Columbia University, about his open-source AI tool, the AI Codebase Knowledge Builder, and its underlying framework, PocketFlow. The video demonstrates how these tools can help users understand complex codebases quickly by generating step-by-step tutorials and visualizing code architecture.
[{"start_ms":"320","end_ms":"7120","text":"what if you could have an AI that could explain any codebase to you in just 5 minutes and what if you could turn any","start_time_text":"0:00"},{"start_ms":"7120","end_ms":"13519","text":"GitHub repo into simple easy to follow step-by-step tutorials well today you're","start_time_text":"0:07"},{"start_ms":"13519","end_ms":"18640","text":"in for a very special treat we have Zachary Hong who built just such a tool","start_time_text":"0:13"},{"start_ms":"18640","end_ms":"25119","text":"and it's an open-source tool that anybody can download and use right away this tool enables an AI to read through","start_time_text":"0:18"},{"start_ms":"25119","end_ms":"31439","text":"entire code bases and then transform complicated code into stepbystep easy to","start_time_text":"0:25"},{"start_ms":"31439","end_ms":"38320","text":"follow tutorials whether you're an engineer that inherited some messy code or you're a vibe coder who wants to","start_time_text":"0:31"},{"start_ms":"38320","end_ms":"44000","text":"build cool projects with AI this is a godsend gift when I reached out to Zach","start_time_text":"0:38"},{"start_ms":"44000","end_ms":"49440","text":"I wanted to learn more about this tool but actually I realized that he was building something more grand than just","start_time_text":"0:44"},{"start_ms":"49440","end_ms":"56160","text":"this super tool he was building an AI framework with just 100 lines of code","start_time_text":"0:49"},{"start_ms":"56160","end_ms":"63440","text":"and this simple yet powerful framework will enable agentic coding meaning now","start_time_text":"0:56"},{"start_ms":"63440","end_ms":"69680","text":"AI agents can build AI agents isn't that super cool now let's hear it from Zach","start_time_text":"1:03"},{"start_ms":"69680","end_ms":"75280","text":"hi Zach it's so great to meet you can you give a quick introduction about yourself","start_time_text":"1:09"},{"start_ms":"75280","end_ms":"81439","text":"yeah thank you for having me here i'm Zach i'm a PhD student at Columb University i've been working on database","start_time_text":"1:15"},{"start_ms":"81439","end_ms":"87360","text":"systems for over four years but over the past two years I've been working on large range model systems and I'm about","start_time_text":"1:21"},{"start_ms":"87360","end_ms":"94159","text":"to graduate and join Microsoft research in a bit over a month again thank you very much i'm very glad and happy to","start_time_text":"1:27"},{"start_ms":"94159","end_ms":"100640","text":"chat about my recent works so now I came across you because of this AI codebase","start_time_text":"1:34"},{"start_ms":"100640","end_ms":"106640","text":"knowledge builder and I'm sure the people who are watching this right now would love to learn more so can you kind","start_time_text":"1:40"},{"start_ms":"106640","end_ms":"112000","text":"of tell us more about it how does one use it and is it something that I could","start_time_text":"1:46"},{"start_ms":"112000","end_ms":"117920","text":"just give you a GitHub link and have you generate this knowledge builder for me","start_time_text":"1:52"},{"start_ms":"117920","end_ms":"124320","text":"yeah so the idea here is whenever you are joining a new team or whenever you are looking at a new open source","start_time_text":"1:57"},{"start_ms":"124320","end_ms":"129679","text":"projects it's pretty overwhelming to just read through this code bases and understanding figure out what's going on","start_time_text":"2:04"},{"start_ms":"129679","end_ms":"137360","text":"under the hood so we just use AI to help you make a pass and help you generate such tutorial so we have already applied","start_time_text":"2:09"},{"start_ms":"137360","end_ms":"143200","text":"this tutorial across multiple popular GitHub repository we can just give you some example which one do you are you","start_time_text":"2:17"},{"start_ms":"143200","end_ms":"148959","text":"interested in just looking at so Zach is kind enough to let me choose any repos","start_time_text":"2:23"},{"start_ms":"148959","end_ms":"154400","text":"so I shared one which is a YouTube summarizer cuz I like to learn a lot","start_time_text":"2:28"},{"start_ms":"154400","end_ms":"161120","text":"from YouTube and sometimes I just want to summarize the YouTube videos so I can have the notes so I just shared a link","start_time_text":"2:34"},{"start_ms":"161120","end_ms":"168160","text":"with Zach yeah in order to run this new repository what you're going to do here is you're going to just follow the get","start_time_text":"2:41"},{"start_ms":"168160","end_ms":"173680","text":"started instruction here so we're going to clone this repo which you already done and we're going to install the","start_time_text":"2:48"},{"start_ms":"173680","end_ms":"180000","text":"dependency which I've done in the past we're going to set up the larger model course so essentially in this models","start_time_text":"2:53"},{"start_ms":"180000","end_ms":"186080","text":"you're going to set up by providing your own project or if you can or if you are using this AI studio you need to provide","start_time_text":"3:00"},{"start_ms":"186080","end_ms":"191760","text":"your API keys but if you are using cloud 2.7 or open a01 you just implement your","start_time_text":"3:06"},{"start_ms":"191760","end_ms":"197840","text":"own function that takes the string input and call the logic model return the response but after you set everything up","start_time_text":"3:11"},{"start_ms":"197840","end_ms":"206319","text":"essentially you just call this python function here so here let's just copy paste this new","start_time_text":"3:17"},{"start_ms":"206319","end_ms":"212120","text":"repository URL to the page let's give it a name maybe let's call","start_time_text":"3:26"},{"start_ms":"212120","end_ms":"217360","text":"it YouTube summarizer","start_time_text":"3:32"},{"start_ms":"217360","end_ms":"222640","text":"and it will run it will so what it's is currently doing here is it is currently","start_time_text":"3:37"},{"start_ms":"222640","end_ms":"229280","text":"crowing the all the files from the GitHub repository and then it is calling the live model to understand hey what's","start_time_text":"3:42"},{"start_ms":"229280","end_ms":"235680","text":"the most important concepts inside of this repository and how are we going to present this concepts and easy to read","start_time_text":"3:49"},{"start_ms":"235680","end_ms":"240879","text":"way for the audience can I ask a question so you know how there's like a","start_time_text":"3:55"},{"start_ms":"240879","end_ms":"246879","text":"limit in context window so why is this tool that you're building able to digest","start_time_text":"4:00"},{"start_ms":"246879","end_ms":"253920","text":"the whole code base ignoring that that limit well the limit of the current large","start_time_text":"4:06"},{"start_ms":"253920","end_ms":"260720","text":"range model is actually pretty large it's let's take Gemini 2.5 Pro as example it currently has 1 million","start_time_text":"4:13"},{"start_ms":"260720","end_ms":"266800","text":"tokens it's more than enough to understand most of the codebase it's not really the contest limit issue here but","start_time_text":"4:20"},{"start_ms":"266800","end_ms":"273520","text":"I think the issue is mostly like how do you manage the contest because you can just dump a lot of options into the 1","start_time_text":"4:26"},{"start_ms":"273520","end_ms":"280320","text":"million token but there is a phenomenon called loss in the middle which means that the model will just neglect the","start_time_text":"4:33"},{"start_ms":"280320","end_ms":"286320","text":"middle part and will focus only on the start and end part of the contest which is pretty human like when we human","start_time_text":"4:40"},{"start_ms":"286320","end_ms":"292000","text":"listen to these you know messages reading to post we just look at the beginning and ending regarding the part","start_time_text":"4:46"},{"start_ms":"292000","end_ms":"297120","text":"uh so what do we do here is we have this a workflow so let me just show you the","start_time_text":"4:52"},{"start_ms":"297120","end_ms":"302720","text":"the design of the of products so we start by identifying the high level","start_time_text":"4:57"},{"start_ms":"302720","end_ms":"308080","text":"abstraction relations concepts we're going to teach the users so this part we","start_time_text":"5:02"},{"start_ms":"308080","end_ms":"314000","text":"just take the whole the whole code into the context but then we're going to write a chapter one by one so here we","start_time_text":"5:08"},{"start_ms":"314000","end_ms":"319600","text":"already starting to write a chapter one for the front end application structure for this individual chapter writing it","start_time_text":"5:14"},{"start_ms":"319600","end_ms":"324800","text":"will only focus on the kind of the files instead of the repository that matters","start_time_text":"5:19"},{"start_ms":"324800","end_ms":"330720","text":"for these concepts so it doesn't take the whole repository it only takes those relevant and which one are relevant is","start_time_text":"5:24"},{"start_ms":"330720","end_ms":"336560","text":"alo decided by large model on the flies so you can keep focus on the most relevant part and the best quality you","start_time_text":"5:30"},{"start_ms":"336560","end_ms":"344560","text":"know chapters uh for the tutorial i see and you being like a system person","start_time_text":"5:36"},{"start_ms":"344560","end_ms":"349680","text":"you mentioned does your background make this easier for you than other people","start_time_text":"5:44"},{"start_ms":"349680","end_ms":"356720","text":"because this is like thinking about systems right yeah it's yeah it's like how you design a system when you have","start_time_text":"5:49"},{"start_ms":"356720","end_ms":"362479","text":"like a very large workloads previously maybe it's a compute or maybe a data","start_time_text":"5:56"},{"start_ms":"362479","end_ms":"369039","text":"crawling all kinds of these different systems you're going to identify what is the key bon here what is the most complex part of the task here and how do","start_time_text":"6:02"},{"start_ms":"369039","end_ms":"374960","text":"we decompose how do we build a microservices that talks with each others each handling individual task","start_time_text":"6:09"},{"start_ms":"374960","end_ms":"380240","text":"here so I do think like this system designs or this mental model helps a lot","start_time_text":"6:14"},{"start_ms":"380240","end_ms":"386240","text":"in terms of designing and large model systems in this AI era and how does","start_time_text":"6:20"},{"start_ms":"386240","end_ms":"392400","text":"somebody gain this kind of mental models is there something that you would recommend for people who are interested","start_time_text":"6:26"},{"start_ms":"392400","end_ms":"398479","text":"in developing a better system mindset or do they just choose the AI","start_time_text":"6:32"},{"start_ms":"398479","end_ms":"404240","text":"and learn yeah I would say it's first of all the","start_time_text":"6:38"},{"start_ms":"404240","end_ms":"411360","text":"whole large model applications systems is pretty new concepts and I I'm still in the stage of explorations i'm still","start_time_text":"6:44"},{"start_ms":"411360","end_ms":"417919","text":"trying build different systems get my intuitions uh what's the trade-off what's the results from different","start_time_text":"6:51"},{"start_ms":"417919","end_ms":"424160","text":"designs so it's pretty early age there's no good tutorials for that and I'm trying to make tutorials based on what I","start_time_text":"6:57"},{"start_ms":"424160","end_ms":"430880","text":"have already learned but still a nent area but I do think like traditional system designs is helpful in somehow","start_time_text":"7:04"},{"start_ms":"430880","end_ms":"437120","text":"like how do you design a service how do you build a application that help a bit","start_time_text":"7:10"},{"start_ms":"437120","end_ms":"444880","text":"I guess and also people can follow you on YouTube to learn more about your thinking and mental models yeah I'm very","start_time_text":"7:17"},{"start_ms":"444880","end_ms":"451120","text":"new to YouTube i just made YouTube over the past few weeks so I think I have a decent technical skill but my","start_time_text":"7:24"},{"start_ms":"451120","end_ms":"457039","text":"presentation is not that I'm a follower thank you thank I'm still learning i'm","start_time_text":"7:31"},{"start_ms":"457039","end_ms":"464400","text":"still trying to practice my skills in terms of YouTube so would greatly appreciate your support and criticisms","start_time_text":"7:37"},{"start_ms":"464400","end_ms":"470479","text":"yeah people who are watching follow him follow Zach thank you thank you yeah the large model","start_time_text":"7:44"},{"start_ms":"470479","end_ms":"477680","text":"call is a bit slow here but that's because we're using the best models we are trying to ask it to do a pretty","start_time_text":"7:50"},{"start_ms":"477680","end_ms":"482720","text":"complex task here actually one more question so the default here that cuz I","start_time_text":"7:57"},{"start_ms":"482720","end_ms":"490560","text":"know that it's like writing chapter 1 2 3 4 5 and the default model that you're recommending people to use is Gemini 2.5","start_time_text":"8:02"},{"start_ms":"490560","end_ms":"497680","text":"well I it it changes right gemini 2.5 is the best model last week uh but so what","start_time_text":"8:10"},{"start_ms":"497680","end_ms":"503440","text":"is it this week yeah OAI just announced 03 04 mini i'm not sure how good they","start_time_text":"8:17"},{"start_ms":"503440","end_ms":"509039","text":"are i haven't tested them out but maybe they are better i don't know but just as of last week they are the best model i","start_time_text":"8:23"},{"start_ms":"509039","end_ms":"514240","text":"see so is it done or not done it is done let's check out the results here so we","start_time_text":"8:29"},{"start_ms":"514240","end_ms":"519839","text":"have the YouTube summarization here and we have the the results it says this","start_time_text":"8:34"},{"start_ms":"519839","end_ms":"525440","text":"project is a web application that use artificial intelligence to create a summary of YouTube videos user paste a","start_time_text":"8:39"},{"start_ms":"525440","end_ms":"531920","text":"link and choose options like the model the app fetches the videos transcript and use AI to gen generate a summary","start_time_text":"8:45"},{"start_ms":"531920","end_ms":"537600","text":"that shows the progress in real time and it even generates like a diagram as a","start_time_text":"8:51"},{"start_ms":"537600","end_ms":"544480","text":"system person I very much love this kind of diagram because text is sometimes just pretty linear but you really want","start_time_text":"8:57"},{"start_ms":"544480","end_ms":"550959","text":"to have this two dimensional understanding of what's going on and how this each components rel to each other so it's something I really emphasized","start_time_text":"9:04"},{"start_ms":"550959","end_ms":"556959","text":"when I was designing this actually people who are interested in the system thinking should use this tool because","start_time_text":"9:10"},{"start_ms":"556959","end_ms":"562399","text":"with this graph they're learning systems thinking right yeah yeah and yeah it's","start_time_text":"9:16"},{"start_ms":"562399","end_ms":"568080","text":"like I've encoded a lot of my way of how I write this tutorial instead of this","start_time_text":"9:22"},{"start_ms":"568080","end_ms":"573600","text":"prompt so I think the it so here I provide a different rules on what you","start_time_text":"9:28"},{"start_ms":"573600","end_ms":"580240","text":"should do when you're generating these chapters you should you should begin with a high level motivation if the actress is too complex break down to key","start_time_text":"9:33"},{"start_ms":"580240","end_ms":"586560","text":"concepts the code block you should also make it minimal make it simplified you should also understand you know you","start_time_text":"9:40"},{"start_ms":"586560","end_ms":"592560","text":"should also illustrate the complex concept using this mermaid diagram so been code of instead of the prompts and","start_time_text":"9:46"},{"start_ms":"592560","end_ms":"600240","text":"the model just pick it up and help generate this kind of the a very nicely visualized architecture for us using","start_time_text":"9:52"},{"start_ms":"600240","end_ms":"605760","text":"mermaid diagrams wow that is so cool okay can I keep looking at this tutorial","start_time_text":"10:00"},{"start_ms":"605760","end_ms":"612160","text":"that's generated yes so we have different chapters so each box here correspond to a different","start_time_text":"10:05"},{"start_ms":"612160","end_ms":"619200","text":"chapters we start from the high level front end application structures so this is a TypeScript issue","start_time_text":"10:12"},{"start_ms":"619200","end_ms":"625600","text":"i'm not a really friend person so I don't really know what's going on here but it breaks it down into the","start_time_text":"10:19"},{"start_ms":"625600","end_ms":"632399","text":"application the layouts the pages the the roots and also provide another nice","start_time_text":"10:25"},{"start_ms":"632399","end_ms":"639440","text":"sequence diagram on what's going on when people asking for YouTube samurai and how does this different call different","start_time_text":"10:32"},{"start_ms":"639440","end_ms":"645120","text":"files instead of this application work together wow this is so cool it's such a","start_time_text":"10:39"},{"start_ms":"645120","end_ms":"651680","text":"good learning resource it's almost like how computer works And but this is like how this repo works and then it's","start_time_text":"10:45"},{"start_ms":"651680","end_ms":"658240","text":"breaking it down step by step exactly and like this is only chapter one here you also have this chapter two more on","start_time_text":"10:51"},{"start_ms":"658240","end_ms":"663440","text":"the UI sides chapter three how do you get this transcript from the YouTube","start_time_text":"10:58"},{"start_ms":"663440","end_ms":"668720","text":"audio chapter four the how does the AI comes into play and work with this","start_time_text":"11:03"},{"start_ms":"668720","end_ms":"676360","text":"transcript to generate a summary and you have the whole summarization pipeline for you so it's starting from the most","start_time_text":"11:08"},{"start_ms":"676360","end_ms":"681600","text":"userfaced section on the UI parts and then a step by step dig deeper and","start_time_text":"11:16"},{"start_ms":"681600","end_ms":"686959","text":"deeper into how does it work internally what's the workflow what's everything","start_time_text":"11:21"},{"start_ms":"686959","end_ms":"693120","text":"what's the back end pipelines working under the hood so dig deeper but also in","start_time_text":"11:26"},{"start_ms":"693120","end_ms":"698640","text":"a very progressive way organized way yeah also one question this is ran in","start_time_text":"11:33"},{"start_ms":"698640","end_ms":"706000","text":"cursor so theoretically you can open the right hand side panel and chat with the agent on this right if I have a question","start_time_text":"11:38"},{"start_ms":"706000","end_ms":"711920","text":"about a class and I wanted more detail about it is it possible for me to ask","start_time_text":"11:46"},{"start_ms":"711920","end_ms":"719279","text":"about some concepts in here yeah we can just ask cursor which also is Gemini 2.5","start_time_text":"11:51"},{"start_ms":"719279","end_ms":"725519","text":"Pro it is reading the file so it's not just purely chbt actually contextualize","start_time_text":"11:59"},{"start_ms":"725519","end_ms":"731519","text":"your answers based on these tutorials and they're talking about how and how","start_time_text":"12:05"},{"start_ms":"731519","end_ms":"737760","text":"CSS work how shenan works and what's the difference here oh my god Zach this","start_time_text":"12:11"},{"start_ms":"737760","end_ms":"743760","text":"needs to be productized this is so valuable to people whoever is","start_time_text":"12:17"},{"start_ms":"743760","end_ms":"748959","text":"watching this are so lucky because now they know they understand another way to","start_time_text":"12:23"},{"start_ms":"748959","end_ms":"755440","text":"learn and this is so cool i don't know if people are recognizing the value in","start_time_text":"12:28"},{"start_ms":"755440","end_ms":"762880","text":"this tool right here yeah if you have a new codebase you can first ask it to generate a tutorial then step side by","start_time_text":"12:35"},{"start_ms":"762880","end_ms":"768880","text":"side you have a teacher along you if you have any question in the middle you can just ask through a chatbot and it will","start_time_text":"12:42"},{"start_ms":"768880","end_ms":"775839","text":"context your questions in the context of this tutorials it can even do some web search it can help you analyze and","start_time_text":"12:48"},{"start_ms":"775839","end_ms":"780959","text":"generate such a great answer you can even customize this tutorial generation","start_time_text":"12:55"},{"start_ms":"780959","end_ms":"786800","text":"workflow based on your own needs yeah you can for instance if you have your own knowledge if you have your own","start_time_text":"13:00"},{"start_ms":"786800","end_ms":"794800","text":"writing styles you can even add more images to this tutorial based on your demands and this is open source so","start_time_text":"13:06"},{"start_ms":"794800","end_ms":"800160","text":"people can either contribute to it or they can just like download it and clone","start_time_text":"13:14"},{"start_ms":"800160","end_ms":"807519","text":"it actually what you did is just you just use one command and pasted the GitHub repo link in there and voila this","start_time_text":"13:20"},{"start_ms":"807519","end_ms":"814639","text":"is there yeah it's very simple it's fully open sourced we provide example GitHub repository common from autogen","start_time_text":"13:27"},{"start_ms":"814639","end_ms":"821440","text":"browser use crew AIDS py numpy requests any kind of the popular repository you","start_time_text":"13:34"},{"start_ms":"821440","end_ms":"827120","text":"can use this tool to generate a very friendly tutorials and you can just clone this repos install and set up your","start_time_text":"13:41"},{"start_ms":"827120","end_ms":"833200","text":"own large models and you can just paste your any repository you want to learn about just run this a single lines of","start_time_text":"13:47"},{"start_ms":"833200","end_ms":"840000","text":"the command and it will generates a tutorial for you in around five minutes it's fully open source and actually we","start_time_text":"13:53"},{"start_ms":"840000","end_ms":"846000","text":"just get a lot of different poll requests over the past few weeks people asking for multi- language support","start_time_text":"14:00"},{"start_ms":"846000","end_ms":"851040","text":"people asking to making the call a model friendly by integrating and providing","start_time_text":"14:06"},{"start_ms":"851040","end_ms":"858000","text":"sample of different models adding virtual environment support local G repository and so on and uh you know","start_time_text":"14:11"},{"start_ms":"858000","end_ms":"864639","text":"that's so cool it's free i can't believe it yeah and I will appreciate your contribution for any of the new features","start_time_text":"14:18"},{"start_ms":"864639","end_ms":"873040","text":"you want and just send a post to our review and just send a new PR and I will review and merges oh my goodness this is","start_time_text":"14:24"},{"start_ms":"873040","end_ms":"878639","text":"such a treat this is crazy i'm so happy to learn about this and this product i'm","start_time_text":"14:33"},{"start_ms":"878639","end_ms":"885519","text":"pretty sure I'm going to use it a lot because I just a lot of times I'm vibe coding and sometimes I have no idea what","start_time_text":"14:38"},{"start_ms":"885519","end_ms":"891199","text":"this thing is doing and I just could use some help and I think this is like the","start_time_text":"14:45