{"id":8325,"date":"2025-11-10T21:00:08","date_gmt":"2025-11-10T13:00:08","guid":{"rendered":"https:\/\/people.utm.my\/shahabuddin\/?p=8325"},"modified":"2025-11-11T09:56:38","modified_gmt":"2025-11-11T01:56:38","slug":"how-i-built-a-custom-ai-chatbot-for-my-university-courses","status":"publish","type":"post","link":"https:\/\/people.utm.my\/shahabuddin\/?p=8325","title":{"rendered":"How I Built a Custom AI Chatbot for My University Courses"},"content":{"rendered":"\n<p>It\u2019s 2 AM. A student is staring at a folder full of lecture slides, project guidelines, and 80-page PDFs. They have a specific question: &#8220;What are the submission requirements for Project Phase 1?&#8221; They know the answer is&nbsp;<em>somewhere<\/em>&nbsp;in those files, but where?<\/p>\n\n\n\n<p>We\u2019ve all been there. As an educator, I see this problem all the time. Students have the materials, but&nbsp;<strong>information retrieval<\/strong>&nbsp;is a huge, unaddressed challenge.<\/p>\n\n\n\n<p>So, I asked myself: &#8220;What if I could build a 24\/7 AI teaching assistant for each of my courses? One that&nbsp;<em>only<\/em>&nbsp;knows about that specific course&#8217;s materials?&#8221;<\/p>\n\n\n\n<p>That&#8217;s exactly what I did. The result is a lightweight, custom-built, multi-course AI chatbot. Here&#8217;s how I built it.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The &#8220;Aha!&#8221; Moment: Why Not Just Use ChatGPT?<\/h3>\n\n\n\n<p>The first question is obvious: &#8220;Why not just use an off-the-shelf AI?&#8221;<\/p>\n\n\n\n<p>The simple answer:&nbsp;<strong>No public AI knows about my private course materials.<\/strong>&nbsp;I can&#8217;t ask it about the &#8220;SBEG3163 System Analysis&#8221; syllabus or the &#8220;MBEX1013 Research Methodology&#8221; project guide.<\/p>\n\n\n\n<p>The solution is a framework called&nbsp;<strong>Retrieval-Augmented Generation (RAG)<\/strong>.<\/p>\n\n\n\n<p>In plain English, RAG means we &#8220;teach&#8221; the AI by giving it a cheat sheet&nbsp;<em>before<\/em>&nbsp;it answers. The process is simple:<\/p>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>Retrieve:<\/strong>&nbsp;When a student asks a question, we first search our&nbsp;<em>own<\/em>&nbsp;documents (the cheat sheet) for the most relevant text.<\/li>\n\n\n\n<li><strong>Augment:<\/strong>&nbsp;We take that relevant text and paste it into a prompt for the AI.<\/li>\n\n\n\n<li><strong>Generate:<\/strong>&nbsp;We tell the AI, &#8220;Using&nbsp;<em>only<\/em>&nbsp;this text I just gave you, answer the student&#8217;s question.&#8221;<\/li>\n<\/ol>\n\n\n\n<p>This gives us the power of a giant AI like Google&#8217;s Gemini, but grounded in the specific, factual knowledge of our own private materials.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The &#8220;Franken-Stack&#8221;: Simple, Powerful Tech<\/h3>\n\n\n\n<p>I didn&#8217;t want a heavy, expensive, complicated system. I wanted something that could run on any basic web host. So, I built a &#8220;vanilla&#8221; stack that is surprisingly powerful:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Frontend:<\/strong>&nbsp;Simple, &#8220;vanilla&#8221; HTML, CSS, and JavaScript. No React, no Vue. Just&nbsp;<code>chat.js<\/code>&nbsp;and a&nbsp;<code>fetch()<\/code>&nbsp;call.<\/li>\n\n\n\n<li><strong>Backend:<\/strong>&nbsp;Good ol&#8217; PHP (<code>api.php<\/code>). Why? Because it&#8217;s&nbsp;<em>everywhere<\/em>, it&#8217;s simple, and it&#8217;s perfect for acting as a middle-man.<\/li>\n\n\n\n<li><strong>The &#8220;Brain&#8221;:<\/strong>&nbsp;The&nbsp;<strong>Google Gemini API<\/strong>. We send our RAG prompt to it, and it generates the human-like answer.<\/li>\n\n\n\n<li><strong>The &#8220;Database&#8221;:<\/strong>&nbsp;A folder of&nbsp;<strong>JSON files<\/strong>&nbsp;(<code>knowledge\/<\/code>). This &#8220;NoDB&#8221; approach means there&#8217;s no complex database to set up. Each course gets its own&nbsp;<code>course_code.json<\/code>&nbsp;file.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How It Works: The Two-Part System<\/h3>\n\n\n\n<p>The whole project is split into two distinct parts:<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Part 1: The &#8220;Librarian&#8221; (Offline Indexing)<\/h4>\n\n\n\n<p>Before the chatbot can answer questions, someone has to read all the books. This is our&nbsp;<code>indexer.php<\/code>&nbsp;script.<\/p>\n\n\n\n<p>This is a password-protected admin page I run&nbsp;<strong>one time<\/strong>&nbsp;after I upload new materials. It:<\/p>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>Scans<\/strong>&nbsp;the&nbsp;<code>materials\/<\/code>&nbsp;folder for all course sub-folders (like&nbsp;<code>materials\/mbex1013\/<\/code>).<\/li>\n\n\n\n<li><strong>Reads<\/strong>&nbsp;all the&nbsp;<code>.pdf<\/code>&nbsp;and&nbsp;<code>.docx<\/code>&nbsp;files inside.<\/li>\n\n\n\n<li><strong>Parses &amp; Chunks<\/strong>&nbsp;all the text into small, overlapping paragraphs.<\/li>\n\n\n\n<li><strong>Saves<\/strong>&nbsp;everything into a single, clean JSON file (e.g.,&nbsp;<code>knowledge\/mbex1013.json<\/code>).<\/li>\n<\/ol>\n\n\n\n<p>This&nbsp;<code>knowledge\/<\/code>&nbsp;folder becomes our &#8220;library,&#8221; and the JSON files are the &#8220;index cards&#8221; for our AI.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Part 2: The &#8220;Concierge&#8221; (Real-time Chat)<\/h4>\n\n\n\n<p>This is what happens when a student asks a question.<\/p>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li>A student clicks on the &#8220;Research Methodology&#8221; card on our landing page. This sends them to&nbsp;<code>chat.html?course=mbex1013<\/code>.<\/li>\n\n\n\n<li>The&nbsp;<code>chat.js<\/code>&nbsp;script on that page grabs the&nbsp;<code>course<\/code>&nbsp;code from the URL.<\/li>\n\n\n\n<li>The student asks, &#8220;What is a literature review?&#8221;<\/li>\n\n\n\n<li><code>chat.js<\/code>&nbsp;sends a&nbsp;<code>fetch<\/code>&nbsp;request to our backend:&nbsp;<code>api.php<\/code>.<\/li>\n\n\n\n<li><code>api.php<\/code>&nbsp;wakes up. It sees the course code&nbsp;<code>mbex1013<\/code>&nbsp;and&nbsp;<strong>loads only&nbsp;<code>knowledge\/mbex1013.json<\/code><\/strong>.<\/li>\n\n\n\n<li>It searches this file for the most relevant chunks of text related to &#8220;literature review.&#8221;<\/li>\n\n\n\n<li>It builds a big prompt for Google&#8217;s AI: &#8220;Hey Gemini, using this text about literature reviews [&#8230;], please answer the student&#8217;s question.&#8221;<\/li>\n\n\n\n<li>Gemini sends back a perfect, HTML-formatted answer.<\/li>\n\n\n\n<li><code>api.php<\/code>&nbsp;passes that answer back to the student&#8217;s browser.<\/li>\n\n\n\n<li>The answer appears in the chat box.<\/li>\n<\/ol>\n\n\n\n<p>The entire process takes about 2-3 seconds.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The Secret Sauce: Keeping Courses Separate<\/h3>\n\n\n\n<p>The most important feature is that the &#8220;Research Methodology&#8221; bot knows&nbsp;<em>nothing<\/em>&nbsp;about &#8220;System Analysis.&#8221; The knowledge is strictly&nbsp;<strong>siloed<\/strong>.<\/p>\n\n\n\n<p>This is achieved by that simple URL parameter (<code>?course=...<\/code>). The&nbsp;<code>api.php<\/code>&nbsp;script is a &#8220;dumb&#8221; (but smart!) bouncer. It&nbsp;<em>only<\/em>&nbsp;loads the single JSON file it&#8217;s told to. This prevents &#8220;cross-contamination&#8221; and ensures students get relevant, accurate answers for the&nbsp;<em>one<\/em>&nbsp;course they are asking about.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What I Learned (The Good and The Bad)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>The Good:<\/strong>&nbsp;It works! It\u2019s fast, incredibly cheap (basic PHP hosting and pay-as-you-go API calls), and fully custom. I have 100% control over the UI, the prompts, and the data.<\/li>\n\n\n\n<li><strong>The Bad (Limitation 1):<\/strong>&nbsp;The indexing is&nbsp;<strong>manual<\/strong>. If I update a PDF, I have to&nbsp;<em>remember<\/em>&nbsp;to go to&nbsp;<code>indexer.php<\/code>&nbsp;and run it again. The next step is to automate this with a cron job.<\/li>\n\n\n\n<li><strong>The Bad (Limitation 2):<\/strong>&nbsp;My search is basic (keyword matching). A &#8220;smarter&#8221; bot would use&nbsp;<strong>vector embeddings<\/strong> (a way to search by&nbsp;<em>meaning<\/em>, not just keywords). This is the next major upgrade.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">See It Live!<\/h3>\n\n\n\n<p>I\u2019m a big believer in building simple tools that solve real problems. This project went from an idea to a deployed, working tool in just a few days.<\/p>\n\n\n\n<p>You can see the live project, fully functional, right here:\u00a0<strong><a href=\"https:\/\/dev.kstutm.com\/chatbot\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/dev.kstutm.com\/chatbot\/<\/a><\/strong><\/p>\n\n\n\n<p>Feel free to click on a course and ask it a question! (Just remember, it&nbsp;<em>only<\/em>&nbsp;knows what&#8217;s in the materials I gave it).<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>It\u2019s 2 AM. A student is staring at a folder full of lecture slides, project guidelines, and 80-page PDFs. They have a specific question: &#8220;What are the submission requirements for Project Phase 1?&#8221; They know the answer is&nbsp;somewhere&nbsp;in those files, but where? We\u2019ve all been there. As an educator, I see this problem all the [&hellip;]<\/p>\n","protected":false},"author":6946,"featured_media":8331,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_uag_custom_page_level_css":"","footnotes":""},"categories":[1285,24,111,5],"tags":[203,1297,376],"class_list":["post-8325","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-development","category-knowledge","category-project","category-teaching","tag-ai","tag-chatbot","tag-course"],"uagb_featured_image_src":{"full":["https:\/\/people.utm.my\/shahabuddin\/wp-content\/uploads\/sites\/890\/2025\/11\/Screenshot-2025-11-10-at-12.46.07-PM.png",1177,812,false],"thumbnail":["https:\/\/people.utm.my\/shahabuddin\/wp-content\/uploads\/sites\/890\/2025\/11\/Screenshot-2025-11-10-at-12.46.07-PM-150x150.png",150,150,true],"medium":["https:\/\/people.utm.my\/shahabuddin\/wp-content\/uploads\/sites\/890\/2025\/11\/Screenshot-2025-11-10-at-12.46.07-PM-300x207.png",300,207,true],"medium_large":["https:\/\/people.utm.my\/shahabuddin\/wp-content\/uploads\/sites\/890\/2025\/11\/Screenshot-2025-11-10-at-12.46.07-PM-768x530.png",640,442,true],"large":["https:\/\/people.utm.my\/shahabuddin\/wp-content\/uploads\/sites\/890\/2025\/11\/Screenshot-2025-11-10-at-12.46.07-PM-1024x706.png",640,441,true],"1536x1536":["https:\/\/people.utm.my\/shahabuddin\/wp-content\/uploads\/sites\/890\/2025\/11\/Screenshot-2025-11-10-at-12.46.07-PM.png",1177,812,false],"2048x2048":["https:\/\/people.utm.my\/shahabuddin\/wp-content\/uploads\/sites\/890\/2025\/11\/Screenshot-2025-11-10-at-12.46.07-PM.png",1177,812,false],"slider-thumb":["https:\/\/people.utm.my\/shahabuddin\/wp-content\/uploads\/sites\/890\/2025\/11\/Screenshot-2025-11-10-at-12.46.07-PM-542x352.png",542,352,true],"pop-thumb":["https:\/\/people.utm.my\/shahabuddin\/wp-content\/uploads\/sites\/890\/2025\/11\/Screenshot-2025-11-10-at-12.46.07-PM-542x340.png",542,340,true]},"uagb_author_info":{"display_name":"Dr. Shah","author_link":"https:\/\/people.utm.my\/shahabuddin\/?author=6946"},"uagb_comment_info":0,"uagb_excerpt":"It\u2019s 2 AM. A student is staring at a folder full of lecture slides, project guidelines, and 80-page PDFs. They have a specific question: &#8220;What are the submission requirements for Project Phase 1?&#8221; They know the answer is&nbsp;somewhere&nbsp;in those files, but where? We\u2019ve all been there. As an educator, I see this problem all the&hellip;","_links":{"self":[{"href":"https:\/\/people.utm.my\/shahabuddin\/index.php?rest_route=\/wp\/v2\/posts\/8325","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/people.utm.my\/shahabuddin\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/people.utm.my\/shahabuddin\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/people.utm.my\/shahabuddin\/index.php?rest_route=\/wp\/v2\/users\/6946"}],"replies":[{"embeddable":true,"href":"https:\/\/people.utm.my\/shahabuddin\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=8325"}],"version-history":[{"count":3,"href":"https:\/\/people.utm.my\/shahabuddin\/index.php?rest_route=\/wp\/v2\/posts\/8325\/revisions"}],"predecessor-version":[{"id":8333,"href":"https:\/\/people.utm.my\/shahabuddin\/index.php?rest_route=\/wp\/v2\/posts\/8325\/revisions\/8333"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/people.utm.my\/shahabuddin\/index.php?rest_route=\/wp\/v2\/media\/8331"}],"wp:attachment":[{"href":"https:\/\/people.utm.my\/shahabuddin\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=8325"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/people.utm.my\/shahabuddin\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=8325"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/people.utm.my\/shahabuddin\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=8325"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}