📘
Winter LLM Bootcamp
  • Welcome to the course. Bienvenue!
    • Course Structure
    • Course Syllabus and Timelines
    • Know your Educators
    • Action Items and Prerequisites
    • Bootcamp Kick-Off Session
  • Basics of LLMs
    • What is Generative AI?
    • What is a Large Language Model?
    • Advantages and Applications of LLMs
    • Bonus Resource: Multimodal LLMs and Google Gemini
  • Word Vectors, Simplified!
    • What is a Word Vector
    • Word Vector Relationships
    • Role of Context in LLMs
    • Transforming Vectors into LLM Responses
    • Bonus Section: Overview of the Transformers Architecture
      • Attention Mechanism
      • Multi-Head Attention and Transformers Architecture
      • Vision Transformers
    • Graded Quiz 1
  • Prompt Engineering and Token Limits
    • What is Prompt Engineering
    • Prompt Engineering and In-context Learning
    • Best Practices to Follow
    • Token Limits and Hallucinations
    • Prompt Engineering Excercise (Ungraded)
      • Story for the Excercise: The eSports Enigma
      • Your Task for the Module
  • Retrieval Augmented Generation (RAG) and LLM Architecture
    • What is Retrieval Augmented Generation (RAG)
    • Primer to RAG: Pre-trained and Fine-Tuned LLMs
    • In-Context Learning
    • High-level LLM Architecture Components for In-Context Learning
    • Diving Deeper: LLM Architecture Components
    • Basic RAG/LLM Architecture Diagram with Key Steps
    • RAG versus Fine-Tuning and Prompt Engineering
    • Versatility and Efficiency in RAG
    • Understanding Key Benefits of Using RAG in Enterprises
    • Hands-on Demo: Performing Similarity Search in Vectors (Bonus Module)
    • Using kNN and LSH to Enhance Similarity Search (Bonus Module)
    • Graded Quiz 2
  • Hands-on Development
    • Prerequisites
    • Dropbox Retrieval App
      • Understanding Docker
      • Building the Dockerized App
      • Retrofitting our Dropbox app
    • Amazon Discounts App
      • How the project works
      • Repository Walkthrough
    • How to Run 'Examples'
    • Bonus Section: Real-time RAG with LlamaIndex and Pathway
  • Bonus Resource: Recorded Interactions from the Archives
  • Final Project + Giveaways
    • Prizes and Giveaways
    • Suggested Tracks for Ideation
    • Form for Submission
Powered by GitBook
On this page

Was this helpful?

  1. Retrieval Augmented Generation (RAG) and LLM Architecture

What is Retrieval Augmented Generation (RAG)

PreviousRetrieval Augmented Generation (RAG) and LLM ArchitectureNextPrimer to RAG: Pre-trained and Fine-Tuned LLMs

Last updated 1 year ago

Was this helpful?

Large Language Models (LLMs) like GPT-4 or Mistral-7b are extraordinary in many ways, yet they come with challenges.

For now, let's focus on one specific limitation: the timeliness of their data. Since these models are trained up to a particular cut-off date, they aren't well-suited for real-time or organization-specific information.

Imagine you're a developer architecting an LLM-enabled app for Amazon. You're aiming to support shoppers as they comb through Amazon for the latest deals on sneakers. Naturally, you want to furnish them with the most current offers available. After all, nobody wants to rely on outdated information, and the same holds for data queried from your LLM.

This is where Retrieval-Augmented Generation, commonly known as RAG, significantly improves the capabilities of LLMs.

In a way that might resemble a resourceful friend in an exam setting or during a speech who—figuratively speaking, of course—swiftly passes you the most relevant "cue card" out of a ton of information to help you understand what you should be writing or saying next.

With RAG, efficient retrieval of the most relevant data for your use case ensures the text generated is both current and substantiated.

RAG, as its name indicates, operates through a three-fold process:

  • Retrieval: It sources pertinent information.

  • Augmentation: This information is then added to the model's initial input.

  • Generation: Finally, the LLM utilizes this augmented input to create an informed output.

Simpy put, RAG empowers LLMs to include real-time, reliable data from external databases in their generated text.

For a better explanation, check out this video by Marina Danilevsky, Senior Research Staff Member at IBM Research where she shares two challenges with LLMs that are resolved with the help of Retrieval Augmented Generation.

(Credits: IBM Technology)
Perhaps that wasn't the perfect example, but you get the point.
😄