📘
Winter LLM Bootcamp
  • Welcome to the course. Bienvenue!
    • Course Structure
    • Course Syllabus and Timelines
    • Know your Educators
    • Action Items and Prerequisites
    • Bootcamp Kick-Off Session
  • Basics of LLMs
    • What is Generative AI?
    • What is a Large Language Model?
    • Advantages and Applications of LLMs
    • Bonus Resource: Multimodal LLMs and Google Gemini
  • Word Vectors, Simplified!
    • What is a Word Vector
    • Word Vector Relationships
    • Role of Context in LLMs
    • Transforming Vectors into LLM Responses
    • Bonus Section: Overview of the Transformers Architecture
      • Attention Mechanism
      • Multi-Head Attention and Transformers Architecture
      • Vision Transformers
    • Graded Quiz 1
  • Prompt Engineering and Token Limits
    • What is Prompt Engineering
    • Prompt Engineering and In-context Learning
    • Best Practices to Follow
    • Token Limits and Hallucinations
    • Prompt Engineering Excercise (Ungraded)
      • Story for the Excercise: The eSports Enigma
      • Your Task for the Module
  • Retrieval Augmented Generation (RAG) and LLM Architecture
    • What is Retrieval Augmented Generation (RAG)
    • Primer to RAG: Pre-trained and Fine-Tuned LLMs
    • In-Context Learning
    • High-level LLM Architecture Components for In-Context Learning
    • Diving Deeper: LLM Architecture Components
    • Basic RAG/LLM Architecture Diagram with Key Steps
    • RAG versus Fine-Tuning and Prompt Engineering
    • Versatility and Efficiency in RAG
    • Understanding Key Benefits of Using RAG in Enterprises
    • Hands-on Demo: Performing Similarity Search in Vectors (Bonus Module)
    • Using kNN and LSH to Enhance Similarity Search (Bonus Module)
    • Graded Quiz 2
  • Hands-on Development
    • Prerequisites
    • Dropbox Retrieval App
      • Understanding Docker
      • Building the Dockerized App
      • Retrofitting our Dropbox app
    • Amazon Discounts App
      • How the project works
      • Repository Walkthrough
    • How to Run 'Examples'
    • Bonus Section: Real-time RAG with LlamaIndex and Pathway
  • Bonus Resource: Recorded Interactions from the Archives
  • Final Project + Giveaways
    • Prizes and Giveaways
    • Suggested Tracks for Ideation
    • Form for Submission
Powered by GitBook
On this page

Was this helpful?

  1. Hands-on Development
  2. Dropbox Retrieval App

Retrofitting our Dropbox app

PreviousBuilding the Dockerized AppNextAmazon Discounts App

Last updated 1 year ago

Was this helpful?

As you dive deeper into this journey, remember the cornerstone of this bootcamp: to publish an open-source project complete with a detailed README. This project should not only utilize the frameworks we've covered but also aim to solve meaningful problems.

Retrofitting, in our context, is about creatively adapting a tool for new and innovative applications. The utility served by this section is to show you, how you as a builder, can consider building something as simple as the Dropbox AI Chat app to understand complex things, for example the EU AI Act.

Below, you can explore a video tutorial by Avril Aysha, a Developer Advocate in the stream data processing space, demonstrating how she harnessed the Dropbox document sync application to create a RAG app.

Link to the Project

  • The repository being referred to can be found here - . Make sure to star it.

  • If you struggle to build the application with the help of README on the GitHub repo above, the video and the description below should help you with it.

  • Please note that Conda is used here instead of Docker. Given your comfort level with these tools you can chose to pick them.

Quick intro to the problem solved: Navigating the maze of new regulations, like the EU AI Act, can be a complex challenge for founders and data practitioners. This app which leverages the Dropbox example, aims to make understanding these regulations more straightforward. Imagine a tool that helps you dissect and comprehend these intricate policies, easing the process of staying compliant and informed. As you explore this application, think of the diverse scenarios you can open just with the Dropbox AI Chat example that we're seeing here.

Connecting the Dots

  • The prompt is processed as embeddings and used as embedded_query.

  • The data we're getting from our data source, (i.e. Dropbox) is converted into smaller chunks with the help of Pathway (pw) and then converted to embeddings and stored in index.

  • Using these, we're creating the augmented prompt with the help of retrieved information and feeding that into GPT-3.5 turbo.

# Real-time data coming from external unstructured data sources like a PDF file
input_data = pw.io.fs.read(
    dropbox_folder_path,
    mode="streaming",
    format="binary",
    autocommit_duration_ms=50,
)

# Chunk input data into smaller documents
documents = input_data.select(texts=extract_texts(pw.this.data))
documents = documents.select(chunks=chunk_texts(pw.this.texts))
documents = documents.flatten(pw.this.chunks).rename_columns(chunk=pw.this.chunks)

# Compute embeddings for each document using the OpenAI Embeddings API
embedded_data = embeddings(context=documents, data_to_embed=pw.this.chunk)

# Construct an index on the generated embeddings in real-time
index = index_embeddings(embedded_data)

# Generate embeddings for the query from the OpenAI Embeddings API
embedded_query = embeddings(context=query, data_to_embed=pw.this.query)

# Build prompt using indexed data
responses = prompt(index, embedded_query, pw.this.query)

# Feed the prompt to ChatGPT and obtain the generated answer.
response_writer(responses)

# Run the pipeline
pw.run()

Following the guidelines provided in the referenced repository, you can successfully deploy the Dropbox AI Chat tool to assist users in navigating and interpreting the complexities of the European Union AI Act. Although merely placing documents in a Dropbox folder doesn't transform it into a "real-time" Large Language Model (LLM) application—a crucial aspect of this bootcamp—creating this static solution to address a specific issue marks a significant achievement in grasping, conceptualizing, and developing relevant Retriever-Augmented Generation (RAG)/LLM applications.

Onto the next challenge.

If you look closely at the repo linked above and visit , you'll be able to connect the dots from what we've learned so far. Here:

You can pat your back if you've come this far from scratch.

😄
api.py
⭐
https://github.com/pathway-labs/dropbox-ai-chat