Retrofitting our Dropbox app

As you dive deeper into this journey, remember the cornerstone of this bootcamp: to publish an open-source project complete with a detailed README. This project should not only utilize the frameworks we've covered but also aim to solve meaningful problems.

Retrofitting, in our context, is about creatively adapting a tool for new and innovative applications. The utility served by this section is to show you, how you as a builder, can consider building something as simple as the Dropbox AI Chat app to understand complex things, for example the EU AI Act.

Below, you can explore a video tutorial by Avril Aysha, a Developer Advocate in the stream data processing space, demonstrating how she harnessed the Dropbox document sync application to create a RAG app.

Link to the Project

The repository being referred to can be found here - https://github.com/pathway-labs/dropbox-ai-chat. Make sure to star it. ⭐
If you struggle to build the application with the help of README on the GitHub repo above, the video and the description below should help you with it.
Please note that Conda is used here instead of Docker. Given your comfort level with these tools you can chose to pick them.

Quick intro to the problem solved: Navigating the maze of new regulations, like the EU AI Act, can be a complex challenge for founders and data practitioners. This app which leverages the Dropbox example, aims to make understanding these regulations more straightforward. Imagine a tool that helps you dissect and comprehend these intricate policies, easing the process of staying compliant and informed. As you explore this application, think of the diverse scenarios you can open just with the Dropbox AI Chat example that we're seeing here.

Connecting the Dots

If you look closely at the repo linked above and visit api.py , you'll be able to connect the dots from what we've learned so far. Here:

The prompt is processed as embeddings and used as embedded_query.
The data we're getting from our data source, (i.e. Dropbox) is converted into smaller chunks with the help of Pathway (pw) and then converted to embeddings and stored in index.
Using these, we're creating the augmented prompt with the help of retrieved information and feeding that into GPT-3.5 turbo.

# Real-time data coming from external unstructured data sources like a PDF file
input_data = pw.io.fs.read(
    dropbox_folder_path,
    mode="streaming",
    format="binary",
    autocommit_duration_ms=50,
)

# Chunk input data into smaller documents
documents = input_data.select(texts=extract_texts(pw.this.data))
documents = documents.select(chunks=chunk_texts(pw.this.texts))
documents = documents.flatten(pw.this.chunks).rename_columns(chunk=pw.this.chunks)

# Compute embeddings for each document using the OpenAI Embeddings API
embedded_data = embeddings(context=documents, data_to_embed=pw.this.chunk)

# Construct an index on the generated embeddings in real-time
index = index_embeddings(embedded_data)

# Generate embeddings for the query from the OpenAI Embeddings API
embedded_query = embeddings(context=query, data_to_embed=pw.this.query)

# Build prompt using indexed data
responses = prompt(index, embedded_query, pw.this.query)

# Feed the prompt to ChatGPT and obtain the generated answer.
response_writer(responses)

# Run the pipeline
pw.run()

Following the guidelines provided in the referenced repository, you can successfully deploy the Dropbox AI Chat tool to assist users in navigating and interpreting the complexities of the European Union AI Act. Although merely placing documents in a Dropbox folder doesn't transform it into a "real-time" Large Language Model (LLM) application—a crucial aspect of this bootcamp—creating this static solution to address a specific issue marks a significant achievement in grasping, conceptualizing, and developing relevant Retriever-Augmented Generation (RAG)/LLM applications.

You can pat your back if you've come this far from scratch. 😄

Onto the next challenge.

PreviousBuilding the Dockerized App NextAmazon Discounts App

Last updated 1 year ago

Was this helpful?