RAG with ChromaDB

ChromaDB is an open-source vector database designed to make it easy to build AI applications with embeddings. This guide shows you how to integrate ChromaDB into your Cycls agent to build a Retrieval-Augmented Generation (RAG) workflow. You will learn how to:

Add ChromaDB as a dependency.
Store and query document embeddings.
Retrieve context to use in your agent’s response.

Prerequisites

Python 3.9+
cycls package installed
Docker installed (for local testing)
OpenAI API key

pip install cycls

Step 1: Create the Agent

Create a new file called app.py and set up your agent with ChromaDB and OpenAI dependencies:

import cycls

@cycls.app(pip=["chromadb", "openai"], copy=[".env"])
async def app(context):
    import chromadb
    from chromadb.utils import embedding_functions
    import os

    # 1. Setup OpenAI Embedding Function
    openai_ef = embedding_functions.OpenAIEmbeddingFunction(
        api_key=os.getenv("OPENAI_API_KEY"),
        model_name="text-embedding-3-small"
    )

    # 2. Initialize ChromaDB client
    client = chromadb.Client()

    # 3. Create/Get collection with specific embedding function
    collection = client.get_or_create_collection(
        name="docs",
        embedding_function=openai_ef
    )

    # 4. Add documents (embeddings are generated automatically via OpenAI)
    collection.add(
        documents=["I love cats", "I love dogs", "The weather is nice"],
        ids=["1", "2", "3"]
    )

    # 5. Get user query
    query = context.messages[-1]["content"]

    # 6. Perform similarity search
    results = collection.query(
        query_texts=[query],
        n_results=1
    )

    # 7. Return the retrieved context
    retrieved_doc = results['documents'][0][0]
    yield f"Found context: {retrieved_doc}"

app.local()

Step 2: Set Up Environment

Create a .env file with your OpenAI API key:

OPENAI_API_KEY=sk-proj-...

Step 3: Run the Agent

Execute your agent script:

python app.py

Cycls will build the local Docker image and start your agent. You can then chat with it to test the semantic search functionality.

Full Code

Here is the complete app.py file:

import cycls

@cycls.app(pip=["chromadb", "openai"], copy=[".env"])
async def app(context):
    import chromadb
    from chromadb.utils import embedding_functions
    import os

    # Setup OpenAI Embedding Function
    openai_ef = embedding_functions.OpenAIEmbeddingFunction(
        api_key=os.getenv("OPENAI_API_KEY"),
        model_name="text-embedding-3-small"
    )

    # Initialize ChromaDB client
    client = chromadb.Client()

    # Create collection with the embedding function
    collection = client.get_or_create_collection(
        name="docs",
        embedding_function=openai_ef
    )

    # Add documents to the collection
    collection.add(
        documents=["I love cats", "I love dogs", "The weather is nice"],
        ids=["1", "2", "3"]
    )

    # Query using the latest message
    query = context.messages[-1]["content"]
    results = collection.query(query_texts=[query], n_results=1)

    # Return retrieved context
    retrieved_doc = results['documents'][0][0]
    yield f"Context: {retrieved_doc}"

app.local()

​Prerequisites

​Step 1: Create the Agent

​Step 2: Set Up Environment

​Step 3: Run the Agent

​Full Code

Prerequisites

Step 1: Create the Agent

Step 2: Set Up Environment

Step 3: Run the Agent

Full Code