6/10/24
In this tutorial, you will learn how to:
Use the RAG (Retrieval-Augmented Generation) framework for context retrieval and answer generation.
Integrate LLMs (Large Language Models) using the OpenAI API to generate human-like responses.
Build interactive web applications with Streamlit
Containerize applications using Docker
Deploy applications on Google Cloud Run to host your application.
Users add a question template.
The app retrieves relevant context from Kay.ai.
The context is passed to the OpenAI API for generating the answer.
The answer is presented to the user.
RAG operates through three main stages:
Retrieval: Relevant document chunks are retrieved from an external knowledge base using semantic similarity calculations.
Generation: The retrieved document chunks, combined with the user’s query, form a comprehensive prompt for the LLM to generate an accurate and contextually relevant answer.
Augmentation: The generation process is enhanced by continuously updating and integrating domain-specific information from the knowledge base.
Contextual Prompting: The retrieved document chunks and the user’s query are combined into a prompt for the LLM
Answer Generation: The LLM generates a response based on the combined prompt.
Enhanced Accuracy: By incorporating external knowledge, RAG reduces the likelihood of generating factually incorrect content.
Continuous Updates: RAG allows for continuous knowledge updates, ensuring the model stays current with new information.
Domain-Specific Information: RAG enables the integration of domain-specific information, making it suitable for specialized tasks.
# Import necessary modules
from getpass import getpass
import os
from langchain.retrievers import KayAiRetriever
from langchain.chains import ConversationalRetrievalChain
from langchain_openai import ChatOpenAI
# Prompt user for KAY_API_KEY and set it as an environment variable
KAY_API_KEY = getpass() # Securely prompt for KAY_API_KEY
os.environ["KAY_API_KEY"] = KAY_API_KEY # Set environment variable for KAY_API_KEY
# Set up the retriever using the KAY_API_KEY environment variable
retriever = KayAiRetriever.create(
dataset_id="company", data_types=["10-Q", "10-K","8-K"], num_contexts=10
)
# Retrieve relevant documents
docs = retriever.get_relevant_documents(
"Does the coca cola co, cik:0000021344, currently operate a supply chain finance program (SCF program)? (Yes/No)"
# "Does the apple inc currently operate a supply chain finance program (SCF program) in 2023? (Yes/No)"
)
docs[0]
Out[3]: Document(page_content='Company Name: COCA COLA CO \n Company Industry: BEVERAGES \n Form Title: 10-K 2021-FY \n Form Section: Risk Factors \n Text: Based on all of the aforementioned factors, the Company believes its current liquidity position is strong and will continue to be sufficient to fund our operating activities and cash commitments for investing and financing activities for the foreseeable future.Cash Flows from Operating Activities As part of our continued efforts to improve our working capital efficiency, we have worked with our suppliers over the past several years to revisit terms and conditions, including the extension of payment terms.Our current payment terms with the majority of our suppliers are 120 days.Additionally, two global financial institutions offer a voluntary supply chain finance ("SCF") program which enables our suppliers, at their sole discretion, to sell their receivables from the Company to these 51 financial institutions on a non recourse basis at a rate that leverages our credit rating and thus may be more beneficial to them.The SCF program is available to suppliers of goods and services included in cost of goods sold as well as suppliers of goods and services included in selling, general and administrative expenses in our consolidated statement of income.The Company and our suppliers agree on the contractual terms for the goods and services we procure, including prices, quantities and payment terms, regardless of whether the supplier elects to participate in the SCF program.', metadata={'chunk_type': 'text', 'chunk_years_mentioned': [], 'company_name': 'COCA COLA CO', 'company_sic_code_description': 'BEVERAGES', 'data_source': '10-K', 'data_source_link': 'https://www.sec.gov/Archives/edgar/data/21344/000002134422000009', 'data_source_publish_date': '2022-02-22T00:00:00+00:00', 'data_source_uid': '0000021344-22-000009', 'title': 'COCA COLA CO | 10-K 2021-FY '})
# Prompt user for OPENAI_API_KEY and set it as an environment variable
OPENAI_API_KEY = getpass() # Securely prompt for OPENAI_API_KEY
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY # Set environment variable for OPENAI_API_KEY
# Set up the conversational retrieval chain using the OpenAI API key
model = ChatOpenAI(model_name="gpt-3.5-turbo")
qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)
# Expanded list of questions with explicit binary answer instruction where applicable
questions = [
"Answer with 'Yes' or 'No' or 'Uncertain': Does the company currently operate a supply chain finance program (SCF program)?",
"Answer with 'Yes' or 'No' or 'Uncertain': Does the implementation of the SCF program positively impact the company's liquidity?",
]
# Initialize chat history
chat_history = []
# Iterate through questions and get answers
for question in questions:
result = qa({"question": question, "chat_history": chat_history})
chat_history.append((question, result["answer"]))
print(f"-> **Question**: {question} \n")
print(f"**Answer**: {result['answer']} \n")
-> **Question**: Answer with 'Yes' or 'No' or 'Uncertain': Does the company currently operate a supply chain finance program (SCF program)?
**Answer**: Yes
-> **Question**: Answer with 'Yes' or 'No' or 'Uncertain': Does the implementation of the SCF program positively impact the company's liquidity?
**Answer**: The implementation of the SCF program positively impacts the company's liquidity by providing additional funding to eligible depository institutions, including the company. This helps ensure that the bank has the ability to meet the needs of all their depositors. The Program offers advances of up to one year in length to banks, savings associations, credit unions, and other eligible depository institutions pledging collateral eligible for purchase by the Federal Reserve Bank, such as U.S. Treasuries, U.S. agency securities, and U.S. agency mortgage-backed securities. This additional funding source can enhance the bank's liquidity position and ability to meet operational needs.
Streamlit is an open-source Python library that makes it easy to create and share custom web apps for machine learning and data science. With Streamlit, you can turn data scripts into shareable web applications in minutes.
Project setup
create a folder “my_app”
create inside the folder a python file called “app.py”
Folder Structure
import streamlit as st
import pandas as pd
# Title of the app
st.title("Simple Streamlit App")
# Display a text input widget
name = st.text_input("Enter your name:")
# Display a button and capture its state
if st.button("Submit"):
st.write(f"Hello, {name}!")
# Display a data frame
data = pd.DataFrame({
'Column 1': [1, 2, 3, 4],
'Column 2': [10, 20, 30, 40]
})
st.write(data)
app.py
file. import streamlit as st
import os
from langchain.retrievers import KayAiRetriever
from langchain.chains import ConversationalRetrievalChain
from langchain_openai import ChatOpenAI
KAY_API_KEY = st.text_input("Enter KAY API Key:", type="password")
OPENAI_API_KEY = st.text_input("Enter OPENAI API Key:", type="password")
if KAY_API_KEY and OPENAI_API_KEY:
os.environ["KAY_API_KEY"] = KAY_API_KEY
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
retriever = KayAiRetriever.create(dataset_id="company", data_types=["10-Q", "10-K", "8-K"], num_contexts=10)
model = ChatOpenAI(model_name="gpt-4-turbo-preview")
if st.button("Retrieve and Ask"):
if company_name and cik and keywords:
dynamic_questions = [
f"Does the {company_name}, cik:{cik}, currently operate a supply chain finance program (SCF program)?",
]
for question in dynamic_questions:
docs = retriever.get_relevant_documents(question)
result = qa({"question": question, "chat_history": []})
st.write(f"-> **Question**: {question}")
st.write(f"**Answer**: {result['answer']}")
st.markdown("-------")
for doc in docs[0:5]:
page_content = doc.page_content
highlighted_content = page_content.replace(keywords, f'<mark style="background-color: #FFFF00">**{keywords}**</mark>')
st.markdown(highlighted_content, unsafe_allow_html=True)
metadata = doc.metadata
st.write(f"Data Source: {metadata.get('data_source')} - Publish Date: {metadata.get('data_source_publish_date')}")
else:
st.error("Please enter all required fields.")
project_root/
│
├── app.py # Main Streamlit application file
├── Dockerfile # Dockerfile to containerize the app
├── docker-compose.yml # Docker Compose file for multi-container applications
├── requirements.txt # Python dependencies
├── README.md # Project documentation (optional)
└── .streamlit/ # Streamlit configuration file (optional)
└── config.toml
# Use the official Python image from the Docker Hub
FROM python:3.9-slim
# Set the working directory in the container
WORKDIR /app
# Copy the requirements.txt file into the container
COPY requirements.txt requirements.txt
# Install the Python dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of the application code into the container
COPY . .
# Set environment variable to disable Streamlit's file watcher
ENV STREAMLIT_SERVER_FILE_WATCHER_TYPE none
# Expose the port that Streamlit will run on
EXPOSE 8501
# Run the Streamlit app
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
requirements.txt
docker-compose.yml
Access the Streamlit App:
http://localhost:8501
.Create Repo
Build the Image
Authorize Docker
Tag Image
Push Image
us-central1-docker.pkg.dev/fintechclass/fintechdemo
gcloud auth configure-docker us-central1-docker.pkg.dev
WARNING: Your config file at [/Users/ndackyssa/.docker/config.json] contains these credential helper entries:
{
"credHelpers": {
"asia.gcr.io": "gcloud",
"eu.gcr.io": "gcloud",
"europe-west1-docker.pkg.dev": "gcloud",
"gcr.io": "gcloud",
"marketplace.gcr.io": "gcloud",
"staging-k8s.gcr.io": "gcloud",
"us.gcr.io": "gcloud",
"us-central1-docker.pkg.dev": "gcloud"
}
}
Adding credentials for: us-central1-docker.pkg.dev/fintechclass/fintechdemo
gcloud credential helpers already registered correctly.
docker tag streamlit-app us-central1-docker.pkg.dev/fintechclass/fintechdemo/streamlit-app
docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
us-central1-docker.pkg.dev/fintechclass/fintechdemo/streamlit-app latest 6ef16c1f7fb2 27 minutes ago 605MB
streamlit-app latest 6ef16c1f7fb2 27 minutes ago 605MB
docker push us-central1-docker.pkg.dev/YOUR_PROJECT_ID/REPO_NAME/streamlit-app
Using default tag: latest
The push refers to repository [us-central1-docker.pkg.dev/fintechclass/fintechdemo/streamlit-app]
eb810b7dbcc4: Pushed
88c20489b4b3: Pushing [========================================> ] 371MB/452.7MB
65c6c5030a15: Pushed
e3118f7a7cf8: Pushed
88e353c378e1: Pushed
f9e773ed6a29: Pushed
16a19a1fe64b: Pushed
e8a6046370e7: Pushed
2bd1a2222589: Pushed
# Build the Docker image for AMD64 platform
docker buildx build --platform linux/amd64 -t streamlit-app .
# Tag the Docker image
docker tag streamlit-app us-central1-docker.pkg.dev/YOUR_PROJECT_ID/REPO_NAME/streamlit-app
# Push the Docker image to GCR
docker push us-central1-docker.pkg.dev/YOUR_PROJECT_ID/REPO_NAME/streamlit-app
# Deploy the container on Cloud Run
gcloud run deploy streamlit-app \
--image us-central1-docker.pkg.dev/YOUR_PROJECT_ID/REPO_NAME/streamlit-app \
--platform managed \
--region us-central1 \
--allow-unauthenticated
For example, if your project ID is my-project
and your region is us-central1
:
# Build the Docker image
docker build -t streamlit-app .
# Authenticate Docker with GCR
gcloud auth configure-docker us-central1-docker.pkg.dev
# Tag the Docker image
docker tag streamlit-app us-central1-docker.pkg.dev/my-project/my-repo/streamlit-app
# Push the Docker image to GCR
docker push us-central1-docker.pkg.dev/my-project/my-repo/streamlit-app
# Deploy the container on Cloud Run
gcloud run deploy streamlit-app \
--image us-central1-docker.pkg.dev/my-project/my-repo/streamlit-app \
--platform managed \
--region us-central1 \
--allow-unauthenticated
Install Google Cloud SDK:
gcloud init
Welcome! This command will take you through the configuration of gcloud.
Settings from your current configuration [default] are:
core:
account: donothack@gmail.com
disable_usage_reporting: 'True'
project: fintechclass
Pick configuration to use:
[1] Re-initialize this configuration [default] with new settings
[2] Create a new configuration
Please enter your numeric choice: 1
Your current configuration has been set to: [default]
You can skip diagnostics next time by using the following flag:
gcloud init --skip-diagnostics
Network diagnostic detects and fixes local network connection issues.
Checking network connection...done.
Reachability Check passed.
Network diagnostic passed (1/1 checks passed).
Choose the account you would like to use to perform operations for this configuration:
[1] donothack@gmail.com
[2] Log in with a new account
Please enter your numeric choice: 1
You are logged in as: [donothack@gmail.com].
Pick cloud project to use:
[1] addr******
[2] fintechclass
Please enter numeric choice or text value (must exactly match list item): 2
Your current project has been set to: [fintechclass].
Not setting default zone/region (this feature makes it easier to use
[gcloud compute] by setting an appropriate default value for the
--zone and --region flag).
See https://cloud.google.com/compute/docs/gcloud-compute section on how to set
default compute region and zone manually. If you would like [gcloud init] to be
able to do this for you the next time you run it, make sure the
Compute Engine API is enabled for your project on the
https://console.developers.google.com/apis page.
Your Google Cloud SDK is configured and ready to use!
* Commands that require authentication will use donothack@gmail.com by default
* Commands will reference project `fintechclass` by default
Run `gcloud help config` to learn how to change individual settings
This gcloud configuration is called [default]. You can create additional configurations if you work with multiple accounts and/or projects.
Run `gcloud topic configurations` to learn more.
Some things to try next:
* Run `gcloud --help` to see the Cloud Platform services you can interact with. And run `gcloud help COMMAND` to get help on any gcloud command.
* Run `gcloud topic --help` to learn about advanced features of the SDK like arg files and output formatting
* Run `gcloud cheat-sheet` to see a roster of go-to `gcloud` commands.
Updates are available for some Google Cloud CLI components. To install them,
please run:
$ gcloud components update
To take a quick anonymous survey, run:
$ gcloud survey
gcloud auth login
gcloud auth login
Your browser has been opened to visit:
https://*accounts.google.com/o/oauth2/auth?response_type=****
You are now logged in as [donothack@gmail.com].
Your current project is [fintechclass]. You can change this setting by running:
$ gcloud config set project PROJECT_ID
Set the Project: