ApertureDB
ApertureDB is a database that stores, indexes, and manages multi-modal data like text, images, videos, bounding boxes, and embeddings, together with their associated metadata.
This notebook explains how to use the embeddings functionality of ApertureDB.
Install ApertureDB Python SDKโ
This installs the Python SDK used to write client code for ApertureDB.
%pip install --upgrade --quiet aperturedb
Note: you may need to restart the kernel to use updated packages.
Run an ApertureDB instanceโ
To continue, you should have an ApertureDB instance up and running and configure your environment to use it.
There are various ways to do that, for example:
docker run --publish 55555:55555 aperturedata/aperturedb-standalone
adb config create local --active --no-interactive
Download some web documentsโ
We're going to do a mini-crawl here of one web page.
# For loading documents from web
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://docs.aperturedata.io")
docs = loader.load()
USER_AGENT environment variable not set, consider setting it to identify your requests.
Select embeddings modelโ
We want to use OllamaEmbeddings so we have to import the necessary modules.
Ollama can be set up as a docker container as described in the documentation, for example:
# Run server
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
# Tell server to load a specific model
docker exec ollama ollama run llama2
from langchain_community.embeddings import OllamaEmbeddings
embeddings = OllamaEmbeddings()
Split documents into segmentsโ
We want to turn our single document into multiple segments.
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)