Code
from mistralai.client import MistralClient, ChatMessage
import requests
import numpy as np
import faiss
import os
from getpass import getpass
= getpass("Type your API Key")
api_key= MistralClient(api_key=api_key) client
Mistral AI を用いた
司馬 博文
3/14/2024
Mistral AI は 2023 年に Google DeepMind の研究者1人と Meta Platform の元研究者2人によって設立されたフランス企業で,オープンソースでの大規模言語モデルの開発を行っている.
LLama 2 70B モデルより性能が良いとされているが,ヨーロッパ系の言語5言語のみに特化しているモデルである.
Mistral Cookbook ではコミュニティによる Mistral AI の言語モデルの利用事例が公開されている.このいくつかを本記事では見て遊んでいく.
API の利用には 登録が必要 であるが,サブスクリプションではなくて利用量に応じた課金方式である.
RAG (Retrieval-Augmented Generation) (Lewis et al., 2020) は,言語モデルと情報検索を次のように組み合わせることで,質問応答などのタスクでの性能を上げる手法である:
これを Mistral を用いて実装してみる.
The first step is to install the needed packages mistralai
and faiss-cpu
and import the needed packages:
Paul Graham のエッセイを知識ベースとして用いることを考える.
f = open('essay.txt', 'w')
f.write(text)
f.close()
関連する情報の抽出を効率的に行うため,外部情報を小さなチャンクに分割することを考える.
In a RAG system, it is crucial to split the document into smaller chunks so that it’s more effective to identify and retrieve the most relevant information in the retrieval process later. In this example, we simply split our text by character, combine 2048 characters into each chunk, and we get 37 chunks.
For each text chunk, we then need to create text embeddings, which are numeric representations of the text in the vector space. Words with similar meanings are expected to be in closer proximity or have a shorter distance in the vector space. To create an embedding, use Mistral’s embeddings API endpoint and the embedding model mistral-embed
. We create a get_text_embedding
to get the embedding from a single text chunk and then we use list comprehension to get text embeddings for all text chunks.
Once we get the text embeddings, a common practice is to store them in a vector database for efficient processing and retrieval. There are several vector database to choose from. In our simple example, we are using an open-source vector database Faiss, which allows for efficient similarity search.
With Faiss, we instantiate an instance of the Index class, which defines the indexing structure of the vector database. We then add the text embeddings to this indexing structure.
Whenever users ask a question, we also need to create embeddings for this question using the same embedding models as before.
We can perform a search on the vector database with index.search
, which takes two arguments: the first is the vector of the question embeddings, and the second is the number of similar vectors to retrieve. This function returns the distances and the indices of the most similar vectors to the question vector in the vector database. Then based on the returned indices, we can retrieve the actual relevant text chunks that correspond to those indices.
Finally, we can offer the retrieved text chunks as the context information within the prompt. Here is a prompt template where we can include both the retrieved text and user question in the prompt.
In the next sections, we are going to show you how to do a similar basic RAG with some of the popular RAG frameworks. We will start with LlamaIndex and add other frameworks in the future.
from langchain_community.document_loaders import TextLoader
from langchain_mistralai.chat_models import ChatMistralAI
from langchain_mistralai.embeddings import MistralAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain
# Load data
loader = TextLoader("essay.txt")
docs = loader.load()
# Split text into chunks
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
# Define the embedding model
embeddings = MistralAIEmbeddings(model="mistral-embed", mistral_api_key=api_key)
# Create the vector store
vector = FAISS.from_documents(documents, embeddings)
# Define a retriever interface
retriever = vector.as_retriever()
# Define LLM
model = ChatMistralAI(mistral_api_key=api_key)
# Define prompt template
prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:
<context>
{context}
</context>
Question: {input}""")
# Create a retrieval chain to answer questions
document_chain = create_stuff_documents_chain(model, prompt)
retrieval_chain = create_retrieval_chain(retriever, document_chain)
response = retrieval_chain.invoke({"input": "What were the two main things the author worked on before college?"})
print(response["answer"])
import os
from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.mistralai import MistralAI
from llama_index.embeddings.mistralai import MistralAIEmbedding
# Load data
reader = SimpleDirectoryReader(input_files=["essay.txt"])
documents = reader.load_data()
# Define LLM and embedding model
Settings.llm = MistralAI(model="mistral-medium")
Settings.embed_model = MistralAIEmbedding(model_name='mistral-embed')
# Create vector store index
index = VectorStoreIndex.from_documents(documents)
# Create query engine
query_engine = index.as_query_engine(similarity_top_k=2)
response = query_engine.query(
"What were the two main things the author worked on before college?"
)
print(str(response))
from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.dataclasses import ChatMessage
from haystack.utils.auth import Secret
from haystack.components.builders import DynamicChatPromptBuilder
from haystack.components.converters import TextFileToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.writers import DocumentWriter
from haystack_integrations.components.embedders.mistral import MistralDocumentEmbedder, MistralTextEmbedder
from haystack_integrations.components.generators.mistral import MistralChatGenerator
document_store = InMemoryDocumentStore()
docs = TextFileToDocument().run(sources=["essay.txt"])
split_docs = DocumentSplitter(split_by="passage", split_length=2).run(documents=docs["documents"])
embeddings = MistralDocumentEmbedder(api_key=Secret.from_token(api_key)).run(documents=split_docs["documents"])
DocumentWriter(document_store=document_store).run(documents=embeddings["documents"])
text_embedder = MistralTextEmbedder(api_key=Secret.from_token(api_key))
retriever = InMemoryEmbeddingRetriever(document_store=document_store)
prompt_builder = DynamicChatPromptBuilder(runtime_variables=["documents"])
llm = MistralChatGenerator(api_key=Secret.from_token(api_key),
model='mistral-small')
chat_template = """Answer the following question based on the contents of the documents.\n
Question: {{query}}\n
Documents:
{% for document in documents %}
{{document.content}}
{% endfor%}
"""
messages = [ChatMessage.from_user(chat_template)]
rag_pipeline = Pipeline()
rag_pipeline.add_component("text_embedder", text_embedder)
rag_pipeline.add_component("retriever", retriever)
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("llm", llm)
rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
rag_pipeline.connect("retriever.documents", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder.prompt", "llm.messages")
question = "What were the two main things the author worked on before college?"
result = rag_pipeline.run(
{
"text_embedder": {"text": question},
"prompt_builder": {"template_variables": {"query": question}, "prompt_source": messages},
"llm": {"generation_kwargs": {"max_tokens": 225}},
}
)
print(result["llm"]["replies"][0].content)