Import modules and create a GPT-4 model instance with the LangChain wrapper, along with an embedding model and text splitter. For this task, I recommend setting the llm request_timeout parameter to 120 to reduce timeout errors when calling the OpenAI API.
import os
import json
from langchain.chat_models import ChatOpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chains import RetrievalQA
from langchain.document_loaders import UnstructuredURLLoader, SeleniumURLLoader
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.prompts.chat import (
ChatPromptTemplate,
SystemMessagePromptTemplate,
HumanMessagePromptTemplate,
)
with open('opinai_key.txt') as f:
os.environ['OPENAI_API_KEY'] = f.read()
llm = ChatOpenAI(
model_name='gpt-4',
temperature=0,
request_timeout=120)
embeddings = OpenAIEmbeddings()
text_splitter = CharacterTextSplitter(
chunk_size=1000,
chunk_overlap=0)
url = 'https://www.cnn.com/2023/04/24/success/managers-employee-engagement-trust/index.html'
# use this loader for html content
loader = UnstructuredURLLoader(urls=[url])
# use this loader for pages that require JavaScript to render
# loader = SeleniumURLLoader(urls=[url])
# load and chunk webpage text
web_doc = loader.load()
web_docs = text_splitter.split_documents(web_doc)
# create Chroma vectorstore
webdoc_store = Chroma.from_documents(
web_docs,
embeddings,
collection_name="web_docs")
Running Chroma using direct local API. Using DuckDB in-memory for database. Data will be transient.
system_template = """
you are an assistant designed to identify potential sources of bias in texts
----------------
{context}"""
messages = [
SystemMessagePromptTemplate.from_template(system_template),
HumanMessagePromptTemplate.from_template("{question}")
]
prompt = ChatPromptTemplate.from_messages(messages)
chain_type_kwargs = {"prompt": prompt}
# create Q+A chain
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=webdoc_store.as_retriever(),
chain_type_kwargs=chain_type_kwargs,
return_source_documents=False)
Technically this whole task can be accomplished with one query, but I have found that separating identification from evaluation produces better, more consistent results.
query = """
Identify forms of bias present in this document.
"""
id_result = qa({"query": query})['result']
Define evaluation criteria and describe a schema to structure the output. It seems that GPT-4 reliably outputs valid JSON when prompted to do so.
query = """
on a scale of 1 to 4, score the likelihood that each of the forms
of bias identified in the bias check will generate false or misleading
information, with
1 being "not likely to generate false or misleading information",
2 being "moderately likely to generate false or misleading information",
3 being "likely to generate false or misleading information", and
4 being "highly likely to generate false or misleading information."
explain the reason for each ranking.
Output should be STRICT JSON, containing
a dictionary containing the website url,
a list of dictionaries containing the types of biases with their explanations and scores, and
a dictionary containing frequency counts for each category of likelihood.
formatted like this:
[
{"url": str},
[
{
"bias_type": str,
"score": int,
"score_definition": str
"explanation": str
},
],
{
"score_frequencies":
{
"1_not_likely": int,
"2_moderately_likely": int,
"3_likely": int,
"4_highly_likely": int
}
}
]
""" + f"""
here is the bias check:{id_result}
here is the url: {url}
"""
eval_result = qa({"query": query})['result']
The built-in Python JSON module will only deserialize a string containing a valid JSON document, so it can be used to validate the response.
data = json.loads(eval_result)
print(json.dumps(data, indent=2))
[ { "url": "https://www.cnn.com/2023/04/24/success/managers-employee-engagement-trust/index.html" }, [ { "bias_type": "Confirmation bias", "score": 2, "score_definition": "moderately likely to generate false or misleading information", "explanation": "The text heavily relies on Gallup's research and surveys to support the argument that managers are the key to employee engagement, retention, productivity, and trust in leadership. By primarily using one source, the text may inadvertently confirm pre-existing beliefs about the importance of managers without considering alternative perspectives or research from other sources." }, { "bias_type": "Selection bias", "score": 3, "score_definition": "likely to generate false or misleading information", "explanation": "The text focuses on the negative aspects of managerial training, stating that '99% of employers don't provide effective training.' This statement may not represent the entire population of employers and could be an overgeneralization. The text does not provide information on successful training programs or examples of companies that have effectively trained their managers." }, { "bias_type": "Authority bias", "score": 2, "score_definition": "moderately likely to generate false or misleading information", "explanation": "The text relies on the authority of Gallup's research and Ashley Herd, a former human resources executive, to support its claims. By doing so, it assumes that their opinions and findings are more valid than those of other experts or sources." }, { "bias_type": "Anecdotal bias", "score": 2, "score_definition": "moderately likely to generate false or misleading information", "explanation": "The text provides anecdotal evidence from Gallup's surveys and Ashley Herd's experience to support its claims. While these anecdotes may be relevant, they may not be representative of the broader population or situation." }, { "bias_type": "Negativity bias", "score": 3, "score_definition": "likely to generate false or misleading information", "explanation": "The text emphasizes the negative aspects of managerial training and its impact on employee engagement, retention, productivity, and trust in leadership. This focus on negative outcomes may lead readers to believe that the situation is worse than it actually is, without considering positive examples or potential solutions." } ], { "score_frequencies": { "1_not_likely": 0, "2_moderately_likely": 3, "3_likely": 2, "4_highly_likely": 0 } } ]