Chatbots

A new era of productivity: Prompt Architected Software

September 4, 2023

AUTHOR:  Jason

AUTHOR:  Victoria

Amidst the generative AI boom, enterprises spanning all industries are bolstering their IT department. Innovation directors are asking us how they can leverage chatbots, or Large Language Models (LLMs), and ride this wave.

Engineering a well-structured prompt is crucial to obtaining accurate and relevant responses from chatbots. However, individual prompts are only half the magic. To fully support your company’s workflows, an overarching architecture, designed to maximise value from a large selection of prompts, is essential.

We introduced the term ‘prompt architecting’ at the end of our past article about building your ‘own ChatGPT’ or fine-tuning an LLM. We explained why fine-tuning or building an LLM from scratch are both a lot of work and usually unnecessary.

In short, creating an LLM from scratch is an ambitious, expensive, and highly risky undertaking recommended primarily to those who plan to compete with the likes of Google or Microsoft. Fine-tuning is a comparatively smaller job, as it only involves retraining a portion of an LLM with new data. But, it is still unnecessarily arduous unless you really need an on-prem LLM (you’d know if you do). Our proposal for the route advisable for the majority of enterprises was prompt architecting with an API-enabled version of an existing model.

Put simply, ‘prompt architecting’ is to prompt engineering what software architecture is to software engineering. Instead of engineering individual prompts that achieve a single goal, we create entire pieces of software that chain, combine, and even generate tens, if not hundreds, of prompts, on the fly to achieve a desired outcome.

The goal of this article is to demystify the term ‘prompt architecting’. We commence with the origins of its name. Then, we explain the two main steps. We finish off with an HR chatbot case study to see how it looks in practice.

Where does ‘prompt architecting’ get its name: its classical roots

Architecture, whether in the context of buildings, software, or prompts involves balancing factors such as functionality, user experience (UX), security, regulatory compliance, and aesthetics. For example, software architects focus on defining the high-level structure, components, and interactions within the system. This initial blueprint sets the foundation for the entire software project.

On the other hand, software engineering delves into the detailed implementation of the software architecture. Software engineering involves the nitty-gritty aspects of coding, testing, debugging, and optimising the software to bring the architectural vision to life.

Next, we look at the world of prompts.

What is prompt architecting?

Let’s apply the above train of thought to prompts (the text or image input we send to LLMs like ChatGPT).

Given the right sequence of prompts, LLMs are remarkably smart at bending to your will. This is why at Springbok, the first step in any of our projects is to design the appropriate prompt architecture for our client’s use case.

In this section, we explain Springbok’s approach to creating prompt architectures.

Prompt engineering

‘Prompt engineering’ popularly refers to the process of crafting individual effective inputs, called ‘prompts’, for an LLM.

Typical goals for such activities include optimising the model’s response in terms of  format, content, style, length or any number of other factors. Practically, prompt engineering involves framing the question in a specific way, adding context, or giving examples of output to guide the model to format its output similarly.

Prompt architecting

A prompt architecture at Springbok often consists of a data flow diagram, and a traditional software architecture diagram.

We consider the following aspects:

Functional requirements

We adapt the data flow architecture to the mode our client chooses for the output to be displayed as. You might have an intuitive dashboard, a conversational interface, or a document compliant with a provided template.

Non-functional requirements

There are also non-functional requirements: balancing cost-effectiveness while optimising performance, working around rate limits, and minimising latency. Not to be forgotten are ensuring security through robust authentication and authorisation, as well as adhering to data location requirements.

Integration with additional data sources

We consider the potential requirements for integrating the LLM with additional data sources. This may include databases for efficient data retrieval, Salesforce for CRM communication, various platforms for content compatibility, and Optical Character Recognition (OCR) capabilities for processing text from images or scanned documents.

Output quality control

We put measures in place to filter offensive language, align generated content with desired tone of voice and branding guidelines, and mitigate the risk of false or misleading information being generated through hallucination.

Analytics

We design mechanisms for collecting user feedback to improve the system’s performance, provide transparency in the decision-making process, gather usage statistics for data-driven decisions, and implement abuse detection mechanisms to prevent malicious use.

What’s the difference between prompt architecting and fine-tuning?

If you are considering prompt architecting, you have likely already explored the concept of fine-tuning. Here is the key distinction between the two.

Fundamentally, fine-tuning involves modifying the LLM, whereas prompt architecture does not.

Fine-tuning is a substantial endeavour that entails retraining a segment of an LLM with a new dataset. This process imbues the LLM with domain-specific knowledge, tailoring it to your specific business needs.

By contrast, prompt architecting does not involve making any modifications to the LLM itself, nor its training data. Rather, it focuses on optimising how you can steer an existing LLM.

The often surprising – and sometimes hidden – major expense of fine-tuning comes from acquiring and preparing the dataset. The data has to be aligned with your desired outcomes, and made compatible with your LLM. Fine-tuning itself is relatively straightforward and inexpensive; it involves uploading your prepared data, initiating the process, and covering the associated API usage fees and computing costs.

Once this data preparation cost is taken into account, fine-tuning is vastly more expensive than starting with prompt architecting-based solutions. Not to mention, a sophisticated prompt architecture is often necessary even if you do wish to fine-tune.

When approaching technology partners, their dataset preparation ability must be inquired about. If they omit comprehensive cost estimates, it should raise a red flag, as it could indicate an unreliable service, or a lack of practical experience in handling this task.

In general, almost all prospective fine-tune use cases worth doing should generally proceed through a PoC stage with a prompt architecture to see if it's worth the operational investment.

What does prompt architecting look like in practice? An HR chatbot case study

Let’s take a look at prompt architecting in a case study. In this section, we consider an HR chatbot that we are questioning about annual leave.

How do we bend LLMs to our will so that we can produce reproducible, reliable outcomes that help our customers both external and internal complete their productive endeavours?

People come to us wanting to achieve things like:

  1. Lawyers: “I want to be able to automatically generate contracts based on my firm’s best practices”
  2. Head of HR: “I want to be able to automatically answer HR-related employee queries based on our handbook”
  3. Head of CX: “I want to automatically answer customer queries based on our product troubleshooting instructions”

 

Luckily, none of these use cases require fine-tuning to solve!

We advocate creating software products to cleverly use prompts to steer ChatGPT the way you want. ‘Prompt architecting’ is what we name this approach. It is similar to prompt engineering, but with a key difference.

Instead of engineering individual prompts that achieve a single goal, we create entire pieces of software that chain, combine, and even generate tens, if not hundreds, of prompts, on the fly to achieve a desired outcome.

How is this done in practice? The specific architecture for any given problem will be heavily specialised. However, every solution will rely on some variation of the following steps:

Step 1

We accept a message from the user. This could be as simple as “Hello, my name is Jason”, but let’s dig into the following example: “How many days of annual leave am I entitled to?”

Step 2

We identify the context of the message and embellish the user’s message with some information. Continuing with our annual leave example: User Context: “Jessica is an Associate, she is currently on probation, please answer questions accordingly.” Contextual Information: “Employees are entitled to 30 days annual leave per year, excluding bank holidays. During the probationary period, employees are only allowed to take at most 10 days of their annual leave. The subsequent entitlement may only be taken after their probationary period.” Chatbot instructions: “You are a HR chatbot, answer the following query using the above context in a polite and professional tone.” Question: “How many days of annual leave am I entitled to?”

Step 3

We send the message to our favourite LLM and receive an answer!

“Employees are entitled to 30 days annual leave per year excluding bank holidays. Since you are in probation, you can take at most 5 days until the end of your probation period”

  1. Then we check the answer for errors (utilising multiple techniques including a secondary LLM, and semantic similarity checks). Here there’s a mistake about 5 days instead of 10, we then ask the LLM to regenerate the answer amending the mistake. “Employees are entitled to 30 days annual leave per year excluding bank holidays. Since you are in probation, you can take at most 10 days until the end of your probation period”
  2. Send the message to the user!

 

Using this general methodology and some clever software to help implement some architectures, we now have a framework that grants:

  1. Full control: Using a software layer with contexts, and checks you control exactly what ChatGPT does.
  2. Accuracy: You directly provide the information that ChatGPT uses, and you can even reference the original source.
  3. Steerability: You give your chatbot a persona and make sure that it stays on-brand.

 

You get a conversational solution that does what you expect. It runs checks and balances to ensure it does not go rogue and ruin your brand’s image.

We have developed multiple components that largely deal with the hallucination issue. They let you customise this process for all sorts of applications, with multiple types of context used for embellishing messages, and response checkers that scan for offensive language, tone of voice, factual correctness, semantic similarity, and even response length.

Some tasks inevitably remain challenging: handling large quantities of data, long conversations, and managing sources of truth for our chatbots (nobody likes managing multiple versions of the same information in different formats).

We’ve built automatic pipelines that can break down long documents into sensible snippets, as well as clever memory systems that build on open source techniques from cutting edge packages like LangChain. Our chatbot-specific content management system (CMS) syncs up with your existing documentation written for humans, keeping the versions of information intended for robots away from our human brains.

A recent legal sector example

We recently worked with Dentons to develop FleetAI, building solutions to go live within just a four-month period from conception to launch.

This makes them the first law firm to have the technology to systematically incorporate generative AI into its day-to-day workflows.

They already have two bots:

  1. One bot to conduct legal research, generate legal content and identify relevant legal arguments
  2. The other allows multiple legal documents to be uploaded so that key data such as clauses and obligations can be extracted, analysed and queried

How can I get started?

Either, you can have your own dedicated in-house generative AI team, or you can reach out to a technology partner like Springbok.

If you are interested in starting a conversation, reach out on [email protected]! We would be pleased to hear from you.

Meet The Author

Jason

Head of Engineering

Jason is a co-founder and the Engineering Lead at Springbok. His previous experience includes working as a Data Science/ Software engineer at Jaguar Land Rover. Jason has led the software engineering delivery of 4 of Springbok’s most significant chatbot projects to date.

Victoria

Founder & CEO

Victoria is a co-founder and managing director of Springbok AI since its start. Her previous experience includes scaling a food tech business Yfood and leading sales at Rasa (conversational AI framework). At Springbok AI she leads a team of 40 engineers and commercial staff on multiple projects and contributes to the further development of the business.

Read more