Taming Chatbot Codebase Complexity with a Bespoke DSL ✈︎

“Programming, when stripped of all its circumstantial irrelevancies, boils down to no more and no less than very effective thinking so as to avoid unmastered complexity, to very vigorous separation of your many different concerns.” — Edsger W. Dijkstra

Hipmunk's Hello Hipmunk chatbot provides personalized travel suggestions to users, helping them search for flights and hotels, as well as find new places to travel to.

The initial version of Hello Hipmunk was created in a very short period of time. It started out as an experiment to evaluate what would be possible with a travel chatbot, but quickly grew in both ability and complexity. Some of this complexity is due to the difficulty of the problems being solved, for example:

  • Processing natural language, which is notoriously variable and messy.
  • Remembering prior information the user has provided and applying it in future contexts, when appropriate.

But some of the chatbot’s complexity wasn’t necessary. When solving new, difficult problems it’s sometimes not until after you’ve implemented one solution that other, cleaner solutions occur to you.

In this blog post, we'll discuss our recent efforts at reducing the complexity of one part of the Hello Hipmunk codebase-- the view layer. Specifically, we'll present a simple Domain Specific Language (DSL) we created that makes it easy to specify the language, structure (and other attributes) of chatbot messages in a clear, easy-to-understand way.

Before we delve into the details of the DSL, let's go into just a little more detail on what a chatbot's view layer -- the code that converts data into a visual element to be seen by an end user -- looks like.

Chatbot view layer

The output of a chatbot's view layer is the JSON that specifies to a messaging platform's API the characteristics of the UI elements we want shown to the user, such as text, buttons, "cards", etc. For example, this is the structure of the JSON that Facebook's Messenger messaging platform expects to receive in order to display a button_template, a message that simply consists of text with buttons, to a user:

{
  "recipient":{
    "id":"<PSID>"
  },
  "message":{
    "attachment":{
      "type":"template",
      "payload":{
        "template_type":"button",
        "text":"What do you want to do next?",
        "buttons":[
          {
            "type":"web_url",
            "url":"https://www.messenger.com",
            "title":"Visit Messenger"
          },
          {
            ...
          },
          {...}
        ]
      }
    }
  }
}

Of course, we don't want to have to repeat all this structure every time we want to generate a button template message in our codebase. So let's create an abstraction to keep our code DRY: a simple function that accepts text and button data as inputs, and returns the above JSON structure with everything inserted in the appropriate places:

def create_button_template_json(self, text, button_data):
    ...
    return button_template_json

Now, if a Messenger user asks the Hello Hipmunk chatbot to show the best single flight from San Francisco to London, we can generate the JSON for our reply message like so:

def build_flight_search_result():
    return (
        create_button_template_json(
            text="Flight to {city} for {price}".format(
                search_result.destination_city, search_result.price
            ),
            [{
                "button_title": "Book flight",
                "url": search_result.booking_url
            }]
        )
    )

The problem

This is a step in the right direction, but it's a little difficult to glance at the method build_flight_search_result and quickly identify all the language that the user will actually see, as the language and code used to build the message are intermixed.

This wouldn't be much of a problem if we only had a few methods like this where the chatbot's language and the code are mixed together. However, a chatbot's UI is largely composed of language. And, if you call the various methods that build your JSON (like create_button_template_json) from several places in your codebase, then your chatbot's language will get spread throughout your codebase. This can make it difficult to keep track of all your chatbot's responses and reason about your system in general. And this problem only grows worse as a chatbot's repertoire of skills increases and the quantity of language in your codebase expands.

Moving toward a solution

Our goal was to be able to look at the chatbot message definitions in our codebase and understand them as quickly as possible-- that is, to know what language the user will see, and in what chat UI elements that language will appear. This would make it easier for both developers and non-developers to visually parse, understand, and modify the chatbot's messages as needed.

How to go about achieving this?

First, we noticed that there are a few things being done in build_flight_search_result:

  • Specifying the language the user will see (e.g., "Flight to {city} for {price}").
  • Specifying the structure of what UI elements that language will appear in (text bubbles, buttons, etc.).
  • Calling the appropriate code to insert variables into strings and perform the translation to JSON.

How can we extract the information we want to see (the language and structure of the message) from the information we don't want to see-- all the surrounding code?

It's worth pointing out that this is the basic motivation underlying all abstractions in programming. In order to reduce complexity, we only want to see (and manipulate) those parts of the system that are relevant to solving the particular problem at hand. Ideally, all other details will be hidden.

For instance, few people choose to implement web apps in assembly language, where they'd have to manipulate CPU registers manually for every trivial operation. Rather, we create high level languages that abstract away many lower-level details of how to do things, such as the specific register manipulations needed to multiply two numbers. Then, we can simply request what we want in our higher-level language using its syntax, e.g., 5 * 5, and trust that it will carry out the many lower-level operations necessary to accomplish it.

This is what we wanted to achieve: to specify what our chatbot messages should look like (and how they should behave), but not have to worry about how they're created, a style of programming known as declarative programming.

The first question we faced in this quest was to decide how to represent the what above-- the description of our chatbot messages. We chose to use YAML, a human-readable data serialization language.

The very basics of YAML

YAML is a superset of JSON, and offers a very concise method of representing data with minimal syntax. YAML has a variety of features, but for the purposes of this post, it will suffice to show how to represent the basic data structures: maps and arrays (see here if you'd like to learn more). Here's how you specify the map {"a_key": "a_value"}:

a_key: a_value

To specify the array ["value_1"]:

- value_1

It doesn't get much simpler or cleaner looking than that!

Let's now dive into the details of the simple DSL we created-- the Bot UI Descriptor Language (BUDL), which we pronounce as "boodle" (because it's fun to say).

Bot UI Descriptor Language (BUDL)

To give us some context for discussing some of the features of the BUDL language, let's imagine we're building a travel chatbot that performs flight searches for you (crazy concept, right?). We won't go through an exhaustive list of BUDL's features, just a few examples to give you a sense of how it works.

Specifying a plain text message

One of the most basic chatbot UI elements you can describe with BUDL is a plain text message. Here's the BUDL syntax for a plain text message:

message_descriptions_by_action:
  introduce_bot:
    - text: Hi, I'm a flight search chatbot.  Where are you headed?

One thing to note right away is the lack of any code or distracting syntax. We merely describe the chatbot's language and the UI elements in which the language should be placed. In a BUDL file, a chatbot has different "actions", e.g., "introduce_bot", and each action can consist of any number of messages (in this case, we just have a single text message).

This combination of message structure and chatbot language gives a clear description of the UI element that will be created, meeting our goal of making chatbot messages easy to visually parse.

Note on BUDL's underlying implementation

Now that we have a language for describing chatbot messages at what looks like a good level of abstraction, it's worth noting that we'll need the code that carries out the lower-level details of how to actually build those messages.

However, we won't go into any details about the code that implements these lower-level details in this post, as we're more concerned with the higher-level abstraction-- the DSL. But we'll refer to this "implementation code" below as the BUDLProcessor.

Injecting dynamically computed content

Chatbots do more than communicate with static text messages. We need the ability to make our messages more dynamic-- to insert the custom data that's been calculated for each chatbot response.

message_descriptions_by_action:
  indicate_price_found:
    - text: "I found a round-trip ticket to Hawaii for {price}"

To inject content into a string, we simply include the name of the variable that will contain the content surrounded by curly braces, e.g., {price} above. Then, when calling the BUDLProcessor, we pass in a key-value object where the keys have the names of these variables and the values they map to contain the values that should be interpolated into the string.

Example of creating more complicated UI elements

One common chatbot UI element is the "card", which typically consists of some or all of the following elements: image, title, subtitle, buttons. Here's an example:

Button template

Multiple cards can be sent to the user at once, to be scrolled through in a "carousel", in order to show more detailed information about things like flights, hotels, product details, etc.

This is how we specify cards in BUDL:

message_descriptions_by_action:
  show_flight_search_results:
    cards:
      - repeated_element_id: flight_card
        image: "{flight_details_image}"
        title: "${flight_price}"
        subtitle: "Price found at {flight_search_time}"
        buttons:
          - type: web_url
            button_text: "Book with {booking_airline_name}"
            url: "{link_to_book_flight}"

One thing to note about how you specify cards in BUDL is that we only describe one of each type of card we want. To support the ability to render multiple of any particular card, we included the concept of a "repeated_element": a UI element where multiples of it may be generated, depending on the data passed to the BUDLProcessor.

To indicate that an element can be repeated, we simply include the key "repeated_element_id" and a string to identify the element by, in this case "flight_card". Then, we can simply pass the necessary data (for as many elements as we need) into the BUDLProcessor, and they'll get generated automatically.

Example of controlling additional message properties

In addition to being able to specify the language and UI elements in a message, the declarative nature of BUDL also works well for specifying other properties a message should have.

For example, one variable worth considering when designing a chatbot user experience is the tempo the chatbot communicates at. If a single chatbot response consists of several messages, it can be overwhelming for the user to receive them all in rapid succession. Rather, it seems that users prefer a chatbot to communicate at a tempo somewhat similar to how a (perhaps quick thinking) human might.

To offer some control over the temporal aspects of the conversation, BUDL supports the ability to specify a custom delay for each message in a chatbot's response using the key message_delay. E.g., adding a 1s delay to the second text message below is as simple as:

message_descriptions_by_action:
  flight_no_longer_available:
    - text: Sorry, that flight appears to no longer be available.
    - text: I looked again and here are some additional flights you may like.
      message_delay: 1

As this example should make clear, adding support for the ability to specify a new message property in BUDL is as simple as adding a new key for the desired property (and modifying the BUDLProcessor to handle it appropriately).

Enabling linguistic variability

People grow tired of automated, impersonal and repetitive conversation. To keep our users engaged, we want a way to increase our chatbot's linguistic variability. In order to support this in BUDL, we included the ability to randomly select phrases from a given list when necessary. E.g., here's how you define multiple ways of saying something (in a "templates" section of the BUDL file):

templates:
  greeting:
    - Hello!
    - Howdy!
    - Hi there!

And here's how you would randomly select one greeting from the list for a "greet_user" chatbot action:

message_descriptions_by_action:
  greet_user:
    - text: "{greeting:random_choice}"
    - text: Where are you interested in flying?

Discussion

The above examples should give you a good sense of what we were aiming to achieve with BUDL, and some of the features it currently supports.

In creating BUDL, we separated the description of our chatbot message language and structure from the code that creates the final output JSON data structure. One might first assume that this would add complexity to our codebase, as we now have more things!

But, as Rich Hickey notes in his talk Simple Made Easy, simplicity often entails creating more -- not fewer -- things. This has the effect of separating the various concerns in your code, making it easier to reason about each of the things in isolation.

One of the benefits of separating the various concerns in one's codebase is increasing the reusability of components-- even in completely new contexts.

For instance, let's take a look at how we've started reusing our BUDL message descriptions in a context we didn't initially anticipate: integration testing.

Creating integration tests for chatbots can be tricky, given the fact that their output contains so much dynamically created content. However, we can now use our BUDL message descriptions as a spec, making it much easier to validate our chatbot's actual output given various inputs.

What's important to realize is that this was not possible when our message descriptions were intermixed with code. But, now, a BUDL message specification provides the perfect template to compare actual messages against for correctness, with it being clearly indicated within each message where the dynamically generated content will be inserted (so we know what to ignore during comparisons).

One of the biggest benefits we've had in adopting BUDL has been that it provides an easy-to-understand representation of the chatbot's messages for any non-programmers responsible for creating and editing the chatbot's language. In fact, PMs and designers are now creating pull requests for edits they've made to BUDL files, eliminating the trouble of having to specify lots of little language changes in different areas of the chatbot's UI in tickets for engineers to work on (and then having to verify that all these little changes were made correctly).

Chatbot applications present their own unique challenges when it comes to managing the complexity of your codebase. We hope you've enjoyed this post on our attempts to reign in and master the complexity of this one part of our codebase (and maybe it's even given you some ideas on how to apply some of these ideas in your own codebase).

And if you're a chatbot developer who loves thinking about chatbots as much as we do, please get in touch!