Conversation classification best practices for data

See related

Conversation classification best practices for data

Last Update: Oct 2024 • Est. Read Time: 5 MIN

To check plan availability, see the pricing page.

In today's world, data is fuel and said to be the new oil. Machine Learning (ML) has revolutionized Artificial Intelligence (AI). Due to this, if you intend to power your tools and processes using Machine Learning, you need to make sure you are using the highest quality fuel.

August 2023 update: This feature is no longer available to new customers.

What is Machine Learning?

In traditional programming, developers create code that gives the computer explicit instructions on how to solve a particular task. This differs from Machine learning applications, where the computer instead leverages huge amounts of data and learns to solve a problem by looking at examples. The output of this process, usually called training a model, relies heavily on the quality of the examples used.

Let's go over a basic example to help explain how this works. Imagine you want to teach a computer how to recognize pictures of cats. To do so, we would show it hundreds, or even thousands, example pictures of what a cat looks as if saying "Hey. This is a cat." We would also show it examples of other pets and different objects to then say "Oh and this is not a cat." Instead of traditionally describing a cat as a domestic animal that has whiskers, meows, and purrs, we let the computer learn by example and figure out its own tricks to identify what a typical cat looks like. Once the system completes the training, we expect the computer will be able to generalize how cats look, and in the future, when faced with a new picture, the system will be able to successfully identify cats.

When it comes to Conversation Classification, you may wish to automate repetitive manual processes, such as reading an incoming message and identifying the contact reason or applying selected tags to classify conversations into topics. In Kustomer, you can use data from your own past conversations as examples to help build a customized automated system that will simulate the cognitive process of understanding the contents of a message.

This general mechanism of learning by example, correcting itself via trial and error, and tweaking its own predictions, can be used to teach a computer many other things using collections of data. Current AI is made of millions of such examples of small and very specific devices able to simulate some fascinating functions of the human brain.

What does data quality mean?

ML applications are often accused of acting as sealed black boxes, especially when they don’t deliver the expected behavior. What we know in advance is that, beyond the algorithms involved, the quality of the data a model is fed with will have the biggest impact in the overall performance.

On one hand, an ML model will learn any patterns found in the data used for training, and only those; it will never be able to handle knowledge not present in the training data. On the other hand, it will also struggle when trying to learn from noisy and conflicting information. There is a popular saying: Garbage in, garbage out. That’s why we need to minimize that garbage in the input.

Data quality refers to the intrinsic regularities, clarity, and consistency we find in your data, in the tags and custom fields that label your conversations and messages, in the custom categories that are meaningful to manage your customer service. While you can't control your customer's requests, you can control how conversations are organized by designing suitable tags and custom fields.

The following example shows the historical data of a simplified ideal customer care service of an online store.

There are four different categories, with apparently clear boundaries, that everybody understands and agrees on, and most of the tickets fall into one of them. The tickets have been manually labelled by agents, according to their topic, as part of their daily job. Notice that there are some unlabelled conversations, located between categories. This isn't an issue as long as they are not the majority and the other groups make sense. In addition, the categories don’t overlap. This process of category identification is easy to automate not because of the conversations themselves, but because of the design of the categories.

On the other hand, the image below shows the same conversations without a clear definition of categories. In this case, some categories have overlapping meanings and names. When the limits of a category are not clear, it’s easy for agents to get confused when categorizing conversations and possibly use the incorrect labels. By looking at the categories below, you can see that the category used might vary, depending on the questions the agent is asking while choosing the one to apply. If the customer wants to return an item, it is a return or a return and a refund? They're not happy with the service so far so technically, it could also be labeled a complaint. Should they just select all of the labels that might apply, just in case?

In addition, as far as the promotions, sales, and coupons categories are concerned, there is too much granularity. It obviously depends on the business, but considering the volume and the similarity of the labels, it seems that space may be simplified by grouping all those conversations under a single category.

Finally, there is another problem with this scenario: a potential new category that would eventually be useful to gather a group of conversations with common patterns is missing—the category for customers asking for an update in their order status.

What can you do to ensure high quality data?

There are a number of steps you can take to ensure the highest quality of your data and leverage automation from the very beginning.

Design tags and custom fields that best suit your needs. While it’s impossible to cover all cases, try to focus on the most important ones in terms of expected volume. For example, order status, returns & refunds, or promo codes.
Create new tags to cover unlabelled information rather than redefining and remapping everything over again.
Add granularity in an incremental fashion. When possible, start with a broad category, such as pets, before attempting to split it into various, more specific categories, such as cats, dogs, or birds.
Try to use clearly defined tags and categories to organize your information. A good rule of thumb is to test them with people. If your team understands their boundaries correctly and knows when to apply them, an ML algorithm will be able to figure out their meanings.
Try to make your team work as consistently as possible. Again, this is easier to achieve when tag limits are clear.
Try to keep data stable for a while so that you can grow the amount of examples using your tags and custom fields. While you can iterate and re-define your tags and custom fields at any time, regularly changing this information will affect the amount of data points you get. Also, keep in mind that any re-definition of tags and categories will need the model to be re-trained.

Conversation classification