Training Data & Rules
Video
When you're making an assistant with Rasa you'll need to have training data. This refers to the text data used to train any models or features you're using. This includes user generated text as well as conversational patterns. It could include customer support logs, assuming data collection & re-use is covered in your privacy policy, or user conversations with your assistant.
Although pre-existing logs offer a good place to start, data from actual users interacting with your assistant is always the best data to work with.
Let's now discuss different parts of the data that you'll provide.
Stories
Stories represent training data to teach your assistant what it should do next.
If you already have conversational data it's good to start with the patterns you've
found there. It's also possible to create your own patterns, but we recommend using
interactive learning (via the rasa interactive
command) to get started. You can
start to create common flows and "happy flows" and then try to add some stories that
contain common digressions.
Once you have some stories to start from, you can train a first model which will allow you to test your model with users to gain feedback.
Examples
Here's an example story;
stories: - story: happy path steps: - intent: greet - action: utter_greet - intent: mood_great - action: utter_happy
You can be quite expressive in a story file though. You could, for example,
use or
statements. The story below uses an or
statement to indicate that
a user can use either the affirm
or the thanks
intent to confirm a signup.
stories: - story: newsletter signup steps: - intent: signup_newsletter - action: utter_ask_confirm_signup - or: - intent: affirm - intent: thanks - action: action_signup_newsletter
You're also able to use checkpoints in your stories.
stories:- story: beginning of conversation steps: - intent: greet - action: utter_greet - checkpoint: ask_feedback- story: provide feedback - checkpoint: ask_feedback - action: utter_ask_feedback - intent: inform - action: utter_thank_you - action: utter_anything_else- story: no feedback - checkpoint: ask_feedback - action: utter_ask_feedback - intent: deny - action: utter_no_problem - action: utter_anything_else
You can use or statements and checkpoints to modularize and simplify your training data. They can be useful, but do not overuse them. Using lots of checkpoints can quickly make your example stories hard to understand, and will slow down training.
Rules
Rules are a type of training data used to train your assistant's dialogue management model. Rules provide a way to describe short pieces of conversations that should always go the same way.
The main difference between a rule and a story is that a story can be seen as an example to learn from, while a rule is a pattern that the assistant must follow.
Here's an example of a rule that you may have in your rules.yml
file.
rules: - rule: Greeting Rule steps: - intent: greet - action: utter_greet
This rule says "whenever I see a user use the greet intent, the response should
always be the utter_greet
response". We'll talk more indepth about rules in
future videos because they also play a large role in using forms, but for now
we can see it as another file that contains data for our assistant.
Intents
The examples for your intents are stored in your nlu.yml
file. Here's an
example of such a file:
nlu: - intent: greet_smalltalk examples: | - hi - hello - howdy - hey - sup - how goes it - whats up?
In this example, we're giving many examples of the greet_smalltalk
intent. This
is training data for Rasa to learn from.
Some Tips
When you're adding training data. It helps to keep the following themes in mind.
When you're starting out with pre-existing logs we recommend to go through the logs to see if you can find examples that fit an intent by hand. You may be able to use machine learning techniques to help you find intents, but we recommend always keeping a human in the loop so that you can guarantee correctness.
If you don't have logs to start from, consider starting out with the most common intents and try to use domain knowledge or the experience of your colleagues to come up with sensible examples. You can always add an "out of scope"-intent for text that your assistant doesn't cover right away. Once you've got some basic intents, it's best to start sharing your assistant so that you can learn from user data.
User generated data is better than synthetic data. We're interested in learning how users interact with the assistant and we don't want to risk overfitting on synthetic data.
Each utterance should match exactly one intent in your training data. Rasa provides an end-to-end learning system that doesn't rely on intents for situations where an utterance may be ambigous. If you've got some ambious utterances, they can be added do a story like so:
stories: - story: happy path steps: - user: "Ciao!" - action: utter_greet - intent: mood_great - action: utter_happy
The reason we're using the example "Ciao!" here is because it can mean either "hello" or "goodbye". In the context of the shown story it means "hello", but we shouldn't add "Ciao!" as an example for the "hello"-intent because it is ambigous. It could also mean "goodbye".
Links
- Rasa Training Format Documentation
- Rasa NLU Data Documentation
- Rasa Stories Documentation
- Rasa Rules Documentation
Exercises
Try to answer the following questions to test your knowledge.
- When does it make sense to use
or
statements in yourstories.yml
file? - When does it make sense to use
checkpoint
in yourstories.yml
file? - Is it a good idea to start making an assistant with lots of intents? Why (not)?