Training Data & Rules
Although pre-existing logs offer a good place to start, data from actual users interacting with your assistant is always the best data to work with.
Let's now discuss different parts of the data that you'll provide.
Stories represent training data to teach your assistant what it should do next.
If you already have conversational data it's good to start with the patterns you've
found there. It's also possible to create your own patterns, but we recommend using
interactive learning (via the
rasa interactive command) to get started. You can
start to create common flows and "happy flows" and then try to add some stories that
contain common digressions.
Once you have some stories to start from, you can train a first model which will allow you to test your model with users to gain feedback.
Here's an example story;
stories:- story: happy pathsteps:- intent: greet- action: utter_greet- intent: mood_great- action: utter_happy
You can be quite expressive in a story file though. You could, for example,
or statements. The story below uses an
or statement to indicate that
a user can use either the
affirm or the
thanks intent to confirm a signup.
stories:- story: newsletter signupsteps:- intent: signup_newsletter- action: utter_ask_confirm_signup- or:- intent: affirm- intent: thanks- action: action_signup_newsletter
You're also able to use checkpoints in your stories.
stories:- story: beginning of conversationsteps:- intent: greet- action: utter_greet- checkpoint: ask_feedback- story: provide feedback- checkpoint: ask_feedback- action: utter_ask_feedback- intent: inform- action: utter_thank_you- action: utter_anything_else- story: no feedback- checkpoint: ask_feedback- action: utter_ask_feedback- intent: deny- action: utter_no_problem- action: utter_anything_else
You can use or statements and checkpoints to modularize and simplify your training data. They can be useful, but do not overuse them. Using lots of checkpoints can quickly make your example stories hard to understand, and will slow down training.
Rules are a type of training data used to train your assistant's dialogue management model. Rules provide a way to describe short pieces of conversations that should always go the same way.
The main difference between a rule and a story is that a story can be seen as an example to learn from, while a rule is a pattern that the assistant must follow.
Here's an example of a rule that you may have in your
rules:- rule: Greeting Rulesteps:- intent: greet- action: utter_greet
This rule says "whenever I see a user use the greet intent, the response should
always be the
utter_greet response". We'll talk more indepth about rules in
future videos because they also play a large role in using forms, but for now
we can see it as another file that contains data for our assistant.
The examples for your intents are stored in your
nlu.yml file. Here's an
example of such a file:
nlu:- intent: greet_smalltalkexamples: |- hi- hello- howdy- hey- sup- how goes it- whats up?
In this example, we're giving many examples of the
greet_smalltalk intent. This
is training data for Rasa to learn from.
When you're adding training data. It helps to keep the following themes in mind.
When you're starting out with pre-existing logs we recommend to go through the logs to see if you can find examples that fit an intent by hand. You may be able to use machine learning techniques to help you find intents, but we recommend always keeping a human in the loop so that you can guarantee correctness.
If you don't have logs to start from, consider starting out with the most common intents and try to use domain knowledge or the experience of your colleagues to come up with sensible examples. You can always add an "out of scope"-intent for text that your assistant doesn't cover right away. Once you've got some basic intents, it's best to start sharing your assistant so that you can learn from user data.
User generated data is better than synthetic data. We're interested in learning how users interact with the assistant and we don't want to risk overfitting on synthetic data.
Each utterance should match exactly one intent in your training data. Rasa provides an end-to-end learning system that doesn't rely on intents for situations where an utterance may be ambigous. If you've got some ambious utterances, they can be added do a story like so:
stories:- story: happy pathsteps:- user: "Ciao!"- action: utter_greet- intent: mood_great- action: utter_happy
The reason we're using the example "Ciao!" here is because it can mean either "hello" or "goodbye". In the context of the shown story it means "hello", but we shouldn't add "Ciao!" as an example for the "hello"-intent because it is ambigous. It could also mean "goodbye".
- Rasa Training Format Documentation
- Rasa NLU Data Documentation
- Rasa Stories Documentation
- Rasa Rules Documentation
Try to answer the following questions to test your knowledge.
- When does it make sense to use
orstatements in your
- When does it make sense to use
- Is it a good idea to start making an assistant with lots of intents? Why (not)?