After the AI block has been trained, we automatically calculate a performance score from 0-100 for you. And you might have already guessed: Higher is better.
The technical term for the measure we are using is "balanced hold-out accuracy" and it takes into account two things:
The accuracy of the model
How balanced your training dataset is
While lots of people are comfortable with accuracy (how often is it right out of all predictions), the balance is a topic that is frequently ignored which can lead to poor performance in the real world.
An example: Imagine you want to categorize customer support requests according to their content – technical issues and customer feedback. Now imagine that you have 1,000 examples to train your model, however, only 50 examples relate to customer feedback. The result is what is called an imbalanced model.
The tricky part comes in when the data that is being used for testing is also imbalanced, say 95 technical issues to 5 customer feedback. In that case, we wouldn't be able to tell the difference in performance between a really good model and one that says "technical issue" every time – in both cases, the accuracy will be high.
And that's why we need to take into account how balanced the training dataset has been.
Of course, there are other metrics to evaluate the performance of a machine learning model. We tried all major ones on dozens of problems and found that balanced hold-out accuracy was generally a good indicator for the majority of practical applications. If you need something specific, feel free to reach out!
If you really want to get into the weeds, Jason Brownlee has written a fantastic article on imbalanced classification problems. If not, here's how you get back to the app again.