Nextory is one of Europe's largest audiobook and e-reading platforms. The company offers thousands of book titles in multiple languages and has one of the most elaborate categorization systems of all audiobook providers.
Having an elaborate multi-hierarchy categorization system helps end users find niche books in very specific categories and is a unique selling point of their product.
Nextory had a mundane task which was to classify all books into a very elaborate categorization system called THEMA which has a hierarchical categorization structure.
The task of categorizing books according to this standard was done manually, and Nextory had an idea about using machine learning to automate this process.
Use existing information and data about books (author, title, description, cover photo) to implement a machine learning model that can predict the hierarchical book categories.
After analyzing the data and the previously hand laballed book categories, we investigated the THEMA specification and figured out a way to produce multi-hierarchical predictions. We used image recognition on the cover images and NLP tools for feature extraction on the text descriptions. A machine learning model that could accurately predict book categories according to the THEMA standard was implemented. This model was put into production in a cloud environment and could be accessed by engineers at Nextory.
As a result, staff that previously hand-labelled categories could work with more qualified tasks instead.
Overall, the model correctly predicted book categories 96% of the time. This metric refers to correctly labeling all book categories for a certain book.
If you are statistically inclined, you understand that this metric alone is not sufficient. Therefore, we have some additional metrics for you:
Precision | Recall | |
---|---|---|
Micro | 98,3% | 98,4% |
Sample | 90,6% | 90,6% |
If you'd like to learn more about hierarchical multiclass models and evaluation metrics, don't hesitate to reach out!
Ever wondered how city transportation is planned? See how we helped Trivector improve their transportation data with clever use of data science.
IoT devices in the water industry are currently rolling out at scale. Multiple systems, different protocols, a variety of data formats and other challenges lies ahead.
Preventing leaks, monitoring flows, preparing for the unforeseeable. We evaluated the potential for machine learning in the water industry.