Customer Cases

Genre categorization

Automating the THEMA standard.
    Keywords
  • Manual work replacement
  • Categorization
  • Machine learning
  • AI Pipelines
  • Python
INTRODUCTION

Nextory is one of Europe's largest audiobook and e-reading platforms. The company offers thousands of book titles in multiple languages and has one of the most elaborate categorization systems of all audiobook providers.

Having an elaborate multi-hierarchy categorization system helps end users find niche books in very specific categories and is a unique selling point of their product.


CHALLENGE

Nextory had a mundane task which was to classify all books into a very elaborate categorization system called THEMA which has a hierarchical categorization structure.

The task of categorizing books according to this standard was done manually, and Nextory had an idea about using machine learning to automate this process.


GOAL

Use existing information and data about books (author, title, description, cover photo) to implement a machine learning model that can predict the hierarchical book categories.


SOLUTION

After analyzing the data and the previously hand laballed book categories, we investigated the THEMA specification and figured out a way to produce multi-hierarchical predictions. We used image recognition on the cover images and NLP tools for feature extraction on the text descriptions. A machine learning model that could accurately predict book categories according to the THEMA standard was implemented. This model was put into production in a cloud environment and could be accessed by engineers at Nextory.

As a result, staff that previously hand-labelled categories could work with more qualified tasks instead.


RESULTS

Overall, the model correctly predicted book categories 96% of the time. This metric refers to correctly labeling all book categories for a certain book.

If you are statistically inclined, you understand that this metric alone is not sufficient. Therefore, we have some additional metrics for you:

Precision

Recall

Micro

98,3%

98,4%

Sample

90,6%

90,6%

If you'd like to learn more about hierarchical multiclass models and evaluation metrics, don't hesitate to reach out!

 Customer Cases

Check out some of our other work

Deep Learning

Ever wondered how city transportation is planned? See how we helped Trivector improve their transportation data with clever use of data science.
Trivector logo

Trip segmentation for multimodal travel flows

Ever wondered how city transportation is planned? See how we helped Trivector improve their transportation data with clever use of data science.

Big Data

IoT devices in the water industry are currently rolling out at scale. Multiple systems, different protocols, a variety of data formats and other challenges lies ahead.
NSVA logo

Big data architectures for latency, freshness and scale

IoT devices in the water industry are currently rolling out at scale. Multiple systems, different protocols, a variety of data formats and other challenges lies ahead.

Machine Learning

Preventing leaks, monitoring flows, preparing for the unforeseeable. We evaluated the potential for machine learning in the water industry.
VASYD logo

Pump efficiency and water flow predictions

Preventing leaks, monitoring flows, preparing for the unforeseeable. We evaluated the potential for machine learning in the water industry.

Reach out

It all starts with a conversation

I'd like to be contacted at