Guide

AI product recommendations in practice: what the data actually needs to work

By Felix Hartmann, Optima Ecom AI February 19, 2025 6 min read
Abstract neural network visualization representing product recommendation algorithm data processing

The standard pitch for recommendation engines is something like: "show customers what they want before they know they want it." In practice, the quality of what you can show depends almost entirely on the quality of the data you're feeding into the system.

This guide covers what good product data looks like for recommendations, what happens when it's missing or inconsistent, and why store size is less important than data structure.

What a recommendation engine is actually doing

Most recommendation systems in e-commerce use some combination of two approaches: collaborative filtering (customers who bought X also bought Y) and content-based filtering (this product is similar to that one based on attributes). More sophisticated systems blend both and add additional signals like current inventory, margin, or seasonal relevance.

Collaborative filtering needs purchase history. Content-based filtering needs product attributes. If either source is thin, incomplete, or inconsistent, the recommendations degrade accordingly.

The catalogue quality problem

Here's a common scenario: a store with 800 products, added over several years by different people following different conventions. Products in the first 300 have detailed attributes and consistent tagging. The next 300 have minimal descriptions and generic category assignments. The last 200 were added quickly and have almost no structured data.

A recommendation engine fed this catalogue will surface reasonable suggestions for the well-described products and weak or irrelevant suggestions for the others. The system isn't broken — it's being asked to infer relationships from data that doesn't support reliable inference.

This is why a store with 200 well-structured products often builds better recommendations than a store with 2,000 inconsistently described ones. The recommendation is only as good as the catalogue data permits.

What good product data looks like

For content-based filtering to work usefully, you want consistent category taxonomy (the same category hierarchy applied to all products), specific attributes (not just "colour: blue" but "material: merino wool, weight: 200gsm, fit: slim"), accurate tags that reflect how customers search and browse, and clear relationships between products (variants, accessories, complementary items).

For collaborative filtering, you need purchase history with enough volume to establish patterns. "Customers who bought X also bought Y" requires that X and Y have both been bought in sufficient combinations to draw a meaningful signal. For a new store or one with low monthly order volume, collaborative filtering is limited — the dataset is too small to reveal reliable patterns.

How purchase history affects the system

Six months of order history gives a baseline; a year or more is better for identifying seasonal patterns. The relevant questions are: how many orders per month on average, how many distinct customers versus repeat buyers, and how varied the purchase combinations are.

A store with 50 orders per month and a narrow catalogue has much less collaborative signal than one with 500 orders across a broader range. That doesn't mean recommendations are impossible at lower volumes — it means the weighting shifts toward content-based methods, and the system configuration needs to account for that.

Platform considerations

Shopify's product API gives clean access to product data, variants, and collections. WooCommerce access depends on the database structure and which attributes you're using — the variability is higher. Both platforms provide order history access, but cleaning and normalising that data before it enters the recommendation pipeline is usually required.

One practical note for Shopify: the default "You may also like" section uses basic product type and collection matching. If you're seeing obviously wrong recommendations there, it's usually a symptom of inconsistent product taxonomy — which will need to be cleaned up regardless of whether you add AI-powered recommendations.

What to do before starting a recommendation project

Audit your top 100 products for attribute consistency. If they're not consistently described, extend that audit to the full catalogue before starting any project. The time spent on data quality before build produces better results than any amount of algorithm tuning after it.

Check your order history volume and distribution. If you have fewer than 200 orders per month, collaborative filtering will be limited and a content-based approach is more appropriate.

Define what "a good recommendation" looks like for your store. Do you want to cross-sell across categories? Upsell within the same category? Avoid recommending certain product combinations? These merchandising decisions need to exist before a recommendation engine can reflect them.

If you want to discuss whether your catalogue is in the right shape for a recommendation project, the discovery call is the right place. Book one here.

Recommendation system performance depends on data quality, catalogue structure, and purchase history volume. Outcomes vary and cannot be guaranteed in advance.