Aether

· 4 min read

Cross-Merchant Product Matching with GTINs

Price comparison sounds simple. Find the same product at multiple retailers, show the prices side by side. But the word “same” is doing a lot of work in that sentence. Establishing that two product listings from two different merchants refer to the same physical item - and not merely a similar one - is the core technical challenge in any cross-merchant comparison service.

The matching problem

Consider a Samsung Galaxy S25 Ultra. One retailer lists it as “Samsung Galaxy S25 Ultra 256GB Titanium Black”. Another lists “SAMSUNG Galaxy S25 Ultra 5G Smartphone, 256GB, Titanium Black”. A third lists “Samsung Galaxy S25 Ultra - 256 GB - Titanium Black - Unlocked”. These are all the same product, but the titles do not match. Even after normalising case and removing punctuation, the strings are different.

Now consider that Samsung also sells the Galaxy S25 Ultra in 512GB, and in Titanium Grey, Titanium Violet, and Titanium Silver. Each of those is a different product with a different price. A matching system that groups all “Samsung Galaxy S25 Ultra” listings together regardless of storage and colour is giving the user misleading information. The comparison must be exact: same product, same variant.

Title matching - even with fuzzy logic, stemming, and normalisation - is fragile at this level of specificity. Two listings might have identical titles but refer to different regional variants. Two listings with different titles might be the same product with different retailer naming conventions. The more you rely on text matching, the more edge cases you encounter.

What GTINs solve

A GTIN - Global Trade Item Number - is the number encoded in a product’s barcode. You will also see it referred to as EAN (European Article Number) or UPC (Universal Product Code), depending on the region and format. These are all part of the same system.

The critical property of a GTIN is that it identifies a specific product variant. The 256GB Titanium Black Samsung Galaxy S25 Ultra has a different GTIN from the 512GB Titanium Black, and both are different from the 256GB Titanium Grey. If two listings from two different retailers share the same GTIN, they are the same physical product. Not similar. Not equivalent. The same item.

This makes GTINs the most reliable foundation for cross-merchant product matching. No fuzzy logic, no probability scores, no manual review. A shared GTIN is a definitive match.

Where GTINs fall short

If GTINs solved the problem completely, this would be a short post. There are real limitations.

Coverage is incomplete. Not all products have GTINs assigned. Retailer own-brand products often lack them. Accessories and peripherals have inconsistent coverage. Some retailers include GTINs in their product data feeds, others do not. In practice, GTIN coverage is best for branded electronics from major manufacturers and weakest for own-brand goods and niche accessories.

Bundles break the model. A retailer might sell a laptop bundled with a carry case and assign the bundle its own GTIN - different from the standalone laptop’s GTIN. The bundle and the standalone product are not the same item, and their GTINs correctly reflect that. But a user asking “what is the cheapest price for this laptop?” expects to see both options, which means the matching system needs to understand bundle relationships in addition to direct GTIN matching.

Data quality varies. A retailer might include a GTIN in their feed that is incorrect - a data entry error, a reused barcode, or a confusion between packaging variants. Bad GTINs in source data produce false matches, which are arguably worse than no matches at all (a user sees a confident price comparison that is actually comparing two different products).

A practical matching strategy

Given these limitations, a reliable matching approach uses GTINs as the primary mechanism and supplements with other signals where GTINs are unavailable.

The hierarchy works roughly like this: if two listings share a GTIN, treat them as matched. If GTINs are missing, fall back to a combination of brand, manufacturer part number (MPN), and normalised title. The further down the fallback chain you go, the lower the confidence in the match - and the more important it becomes to communicate that confidence to the consuming agent.

A match based on a shared GTIN is definitive. A match based on brand plus normalised title is probable. A match based on title alone is a suggestion. Treating all three with equal confidence is how bad product comparisons happen.

The design principle is the same one that applies to data freshness: be explicit about what you know and how confident you are. Let the agent decide how much certainty is enough for the task at hand.