Unified AI Approach Using Encoding and Generative Large Language Models for Variant Product Matching in e-Commerce

Abstract

We introduce VARM, variant relationship matcher strategy, to identify pairs of variant products in e-commerce catalogs. Traditional definitions of entity resolution are concerned with whether product mentions refer to the same underlying product. However, this fails to capture product relationships that are critical for e-commerce applications, such as having similar, but not identical, products listed on the same webpage or share reviews. Here, we formulate a new type of entity resolution in variant product relationships to capture these similar e-commerce product links. In contrast with the traditional definition, the new definition requires both identifying if two products are variant matches of each other and what the attributes are that vary between them. To satisfy these two requirements, we developed a strategy that leverages the strengths of both encoding and generative AI models. First, we construct a dataset that captures webpage product links, and therefore variant product relationships, to train an encoding large language model (LLM) to predict variant matches for any given pair of products. Second, we use retrieval-augmented generation-prompted generative LLMs to extract variation and common attributes amongst groups of variant products. To validate our strategy, we evaluated model performance using real data from one of the world’s leading e-commerce retailers. The results showed that our strategy outperforms alternative solutions and paves the way to exploiting these new types of product relationships.

Keywords

Generative Artificial Intelligence GenAI e-commerce entity resolution large language models LLM

Introduction

Entity resolution (ER)¹ is an important task in data integration whose goal is to determine whether two mentions refer to the same real-world entity. Industry practitioners and academic researchers have, for long, devised techniques to address ER in various domains, e.g., resolving social media handles² and resolving products in e-commerce.³ While ER usually refers to exact ER, wherein two mentions are deemed to match each other if and only if each and every attribute of said mentions agrees with each other, data integration in e-commerce entails addressing subtle but nontrivial versions of the basic ER task. Still, this traditional definition of ER fails to capture product relationships that are critical for e-commerce catalogs.

As illustrated in the e-commerce site screenshot in Figure 1A, highly related products, but not identical products, are listed on the same webpage to facilitate the search. These consolidated webpage listings allow customers to look at the common product attributes while being able to easily choose amongst the different product variations, given by the variation attributes, such as color or size. Identifying these kinds of product relationships not only improves e-commerce listings, but it can also be exploited for many other applications, such as review sharing or search deduplication. To capture this notion, we formulate a new type of ER task for variant product relationships.

FIG. 1.

Variant product relationships in e-commerce sites. (A) E-commerce webpage screenshot showing a variant product listing example where the variation attributes, color and size, index the linked variant products on the same webpage. (B) Example showing how the same type of product, such as keyboards, can be associated with different variation attributes depending on the brand or listing website.

This new definition of variant products imposes additional considerations since it can vary across product types or even brands. For example, color and size are adequate variation attributes for clothing products, but for drinks, the difference between products is their flavor or sugar content, or diagonal screen size for Televisions (Appendix Fig. A1). Moreover, even for products of the same type, the specific variation attributes may change depending on the specific brand, such as keyboards from HyperX brand vary by switch and model, whereas keyboards from Razer have keyboard switch, keyboard layout, and color/design for variation attributes (Fig. 1B, Table A2).

FIG. A1.

Example variant products identified by VARM. (A) Pair of watches belonging to the same variation group, regardless of difference in color, material or dial style, as shown by the shared listing on the webpage. (B) Pair of variant edible products where the variation attribute is flavor. VARM, variant relationship matcher.

Table A2.

Example product listings on e-commerce platforms

Product	Product type	Brand	Variation attributes	URL
Razer Huntsman Mini - Linear optical switch - U.S.-Mercury	Keyboard	Razer	Keyboard switch, Keyboard layout, Color/design	https://www.razer.com/gaming-keyboards/Razer-Huntsman-Mini/RZ03-03390400-R3M1
HyperX alloy origins - Mechanical gaming Keyboard	Keyboard	HyperX	Switch, model	https://hyperx.com/collections/gaming-keyboards/products/hyperx-alloy-origins-mechanical-gaming-keyboard?variant=42330451148957
Linen-blend wide Strap mini dress	Dress	Abercrombie & Fitch	Color, size, length	https://www.abercrombie.com/shop/us/p/linen-blend-wide-strap-mini-dress-56256821
Satin slip dress	Dress	Zara	Size	https://www.zara.com/us/en/satin-slip-dress-p08354475.html
Love ring	Ring	Cartier	Size, metal	https://www.cartier.com/en-us/jewelry/rings/love/love-ring-classic-model--CRB4084800.html
Eternity solitaire ring	Ring	Swarovski	Color, size	https://www.swarovski.com/en-US/p-M5697472/Eternity-solitaire-ring-Lab-grown-diamonds-1-2-ct-tw-Round-cut-14K-yellow-gold

In practice, identifying variant product relationships imposes two main challenges. First, one has to establish if a given pair of products are variations of the same entity variant match or different products mismatch. Supervised methods based on encoding large language models (LLMs) are the current state of the art to establish if two products are identical exact match, but it comes at the cost of collecting extensive labeled datasets for model training,^4,5 which are not readily available for variant matching, unlike exact matching. Second, one has to determine the variation attributes for the relevant set of variant products. While it could also be described as a supervised task, this would require learning thousands of variation attribute labels since they can vary by product type or brand, further increasing the need for labeled training data and the overall complexity of the task.

Here, we introduce a new strategy for variant relationship matching, VARM, that leverages the respective strengths of generative and encoding LLMs to overcome the challenges of identifying this new kind of product relationship. First, to capture variant product information, we construct a dataset that captures variant product pair relationships given by products listed on the same webpage, which we used to train an encoding LLM to predict variant match relationships. Second, we use generative LLMs to predict variation attributes for groups of variant products, without requiring training data or being limited to a fixed set of variation attribute labels. To further provide the generative model with e-commerce information, we used a retrieval-augmented generation (RAG)^6,7 by providing context about products from similar product types and brands. Overall, this work presents the following main contributions: −

We introduce the novel variant product identification task.

−

We formulate a novel method to identify variant products, capitalizing on the information present in e-commerce webpages.

−

We develop a strategy that uses the synergistic properties of encoding and generative LLMs to accurately predict variant matching products and variation attributes.

−

We validate the model on three relevant datasets from e-commerce services.

Related Work

Given that ER has been a topic of research for more than half a century, almost all major approaches of machine learning have been applied to solve it, including supervised and unsupervised approaches.⁸ More recently, the focus has shifted to deep learning, including bespoke neural networks,⁹ pretrained language models⁴ and most recently, generative AI.^10–12 None of these works address variant matching, which is the focus of this work. Narayan et al.¹⁰ show that GPT3 can perform ER when provided with few-shot task demonstrations (in-context learning); Peeters and Bizer¹¹ show that the addition of entity matching rules to the prompt can help boost ChatGPT’s matching quality; and Peeters and Bizer¹² use GPT4 to provide explanations alongside match/mismatch predictions along with automatically dividing mispredictions into easy-to-understand error classes. More importantly, Peeters and Bizer show that despite recent advances, fine-tuning on sufficient labeled data can still outperform the best (zero-shot) generative AI ER results. Following this result in our work, we fine-tune pretrained LLMs to generate variant match labels, and to address the more challenging task of predicting variation attributes, we exploit world knowledge inherent in generative AI models.

Methods

Our variant relationship matching, VARM, strategy can be largely divided into two main tasks (Fig. 2): (1) variant match prediction and (2) identification of variation attributes.

FIG. 2.

VARM strategy schematic. The variant product relationships present in webpages are exploited to construct a dataset with matching variant product pairs, which are then augmented to generate negative or mismatched examples for encoding LLM training. The groups of variant products are also used to predict variation attributes using a generative AI that can also take RAG product information in the prompt. The trained models can be used to predict variant product relationships and attributes for any new pair of products. LLM, large language model; RAG, retrieval-augmented generation; VARM, variant relationship matcher.

Variant match prediction

Each product $p$ can be defined as a structured set of key-value pairs $p = {(a_{i}, v_{i})}_{1 \leq i \leq k}$ , where $a_{i}$ is the attribute name and $v_{i}$ is the attribute’s value represented as text (see Appendix A1). Given a product pair $p_{1}, p_{2}$ , the encoding model aims to predict match label in a binary classification task.

DistilBERT was chosen as the encoding model given its competitive performance on product entity tasks,^4,12 but the strategy generalizes to alternative model choices. The model has 66 M parameters, distilled from a 110 M parameter teacher model, with 12 hidden layers, or transformer blocks, and 768 attention heads.^13,14 Given a pair of products, the product attributes from both products are first concatenated and tokenized into a single sequence of text tokens to enable early fusion.¹⁵ The two product descriptions are separated by a $[SEP]$ token and padded and/or truncated to meet the 512 token input limitation, while ensuring that half the tokens came from each of the products in the pair. To provide the model with relevant product understanding, the model was first fine-tuned in e-commerce relevant tasks.^16,17 In contrast to the off-the-shelf BERT, we will refer to version of the model pretrained on the e-commerce dataset with an ecom tag.

To capture variant product relationships, a labeled dataset was generated capitalizing on the positive variant product links present in webpage listings and variation groups and synthesizing negative samples by leveraging the positive links (see dataset details below). Model weights were fine-tuned to perform the variant product matching task in a supervised fashion by minimizing the cross-entropy loss function of a linear classification layer using this dataset. The training regime was limited to a single epoch with Adaptive Moment Estimation optimization and a ${5 e}^{- 6}$ learning rate without weight decay.

Variation attribute identification

For a set of products in a given variation group, we formulated the variation attribute estimation of VARM as a “Text-to-Text” task inspired by the recent success of generative AI.^18–20 Specifically, the input is structured as an instruction that encapsulates the text attributes for all the products belonging to a given variation group. Therefore, for each given variation group, we have a set of products $p_{1 : k} \in P$ with associated attributes $(a_{1 : k, 1 : i}, v_{1 : k, 1 : i})$ , and instruction $I$ that the generative model $f$ takes to predict attribute target class $c \in C$ as:

f : \{P (a_{1 : k, 1 : i}, v_{1 : k, 1 : i}), I\} \to C

where the set of class labels

C

is limited to common and variation attributes. Note that we do not set a constraint to provide labels for all of the attributes nor to limit the output to structured attributes. As such, the model can identify variation attributes in the product attribute keys

a

or values

v

. Still, we penalize the model when providing contradictory labels for the same attribute.

We formalize the model as both a zero-shot and few-shot learner utilizing an off-the-shelf LLM, Claude3 Haiku²¹ (see parameters settings in Appendix Table A1). The zero-shot formulation implicitly assumes that LLMs have been trained on massive amounts of language data and thus possess contextual understanding, given a correctly engineered prompt.^20,22 For prompt engineering we combined chain-of-thought and instruction techniques to predict all attributes labels for a given variation group.^23,24 To mitigate the impact of this assumption, we provide product-relevant information as part of the prompt using RAG.⁷ For a given product pair of certain product type and brand, information about variation attributes is retrieved online from the webpage-linked products dataset. For the specific product type or brand, variation groups with products belonging to the same product type or brand were filtered, then the associated variation attributes were collected and structured into a list of unique variation attributes to be included in the prompt.

Table A1.

Generative model parameter settings

Model	Task	Parameter	Value
Claude instant V1	Product	max_tokens_to_sample	30
${(GenAI}_{zero_shot})$	matching	temperature	0
${(GenAI}_{few_shot})$		top_k	100
Claude Sonnet 3.5v2	Product	max_tokens_to_sample	200
${(GenAI}_{multimodal})$	matching	temperature	0
		top_k	100
Claude3 Haiku	Attribute	max_tokens	500
$({GenAI}_{zero_shot})$	identification	temperature	0
${(GenAI}_{RAG})$		top_p	0.9

The detailed prompt provided to the model is structured as follows:

*Variation attribute identification prompt*

You are an expert on products. The following list of products are the same entity but variations of each other. Some of the descriptors are unique to each product and some other are different across them. The attributes can be descriptors or keywords within the descriptor.

Below are the products’ descriptions:

{variation_group_products}

You need to complete the following tasks.

Compare the details in the all products above and determine the attributes that are common and different across the products.

If an attribute is “different,” it cannot be “same.”

Respond: “Different:” followed by a list the attributes that are the different across them.

Respond: “Same:” followed by a list the attributes that are same across them.

Respond: “Reason:” explaining why or how attributes are different, for the different attributes.

Do not add an explanation about the output format. Ensure the output is exclusively in JSON format.

Return only a JSON block using double quotes in this format and in this order:

{[“Different”: [“list different attributes”], “Same”: [list same attributes], “Reason”: [Reason why attributes are different]]}

Do not return anything except a JSON. Always begin your output with: “{“

Datasets

The datasets used to develop and evaluate VARM models were¹:

Webpage-linked products: dataset containing pairs of products listed on the same e-commerce webpage as illustrated in Figure 1A. Since these pairs were presented together, we can assume them to belong to the same variation group and therefore are variant products, irrespective of the variation attribute(s). Using these positive relationships, we generated synthetic samples by shuffling product pairs. Exploiting our understanding of product relationships, we used an informed strategy to generate negative samples according to the following three buckets: hard negative samples coming from shuffling product pairs for another product from the same brand and product type, medium difficulty samples by shuffling product pairs for another product from the same product type but not brand, and easy samples by shuffling pairs for another product from a different product type and brand. Text information from 2 M product pairs and 168 K variation groups is structured into product attributes, split into 70:30 training:evaluation sets with balanced class labels, while enforcing that products from the same variation groups are in the same data split.

Expertly audited variant product pairs: dataset containing 470 labeled pairs of products listed on different e-commerce websites and paired based on product similarity, representing a challenging dataset that mimics real use cases for variant product matching. The variant match labels were given by catalog experts taking webpage structure as ground truth.

Expertly audited variation group attributes: dataset with individual attributes from 10 variation groups, with 2–12 products each, labeled as variation or common depending on whether each attribute was different or the same across the products in the variation theme as determined by catalog human experts. The dataset contained 81 labeled attributes, with 35 and 46 labeled as variation and common, respectively.

Results

VARM accurately learns variation matching product relationships from website structure

To validate our VARM strategy, we evaluated the outputs of the different model components under different experimental conditions and compared its performance with state-of-the-art models performing the same tasks.

First, since the webpage-linked products dataset initially contains only positive variation match samples, we generated synthetic samples by shuffling product pairs. We used an informed strategy to generate the negative samples according to types of products and brands (see Datasets section in Methods).

As control, we compared the performance of models trained in this dataset with that of a dataset containing random pair shuffles for negative sample generation. For each experiment we generated 1 M negative samples to generate a label-balanced dataset with 2 M pairs partitioned into 70:30 training:evaluation splits, while ensuring that products from the same variation group would be in a given split.

We fine-tuned encoding LLM models on the aforementioned set of negative samples and the positive labels from the webpage-linked products dataset, starting from an off-the-shelf DistilBERT² model or a DistilBERT model pretrained on e-commerce tasks, ${distilBERT}^{ecom}$ . We also compared the performance of a zero-shot generative matching models using Claude on a held-out test split of the dataset (prompt details in Appendix A3). While the generative model provides an above-average performance without any training data, the fine-tuned models clearly outperformed it when trained on the task (Fig. 3).

FIG. 3.

Model performance as a function of training set size on the webpage-linked products dataset. Performance of GenAI model ${GenAI}_{zero_shot}$ , off-the-shelf $DistilBERT$ and $DistilBERT$ pretrained on e-commerce tasks ${DistilBERT}_{ecom}$ , finetuned using random $rand$ or informed $\inf$ negative sampling strategy.

To better understand the dependency of labeled data on model performance, we varied the amount of data used to fine-tune the encoding models. As expected, the performance increases monotonically with training set size. However, fine-tuning a LLM model from scratch, $DistilBERT$ , requires tens of thousands of training examples to achieve similar performance to an equivalent model that was pretrained on e-commerce tasks and further fine-tuned with a few thousand examples, ${distilBERT}^{ecom}$ . This is likely because the off-the-shelf model has to learn product representations alongside the variant matching task, whereas the ${distilBERT}^{ecom}$ model only had to learn the new task boundaries. Moreover, the strategy used to generate the negative samples also impacts task learning rates. Fine-tuning with randomly generated samples ${distilBERT}^{rand}$ results in performances comparable to those of training the same models with a hundred times less training data generated, taking into consideration product type and brand dependencies ${distilBERT}_{\inf}$ (Fig. 3). Still, it is important to note that the difficulty of the task is influenced by the synthetically generated negative examples and may not be fully representative of the true distributions found in practice.

To prove the validity of VARM’s matching model component for practical e-commerce applications, we tested its performance in a dataset with varied product pairs across e-catalogs and expertly labeled. Given that the example pairs were sampled and not synthetically generated, they represent conditions that could be faced when tackling e-commerce tasks. Testing the generalization performance of the different models showed that all models can accurately estimate variant product relationships. In addition, providing few-shot examples using RAG as part of the instruction, ${GenAI}_{few_shot}$ , improved performance suggesting that the model can learn product relationships from the context. We also experimented using multimodal signals by providing product images and not only text as input features, ${GenAI}_{multimodal}$ , which improved overall performance metrics (see methods in A.3). It is worth noting that while the False Positive Rate (FPR) rates are relatively high, this is likely an outcome of the class balance ratio in our held-out evaluation set. In the real world, we are likely to face much more “easy-to-identify” pairs of products that are unrelated to each other.

Still, ${DistilBERT}_{\inf}^{ecom}$ model pretrained on e-commerce tasks before being fine-tuned on the variant matching tasks outperforms alternative models (Table 1).

Table 1.

Models’ performance comparison evaluated e-commerce relevant variant product dataset

Model	AUROC	Accuracy	Precision	Recall	F1 score	FPR
${GenAI}_{zero_shot}$	—	74.68	92.01	75.98	83.89	33.87
${GenAI}_{few_shot}$	—	80.42	93.23	82.11	87.93	30.64
${GenAI}_{multimodal}$	—	78.09	92.70	79.66	86.32	32.35
${DistilBERT}_{rand}$	68.17	82.12	91.78	88.23	89.55	58.06
${DistilBERT}_{\inf}$	69.98	86.81	92.41	99.99	92.93	53.22
${DistilBERT}_{rand}^{ecom}$	89.71	85.74	97.90	87.74	91.44	27.41
${DistilBERT}_{\inf}^{ecom}$	90.61	87.44	98.39	90.68	92.61	33.87

Boldface highlights the best performing model/row for each metric/column.

VARM correctly classifies variation attributes

Identifying attributes that are common or vary across groups of variant products is critical for multiple applications. To assess VARM’s ability to label attributes, we sampled 500 variation groups present in the webpage-linked products dataset where the variation attribute is known. We tested the performance of the zero-shot generative AI model, ${GenAI}_{zero_shot}$ , prompted to solve this task and also when provided additional RAG information about variation attributes from other products of similar type and brand, ${GenAI}_{RAG}$ . Both ${GenAI}_{zero_shot}$ and ${GenAI}_{RAG}$ generate qualitative correct responses, with consistent explanations that take into consideration product type and brand (examples in Appendix A2).

To get a quantitative estimate of the performance, we estimated the recall when predicting structured variation attributes to prevent penalizing for additionally found variation attributes that may not have a structured key. As baseline, we define a heuristic model that estimates variation attributes as the structured attributes that vary across more than 90% of the products in the variation group. All models can predict variation attributes above chance, with generative models outperforming heuristic-based methods. Moreover, providing additional context about products in the prompt, ${GenAI}_{RAG}$ further boosts attribute identification performance (Table 2).

Table 2.

Variation attribute identification performance on the webpage-linked dataset sample

Model	Recall (color, size)	Recall (all)
Heuristic	70.65	70.28
${GenAI}_{zero_shot}$	79.56	79.67
${GenAI}_{RAG}$	90.95	80.40

Boldface highlights the best performing model/row for each metric/column.

Recall performance for models predicting color and size variation attributes and all variation attributes.

While identifying variation attributes is sufficient to cluster variant products, also extracting common attributes is critical to provide a complete description of the product group. To test VARM’s ability to determine both common and variant attributes, catalog experts evaluated model predictions for both labels on a dataset with 10 variation groups. The evaluation showed that both versions of VARM, ${GenAI}_{zero_shot}$ and ${GenAI}_{RAG}$ , can accurately predict common and variation attributes. Interestingly, providing information about the variation attributes using RAG not only improves performance for the variation attributes but also for the common attributes, suggesting that the generative LLM has a global understanding of the task and can generalize across the different rules (Table 3).

Table 3.

Common and variation attribute identification performance

Model	Attribute type	Attribute #	Accuracy
	Common	46	80.43
${GenAI}_{zero_shot}$	Variation	35	73.53
	All	81	75.29
	Common	46	84.78
${GenAI}_{RAG}$	Variation	35	74.29
	All	81	79.01

Boldface highlights the best performing model/row for each metric/column.

Discussion

The recent developments in LLM technology have popularized its use with successful application to a multitude of use cases, including e-commerce tasks.¹² Particularly, encoding LLMs are generally preferred for classification tasks or learning embeddings, while generative AI models are used for text generation tasks like summarizing or translation.^12,17,25 In this work, we capitalize on the respective advantages of encoding and generative LLMs to solve a new task for ER aimed at identifying variant matches and variation attributes amongst e-commerce products. While using both encoding and generative models provides higher performance, it also increases design and computational complexity. A promising direction to reduce this complexity would be to also use generative models for ER by leverage generated samples in context learning,²⁶ which as shown in or few-shot examples aids the model learn this kind of product relationship, and given the rapid progress in the field, they could provide comparable performance to encoding models.²⁷

Here, we show how we can learn variant product relationships leveraging the information present in website structures. Still, it will be worth exploring in future works how to adjust the granularity of these relationships so it can be applied across e-commerce sectors. For example, jewelry variation attributes could be material type at a coarse level of description, but for jewelry retailers the relevant variation attributes could be finer, such as ring size or gem type. Here, we showed that the strategy used to augment the dataset and generate negative examples was for model performance, suggesting that data augmentation methods could prove useful to define new types of product relationships.^28,29

This work expands the traditional definition of ER to identify variant relationships and implements a model to successfully identify these relationships amongst products. While it was only tested in e-commerce catalog applications, the new formulation and model strategy can be directly extended to other areas using ER, such as data curation or customer identification.³⁰

Authors’ Contributions

Conceptualization—P.H.V., L.W., and P.S.; Data curation—P.H.V.; Formal analysis—P.H.V.; Investigation—P.H.V.; Methodology—P.H.V., L.W., and P.S.; Project administration—P.H.V., L.W., P.S., and B.X.; Software—P.H.V. and Y.C.; Supervision—L.W., P.S., B.X., and C.L.; Validation—P.H.V. and Y.C.; Visualization—P.H.V.; Writing—original draft—P.H.V., L.W., and P.S.; Writing—review—P.H.V., L.W., P.S., and B.X.

Footnotes

Acknowledgments

The authors thank Abhishek Tripathi, Alvaro Quitral, Venessa Tauro, Daniel Marti, Nilotpal Das, Raghavendran Balu, Marian George, and Yi Ren for their assistance accessing resources and insightful feedback.

Author Disclosure Statement

The authors declare no competing interests or conflict of interest.

Funding Information

All work was performed while authors were employed at Amazon on in-house infrastructure. The views expressed in this article are solely the authors’ and may not necessarily reflect Amazon’s viewpoint.

Appendix

References

Fellegi

, Sunter

. A theory for record linkage. Journal of American Statistics (1969), 1969.

Qian

, Popa

, Sen

. 2017. Active Learning for Large-scale Entity Resolution. CIKM.

Jain

, Sarawagi

, Sen

. Deep Indexed Active Learning for Matching Heterogeneous Entity Representations. PVLDB, 2022.

, Li

, Suhara

, et al. Deep entity matching with pre-trained language models. Proc VLDB Endow, 2020; 14(1):50–60; doi: 10.14778/3421424.3421431

Tracz

, Wójcik

, Jasinska

, et al. Robert Mroczkowski, and Ireneusz Gawlik. BERT-based similarity learning for product matching. 2020.

Gao

, Xiong

, Gao

, et al. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv, 2024. Available from:https://arxiv.org/abs/2312.10997

Lewis

, Perez

, Piktus

, et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv, 2021. Available from: https://arxiv.org/abs/2005.11401

Getoor

, Machanavajjhala

. 2013. Entity resolution for big data. In Tutorial at KDD.

Mudgal

, Li

, Rekatsinas

, et al. Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep learning for entity matching: A design space exploration. In Proceedings of the 2018 International Conference on Management of Data.

10.

Narayan

, Chami

, Orr

, et al. 2022. Can Foundation Models Wrangle Your Data? In PVLDB.

11.

Peeters

, Bizer

. 2023. Using ChatGPT for entity matching. In European Conference on Advances in Databases and Information Systems.

12.

Peeters

, Bizer

. 2024. Entity Matching using Large Language Models. arXiv preprint Available from: https://arxiv.org/abs/2310.11244

13.

Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT.

14.

Sanh

, Debut

, Chaumond

, et al. Dis- tilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv, 2019. Available from: http://arxiv.org/abs/1910.01108

15.

Yang

, Zhao

, Wu

, et al. Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey. Sci China Inf Sci, 2025; 68(10). Available from: https://arxiv.org/abs/2406.08068

16.

Jin

, Mao

, Li

, et al. 2023. Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for Recommendation and Text Generation. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track. Available from: https://openreview.net/forum?id=uXBO47JcJT

17.

Zhang

, Yuan

, Liu

, et al. E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce. arXiv, 2021. Available from: https://arxiv.org/abs/2009.02835

18.

Gilardi

, Alizadeh

, Kubli

. ChatGPT outperforms crowd workers for text-annotation tasks. Proc Natl Acad Sci U S A, 2023; 120(30):e2305016120; doi: 10.1073/pnas.2305016120

19.

Kulkarni

, Bansal

, Sr. Manager - Digital Applications, Fortune Brands Home & Security, USA. Exploring real-world applications of GenAI in retail. J Arti Inte & Cloud Comp, 2023; 2(186):1–5; doi: 10.47363/JAICC/2023[Mismatch]

20.

Sun

, Li

, et al. Text Classification via Large Language Models. arXiv, 2023.

21.

Anthropic. The claude 3 model family: Opus, Anthropic: Sonnet, Haiku. (2024).

22.

Kojima

, Gu

, Reid

. Yutaka Matsuo, and Yusuke Iwasawa. 2023. Large Language Models are Zero-Shot Reasoners. ArXiv, 2205.

23.

Wei

, Wang

, Schuurmans

, et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. ArXiv, 2023.

24.

Zhang

, Zhang

, Li

, et al. Automatic Chain of Thought Prompting in Large Language Models. ArXiv, 2022.

25.

Grzegorz Chodak. 2024. Artificial Intelligence in E-Commerce. Springer Nature Switzerland, Cham, 187–233; doi:10.1007/978-3-031-55225-0_7

26.

Brown

, Mann

, Ryder

, et al. Language models are few-shot learners. CoRR, abs/2005.14165. 2020.

27.

Gursoy

, Cai

. Artificial intelligence: An overview of research trends and future directions. IJCHM, 2025; 37(1):1–17.

28.

Jiang

, Shang

, Liu

, et al. BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online E-Commerce Search. In 2020 IEEE International Conference on Data Mining (ICDM). IEEE; 2020, pp. 212–221; doi: 10.1109/ICDM50108

29.

Wen

, Kumar Vasthimal

, Lu

, et al. 2019. Building Large-Scale Deep Learning System for Entity Recognition in E-Commerce Search. In Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (Auckland, New Zealand) (BDCAT ‘19). Association for Computing Machinery, New York, NY, USA, 149–154; doi:10.1145/3365109.3368765

30.

Binette

, Steorts

. Almost) all of entity resolution. Sci Adv, 2022; 8(12):eabi8021; doi: 10.1126/sciadv.abi8021

31.

Anthropic. 2024. Claude 3.5 sonnet model card addendum.