V. Rastogi, N. Dalvi and M. Garofalakis, "Large-scale collective entity matching", in 35th International Conference on Very Large Data Bases, 2009.
https://doi.org/10.14778/1938545.1938546
There have been several recent advancements in Machine Learningcommunity on the Entity Matching (EM) problem. However,their lack of scalability has prevented them from being applied inpractical settings on large real-life datasets. Towards this end, wepropose a principled framework to scale any generic EM algorithm.Our technique consists of running multiple instances of the EM algorithmon small neighborhoods of the data and passing messagesacross neighborhoods to construct a global solution. We prove formalproperties of our framework and experimentally demonstratethe effectiveness of our approach in scaling EM algorithms.