Periklis Chrysogelos, "Streaming, high performance support vector methods", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2016
https://doi.org/10.26233/heallink.tuc.66234
We are in an era where data are constantly being generated and machine learning can benefit from this to produce better models. Support vector machines are a popular machine learning model, which can be adapted and used for various tasks, such as classification, regression and clustering. We study the problem of continuously updating L2 Support Vector Machines in a distributed environment where new data constantly arrive in remote sites. We approach this as the problem of tracking a convex function’s minimum over the convex hull of the union of fully dynamic sets, each one located in one of the sites. We give communication efficient solutions for both the exact and approximate variants of the problem and show that they are applicable in the case of a kernelized SVM trained in an explicit feature space. In our proposed methods, the sites communicate only when it is necessary, that is, every time the model has been truly outdated. Also, in case the sites are forced to communicate, we propose two algorithms, one iterative and one with a single stage of communication.