Maximum likelihood stochastic transformation adaptation for medium and small data sets

Boulis Constantinos, Diakoloukas Vasilis, Digalakis Vasilis

URI	http://purl.tuc.gr/dl/dias/719ECF19-A80F-4AFB-A104-A786DB940E4B	-
Αναγνωριστικό	http://www.sciencedirect.com/science/article/pii/S0885230801901688	-
Αναγνωριστικό	https://doi.org/10.1006/csla.2001.0168	-
Γλώσσα	en	-
Μέγεθος	29 pages	en
Τίτλος	Maximum likelihood stochastic transformation adaptation for medium and small data sets	en
Δημιουργός	Boulis Constantinos	en
Δημιουργός	Diakoloukas Vasilis	en
Δημιουργός	Διακολουκας Βασιλeioς	el
Δημιουργός	Digalakis Vasilis	en
Δημιουργός	Διγαλακης Βασιλης	el
Εκδότης	Elsevier	en
Περίληψη	Speaker adaptation is recognized as an essential part of today’s large-vocabulary automatic speech recognition systems. A family of techniques that has been extensively applied for limited adaptation data is transformation-based adaptation. In transformation-based adaptation we partition our parameter space in a set of classes, estimate a transform (usually linear) for each class and apply the same transform to all the components of the class. It is known, however, that additional gains can be made if we do not constrain the components of each class to use the same transform. In this paper two speaker adaptation algorithms are described. First, instead of estimating one linear transform for each class (as maximum likelihood linear regression (MLLR) does, for example) we estimate multiple linear transforms per class of models and a transform weights vector which is specific to each component (Gaussians in our case). This in effect means that each component receives its own transform without having to estimate each one of them independently. This scheme, termed maximum likelihood stochastic transformation (MLST) achieves a good trade-off between robustness and acoustic resolution. MLST is evaluated on the Wall Street Journal(WSJ) corpus for non-native speakers and it is shown that in the case of 40 adaptation sentences the algorithm outperforms MLLR by more than 13%. In the second half of this paper, we introduce a variant of the MLST designed to operate under sparsity of data. Since the majority of the adaptation parameters are the transformations, we estimate them on the training speakers and adapt to a new speaker by estimating the transform weights only. First we cluster the speakers in a number of sets and estimate the transformations on each cluster. The new speaker will use transformations from all clusters to perform adaptation. This method, termed basis transformation, can be seen as a speaker similarity scheme. Experimental results on the WSJ show that when basis transformation is cascaded with MLLR marginal gains can be obtained from MLLR only, for adaptation of native speakers.	en
Τύπος	Peer-Reviewed Journal Publication	en
Τύπος	Δημοσίευση σε Περιοδικό με Κριτές	el
Άδεια Χρήσης	http://creativecommons.org/licenses/by/4.0/	en
Ημερομηνία	2015-11-02	-
Ημερομηνία Δημοσίευσης	2001	-
Θεματική Κατηγορία	Data sets	en
Βιβλιογραφική Αναφορά	C. Boulis, V. Diakoloukas and V. Digalakis, "Maximum likelihood stochastic transformation adaptation for medium and small data sets," Comput. Speech Language, vol. 15, no. 3, pp. 257-285, Jul. 2001. doi:10.1006/csla.2001.0168	en

Αναζήτηση

Πλοήγηση

Ο Χώρος μου

Maximum likelihood stochastic transformation adaptation for medium and small data sets

Boulis Constantinos, Diakoloukas Vasilis, Digalakis Vasilis

Υπηρεσίες

Εξαγωγή

Κοινοποίηση

Στατιστικά

Μεταδεδομένων & Περιεχομένου σε METS:

Μεταδεδομένων σε Μορφότυπο: