Ιδρυματικό Αποθετήριο
Πολυτεχνείο Κρήτης
EN  |  EL

Αναζήτηση

Πλοήγηση

Ο Χώρος μου

Maximum likelihood stochastic transformation adaptation for medium and small data sets

Boulis Constantinos, Diakoloukas Vasilis, Digalakis Vasilis

Απλή Εγγραφή


URIhttp://purl.tuc.gr/dl/dias/719ECF19-A80F-4AFB-A104-A786DB940E4B-
Αναγνωριστικόhttp://www.sciencedirect.com/science/article/pii/S0885230801901688-
Αναγνωριστικόhttps://doi.org/10.1006/csla.2001.0168-
Γλώσσαen-
Μέγεθος29 pagesen
ΤίτλοςMaximum likelihood stochastic transformation adaptation for medium and small data setsen
ΔημιουργόςBoulis Constantinosen
ΔημιουργόςDiakoloukas Vasilisen
ΔημιουργόςΔιακολουκας Βασιλeioςel
ΔημιουργόςDigalakis Vasilisen
ΔημιουργόςΔιγαλακης Βασιληςel
ΕκδότηςElsevieren
ΠερίληψηSpeaker adaptation is recognized as an essential part of today’s large-vocabulary automatic speech recognition systems. A family of techniques that has been extensively applied for limited adaptation data is transformation-based adaptation. In transformation-based adaptation we partition our parameter space in a set of classes, estimate a transform (usually linear) for each class and apply the same transform to all the components of the class. It is known, however, that additional gains can be made if we do not constrain the components of each class to use the same transform. In this paper two speaker adaptation algorithms are described. First, instead of estimating one linear transform for each class (as maximum likelihood linear regression (MLLR) does, for example) we estimate multiple linear transforms per class of models and a transform weights vector which is specific to each component (Gaussians in our case). This in effect means that each component receives its own transform without having to estimate each one of them independently. This scheme, termed maximum likelihood stochastic transformation (MLST) achieves a good trade-off between robustness and acoustic resolution. MLST is evaluated on the Wall Street Journal(WSJ) corpus for non-native speakers and it is shown that in the case of 40 adaptation sentences the algorithm outperforms MLLR by more than 13%. In the second half of this paper, we introduce a variant of the MLST designed to operate under sparsity of data. Since the majority of the adaptation parameters are the transformations, we estimate them on the training speakers and adapt to a new speaker by estimating the transform weights only. First we cluster the speakers in a number of sets and estimate the transformations on each cluster. The new speaker will use transformations from all clusters to perform adaptation. This method, termed basis transformation, can be seen as a speaker similarity scheme. Experimental results on the WSJ show that when basis transformation is cascaded with MLLR marginal gains can be obtained from MLLR only, for adaptation of native speakers.en
ΤύποςPeer-Reviewed Journal Publicationen
ΤύποςΔημοσίευση σε Περιοδικό με Κριτέςel
Άδεια Χρήσηςhttp://creativecommons.org/licenses/by/4.0/en
Ημερομηνία2015-11-02-
Ημερομηνία Δημοσίευσης2001-
Θεματική ΚατηγορίαData setsen
Βιβλιογραφική ΑναφοράC. Boulis, V. Diakoloukas and V. Digalakis, "Maximum likelihood stochastic transformation adaptation for medium and small data sets," Comput. Speech Language, vol. 15, no. 3, pp. 257-285, Jul. 2001. doi:10.1006/csla.2001.0168en

Υπηρεσίες

Στατιστικά