skip to main content
10.1007/978-3-030-87802-3_14guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

What Causes Phonetic Reduction in Russian Speech: New Evidence from Machine Learning Algorithms

Published: 27 September 2021 Publication History

Abstract

In this paper, we describe the second stage of the study aimed at describing the factors that influence the phonetic reduction of words in Russian speech using machine learning algorithms. We discuss the limitations of the first stage of our study and try to overcome some of them by increasing the dataset and using new algorithms such as random forest, gradient boosting, and perceptron. We used the texts from the Corpus of Russian Speech as the data. The dataset was divided into two separate datasets: one consisted of single words and the other contained multiword units from our corpus. According to the results, for single words the most important features turned out to be the number of syllables and whether the word is an adjective as they were chosen by all algorithms. For the multiword units, the main features were the number of syllables, frequency in Russian spoken texts (in ipm), and token frequency in a given text. In our further research, we are going to expand the dataset and look closer on such features as text type and token frequency in a given text.

References

[1]
Jurafski D, Bell A, Gregory M, and Raymond WD Bybee J and Hopper P Probabilistic relations between words: evidence from reduction in lexical production Frequency and the Emergence of Linguistic Structure 2001 Philadelphia John Benjamins 229-254
[2]
Kipyatkova I Karpov A, Jokisch O, and Potapova R Improving Russian LVCSR using deep neural networks for acoustic and language modeling Speech and Computer 2018 Cham Springer 291-300
[3]
Ernestus M and Tucker BV Why we need to investigate casual speech to truly understand language production, processing and mental lexicon Ment. Lex. 2016 11 3 375-400
[4]
Dayter M and Riekhakaynen E Karpov A and Potapova R Automatic prediction of word form reduction in Russian spontaneous speech Speech and Computer 2020 Cham Springer 119-127
[5]
Ernestus M Voice Assimilation and Segment Reduction in Casual Dutch. A Corpus-Based Study of the Phonology-Phonetics Interface. 2000 Utrecht Landelijke Onderzoekschool Taalwetenschap
[6]
Spilková H Phonetic Reduction in Spontaneous Speech: An Investigation of Native and Non-Native Production 2014 Trondheim Norwegian University of Science and Technology
[7]
Stoyka, D.A.: Reduced Forms of Russian Speech: Linguistic and Extralinguistic Aspects. PhD thesis, Saint Petersburg (2016). (in Russian)
[8]
Lobanov, B.M., Tsyrulnik, L.I.: Modeling of intra-word and inter-word phonetic-acoustic phenomena in the synthesizer of Russian speech by text. In: Ideas and Methods of Experimental Study of Speech: Collection of Articles. Art. in Memory of prof. L.A. Chistovich and prof. V. A. Kozhevnikov, pp. 47–63. St. Petersburg (2008). (in Russian)
[9]
Riekhakaynen E Realization of intervocalic consonant clusters in frequency words of the Russian language Vestnik Sankt-Peterburgskogo Universiteta, Yazyk i Literatura 2020 17 4 672-690 (In Russian)
[10]
Schachtenhaufen, R.: Phonetic reductions and linguistic factors. In: New Perspectives on Speech in Action. Proceedings of the 2nd SJUSK Conference on Contemporary Speech Habits, pp. 167–179. Samfundslitteratur, Frederiksberg (2013)
[11]
Pharao N Consonant Reduction in Copenhagen Danish: A Study of Linguistic and Extra-linguistic Factors in Phonetic Variation and Change 2010 København Det Humanistiske Fakultet, Københavns Universitet
[12]
Riekhakaynen, E.: Corpora of Russian spontaneous speech as a tool for modelling natural speech production and recognition. In: 10th Annual Computing and Communication Workshop and Conference, CCWC 2020, January 2020, pp. 406–411. IEEE, Las Vegas (2020).
[13]
Ventsov AV and Grudeva EV A Frequency Dictionary of Russian 2008 Cherepovets CHSU Publishing House (In Russian)
[14]
Breiman L Random forests Mach. Learn. 2001 45 5-32
[15]
Geurts P, Ernst D, and Wehenkel L Extremely randomized trees Mach. Learn. 2006 63 3-42
[16]
Hastie T, Tibshirani R, and Friedman J The Elements of Statistical Learning: Data Mining, Inference, and Prediction 2012 New York Springer
[17]
Aggarwal CC Machine Learning for Text 2018 Cham Springer
[18]
Manning CD, Raghavan P, and Schütze H An Introduction to Information Retrieval 2009 Cambridge Cambridge University Press
[19]
Alpaydin E Introduction to Machine Learning 2014 Cambridge MIT Press
[20]
Riekhakaynen, E.: Reduction in spontaneous speech: How to survive. In: Heegart, J., Henrichsen, P.J. (eds.) Copenhagen Studies in Language. 43: New Perspectives on Speech in Action: Proceedings of the 2nd SJUSK Conference on Contemporary Speech Habits, pp. 153–167. Samfundslitteratur, Frederiksberg (2013)
[21]
Knyazev SA and Pozharitskaya SK Modern Russian Language: Phonetics, Correct Pronunciation, Writing System, Spelling 2011 Moscow Academic Project, Gaudeamus (In Russian)
[22]
Riekhakaynen EI Recognition of Russian Speech: Context + Frequency 2016 St. Petersburg St. Petersburg State University (In Russian)
[23]
Apushkina, I.E.: Stressed and unstressed words in a spontaneous spoken text. In: Cherepovets Scientific Readings–2009: Proceedings of the All-Russian Conference Dedicated to the Day of the City of Cherepovets (November 2–3, 2009). Part 1. Literature Studies and Linguistics at the Beginning of the 21st Century, pp. 57–60. GOU VPO ChGU, Cherepovets (2010). (in Russian)
[24]
Guyon I, Weston J, Barnhill S, and Vapnik V Gene selection for cancer classification using support vector machines Mach. Learn. 2002 46 389-422
[25]
Freund Y and Schapire RE Large margin classification using the perceptron algorithm Mach. Learn. 1999 37 277-296
[26]
Zumel N and Mount J Practical Data Science with R 2020 New York Manning Publications
[27]
Sholle F Deep Learning in Python 2018 St. Petersburg Piter (In Russian)
[28]
Pavlova AV and Svetozarova ND Phrasal Stress in Phonetic, Functional and Semantic Aspects 2017 Moscow Flinta (In Russian)

Index Terms

  1. What Causes Phonetic Reduction in Russian Speech: New Evidence from Machine Learning Algorithms
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          Speech and Computer: 23rd International Conference, SPECOM 2021, St. Petersburg, Russia, September 27–30, 2021, Proceedings
          Sep 2021
          855 pages
          ISBN:978-3-030-87801-6
          DOI:10.1007/978-3-030-87802-3
          • Editors:
          • Alexey Karpov,
          • Rodmonga Potapova

          Publisher

          Springer-Verlag

          Berlin, Heidelberg

          Publication History

          Published: 27 September 2021

          Author Tags

          1. Phonetic reduction
          2. Speech
          3. Machine learning
          4. Russian

          Qualifiers

          • Article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 26 Oct 2024

          Other Metrics

          Citations

          View Options

          View options

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media