SELECTED PUBLICATIONS
Books and Chapters
- K.Markov, T.Matsui, “Speech and Music Emotion Recognition Using Gaussian Processes”, in Modern Methodology and Applications in Spatial-Temporal Modeling, Springer, pp.65-85, 2015. (pdf)
- S.Sakti, K.Markov, S.Nakamura, W.Minker, “Incorporating Knowledge Sources into Statistical Speech Recognition”, Lecture Notes in electrical Engineering 42, Springer, New York, 2009. (order)
Journals
- Z.Feng, K. Markov, J.Saito, T.Matsui, “Neural Cough Counter: A Novel Deep Learning Approach for Cough Detection and Monitoring", IEEE Access, Volume 12, 2024, pp 118816-118829. (pdf)
- J.Villegas, S. Lee, J. Perkins, K. Markov, “Psychoacoustic features explain creakiness classifications made by naive and non-naive listeners", Speech Communication, Volume 147, 2023, pp 74-81. (pdf)
- D.Vazhenina, K.Markov, End-to-End Noisy Speech Recognition Using Fourier and Hilbert Spectrum Features, Electronics 9, 1157, 2020. (pdf)
- J.Villegas, K.Markov, J.Perkins and S.Lee, “Prediction of Creaky Speech by Recurrent Neural Networks Using Psychoacoustic Roughness", IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 2, pp. 355-366, Feb. 2020. (pdf)
- J.Yu, K.Markov, T.Matsui, “Articulatory and Spectrum Information Fusion Based on Deep Recurrent Neural Networks", IEEE/ACM Transactions on Audio, Speech and Language Processing, vol.27, no.4, pp.742-752, 2019. (pdf)
- K.Markov, T.Matsui, “Music Genre and Emotion Recognition Using Gaussian Processes", Access, IEEE, v.2, pp.688-697, 2014. (pdf)
- A.Karpov, K.Markov, I.Kipyatkova, D.Vazhenina, A.Ronzhin, “Large vocabulary Russian speech recognition using syntactico-statistical language modeling”, Speech Communication 56, pp.213-228, 2014. (pdf)
- K.Markov, T.Matsui, “High level feature extraction for the self-taught learning algorithm”, EURASIP Journal on Audio, Speech, and Music Processing, 2013, 2013:6. (html/pdf)
- S.Sakti, K.Markov, S.Nakamura, “Incorporating Knowledge Sources into a Statistical Acoustic Model for Spoken Language Communication Systems”, IEEE Trans. on Computers, vol.56, no.9, pp.1199-1211, Sept. 2007. (pdf)
- S.Nakamura, K.Markov, (and others), “The ATR Multilingual Speech-to-Speech Translation System”, IEEE Trans. on Audio, Speech and Language Processing, vol.14, no.2, pp.365-376, 2006. (pdf)
- K.Markov, J.Dang, S.Nakamura, “Integration of Articulatory and Spectrum Features based on the Hybrid HMM/BN Modeling Framework”, Speech Communication, vol.48, pp.161-175, 2006. (pdf)
- K.Markov, S.Nakamura, “Using Hybrid HMM/BN Acoustic Models: Design and Implementation Issues”, IEICE Trans. on Information and Systems, vol.E89-D, no.3, pp.981-988, 2006. (pdf)
- S.Sakti, K.Markov, S.Nakamura, “A Hybrid HMM/BN Acoustic Model Utilizing Pentaphone Context Dependency”, IEICE Trans. on Information and Systems, vol.E89-D, no.3, pp.954-961, 2006. (pdf)
- S.Sakti, S.Nakamura, K.Markov, “Improving Acoustic Models Precision by Incorporating a Wide Phonetic Context based on a Bayesian Framework”, IEICE Trans. on Information and Systems, vol.E89-D, no.3, pp.946-953, 2006. (pdf)
- S.Matsuda, T.Jitsuhiro, K.Markov, S.Nakamura, “ATR Parallel Decoding Based Speech Recognition System Robust to Noise and Speaking Styles”, IEICE Trans. on Information and Systems, vol.E89-D, no.3, pp.989-997, 2006. (pdf)
- K.Markov, S.Nakamura, “A Hybrid HMM/BN Acoustic Model For Automatic Speech Recognition”, IEICE Trans. on Information and Systems, vol.E86-D, no.3, pp.438-445, 2003. (pdf)
- K.Markov, T.Matsui, R.Gruhn, J.Zhang, S.Nakamura, “Noise and Channel Distortion Robust ASR System For DARPA SPINE2 Task”, IEICE Trans. on Information and Systems, vol.E86-D, no.3, pp.497-503, 2003. (pdf)
- J.Zhang, K.Markov, T.Matsui, S.Nakamura, “A Study On Acoustic Modeling of Pauses For Recognizing Noisy Conversational Speech”, IEICE Trans. on Information and Systems, vol.E86-D, no.3, pp.489-496, 2003. (pdf)
- K.Markov, S.Nakagawa, “Integrating Pitch and LPC-Residual Information With LPC-Cepstrum For Text-Independent Speaker Recognition”, Journal of Acoustical Society of Japan. (E), vol.20, no.4, pp.281-291, 1999. (pdf)
- K.Markov, S.Nakagawa, “Text-Independent Speaker Recognition Using Non-Linear Frame Likelihood Transformation”, Speech Communication, vol.24, pp.193-209, 1998. (pdf)
- K.Markov, S.Nakagawa, “Text-Independent Speaker Identification Utilizing Likelihood Normalization Technique”, IEICE Trans. on Information and Systems, 1997, vol.E80-D, no.5, pp.589-593, 1997. (pdf)
Conferences
- M. Ikeda, K.Markov,
"FastSpeech2 Based Japanese Emotional
Speech Synthesis", In IEEE 12th International Conference on Intelligent
Systems, pp. 1-5. IEEE, 2024.
(pdf)
- V. Do, K.Markov,
"Using Large Language Models for Bug Localization
and Fixing", In 12th International Conference on Awareness Science and
Technology (iCAST), pp. 192-197. IEEE, 2023.
(pdf)
- S.Majima, K.Markov,
"Personality Prediction from Social Media Posts using Text
Embedding and Statistical Features", In 17th Conference on Computer Science and
Intelligence Systems (FedCSIS), pp. 235-240. IEEE, 2022.
(pdf)
- M.Ito, K.Markov,
"Sentence Embedding based Emotion Eecognition from Text
Data", In Proceedings of the Conference on Research in Adaptive and Convergent
Systems, pp. 53-57. 2022.
(pdf)
- K.Yamashita, K.Markov,
"Medical Image Enhancement Using Super Resolution
Methods", In: Krzhizhanovskaya V. et al. (eds) Computational Science – ICCS 2020,
Lecture Notes in Computer Science, vol 12141, pp.496--508, Springer, 2020.
(pdf)
- J.Yu, K.Markov, A.Karpov,
"Speaking Style Based Apparent Personality
Recognition", International Conference on Speech and Computer, pp.540-548,
Aug 2019.(pdf)
- J.Yu, K.Markov,
"Deep Learning based Personality Recognition from Facebook
Status Updates", IEEE Int. Conference on Awareness Science and Technology, pp.383-387,
Nov 2017.(pdf)
- K.Markov, T.Matsui,
"Robust Speech Recognition using Generalized
Distillation Framework", In Proc. Interspeech,
pp.2364-2368, Sep 2016.(pdf)
- J.Yu, K.Markov, T.Matsui,
"Articulatory and Spectrum Features Integration
using Generalized Distillation Framework", IEEE Int. Workshop on Machine
Learning for Signal Processing, Sep. 2016.(pdf)
- K.Markov, T.Matsui, F.Septier, G.Peters,
"Dynamic Speech Emotion Recognition with State-Space
Models", In Proc. European Signal Processing Conference (EUSIPCO 2015),
pp.2122-2126, Sep 2015.(pdf)
- M.Soleymani, A.Aljanaki, Y.Yang, M.Caro, F.Eyben, K.Markov,
B.Schuller, R.Veltkamp, F.Weninger, F.Wiering,
"Emotional Analysis of Music: A Comparison of Methods", In Proc. ACM International Conference on Multimedia (MM 2014),
pp.1161-1164, November 2014.(pdf)
- D.Vazhenina, K.Markov, "Sequence Memoizer based
Language Model for Russian Speech Recognition", In Proc. International
Workshop on Spoken Language Technologies for Under-resourced Languages, SLTU-2014,
pp.183-187, May 2014.(pdf)
- D.Vazhenina, K.Markov, "Factored
Language Modeling for Russian LVCSR", In Proc. International
Joint Conference on Awareness Science and Technology & Ubi-Media Computing,
Nov. 2013.(pdf)
- K.Markov, T.Iwata, T.Matsui, "Music
Emotion Recognition using Gaussian Processes", In Proc.
MediaEval'2013 Benchmark Workshop, Oct. 2013.
(pdf)
- K.Markov, T.Matsui, "Music Genre
Classification using Gaussian Process Models", In Proc. IEEE Int.
Workshop on Machine Learning for Signal Processing, Sep. 2013.
(pdf)
- D.Vazhenina, K.Markov, "Evaluation
of Advanced Language Modeling Techniques for Russian LVCSR",
M.Zelezny et al. (Eds.): SPECOM2013, LNAI 8113, pp.124-131, 2013.
(pdf)
- K.Markov, T.Matsui, "Non-negative
Matrix Factorization Based Self-Taught Learning With Application
To Music Genre Classification", In Proc. IEEE Int.
Workshop on Machine Learning for Signal Processing, Sep. 2012.
(pdf)
- K.Markov, "Towards Continuous
Online Learning based Cognitive Speech Processing", In
Proc. IEEE Int. Workshop on Statistical Machine Learning for
Speech Processing, Mar. 2012. (pdf)
- K.Markov, T.Matsui, "Music
Genre Classification using Self-Taught Learning via Sparse
Coding", In Proc. IEEE ICASSP, pp.1929-1932, Mar. 2012. (pdf)
- D.Vazhenina, K.Markov, "Phoneme
set selection for Russian speech recognition", in Proc.
IEEE 7th International Conference on Natural Language Processing
and Knowledge Engineering, pp.475-478, Nov. 2011, (pdf), Best paper award.
- A.Karpov, A.Ronzhin, K.Markov, M.Zelezni, “Viseme-Dependent Weight Optimization for CHMM-based Audio-Visual Speech Recognition”, in Proc. Interspeech, pp.2678-2681, Sep. 2010. (pdf)
- D.Vazhenina, K.Markov, “Recent Developments in the Russian Speech Recognition Technology”, in Proc. IEEE/ASIC International Conference on Computer and Information Science, pp 535-537, 2010.
- K.Markov, “Advanced Approaches to Speaker Diarization of Audio Documents”, in Proc. 2nd International Conference on Ubiquitous Media Computing, pp.39-45, Dec. 2009. (pdf)
- K.Markov, “Structured Models Design for Improved Speech Recognition”, in Proc. of 12th. International Conference on Humans and Computers, pp.45-50, Dec. 2009. (pdf)
- K.Markov, S.Nakamura, “Improved Novelty Detection for On-line GMM based Speaker Diarization”, in Proc. Interspeech, pp.363-366, Sept. 2008. (pdf)
- K.Markov, S.Nakamura, “Language Identification with Dynamic Hidden Markov Network”, in Proc. IEEE ICASSP, pp.4233-4236, 2008. (pdf)
- K.Markov, S.Nakamura, “Never-Ending Learning System for On-line Speaker Diarization”, in Proc. IEEE ASRU Workshop, pp.699-704, 2007. (pdf)
- K.Markov, S.Nakamura, “Never-Ending Learning with Dynamic Hidden Markov Network”, in Proc. Interspeech, pp.1437-1440, 2007. (pdf)
- S.Sakti, K.Markov, S.Nakamura, “An HMM Acoustic Model Incorporating Various Additional Knowledge Sources”, in Proc. Interspeech, pp.2117-2120, 2007. (pdf)
- S.Sakti, K.Markov, S.Nakamura, “A Method to Integrate Additional Knowledge Sources into HMM based on Junction Tree Decomposition”, in Proc. EUSIPCO, pp.2404-2408, 2007. (pdf)
- K.Markov, S.Nakamura, “Forward-Backwards Training of Hybrid HMM/BN Acoustic Models”, in Proc. ICSLP, pp.621-624, 2006. (pdf)
- S.Sakti, K.Markov, S.Nakamura, “The use of Bayesian Network for Incorporating Accent, Gender and Wide-Context Dependency Information”, in Proc. ICSLP, pp.1563-1566, 2006. (pdf)
- S.Sakti, K.Markov, S.Nakamura, “Incorporation of Pentaphone Context Dependency based on Hybrid HMM/BN Acoustic Modeling Framework”, in Proc. IEEE ICASSP, pp.1177-1180, 2006. (pdf)
- K.Markov, S.Nakamura, “Modeling Successive Frame Dependencies With Hybrid HMM/BN Acoustic Model”, in Proc. IEEE ICASSP, pp.701-704, 2005. (pdf)
- S.Sakti, S.Nakamura, K.Markov, “Incorporating a Bayesian wide phonetic context model for acoustic re-scoring”, in Proc. Eurospeech, pp.1629-1632, 2005. (pdf)
- R.Gruhn, K.Markov, S.Nakamura, “A Statistical Lexicon For Non-Native Speech Recognition”, in Proc. ICSLP, pp.851-854, 2004. (pdf)
- K.Markov, S.Nakamura, J.Dang, “Integration of Articulatory Dynamic Parameters in HMM/BN Based Speech Recognition System”, in Proc. ICSLP, pp.774-777, 2004. (pdf)
- S.Matsuda, T.Jitsuhiro, K.Markov, S.Nakamura, “Speech Recognition System Robust to Noise and Speaking Styles”, in Proc. ICSLP, pp.844-847, 2004. (pdf)
- S.Nakamura, K.Markov, “A Hybrid HMM/Bayesian Network Approach to Robust Speech Recognition”, in Proc. Special Workshop in Maui (SWIM): Lectures by Masters in Speech Processing, 2004.
- K.Markov, J.Dang, Y.Iizuka, S.Nakamura, “Hybrid HMM/BN ASR System Integrating Spectrum and Articulatory Features”, in Proc. Eurospeech, pp.965-968, 2003. (pdf)
- K.Markov, S.Nakamura, “Hybrid HMM/BN LVCSR System Integrating Multiple Acoustic Features”, in Proc. IEEE ICASSP, pp.888-891, 2003. (pdf)
- J.Dang, Y.Iizuka, K.Markov, S.Nakamura, “Improvement of Speech Recognition Method using Speech Production Mechanism”, in Proc. 15th. International Congress of Phonetic Sciences, pp.731-734, 2003. (pdf)
- K.Markov, S.Nakamura, “Modeling HMM State Distributions With Bayesian Networks”, in Proc. ICSLP, pp.1013-1016, 2002. (pdf)
- K.Markov, S.Nakagawa, S.Nakamura, “Discriminative Training of HMM Using Maximum Normalized Likelihood Algorithm”, in Proc. IEEE ICASSP, pp.497-500, 2001. (pdf)
- K.Markov, S.Nakamura, “Frame Level Likelihood Transformations For ASR and Utterance Verification”, in Proc. ICSLP, pp.1038-1041, 2000. (pdf)
- K.Markov, S.Nakagawa, “Text-Independent Speaker Recognition Using Multiple Information Sources”, in Proc. ICSLP, pp.173-176, 1998. (pdf)
- K.Markov, S.Nakagawa, “Discriminative Training Of GMM Using a Modified EM Algorithm For Speaker Recognition”, in Proc. ICSLP, pp.177-180, 1998. (pdf)
- K.Markov, S.Nakagawa, “Speaker Verification Using Frame and Utterance Level Likelihood Normalization”, in Proc. IEEE ICASSP, pp.1087-1090, 1997. (pdf)
- K.Markov, S.Nakagawa, “Frame Level Likelihood Normalization For Text-Independent Speaker Identification Using Gaussian Mixture Models”, in Proc. ICSLP, pp.1764-1767, 1996. (pdf)
Granted Patents
- K.Markov, S.Nakamura, 5065693, “System for simultaneous learning and recognition of spatio-temporal patterns”, 2012.
- S.Sakti, K.Markov, S.Nakamura, 4861912,
"Knowledge source integrating probabilistic computation method and
program", 2011.
- K.Markov, S.Nakamura, 3936266, “Speech recognition apparatus and program”, 2007.
- R.Yamada, K.Markov, others, 3520022, “Foreign language learning apparatus, foreign language learning method and media”, 2004.