SELECTED PUBLICATIONS

Books and Chapters

  • K.Markov, T.Matsui, “Speech and Music Emotion Recognition Using Gaussian Processes”, in Modern Methodology and Applications in Spatial-Temporal Modeling, Springer, pp.65-85, 2015. (pdf)
  • S.Sakti, K.Markov, S.Nakamura, W.Minker, “Incorporating Knowledge Sources into Statistical Speech Recognition”, Lecture Notes in electrical Engineering 42, Springer, New York, 2009. (order)

Journals

  • Z.Feng, K. Markov, J.Saito, T.Matsui, “Neural Cough Counter: A Novel Deep Learning Approach for Cough Detection and Monitoring", IEEE Access, Volume 12, 2024, pp 118816-118829. (pdf)
  • J.Villegas, S. Lee, J. Perkins, K. Markov, “Psychoacoustic features explain creakiness classifications made by naive and non-naive listeners", Speech Communication, Volume 147, 2023, pp 74-81. (pdf)
  • D.Vazhenina, K.Markov, End-to-End Noisy Speech Recognition Using Fourier and Hilbert Spectrum Features, Electronics 9, 1157, 2020. (pdf)
  • J.Villegas, K.Markov, J.Perkins and S.Lee, “Prediction of Creaky Speech by Recurrent Neural Networks Using Psychoacoustic Roughness", IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 2, pp. 355-366, Feb. 2020. (pdf)
  • J.Yu, K.Markov, T.Matsui, “Articulatory and Spectrum Information Fusion Based on Deep Recurrent Neural Networks", IEEE/ACM Transactions on Audio, Speech and Language Processing, vol.27, no.4, pp.742-752, 2019. (pdf)
  • K.Markov, T.Matsui, “Music Genre and Emotion Recognition Using Gaussian Processes", Access, IEEE, v.2, pp.688-697, 2014. (pdf)
  • A.Karpov, K.Markov, I.Kipyatkova, D.Vazhenina, A.Ronzhin, “Large vocabulary Russian speech recognition using syntactico-statistical language modeling”, Speech Communication 56, pp.213-228, 2014. (pdf)
  • K.Markov, T.Matsui, “High level feature extraction for the self-taught learning algorithm”, EURASIP Journal on Audio, Speech, and Music Processing, 2013, 2013:6. (html/pdf)
  • S.Sakti, K.Markov, S.Nakamura, “Incorporating Knowledge Sources into a Statistical Acoustic Model for Spoken Language Communication Systems”, IEEE Trans. on Computers, vol.56, no.9, pp.1199-1211, Sept. 2007. (pdf)
  • S.Nakamura, K.Markov, (and others), “The ATR Multilingual Speech-to-Speech Translation System”, IEEE Trans. on Audio, Speech and Language Processing, vol.14, no.2, pp.365-376, 2006. (pdf)
  • K.Markov, J.Dang, S.Nakamura, “Integration of Articulatory and Spectrum Features based on the Hybrid HMM/BN Modeling Framework”, Speech Communication, vol.48, pp.161-175, 2006. (pdf)
  • K.Markov, S.Nakamura, “Using Hybrid HMM/BN Acoustic Models: Design and Implementation Issues”, IEICE Trans. on Information and Systems, vol.E89-D, no.3, pp.981-988, 2006. (pdf)
  • S.Sakti, K.Markov, S.Nakamura, “A Hybrid HMM/BN Acoustic Model Utilizing Pentaphone Context Dependency”, IEICE Trans. on Information and Systems, vol.E89-D, no.3, pp.954-961, 2006. (pdf)
  • S.Sakti, S.Nakamura, K.Markov, “Improving Acoustic Models Precision by Incorporating a Wide Phonetic Context based on a Bayesian Framework”, IEICE Trans. on Information and Systems, vol.E89-D, no.3, pp.946-953, 2006. (pdf)
  • S.Matsuda, T.Jitsuhiro, K.Markov, S.Nakamura, “ATR Parallel Decoding Based Speech Recognition System Robust to Noise and Speaking Styles”, IEICE Trans. on Information and Systems, vol.E89-D, no.3, pp.989-997, 2006. (pdf)
  • K.Markov, S.Nakamura, “A Hybrid HMM/BN Acoustic Model For Automatic Speech Recognition”, IEICE Trans. on Information and Systems, vol.E86-D, no.3, pp.438-445, 2003. (pdf)
  • K.Markov, T.Matsui, R.Gruhn, J.Zhang, S.Nakamura, “Noise and Channel Distortion Robust ASR System For DARPA SPINE2 Task”, IEICE Trans. on Information and Systems, vol.E86-D, no.3, pp.497-503, 2003. (pdf)
  • J.Zhang, K.Markov, T.Matsui, S.Nakamura, “A Study On Acoustic Modeling of Pauses For Recognizing Noisy Conversational Speech”, IEICE Trans. on Information and Systems, vol.E86-D, no.3, pp.489-496, 2003. (pdf)
  • K.Markov, S.Nakagawa, “Integrating Pitch and LPC-Residual Information With LPC-Cepstrum For Text-Independent Speaker Recognition”, Journal of Acoustical Society of Japan. (E), vol.20, no.4, pp.281-291, 1999. (pdf)
  • K.Markov, S.Nakagawa, “Text-Independent Speaker Recognition Using Non-Linear Frame Likelihood Transformation”, Speech Communication, vol.24, pp.193-209, 1998. (pdf)
  • K.Markov, S.Nakagawa, “Text-Independent Speaker Identification Utilizing Likelihood Normalization Technique”, IEICE Trans. on Information and Systems, 1997, vol.E80-D, no.5, pp.589-593, 1997. (pdf)

Conferences

  • M. Ikeda, K.Markov, "FastSpeech2 Based Japanese Emotional Speech Synthesis", In IEEE 12th International Conference on Intelligent Systems, pp. 1-5. IEEE, 2024. (pdf)
  • V. Do, K.Markov, "Using Large Language Models for Bug Localization and Fixing", In 12th International Conference on Awareness Science and Technology (iCAST), pp. 192-197. IEEE, 2023. (pdf)
  • S.Majima, K.Markov, "Personality Prediction from Social Media Posts using Text Embedding and Statistical Features", In 17th Conference on Computer Science and Intelligence Systems (FedCSIS), pp. 235-240. IEEE, 2022. (pdf)
  • M.Ito, K.Markov, "Sentence Embedding based Emotion Eecognition from Text Data", In Proceedings of the Conference on Research in Adaptive and Convergent Systems, pp. 53-57. 2022. (pdf)
  • K.Yamashita, K.Markov, "Medical Image Enhancement Using Super Resolution Methods", In: Krzhizhanovskaya V. et al. (eds) Computational Science – ICCS 2020, Lecture Notes in Computer Science, vol 12141, pp.496--508, Springer, 2020. (pdf)
  • J.Yu, K.Markov, A.Karpov, "Speaking Style Based Apparent Personality Recognition", International Conference on Speech and Computer, pp.540-548, Aug 2019.(pdf)
  • J.Yu, K.Markov, "Deep Learning based Personality Recognition from Facebook Status Updates", IEEE Int. Conference on Awareness Science and Technology, pp.383-387, Nov 2017.(pdf)
  • K.Markov, T.Matsui, "Robust Speech Recognition using Generalized Distillation Framework", In Proc. Interspeech, pp.2364-2368, Sep 2016.(pdf)
  • J.Yu, K.Markov, T.Matsui, "Articulatory and Spectrum Features Integration using Generalized Distillation Framework", IEEE Int. Workshop on Machine Learning for Signal Processing, Sep. 2016.(pdf)
  • K.Markov, T.Matsui, F.Septier, G.Peters, "Dynamic Speech Emotion Recognition with State-Space Models", In Proc. European Signal Processing Conference (EUSIPCO 2015), pp.2122-2126, Sep 2015.(pdf)
  • M.Soleymani, A.Aljanaki, Y.Yang, M.Caro, F.Eyben, K.Markov, B.Schuller, R.Veltkamp, F.Weninger, F.Wiering, "Emotional Analysis of Music: A Comparison of Methods", In Proc. ACM International Conference on Multimedia (MM 2014), pp.1161-1164, November 2014.(pdf)
  • D.Vazhenina, K.Markov, "Sequence Memoizer based Language Model for Russian Speech Recognition", In Proc. International Workshop on Spoken Language Technologies for Under-resourced Languages, SLTU-2014, pp.183-187, May 2014.(pdf)
  • D.Vazhenina, K.Markov, "Factored Language Modeling for Russian LVCSR", In Proc. International Joint Conference on Awareness Science and Technology & Ubi-Media Computing, Nov. 2013.(pdf)
  • K.Markov, T.Iwata, T.Matsui, "Music Emotion Recognition using Gaussian Processes", In Proc. MediaEval'2013 Benchmark Workshop, Oct. 2013. (pdf)
  • K.Markov, T.Matsui, "Music Genre Classification using Gaussian Process Models", In Proc. IEEE Int. Workshop on Machine Learning for Signal Processing, Sep. 2013. (pdf)
  • D.Vazhenina, K.Markov, "Evaluation of Advanced Language Modeling Techniques for Russian LVCSR", M.Zelezny et al. (Eds.): SPECOM2013, LNAI 8113, pp.124-131, 2013. (pdf)
  • K.Markov, T.Matsui, "Non-negative Matrix Factorization Based Self-Taught Learning With Application To Music Genre Classification", In Proc.  IEEE Int. Workshop on Machine Learning for Signal Processing, Sep. 2012. (pdf)
  • K.Markov, "Towards Continuous Online Learning based Cognitive Speech Processing", In Proc. IEEE Int. Workshop on Statistical Machine Learning for Speech Processing, Mar. 2012. (pdf)
  • K.Markov, T.Matsui, "Music Genre Classification using Self-Taught Learning via Sparse Coding", In Proc. IEEE ICASSP, pp.1929-1932, Mar. 2012. (pdf)
  • D.Vazhenina, K.Markov, "Phoneme set selection for Russian speech recognition", in Proc. IEEE 7th International Conference on Natural Language Processing and Knowledge Engineering, pp.475-478, Nov. 2011, (pdf), Best paper award.
  • A.Karpov, A.Ronzhin, K.Markov, M.Zelezni, “Viseme-Dependent Weight Optimization for CHMM-based Audio-Visual Speech Recognition”, in Proc. Interspeech, pp.2678-2681, Sep. 2010. (pdf)
  • D.Vazhenina, K.Markov, “Recent Developments in the Russian Speech Recognition Technology”, in Proc. IEEE/ASIC International Conference on Computer and Information Science, pp 535-537, 2010.
  • K.Markov, “Advanced Approaches to Speaker Diarization of Audio Documents”, in Proc. 2nd International Conference on Ubiquitous Media Computing, pp.39-45, Dec. 2009. (pdf)
  • K.Markov, “Structured Models Design for Improved Speech Recognition”, in Proc. of 12th. International Conference on Humans and Computers, pp.45-50, Dec. 2009. (pdf)
  • K.Markov, S.Nakamura, “Improved Novelty Detection for On-line GMM based Speaker Diarization”, in Proc. Interspeech, pp.363-366, Sept. 2008. (pdf)
  • K.Markov, S.Nakamura, “Language Identification with Dynamic Hidden Markov Network”, in Proc. IEEE ICASSP, pp.4233-4236, 2008. (pdf)
  • K.Markov, S.Nakamura, “Never-Ending Learning System for On-line Speaker Diarization”, in Proc. IEEE ASRU Workshop, pp.699-704, 2007. (pdf)
  • K.Markov, S.Nakamura, “Never-Ending Learning with Dynamic Hidden Markov Network”, in Proc. Interspeech, pp.1437-1440, 2007. (pdf)
  • S.Sakti, K.Markov, S.Nakamura, “An HMM Acoustic Model Incorporating Various Additional Knowledge Sources”, in Proc. Interspeech, pp.2117-2120, 2007. (pdf)
  • S.Sakti, K.Markov, S.Nakamura, “A Method to Integrate Additional Knowledge Sources into HMM based on Junction Tree Decomposition”, in Proc. EUSIPCO, pp.2404-2408, 2007. (pdf)
  • K.Markov, S.Nakamura, “Forward-Backwards Training of Hybrid HMM/BN Acoustic Models”, in Proc. ICSLP, pp.621-624, 2006. (pdf)
  • S.Sakti, K.Markov, S.Nakamura, “The use of Bayesian Network for Incorporating Accent, Gender and Wide-Context Dependency Information”, in Proc. ICSLP, pp.1563-1566, 2006. (pdf)
  • S.Sakti, K.Markov, S.Nakamura, “Incorporation of Pentaphone Context Dependency based on Hybrid HMM/BN Acoustic Modeling Framework”, in Proc. IEEE ICASSP, pp.1177-1180, 2006. (pdf)
  • K.Markov, S.Nakamura, “Modeling Successive Frame Dependencies With Hybrid HMM/BN Acoustic Model”, in Proc. IEEE ICASSP, pp.701-704, 2005. (pdf)
  • S.Sakti, S.Nakamura, K.Markov, “Incorporating a Bayesian wide phonetic context model for acoustic re-scoring”, in Proc. Eurospeech, pp.1629-1632, 2005. (pdf)
  • R.Gruhn, K.Markov, S.Nakamura, “A Statistical Lexicon For Non-Native Speech Recognition”, in Proc. ICSLP, pp.851-854, 2004. (pdf)
  • K.Markov, S.Nakamura, J.Dang, “Integration of Articulatory Dynamic Parameters in HMM/BN Based Speech Recognition System”, in Proc. ICSLP, pp.774-777, 2004. (pdf)
  • S.Matsuda, T.Jitsuhiro, K.Markov, S.Nakamura, “Speech Recognition System Robust to Noise and Speaking Styles”, in Proc. ICSLP, pp.844-847, 2004. (pdf)
  • S.Nakamura, K.Markov, “A Hybrid HMM/Bayesian Network Approach to Robust Speech Recognition”, in Proc. Special Workshop in Maui (SWIM): Lectures by Masters in Speech Processing, 2004.
  • K.Markov, J.Dang, Y.Iizuka, S.Nakamura, “Hybrid HMM/BN ASR System Integrating Spectrum and Articulatory Features”, in Proc. Eurospeech, pp.965-968, 2003. (pdf)
  • K.Markov, S.Nakamura, “Hybrid HMM/BN LVCSR System Integrating Multiple Acoustic Features”, in Proc. IEEE ICASSP, pp.888-891, 2003. (pdf)
  • J.Dang, Y.Iizuka, K.Markov, S.Nakamura, “Improvement of Speech Recognition Method using Speech Production Mechanism”, in Proc. 15th. International Congress of Phonetic Sciences, pp.731-734, 2003. (pdf)
  • K.Markov, S.Nakamura, “Modeling HMM State Distributions With Bayesian Networks”, in Proc. ICSLP, pp.1013-1016, 2002. (pdf)
  • K.Markov, S.Nakagawa, S.Nakamura, “Discriminative Training of HMM Using Maximum Normalized Likelihood Algorithm”, in Proc. IEEE ICASSP, pp.497-500, 2001. (pdf)
  • K.Markov, S.Nakamura, “Frame Level Likelihood Transformations For ASR and Utterance Verification”, in Proc. ICSLP, pp.1038-1041, 2000. (pdf)
  • K.Markov, S.Nakagawa, “Text-Independent Speaker Recognition Using Multiple Information Sources”, in Proc. ICSLP, pp.173-176, 1998. (pdf)
  • K.Markov, S.Nakagawa, “Discriminative Training Of GMM Using a Modified EM Algorithm For Speaker Recognition”, in Proc. ICSLP, pp.177-180, 1998. (pdf)
  • K.Markov, S.Nakagawa, “Speaker Verification Using Frame and Utterance Level Likelihood Normalization”, in Proc. IEEE ICASSP, pp.1087-1090, 1997. (pdf)
  • K.Markov, S.Nakagawa, “Frame Level Likelihood Normalization For Text-Independent Speaker Identification Using Gaussian Mixture Models”, in Proc. ICSLP, pp.1764-1767, 1996. (pdf)

Granted Patents

  • K.Markov, S.Nakamura, 5065693, “System for simultaneous learning and recognition of spatio-temporal patterns”, 2012.
  • S.Sakti, K.Markov, S.Nakamura, 4861912, "Knowledge source integrating probabilistic computation method and program", 2011.
  • K.Markov, S.Nakamura, 3936266, “Speech recognition apparatus and program”, 2007.
  • R.Yamada, K.Markov, others, 3520022, “Foreign language learning apparatus, foreign language learning method and media”, 2004.