Linguist and Engineer/Physicist
Employment history
- Since 2025: Research fellow at INALCO, ER-TIM, Equipe de Recherche Texte, Informatique, Multilinguisme
- 2025: Research fellow at CNRS, LPP, Laboratoire de Phonétique et Phonologie
- 2023 - 2024: Research fellow at CNRS, LLF, Laboratoire de Linguistique Formelle and Lacito, Langues et Civilisations à Tradition Orale.
- 2018 - 2022: Doctoral student at Lacito (Villejuif) and Gipsa-lab (Grenoble Image Parole Signal Automatique).
- 2020 - 2021: Mandarin teacher, private classes, 2h/week.
- 2018 - 2019: Temporary lecturer for the class "Introduction to linguistics and language families", Université Grenoble Alpes (24 hours).
- 2017 - 2018: Research Intern at Gipsa-lab in the frame of a Linguistics MA. Oral-nasal signals analysis for the acoustic study of the oral and nasal cues of nasalised sounds in French and Taiwan Mandarin.
- 2014 - 2017: Confirmed Engineer at Areva NP. Neutronic design of nuclear fuel assemblies, Lyon, France.
- 2011 - 2014: Confirmed Engineer at Wecan, an Areva-CGN (China General Nuclear) joint-venture specializing in nuclear design and safety, Shenzhen, China.
- 2007 - 2011: Engineer at Areva NP, in nuclear safety analyses, Paris, France.
University Education
- 2018 - 2022: PhD in Phonetics, Phonology and Speech Sciences, Université Paris III Sorbonne Nouvelle.
- 2017 - 2018: Master's degree in Linguistics, with a specialization in the field of experimental phonetics and phonology, Université Grenoble Alpes, obtained with honors.
- 2014 - 2015: Level 3 University degree (DU niveau 3) in Chinese language and culture, Université Lyon 3, obtained with honors.
- 2001 - 2007: Master of Engineering at École Nationale Supérieure de Physique de Grenoble (PHELMA, Institut National Polytechnique de Grenoble).
Student Supervision Experience
2023: Co-supervision of MA student Siman Chen (INALCO). Leveraging Speech Models for Audio-based Lexical Retrieval in Dictionaries. The case of the Teochew language. (6 months).
2023: Supervision of BA student Berthilde Biard (Université Sorbonne Nouvelle). Data extraction and analysis for the construction of verbal and nominal paradigms in Naish languages (3 months).
Languages spoken
- English: C1 (TOEFL test: 627/677)
- Mandarin: B2/C1 (HSK level 5: 214/300, 2017; HSK level 3: 299/300, 2015)
- Spanish: Intermediate
- Na (Narua): Working knowledge
- Russian: Notions
Computer skills
- General IT: GNU/Linux systems administration (Ubuntu, Mint), Knowledge in GPU installations, Pytorch, LaTeX, Libre Office, MS Office
- Programming: Linux/Unix: Bash, csh, Ksh ; General: Python, C, XML, Jupyter, version control systems (git), package management (pip, conda), dockerSpeech processing: Praat, NLP: Transformers-based neural networks (Wav2Vec2.0, XLSR), Statistics: R, seaborn
Foreign collaborations
- 2025: Research trip to Uppsala University in the frame of the automating language documentation workshop. This workshop gathered around twenty scholars from the field of computational linguistics to discuss the future of documentary linguistics at the LLM era.
- 2019: Research trip to Kunming Yunnan Minzu University, organized at the invitation of M. He Likun and M. Liu Jinrong (School of Ethnic Cultures). During this visit I attended seminars that covered a range of descriptive works on Yunnan languages. Additionally, I gave a seminar on the phonological system of Shekua Na.
Technical achievements
- 2021: Creation of a fully customizable keyboard for Linux users interested in writing with the International Phonetic Alphabet. See link
- 2020: Realization of a solution to allow the conversion of Praat textgrids to XML format to accelerate Pangloss deposits. See GitHub
- 2018: Design of the acquisition module for a separate recording of oral and nasal tract output, by modifying the Glottal Enterprise nasalance plate.
Grants and projects
- 2019: International mobility grant (3,750€), obtained from the UGA IDEX International Mobility Commission.
Publications
Documentation work
- Since the beginning of my work as a field linguist, I have focused on the Shekua variety of the Na language, also known as Lataddi Narua. Shekua is a small village situated in proximity to the Grass Sea of lake Lugu. The speakers in this village use a variety that is closely related to Yongning Narua, whose tone system has been described in Michaud (2017). The detail of my fieldwork experience is outlined below:
- 2023: Interviews of Shekua Na speakers : narratives, phonological confirmation paradigms, dialogues (Yunnan, 2 months)
- 2019: Interviews of Shekua Na speakers : phonetics and phonology, narratives (Sichuan and Yunnan, 3 months)
- 2018: Transcription, translation and archiving or unpublished recordings by ā huì (MA, Yunnan University).
Ongoing research
- Comparative work on Na dialects: My research focuses on a synchronic comparison between Lataddi Na (Fily, 2022; Dobbs and Lǎ, 2016) Yongning Na (Michaud, 2017). These linguistic varieties, characterised synchronically by great segmental similarities but significant differences in their tonal systems, form a solid basis for tone comparison in these languages. My approach aims to identify precise tonal correspondences between a system in which tonal realisations are distinguished into two levels (H, L, in Lataddi) and a system in which tonal realisations are based on three levels (H, M, L in Yongning), in order to highlight any possible rearrangements. As the na of Lataddi and the na of Yongning are likely closely related, this comparative approach will enable us to better characterise and quantify the differences between these languages.
- NLP for the less documented languages:In this field, data is not everything, particularly given the urgency of the situation: every 15 days, somewhere on Earth, the last speaker of an endangered language passes away (Evans, 2009), taking with them a wealth of invaluable knowledge. The erosion of linguistic diversity is a view widely shared by the scientific community, which is harnessing machine learning technologies to accelerate the documentation of endangered languages. This work has gained significant momentum since certain barriers have been overcome in the field of machine learning, The most important of which is the incorporation of linguistic context through attention mechanisms (Vaswani et al., 2017). These advances are raising hopes within the scientific community, which now has tools that genuinely saves us from time-consuming, low added-value tasks (generating a first draft of transcriptions from an audio recording, diarisation, etc.). Today, tasks such as automatic speech recognition are handled by language models with error rates that specialists consider to be still high, but for field linguists, having access to technologies that can reduce by 80 percent the number of corrections needed in a transcribed text represents a huge economy, given the scale of the task facing field linguists. To that end, I joined several projects which worked toward the goal of enabling NLP technologies for the less documented languages, which enabled me to better understand how large language models encode speech properties, and as a result how we should be using them to obtain results worth exploiting in the field of endangered languages documentation.