Эта статья является препринтом и не была отрецензирована.
О результатах, изложенных в препринтах, не следует сообщать в СМИ как о проверенной информации.
Легковесные алгоритмы коррекции раскладки ENG→RU в условиях высокой нагрузки
1. Lee J. S., Choi K. S. English to Korean statistical transliteration for information retrieval //Computer Processing of Oriental Languages. – 1998. – Т. 12. – №. 1. – С. 17-37.
2. Pogrebnoi D., Funkner A., Kovalchuk S. RuMedSpellchecker: Correcting Spelling Errors for Natural Russian Language in Electronic Health Records Using Machine Learning Techniques //International Conference on Computational Science. – Cham : Springer Nature Switzerland, 2023. – С. 213-227.
3. Raj A. A. A. Multi-lingual Screen Reader and Processing of Font-data in Indian languages : дис. – MS Thesis at International Institute of Information Technology Hyderabad, India, 2008.
4. Prabhakar D. K., Pal S. Machine transliteration and transliterated text retrieval: a survey //Sādhanā. – 2018. – Т. 43. – №. 6. – С. 93.
5. Balabaeva K., Funkner A., Kovalchuk S. Automated spelling correction for clinical text mining in Russian //Digital Personalized Health and Medicine. – IOS press, 2020. – С. 43-47.
6. Chari A., Ounis I., MacAvaney S. Lost in Transliteration: Bridging the Script Gap in Neural IR //Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. – 2025. – С. 2900-2905.
7. Bruch S. et al. Special Section on Efficiency in Neural Information Retrieval //ACM Transactions on Information Systems. – 2024. – Т. 42. – №. 5. – С. 1-4.
8. Rozovskaya A. Spelling correction for Russian: A comparative study of datasets and methods //Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021). – 2021. – С. 1206-1216.
9. Mikolov T. et al. Efficient estimation of word representations in vector space //arXiv preprint arXiv:1301.3781. – 2013.
10. Sachdeva N., McAuley J. How useful are reviews for recommendation? a critical review and potential improvements //proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. – 2020. – С. 1845-1848.
11. Toutanova K., Moore R. C. Pronunciation modeling for improved spelling correction //Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. – 2002. – С. 144-151.
12. Belinkov Y., Bisk Y. Synthetic and natural noise both break neural machine translation //arXiv preprint arXiv:1711.02173. – 2017.
13. Müller L. et al. Dictionary Attack with Transformed Russian Words using QWERTY Keyboard Layout. – 2024.
14. Xue L. et al. mT5: A massively multilingual pre-trained text-to-text transformer //arXiv preprint arXiv:2010.11934. – 2020.
15. Kudo T., Richardson J. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing //arXiv preprint arXiv:1808.06226. – 2018.
16. Lewis M. et al. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension //arXiv preprint arXiv:1910.13461. – 2019.
17. Joulin A. et al. Bag of tricks for efficient text classification //arXiv preprint arXiv:1607.01759. – 2016.