About automatic generation of commit messages in version control systems

I.A. Kosyanenko, R.G. Bolbakov


In the modern world, no software development project is complete without the use of version control systems to track changes in data. The version control system is especially important in teamwork, when work on different parts of the same project is delegated to several developers. It is necessary to describe the changes made with their context in a relevant way, while fitting them into the restrictions set by the version control system itself (for example, git allows you to put only 72 characters in a commit message). Nevertheless, the process of describing or commenting on changes is still a routine, non-automated task, which developers often do not pay due attention to. The use of modern methods of natural language processing (or other models and methods that allow generating descriptions of changes in natural language based on the source code) can help developers save time on writing up-to-date messages to commits and maintain the correctness of the repository of source code versions.

Full Text:

PDF (Russian)


Otte S. Version Control Systems //Computer systems and telematics. - 2009. - pp. 11-13.

Ruparelia N. B. The history of version control //ACM SIGSOFT Software Engineering Notes. - 2010. - Vol. 35. - No. 1. - pp. 5-9;

History of version control systems // Habr URL: https://habr.com/ru/post/478752 / (accessed: 07.12.2021).

Rochkind M. J. The source code control system //IEEE transactions

on Software Engineering. - 1975. - No. 4. - pp. 364-370

Chapter 5 SCCS Source Code Control System // Oracle docs URL: https://docs.oracle.com/cd/E19504-01/802-5880/6i9k05dhp/index.html (accessed: 07.12.2021).

Buse R. P. L., Weimer W. R. Automatically documenting program changes //Proceedings of the IEEE/ACM international conference on Automated software engineering. - 2010. - pp. 33-42.

2021 Developer Survey // Stack Overflow URL: https://insights.stackoverflow.com/survey/2021#overview (acessed: 13.12.2021).

Tsitoara M. Git Best Practices //Beginning Git and GitHub. – Apress, Berkeley, CA, 2020. – pp. 79-86.

Dyer R. et al. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories //2013 35th International Conference on Software Engineering (ICSE). – IEEE, 2013. – pp. 422-431.

Cortés-Coy L. F. et al. On automatically generating commit messages via summarization of source code changes //2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation. – IEEE, 2014. – pp. 275-284.

Jiang S., McMillan C. Towards automatic generation of short summaries of commits //2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC). – IEEE, 2017. – pp. 320-323.

Mockus A., Votta L. G. Identifying Reasons for Software Changes using Historic Databases //icsm. – 2000. – pp. 120-130.

Abram H. et al. On the naturalness of software //Proceedings of the 34th International Conference on Software Engineering. – 2012. – pp. 837-847.

Jiang S., Armaly A., McMillan C. Automatically generating commit messages from diffs using neural machine translation //2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). – IEEE, 2017. – pp. 135-146.

Alexandru C. V., Panichella S., Gall H. C. Replicating parser behavior using neural machine translation //2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC). – IEEE, 2017. – pp. 316-319.

Sutskever I., Vinyals O., Le Q. V. Sequence to sequence learning with neural networks //Advances in neural information processing systems. – 2014. – pp. 3104-3112.

Cho K. et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation //arXiv preprint arXiv:1406.1078. – 2014.

Hinton G. E., Salakhutdinov R. R. Replicated softmax: an undirected topic model //Advances in neural information processing systems. – 2009. – Vol. 22. – pp. 1607-1614.

Papineni K. et al. Bleu: a method for automatic evaluation of machine translation //Proceedings of the 40th annual meeting of the Association for Computational Linguistics. – 2002. – pp. 311-318.

Feng Z. et al. Codebert: A pre-trained model for programming and natural languages //arXiv preprint arXiv:2002.08155. – 2020.

Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing // Google AI Blog URL: https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html (accessed: 15.12.2021).

Jung T. H. CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model //arXiv preprint arXiv:2105.14242. – 2021.

Feng Z. et al. Codebert: A pre-trained model for programming and natural languages //arXiv preprint arXiv:2002.08155. – 2020.


  • There are currently no refbacks.

Abava  Кибербезопасность FRUCT 2023

ISSN: 2307-8162