Mining of formal software specifications for further use in automatic commit message generation

Ivan Kosyanenko, Roman Bolbakov


In today's team-based software development, good commit messages - comments on changes made in natural language - are essential. The metric for evaluating a commit message is its relevance. A good commit message should not only describe the changes made, but also their context.

With the growing popularity of version control systems, the number of studies aimed at creating tools for automatic generation of commit messages has increased. The most promising for this task are methods based on deep learning and neural machine translation. Such methods have program code or something to which it can be reduced as input. The only possible way to automatically generate a message for a commit is to "look" at the history of changes (diff). All modern version control systems provide the user with such a history, and special algorithms have been designed to compare changes between two versions of program code. 

But there are a great many programming paradigms, even more there are programming languages with their own features and syntax. To make a tool for generating commit messages universal, it is necessary to be able to reduce the program code in any programming language to a single "notation".

One possible example of such notation is formal specifications. A specification shows what a program should do. There are many ways to write formal specifications, from algebraic expressions and specification languages to deterministic finite state machines. Since a commit message should give an indication of what and how the new software version has changed, the idea of using program specifications in commit message generation tools seems very rich.

Currently, most programs do not have a formal specification. It is believed that the time spent on elaborating specifications is not worth the result. One of the effective approaches to solve this problem is specification mining.

Full Text:

PDF (Russian)


Chen F., Kim M., Cho J. Novel Natural Language Summary of Program Code via Leveraging Multiple Input Representations //Findings of the Association for Computational Linguistics: EMNLP 2021. – 2021. – pp. 2510-2520

Sutzkever I., Vinyals O., Le Q. V. Sequence to sequence learning with neural networks //Advances in neural information processing systems. – 2014. – Vol. 27

Choi Y. S. et al. Learning Sequential and Structural Information for Source Code Summarization //Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. – 2021. – pp. 2842-2851.

Burdonov I. B., Demakov A.V., Kosachev A. S., Maksimov A.V., Petrenko A. K. Formal specifications in reverse engineering and program verification technologies // Proceedings of the ISP RAS. 2000. no. URL: (accessed: 25.09.2022) (in russian).

Lamsweerde A. Formal specification: a roadmap //Proceedings of the Conference on the Future of Software Engineering. - 2000. – pp. 147-159.

Bowen J. P. Formal specification and documentation using Z: A case study approach. – London : International Thomson Computer Press, 1996. – Vol. 66.

Guttag J. V., Horning J. J., Wing J. M. The Larch Family of Specification Languages //IEEE Softw. – 1985. – Vol. 2. – No. 5. – pp. 24-36.

Monastyrnaya V. S., Frolov V. V. Visual language Dragon and its application //Actual problems of aviation and cosmonautics. – 2016. – Vol. 2. – no. 12. – pp. 78-79 (in russian).

Sennrich R., Haddow B., Birch A. Improving neural machine translation models with monolingual data //arXiv preprint arXiv:1511.06709. – 2015.

Le T. D. B., Lo D. Deep specification mining //Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. – 2018. – pp. 106-117.

Makarov T. et al. Recurrent neural network based language model //Interspeech. – 2010. – Vol. 2. – No. 3. – pp. 1045-1048.

Tishchenko V. A. Path-compressed prefix tree as the basis of a classifier by lexicographic feature //Materials of the XXXIII International Scientific and Practical Conference "Advances in Science and Technology". – 2020. – p. 129. (in russian)

Chen T. Y., Cuo F. C., Marcel R. On the statistical properties of the f-measure //Fourth International Conference on Quality Software, 2004. QSIC 2004. Proceedings. – IEEE, 2004. – pp. 146-153.

Robinson B. et al. Scaling up automated test generation: Automatically generating maintainable regression unit tests for programs //2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011). – IEEE, 2011. – pp. 23-32.

Hochreiter S., Schmidhuber J. Long short-term memory //Neural computing. – 1997. – Vol. 9. – No. 8. – pp. 1735-1780.

McQueen J. Some methods for classification and analysis of multivariate observations //Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1967. – 1967. – pp. 281-297.

Rokach L., Maison O. Clusteringmethods //Data mining and knowledge discovery handbook. – Springer, Boston, MA, 2005. – pp. 321-352.

Khalid S., Nadeem A. Automatic generation of finite state machine from object-oriented format specifications //2010 6th International Conference on Emerging Technologies (ICET). – IEEE, 2010. – pp. 304-309.

Li W. Specification mining: New formalisms, algorithms and applications. – University of California, Berkeley, 2013. — pp.25-42.

Bayer A., Lucker M., Schallhart C. Comparing LTL semantics for runtime verification //Journal of Logic and Computation. – 2010. – Vol. 20. – No. 3. – Pp. 651-674.

Label M., Su Z. Symbolic meaning of temporal specifications //Proceedings of the 30th international conference on Software engineering. - 2008. – pp. 51-60.

Cherukuri H., Ferrari A., Spoletini P. Towards Explainable Formal Methods: From LTL to Natural Language with Neutral Machine Translation //International Working Conference on Requirements Engineering: Foundation for Software Quality. – Springer, Cham, 2022. – pp. 79-86.

Kosyanenko I. A., Bolbakov R. G. On automatic generation of commit messages in version control systems //International Journal of Open Information Technologies. – 2022. – Vol. 10. – No. 4. – pp. 55-60. (in russian)


  • There are currently no refbacks.

Abava  Кибербезопасность FRUCT 2023

ISSN: 2307-8162