The application of the branch and bound method in the problem of reconstructing the matrix of distances between DNA strings

Boris Melnikov, Marina Trenina

Abstract


In this paper, we continue to consider one of the tasks of biocybernetics, i.e. the problem of reconstructing the distance matrix between DNA sequences. In this problem, not all the elements of the matrix under consideration are known at the input of the algorithm (usually 50% or less of the elements). The basis for the development of the algorithm for reconstructing such a matrix is the method of comparative evaluation of algorithms for calculating distances between DNA sequences developed and investigated by us, based on a special analysis of the distance matrix. In this analysis, we applied the badness of each of the triangles of the matrix determined by us before. Continuing to improve the algorithms for solving this problem, we consider the use of the branches and bounds method in it. To do this, for some known sequence of unfilled elements, we apply the algorithms we considered in previous publications, but now we choose the sequences ourselves using developed by us version of method of branches and bounds. In our interpretation of this method, all possible sequences of unknown elements of the upper triangular part of the matrix are taken as the set of admissible solutions. In each current subtask, any of the blank elements of the matrix is taken as the separating element, and the sum of the badness values for all triangles that have already been formed by the time this subtask is considered is taken as the boundary. Thus, the definition of elements of an incompletely filled matrix occurs in such a sequence that the final badness indicator for all triangles is selected using greedy heuristics that fits completely into the framework of the classical variants of the description of the branches and bounds method. As a result of applying such an algorithm, we get results that, from the point of view of the value of badness, are the lowest possible (in the case of a completed version of the branch and boundary method), or close to optimal ones (in the case of its uncompleted version). At the same time, in our computational experiments, the running time of the algorithm practically coincides with the time of the algorithm considered by us in the previous publication (it exceeds it by no more than 10%), and the badness value usually decreases by 20-40% from the initial value. Thus, we are able to quickly and efficiently restore the DNA matrix, often even if it is filled less than 40%.


Full Text:

PDF (Russian)

References


Mel'nikov B. F., Trenina M. A. Ob odnoj zadache vosstanovlenija matric rasstojanij mezhdu cepochkami DNK. International Journal of Open Information Technologies.ISSN: 2307-8162. Vol. 6. No. 6. 2018. C. 1–13.

Mel'nikov B. F., Pivneva S. V, Trifonov M. A. Mul'tijevristicheskij podhod k sravneniju kachestva opredeljaemyh metrik na mnozhestve posledovatel'nostej DNK. Sovremennye informacionnye tehnologii i IT obrazovanie. T. 13. # 2. 2017. S. 89–96.

Mel'nikov B. F., Trenina M. A., Kochergin A. S. Podhod k uluchsheniju algoritmov raschjota rasstojanij mezhdu cepochkami DNK (na primere algoritma Nidlmana – Vunsha). Izvestija vysshih uchebnyh zavedenij. Povolzhskij region. Fiziko-matematicheskie nauki. # 1 (45). 2018. (https://izvuz_fmn.pnzgu.ru/fmn4118)

Melnikov B. F., Pivneva S. V., Trifonov M. A. Comparative analysis of algorithms calculating distances of DNA sequences and some related problems. Sbornik trudov III mezhdunarodnoj konferencii i molodezhnoj shkoly «Informacionnye tehnologii i nanotehnologii (ITNT-2017)», Samarskij nacional'nyj issledovatel'skij universitet imeni akademika S.P. Koroleva. 2017. S. 1640–1645.

Mel'nikov B. F., Radionov A. N. O vybore strategii v nedetermirovannyh antagonisticheskih igrah. Programmirovanie. # 5.

S. 55–62.

Mel'nikov B. F. Jevristiki v programmirovanii nedeterminirovannyh igr. Izvestija RAN. Programmirovanie. # 5. 2001. S. 63–80.

Gudman S, Hidetniemi S. Vvedenie v razrabotku i analiz algoritmov. M.: Mir. 1981. 364 s.

Hromkovic J. Algorithmics for Hard Problems. Introduction to Combinatorial Optimization, Randomization, Approximation, and Heuristics. Springer. 2003. 538 p.

Melnikov B. Multiheuristic approach to discrete optimization problems.

Cybernetics and Systems Analysis. 2006. Vol. 42. No. 3. P. 335–341.

Home – Nucleotide – NCBI [Jelektron. resurs]. – Rezhim dostupa:

https://www.ncbi.nlm.nih.gov/nuccore, svobodnyj

Ajala F, Kajger Dzh. Sovremennaja genetika. Per. s angl. T. 1. M.: Mir. 1987. 295 s.

Melnikov B., Radionov A., Moseev A., Melnikova E. Some specific heuristics for situation clustering problems. ICSOFT, Technologies, Proceedings 1st International Conference on Software and Data Technologies. 2006. P. 272–279.

Mel'nikov B. F., Mel'nikova E. A. Klasterizacija situacij v algoritmah real'nogo vremeni dlja zadach diskretnoj optimizacii. Sistemy upravlenija i informacionnye tehnologii. T. 28. # 2. 2007. S. 16–20.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность MoNeTec 2024

ISSN: 2307-8162