
•The and higher accuracy of calculations, apparently, is not
interesting here. Note that these violations of the triangle
inequality exist for standard algorithms of calculating
distances between genomes, then such violations are not
our problems.
•The remaining columns (badness) have the same meaning
as before. The variant of badness (4) is described in more
detail in [1] 8.
Based on the calculation results, we can see some advantage
of Needleman – Wunsch algorithm over other algorithms.
Now, we are ready to formulate the main motivation to
perform all our work related to DNA analysis algorithms.
Thus, the most important matter in this case is the following
one:
can we talk about the effectiveness of such algo-
rithms and the adequacy of these models based on
the analysis matrices of the distance between the
genomes only, without the involvement of biologists?
The authors of this paper believe that this question should be
answered in the affirmative: yes, we can!
In this section, we describe the object of the research of this
paper.
Firstly, let us list the species of monkeys we are considering,
see Table III. It is important to remark, that all the species
belonging to different genera: apparently, this fact leads to
a more or less successful distribution of the elements of the
distance matrix.
After that, we present the distances calculated for the
mt DNA of these species in the form of two tables; everything
is considered for two different distance calculation algorithms.
Namely, for our article we have reviewed the algorithms of
Jaro – Winkler and Needleman – Wunsch 9.
Table IV is the calculated distance matrix for the Jaro –
Winkler’s algorithm. The species numbers correspond to those
shown in Table III. The peculiarity of this algorithm is that it
gives very close answers for these types; therefore, the 3-digit
numbers shown in the table correspond to 3decimal places
after 0.0, for instance, 541 means 0.0541.
Table V is the calculated distance matrix for the Needle-
man – Wunsch’s algorithm. The species numbers also cor-
respond to those shown in Table III. This algorithm gives
not very close answers for these types; therefore, the 3-digit
numbers shown in the table correspond to 3decimal places
after 0. (not 0.0), for instance, 375 means 0.375. It is important
to note that such an 10 times increase in values does not
8Note that below, we shall equally designate the numbers related to the
previously discussed methods of calculating the pair correlation, as well as the
numbers for the badness: for instance (2) is the second method for correlation
and also the second badness. However, there will be no misunderstandings
(ambiguities), it will always be clear from the context what exactly is meant.
9The authors express their gratitude to the post-graduate students Li
Jiamian and Mu Jingyuan (Shenzhen MSU – BIT University, China), who have
calculated the tables given below.
Note in advance that the tables can be copied from the pdf-file and easily
processed using any computer programs.
change any of the values of the badness of the triangles we
are considering: indeed, considering the first triangle of the
Table IV, the sides 0.0541,0.0677, and 0.0635, we can say
that its badness is exactly equal to the badness of the triangle
with the sides 0.541,0.677, and 0.635.
(In general, as follows from the previous material, we can
work with the Table IV and V, as well as with any other tables
built on the same principle, simply as with tables of integers:
the values of badness that we are interested in will be the
same.)
The values of the average badness (notation δ) are shown at
the bottom of both tables. It is important that these values are
very small (in both cases, we also indicated triangles with sides
differing by 1, the badness of which is approximately equal to
the average badness of 4960 triangles of the corresponding
table). From our point of view, the resulting “averaged”
triangles (with the sides 10.5,9.5, and 8.5 for the first example
and with the sides 11,10, and 9for the second example) are
visually almost indistinguishable from equilateral triangles 10.
This section could be considered as the main one. We
consider the approach to calculation of the pair correlation
proposed by us.
First, it is necessary to say how exactly the sequences
of triangles are obtained, the sequences of the badness of
which are the subject of analysis using various pair correlation
algorithms. The answer to this is very simple: for fixed vertices
having numbers 1and 2, we consider as the third all other
possible options in ascending order, then fix vertices 1and 3
(instead of 1and 2) and do the same, etc.
Thus, we obtain two different sequences of badness for
the same sequence of triangle numbers. For these sequences,
we calculate the pair correlation in all the methods described
above (recall that they were designated from (0) to (3)), and,
in addition, we also use method (4), which we shall briefly
describe further. We also remind you that in this method,
we tried to take into account both the relative values of the
elements in pairs (like methods (1), (2) and (3)) and their exact
values (like method (0), i.e. in the case of the usual calculation
of the correlation coefficient).
Thus, like methods (2) and (3), we consider the set of pairs
of pairs: the first pair is Xiand Xj(for random variable X
implementations), and the second one is Yiand Yj(for Y).
Similarly like methods (2) and (3), each value can be in the
range from −1to 1(with the usual meaning of these values),
and the final correlation value is obtained by averaging all
obtained values (in our case, 4960 values).
For these pairs, we obtain the value shown on Fig. 5. In it,
values Xiand Xjare on the left side, and values Yiand Yj
are on the right side.
10 In some our previous works, another variant of badness was also
considered, i.e. σ, not δ. The strict definition of σis of little interest for
this work, but when considering the previously cited articles, this should be
taken into account.
4. The Object of the Research 5. The Proposed Approach to
Calculation of the Pair Correlation
MOLECULAR SCIENCES AND APPLICATIONS
DOI: 10.37394/232023.2024.4.6
Boris Melnikov, Elena Melnikova