Statistical Modeling of Chlorinated Chemical Compounds Bioactivity
VLADIMIR MUKHOMOROV
Physical Department
State Polytechnic University
St. Petersburg
RUSSIA
Abstract: - A general regression equation based on physical concepts of the behavior of a polar molecule in a
condensed medium is derived. The regression equation makes it possible, from a unified standpoint, to
statistically significantly explain the toxicity of both chlorine-substituted benzenes and saturated and
unsaturated chlorinated hydrocarbons. Statistically significant explanatory molecular features that determine
the bioactivity of drugs have been identified.
Key-Words: - Regression, significance, information, quality criteria, collinearity, toxicity, intermolecular,
electronic, pseudopotential, chlorinated chemical compounds
Received: May 26, 2021. Revised: April 15, 2022. Accepted: May 12, 2022. Published: June 6, 2022.
1 Introduction
Much attention has recently been paid to finding
various quantitative relationships that link variations
in the molecular structure of chemical compounds to
their biological activity. For these purposes, either
abstract statistical models are used (leaving the
mechanism of biological activity undisclosed) or
assumed physico-chemical ideas about the possible
behaviour of chemical compounds in the biosystem.
This article analyzes the toxic effects of
chlorobenzene derivatives, as well as saturated and
unsaturated chlorine-containing compounds. Using
the methods of the theory of intermolecular
interactions, the corresponding regression
relationships will be derived here for the purposes of
statistical analysis of the relationship between the
structure of a molecule and its biological activity.
2 Problem Formulation
In accordance with modern concepts, the biological
activity of chemical compounds is determined by
their physicochemical properties at the macroscopic
level (solubility, distribution, permeability), as well
as at the microscopic level (electronic characteristics
of molecules). In this regard, it can be assumed that
the biological effect is equally determined by two
circumstances: the transport of the molecule to the
site of action and the physicochemical interaction of
the molecule with the receptor. Attempts to obtain
appropriate regression equations that take into
account different factors have been repeatedly
discussed in the literature [1,2]. The authors of the
papers [3,4] point out the difficulties in the
physicochemical interpretation of the observations
used in these studies.
3 Problem Solution
3.1 Chlorinated benzene derivatives
The Hansch model [5,6] has been the most widely
used in recent years. This model relates the
bioactivity of chemical compounds to their
lipophilic characteristics. In many practical cases,
this model has proved useful. Therefore, let us
check whether the bioactivity (average lethal doses
of benzene chlorine derivatives for white rats upon
oral administration [7]) is really related to the
partition coefficient P of the substance in the
octanol–water system. We use the well-known
Hansch equation
A = B0 + B1lgP + B2(lgP)2, (1)
here A = 1000/LD50 is bioactivity, B0, B1, and B2 are
some unknown parameters that are defined by
minimizing the squared deviation of function values
(1) from known experimental values. Toxicity and
lgP values for a number of substituted benzenes are
given in Table 1.
Using equation (1), the coefficient of determination
R2 = 0.237 was determined. This coefficient
characterizes the magnitude of the statistical
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.13
Vladimir Mukhomorov
E-ISSN: 2224-2678
115
Volume 21, 2022
relationship between activity A and distribution
parameters for immiscible solvents. That is, the
“explaining” ability of the model is only 23.7%. The
statistical significance of the multiple correlation
coefficient can be tested by using the following
inequality [8]: t = |R|(Nm – 1)0.5/(1 – R2)0.5 = 1.67
< t0.05cr(f = 9) = 2.26; where m is the number of
explanatory variables, N is the sample size.
Consequently, it must be recognised that there is no
reliable relationship between regression (1) and the
observed toxicity at the 95% confidence level.
Comparison of the multiple correlation coefficient R
= 0.49 with the critical value R0.05cr(N m 1 = 9;
m = 2) = 0.697 [9] also indicates that the correlation
coefficient R is insignificant at the significance level
α = 0.05. Moreover, as the analysis showed, the use
of various modifications of equation (1) (see, for
example, [10]), including the use of the Hammett
constants ϭ, also does not allow one to establish a
relationship between changes in the structure of the
molecule and the variation of the biological
response. Apparently, the relationship between the
molecular structure and the bioactivity of a chemical
compound must be sought based on other properties
of this series of compounds. In molecular
pharmacology, it is known that the biological action
of a chemical compound depends on its ability to
accumulate in certain areas of the body through
interaction with sensitive local biostructures of the
body.
Analysis of the relationship between various toxicity
indicators of chemical compounds and their
physicochemical properties showed the following
[11]. The greatest number of correlations is found
with the properties of low-molecular weight
compounds, which are determined at the electronic
level and are related to the energy of intermolecular
interaction of molecules [12,13].The presence of
long-range components in the energy of the
intermolecular interaction must lead to a
concentration gradient of chemical compounds. This
contributes to the emergence of a diffusion flow of
low molecular weight chemical compounds directed
towards the active center.
The process of binding exogenous molecules in the
body can approximately determine the additive
components of pairwise intermolecular interactions.
Difficulties in consistently taking into account the
contributions to the intermolecular interaction are
due to the variety of types of pair interactions of
molecules, as well as the possibility of correctly
taking into account the influence of the condensed
phase on these interactions. Intermolecular
interactions are usually divided into two groups
according to their radius of action and relative
strength: specific and non-specific (universal). The
first group includes anisotropic pairwise quasi-
chemical bonds (donor-acceptor complexes,
hydrogen bonds) that arise when the electron shells
of interacting molecules overlap markedly.
Nonspecific interactions include various
electrostatic interactions, as well as short-range
dispersion forces. These interactions are determined
not only by the individual properties of individual
chemical compounds, but also by the properties of
the biosubstrate, that is, the condensed phase in
which exogenous chemical compounds are
distributed. The proposed mathematical model
should highlight the additive components of the
interaction, that is, orientation interactions,
polarization and short-range contributions (on a
molecular scale). These contributions are related to
the dipole moment, electronic polarizability,
ionization potential, and also to the position of one-
electron energies MO on the energy scale of an
isolated molecule. This approach makes it possible
to establish the existence of possible causal
relationships between molecular features and
bioresponse.
The analyzed series of chlorobenzene derivatives
(Table 1) is interesting from the point of view that in
this case it is possible to move away from the
problems associated with the conformational
transitions of molecules. In addition, these
molecules have similar sizes and are not expected to
be involved in metabolic transformations.
It is known [16,17] that chlorinated compounds
have good acceptance properties. This allows them
to participate in the formation of donor-acceptor
molecular complexes due to electron transfer. The
change in total energy (∆E) when a bond is formed
between atom s of the donor molecule and atom t of
the acceptor molecule can be written as follows
ΔE = - qsqt/(κsRst) +
mΣn(CsmCtnΔβst)2/(εmεn). (2)
Here the summation occurs over the m occupied
molecular orbitals (MO) of the donor and over the n
vacant MO of the acceptor; εm and εn are the
energies of single-particle MO of the donor and
acceptor, respectively; βst is the change in the
resonance integral of the interacting atoms s and t at
the distance Rst between the atoms; Csm and Ctn are
the expansion coefficients of MO in atomic orbitals;
κs is the static permittivity of the condensed
medium.
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.13
Vladimir Mukhomorov
E-ISSN: 2224-2678
116
Volume 21, 2022
Table 1
Molecular parameters of chlorobenzene derivatives and their mean lethal doses (LD50) for white rats
after oral administration of drugs.
Chemical compounds
lgP
I1*), eV
μ1, D, [14]
α1**),1024cm3
Z,
arb.
units
H, bits
Aexp, 1000/LD50
[14]
Amod, (15)
1
Chlorobenzene
2.84
9.15
1.69
13.2
3.00
1.33
0.303
0.248
2
p-Dichlorobenzene
3.39
9.11
0
15.0
3.50
1.46
0.398
0.476
3
o-Dichlorobenzene
3.39
9.23
2.51
15.9
3.50
1.46
0.468
0.788
4
1,2,4,5-Tetrachlorobenzene
4.89
9.31
0
19.7
4.50
1.46
0.667
1.036
5
2,4,6-Trichlorophenol
3.06
9.02
1.62
19.0
4.00
1.78
1.299
0.900
6
1,2,4-Trichlorobenzene
4.13
9.26
1.25
18.3
4.00
1.50
1.323
0.847
7
3,4-Dichloraniline
2.69
8.24
4.16
17.8
3.43
1.73
1.429
1.405
8
p-Nitrochlorobenzene
2.39
10.21
2.52
16.5
3.71
1.99
1.802
2.291
9
m-Nitrochlorobenzene
2.46
10.00
3.38
17.9
3.71
1.99
2.326
2.572
10
o-Nitrochlorobenzene
2.53
9.95
4.25
16.7
3.71
1.99
2.949
2.900
11
2,4-Dinitrochlorobenzene
2.45
10.86
3.29
19.1
4.25
2.11
3.571
3.030
12
2,3,5,6-Tetrachloronitrobenzene
4.55
9.86
5.34
24.1
5.00
1.99
4.000
4.041
*) Dipole moments of molecules and their ionization potentials were calculated using the MINDO/3 quantum mechanical
method. **) The polarizabilities of the molecules were determined using the Lefevre additive scheme [15].
The surrounding dielectric medium is considered as
structureless and continuous, which is characterized
by the dielectric constant κs. The first term in
equation (2) determines the electrostatic interaction
between atoms, which have electronic charges qs
and qt. Electrostatic forces favor the interaction of
donor and acceptor atoms, but usually are not
decisive for the stabilization of the complex. In
highly polar solvents, electrostatic interactions are
significantly weakened. The second term defines
quasi-chemical binding, i.e. it characterizes the
partial electron transfer from the donor MO to the
acceptor, thereby stabilizing the molecular complex.
The donor-acceptor mechanism arises when the free
orbital of the acceptor overlaps with the filled
orbital of the donor or donor group of atoms. The
donor is assumed to have a lone pair of electrons.
For example, a nitrogen atom has a lone pair of
electrons in the state 2s2. When two molecules
approach each other, the lone pair of electrons is
shared between the two molecules.
This generalization of electrons is accompanied by
the formation of bonds between molecules. The
interaction between the donor and acceptor leads to
a decrease in the energy of the ground state of the
entire system below the initial levels of the donor
and acceptor. The measure of the acceptor activity
of a chemical compound having a closed electron
shell is the position of the lowest free MO (εnb0).
Moreover, the acceptor properties are stronger, the
lower the level εnb0. Index zero indicates that one-
electron energy corresponds to an isolated molecule
in vacuum.
The quantum-chemical method was used to
determine the numerical values of one-electron MO
energies CNDO/S’ [18]. The experimental values of
the lengths of interatomic bonds, bond angles of
molecules, for all chemical compounds analyzed
here, were taken from the reference book [19].
In a condensed polar medium, under the influence
of the electrostatic field of the dipole molecules of
the environment, the electronic levels are shifted
relative to their position in an isolated molecule.
The macroscopic electric field Eeff acting on a
molecule in a condensed medium differs from the
average macroscopic field. This is due to the effect
of polarization of the dielectric in an external field,
as well as due to the action of the reactive field of
the polar molecules of the dielectric medium. The
induced (reactive) electric field acts on the field
source, changing the electronic distribution of the
molecule (i.e., self-action of the polar molecule
occurs). The presence of a polarizable medium
between interacting molecules can significantly
change their total potential energy.
As is known [20], the reactive field of L.Onsager
[21] acting on a molecule is proportional to its
dipole moment. To determine the effect of a reactive
field on the electronic states in a molecule, we use
the concepts developed in molecular spectroscopy
of the condensed state [22]. Intermolecular
interactions can significantly affect the optical
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.13
Vladimir Mukhomorov
E-ISSN: 2224-2678
117
Volume 21, 2022
spectra of molecules, shifting the maxima of the
absorption and emission bands, changing the
intensity of the bands, and new spectral frequencies
may appear.
Within the framework of the continuum theory of
the reaction field, the one-electron energy εnb, under
the condition of thermodynamic equilibrium of the
molecule with the environment, will be determined
as follows:
εnb = εnb0 fRμ12/(2a3) + α1fR2μ12/(2a6)
3I1I3α1α3(n32–1)/[2(I1 + I3)a6(n32+2)]. (3)
Here, the reaction field and dispersion interactions
are taken into account. fR = 2(κs–1)/(2κs + 1) is the
electric field factor of the reaction of a point dipole
for a polar medium with a static permittivity κs; a,
μ1, α1 and I1 are the effective size of a molecule of a
low molecular weight exogenous chemical
compound commensurate with the average radius of
the molecule, its dipole moment, as well as the
average static electronic polarizability and the first
ionization potential of the molecule, respectively.
In a condensed medium, each molecule is under the
influence of a combination of surrounding
molecules. This is partly taken into account by the
factor fR, which depends on the macroscopic
properties of the condensed medium. Let us agree
that index 1 refers to the molecule of an exogenous
chemical compound, index 2 refers to the
biosubstrate molecule with which the molecule of
the exogenous substance interacts. Index 3 refers to
a polar dielectric medium, the optical refractive
index of which is equal to n3. The second and third
terms in equation (3) describe the interaction of the
dipole moment of the molecule with the field of the
electrostatic reaction of the polar dielectric medium
[22] and, thus, the effect of polarization of the
molecule by this field is taken into account. The
third term in (3) characterizes the effect of short-
range dispersion interactions (in the London
approximation) on the MO levels of an exogenous
chemical compound in a condensed medium.
On a macroscopic scale, the role of the reaction field
manifests itself in the fact that the liquid is
compressed in the region of the surrounding
molecule, thereby increasing the potential energy of
the polar liquid medium. This is one of the reasons
for the increase in the boiling point (Tb) and
decrease in the melting point (Tm) of polar liquids.
Therefore, it is natural that the correlation
coefficients between the bioresponse and each of the
parameters Tb, Tm and μ1 are close in magnitude to
each other [11]. However, in this case, the
regression equations are not informative enough,
since it is not clear which approximately additive
contributions of intermolecular interactions should
be taken into account in each particular case.
One can make some assumptions about the
molecular properties of the local region with which
the exogenous molecule is associated, if the main
contributions are known. For example, if dispersion
forces make a dominant contribution, then a local,
biological object must have high polarization
properties.
When a low molecular weight chemical compound
approaches a receptor, the molecule enters into a
pair interaction with it. In the dipole approximation,
the efficiency of this interaction is determined by
the following additive physical components:
1) dipole - dipole interaction, which after averaging
over all orientations of molecules at temperature T
has the following form
Edip = - 2μ12μ22/(3κsR06kBT), (4)
that is, the attraction between the dipoles depends
on the temperature;
2) inductive interaction
Eind = - (α2μ12 + α1μ22)/(κs2R06); (5)
3) dispersion interaction, which, in the London
approximation, can be written in the following form:
Edisp = - 3α1α2I1I2/[2R06(I1 + I2)]. (6)
Approximate formulas (3) - (6) make it possible to
write the pair interaction of molecules in terms of
the properties of individual molecules. Here R0 is
the effective distance between the interacting
molecules; αi is the isotropic electronic
polarizability of the i-th molecule; μi is the dipole
moment of the i-th molecule; kB is the Boltzmann
constant; Ii is the first ionization potential of the i-th
molecule. It should be borne in mind that the
assumption of the additivity of intermolecular
interactions is not sufficiently rigorous [23].
However, for solving practical problems, the
approximation of interaction additivity is generally
accepted. The possibility of using simplified
formulas makes it possible to significantly expand
the scope of the theory of intermolecular
interactions.
Studies have shown [20,24] that formulas (4) - (6)
are applicable to real systems. These
approximations make it possible to correctly
indicate the changes in the potential energy of the
interaction of molecules as a function of the distance
between them and the individual properties of the
molecules. Before writing the final equation that
determines the change in the energy of the system,
taking into account specific and nonspecific
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.13
Vladimir Mukhomorov
E-ISSN: 2224-2678
118
Volume 21, 2022
interactions, let's simplify equation (2). For
homologous series of chemical compounds or
related compounds, which interact with the same
donor, the electron-accepting properties of the
molecule mainly depend on the position on the
energy scale of the lower free molecular orbital εnb
of the acceptor: Ed-a = f(εnb). For the homological
series of chemical compounds, the change in
interaction energy is approximately proportional to
the energy εnb: Ed-a εnb. In general, the equation
that determines the value of the stabilization energy
of the complex of a low-molecular chemical
compound plus a receptor will have the following
form:
ΔEd-a = k0 + nb0 + μ12{-kfR/(2a3) + 1fR2/(2a6) –
2μ22/(3κsR06kBT) α2/(κs2R06)}+ α1{–μ22/(κs2R06)
3α2I1I2/(2R06(I1 + I2))} –
[3α1I1I3/(2a3(I1 + I2))]∙[(n32 –1)/(n32 + 2)]. (7)
Here k and k0 are some numerical coefficients. For
the purposes of regression analysis, this equation
can be simplified. It is known that the inductive
interaction Eind << Edisp and therefore in the fourth
term of the series (7) the first term in the curly
bracket can be ignored in comparison with the
second term. In what follows, we will assume that
the molecular parameters that relate to the
biosubstrate and the polar medium are constant for
the entire range of chemical compounds.
In this study, the active center of the biophase with
which the exogenous molecules interacts is not
specified. Therefore, for certainty we will assume I2
I3 10 eV. This value of the ionization potential
corresponds to most organic molecules [25,26].
Since the main goal is to obtain a regression
equation, this approximation is quite satisfactory.
Thus, equation (7) can be reduced to the following
multifactorial regression equation with three
explanatory variables that enter the equation
additively and have a joint simultaneous effect on
the resulting trait:
A ≡ 1000/LD50 = B0 + B1εnb0 + B2μ12 +
B3α1I1/(I1 +10). (8)
For the convenience of presenting the statistical
material, in what follows we introduce the following
notation: x1 = εnb0, x2 = μ12 и x3 = α1I1/(α1 + 10).
Next, the problem is reduced to estimating the
multiple regression coefficients Bi from the known
results of sample observations.
The statistics of the populations A, x1, x2 and x3 will
be as follows:
A = 1000/LD50: N = 12, Aav = 1.71 ± 0.36; 95%
confidence interval: 0.91 - 2.51; Amin = 0.303, Amax
= 4.00, SA = 1.266, τmin = 1.12 < τmax = 1.82 <
τ0.05cr,2(N) = 2.387 < τ0.05cr,1(N) = 2.523; Wilk-
Shapiro normality test: W = 0.910 > W0.05cr(N) =
0.859, David-Hartley-Pearson normality test:
U10.05cr(N) = 2.800 < U = [(Amax Amin)/SA] =
2.92 < U20.05cr(N) = 3.910, representativeness of the
sample size: Nrepr = 10;
x1: N = 12, x1av = - 2.26 ± 0.28; 95% confidence
interval: (-2.87, -1.64); εnb0,min = -3.67, εnb0,max = -
0.97, Sx1 = 0.964, τmax = 1.34 < τmin = 1.46 <
τ0.05cr,2(N) = 2.387 < τ0.05cr,1(N) = 2.523; Wilk-
Shapiro normality test: W = 0.895 > W0.05cr(N) =
0.859, David-Hartley-Pearson normality test:
U10.05cr(N) = 2.800 U= [(εnb0,max εnb0,min)/Sх1] =
2.791 < U20.05cr(N) = 3.910; Nrepr = 10;
x2: N = 12, x2av = 8.82 ± 2.54; 95% confidence
interval: 3.23 - 14.41; μ12,min = 0, μ12,max = 28.52, Sx2
= 8.80, τmin = 1.00 < τmax = 2.24 < τ0.05cr,2(N) =
2.387 < τ0.05cr,1(N) = 2.523; Wilk-Shapiro normality
test: W = 0.885 > W0.05cr(N) = 0.859, David-Hartley-
Pearson normality test: U10.05cr(N) = 2.800 < U =
[(μ12,max μ12,min)/Sх2] = 3.25 < U20.05cr(N) = 3.910;
Nrepr = 10;
x3: N = 12, x3av = 8.43 ± 0.42; 95% confidence
interval: 7.75 - 9.58; x3min = 5.276, x3max = 11.965,
Sx3 = 1.744, τmin = 1.81 < τmax = 2.03 < τ0.05cr,2(N)
= 2.387 < τ0.05cr,1(N) = 2.523; Wilk-Shapiro
normality test: W = 0.975 > W0.05cr(N) = 0.859,
David-Hartley-Pearson normality test: U10.05cr(N) =
2.800 < U = [(x3max x3min)/Sх3] = 3.83 = U20.05cr(N)
= 3.910; Nrepr = 10. (9)
Since the sample size is limited, before analyzing
the regression (8), we perform the following
procedure. Let's analyze a regression that uses only
one explanatory variable, such as the variable x2:
A1(x2) = a0 + a1 x2.
For this regression, we get the following statistics:
N = 12, m1 = 1; R = 0.80 ± 0.11, |R*| = 0.82 >
R0.05cr(N 2) = 0.576; correlation coefficient
significance test based on the Fisher normalizing z-
transform (with Hotelling corrections taken into
account): uH = 1.07 > u0.05(N) = z0.975∙(N 1)-0.5 =
0.591; RMSE = 0.794; the minimum sample size
sufficient for the reliability of the correlation
coefficient: N0.05min = 6; a0 = 0.17 ± 0.09, a1 = -1.01
± 0.26, |t(a1)| = 3.8 > t0.05cr(N 2) = 2.228 > t(a0) =
1.93; unexplained regression residuals (perturbing
variable) are normally distributed: Wilk-Shapiro
test: W = 0.938 > W0.05cr(N) = 0.859; F = 17.57 >
F0.05cr(f1 = 1;f2 = 10) = 4.96.
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.13
Vladimir Mukhomorov
E-ISSN: 2224-2678
119
Volume 21, 2022
Here R* = R∙[1 + 0.5∙(1 – R2)/(N – 3)] is the adjusted
correlation coefficient. Thus, there is a significant
relationship between the explanatory variable x2 and
bioresponse. Checking the relationship of regression
residuals δA1 with the explanatory variable x1 also
indicates the presence of a significant correlation
between them:
δA1(x1) = a01 + a11 x1,
N = 12, m1 = 1; R = - 0.74 ± 0.14, |R*| = 0.76 >
R0.05cr(N 2) = 0.576; correlation coefficient
significance test based on the Fisher normalizing z-
transform (with Hotelling corrections taken into
account): uH = 0.92 > u0.05(N) = z0.975∙(N 1)-0.5 =
0.591; RMSE = 0.533; the minimum sample size
sufficient for the reliability of the correlation
coefficient: N0.05min = 7; a01 = -1.31 ± 0.41, a11 = -
0.58 ± 0.17, |t(a11)| = 3.48 > t0.05cr(N 2) = 2.228;
unexplained regression residuals (perturbing
variable) are normally distributed: Wilk-Shapiro
test: W = 0.944 > W0.05cr(N) = 0.859; F = 12.13 >
F0.05cr(f1 = 1;f2 = 10) = 4.96.
It follows that the regression δA1(x1) is also
statistically significant. That is, the unexplained
residuals of regression A1(x2) correlate with another
explanatory variable, namely variable x1. Further
verification of the relationship between the
regression residuals δA1(x1) and the explanatory
variable x3 showed that the residuals are not
associated with the molecular index characterizing
the dispersion interaction: R = 0.31 < R0.05cr(N – 2) =
0.576, F = 1.07 < F0.05cr(f1 = 1;f2 = 10) = 4.96.
Obviously, this contribution to the interaction of
molecules is insignificant for the analyzed sample.
It follows from inequalities (9) that at a significance
level of 5%, the populations A, x1, x2, and x3 are
homogeneous and normally distributed. The
homogeneity of the analyzed data was checked
using the τ-criterion [27,28]. The following statistics
were obtained for regression (8):
N = 12, m1 = 3 number of explanatory
variables; multiple correlation coefficient: R1 =
0.966 > R0.05cr(m1; N m1 1) = 0.777, multiple
determination coefficient: R12 = 0.933, R1*2 = 0.91;
standard error of the regression estimate: SA = 0.381;
B0 = - 1.29 ± 0.82, B1 = -0.78 ± 0.18, B2 = 0.06 ±
0.02, B3 = 0.09 ± 0.12; |t(B1)| = 4.27 > t(B2) =
3.51 > t0.05cr(f = Nm – 1) = 2.306 > t(B3) = 0.70;
F = 37.29 > F0.05cr(f1 = 3; f2 = 8) = 4.07; Σ1 =
1.1586; AIC1 = -1.837, SC1 = -1.5093, SS1 =
0.1196. (10)
Here Σ1 is the sum of squared residuals; R0.05cr(m1; N
m1 1) is the critical value of the multiple
correlation coefficient [9], which determines the
lower acceptable limit of the degree of association
between the variations of the resulting attribute and
all explanatory variables.
Statistics (10) contains information quality criteria
for the linear regression equations of Akaike [29]
and Schwartz [30], as well as the alternative ratio SS
= Σ0.5/(N m). For regression residuals, the Wilk-
Shapiro test would be as follows: W = 0.975 >
W0.05cr(N) = 0.859. The information quality criteria
for the regression equation are defined as follows:
AIC = (2m/N) + ln(Σ/N),
SC = [(m + 1)lnN]/N + ln(Σ/N) . (11)
The assessment of the significance of the multiple
determination coefficient (10) is carried out using
the F-statistics: F = R2∙(N m 1)/m/(1 – R2). Since
F > F0.05cr (see Eq.(10)), it can be assumed that the
multiple coefficient of determination is reliably
different from zero with probability 1 α = 0.95,
and the explanatory variables reliably explain
variations in bioactivity. The Bi regression
coefficients are significantly greater than zero if
t(Bi) > tcr at significance level α and number of
degrees of freedom f = N m1 1 at the two-sided
critical region. Therefore, the coefficient B3 in
Eq.(10) is not statistically reliable. Regression (8)
explains 93.3% of the variability in bioactivity.
Only 6.7% of unexplained variations can be
attributed to unaccounted for factors or random
variations in the original data.
The importance of the participation of each of the
independent explanatory variables in assessing the
variability of the resulting sign is characterized by
standardized regression coefficients. The
standardized regression coefficients Bi* are related
to the normal regression coefficients (8) by the
following relationships:
B1* = B1Sx1/SA = - 0.596, B2* = B2Sx2/SA = 0.405,
B3* = B3Sx3/SA = 0.098. (12)
On a natural scale, the regression coefficients are
dimensional quantities. However, the standardized
coefficients are dimensionless and this makes it
possible to perform their quantitative comparison.
Knowledge of standardized coefficients makes it
possible to determine the proportion of explanatory
variables involved in explaining the variability of
the resulting sign. An approximate ratio for the
multiple coefficient of determination can be used to
obtain information about the comparative influence
of individual variables [31]:
Rappr 2 = B1*rx1,A + B2*rx2,A + B1*rx3,A =
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.13
Vladimir Mukhomorov
E-ISSN: 2224-2678
120
Volume 21, 2022
0.535 + 0.321 + 0.075 = 0.931. (13)
Here rx1,A = -0.897, rx2,A = 0.798 and rx3,A = 0.770
are the pairwise correlation coefficients between
each explanatory variable and the observed
bioactivity. From relation (13) it follows that the
greatest contribution to the explanation of the
variability in the toxicity of chemical compounds
comes from the variables x1 (53.5%) and x2 (32.1%).
The equity participation of variable x3 is very
insignificant and amounts to only 7.5%. The
approximate value of the coefficient of
determination (13) is very close to the value of R12 =
0.933 (see Eq.(10)).
The adjusted (unbiased) coefficient of determination
is determined from the following:
R*2 = 1 – (1 – R2)·(N – 1)/(Nm 1). (14)
An adjusted coefficient of determination, R*2 is
applied so that models with different numbers of
explanatory factors can be compared.
Thus, at the 95% confidence level, a very strong
relationship can be assumed between toxicity and
the explanatory variables x1 and x2. The coefficient
of determination R12 = 0.933 (10) indicates what
portion (in this case, 93.3%) of the total variance of
the bioresponse function is explained by the factors
x1, x2, and x3. Only 6.7% of the total variance cannot
be explained by the model (the uncertainty factor is
0.067) and appears to be due to unaccounted for
variables or random deviations in the original
sample. The values of coefficients B1* and B2*
reflect the significant dependence, at the 95%
confidence level, of the resultant variable on the
explanatory factors x1 and x2. At the same time, the
intermolecular dispersion interaction (~ x3) is of
minor importance (not significant at the chosen
significance level α), since t(B3) < tcr (two-sided
hypothesis evaluation). The difference of the
coefficient B3 from zero can be attributed to random
fluctuations in the original sample. Therefore,
nothing definite can be said about the influence of
the dispersion interaction on the resultant attribute.
Features that have low information content (i.e.,
"weight") can be excluded from further analysis.
The choice of independent explanatory variables is a
process of successive refinement of the initial
hypothesis. The following steps can be
distinguished in this process: formation of a primary
hypothesis (Eq. (8)) about the set of independent
variables; analysis of structural relationships;
narrowing of features and selection of significant
variables for modeling.
Since the influence of dispersion interactions is
insignificant, equation (8) can be replaced by the
following reduced two-factor regression equation:
A ≡ 1000/LD50 = B0 + B1x1 + B2x2, (15)
N = 12, m2 = 2, B0 = -0.76 ± 0.29, B1 = -0.86 ± 0.14,
B2 = 0.06 ± 0.02; |t(B1)| = 6.10 > t(B2) = 3.98 >
|t(B0)| = 2.65 > t0.05cr (f = N m2 1) = 2.26; R2 =
0.968 > R0.05cr (2;9) = 0.697, R22 = 0.937, R2*2 =
0.933; standard error of the regression estimate: SA
= 0.370; F = 59.04 > F0.05cr(f1 = m2; f2 = N m2 1)
= 4.26; Σ2 = 1.2294; AIC2 = -1.9450, SC2= -1.6572,
SS2 = 0.1109. (16)
Regression residuals are normally distributed. Wilk-
Shapiro normality test: W = 0.940 > W0.05cr(N) =
0.859.
If the ratio F many times (for example, not less than
four times) exceeds the tabular value, then such a
regression, according to [32], has predictive
properties. Additional information about the
significance of variable x3 can be obtained by
analyzing the relationship between the regression
residuals (15) and variable x3. As the analysis
showed, the correlation coefficient is insignificant:
R = 0.16 < R0.05cr(N 2) = 0.576, that is, the
explanatory variable x3 is not related in any way to
the unexplained regression residuals (15).
Therefore, this variable does not really need to be
included in the regression equation. A comparison
of the information tests AIC1 and AIC2 for
regressions (8) and (15) shows that the quality of the
reduced regression (15) is higher than for regression
(8), although the number of explanatory variables
has decreased. Similar relations hold for the tests
SC1 = -1.5093 and SC2 = -1.6572. The values of the
Akaike and Schwarz tests are associated with the
ratios SS: SS1 = Σ10.5/(N m1) = 0.1196 and SS2 =
Σ20.5/(N m2) = 0.1109. The use of AIC, SC and SS
tests is justified because the R2 criterion may not be
informative. The fact is that for a model with a large
number of explanatory variables, the criterion R2
will always be no less than for a model with a
smaller number of explanatory variables. The
presence of collinearity between explanatory
variables can significantly distort the relationship
between the information tests for full regression and
reduced regression.
If we analyse the residuals δA of a simple linear
regression that depends on only one explanatory
variable x1: A ≡ 1000/LD50 = B0 + B1x1 (|R| = 0.90 >
R0.05cr(N 2) = 0.576; F = 41.2 > F0.05cr(f1 = 1;f2 =
10) = 4.96), then appears that the residuals δA of the
regression are significantly correlated with the set of
variables x2: r = 0.66 > R0.05cr(N 2) = 0.576, F =
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.13
Vladimir Mukhomorov
E-ISSN: 2224-2678
121
Volume 21, 2022
7.76 > F0.05cr(f1 = 1; f2 = 10) = 4.96. Thus, for the
regression equation (15), both explanatory variables
x1 and x2 are significant. However, it is necessary to
check whether these variables are not significantly
correlated. Collinearity between explanatory
variables can lead to misjudgment of the impact of
variables on the outcome variable because the
explanatory variables are related. The presence of
collinearity between explanatory variables can
significantly distort the test relationships for full
regression and reduced regression. Checking the
interconnectedness of the explanatory variables x1
and x2 is carried out as follows:
N = 12; x1(x2) = b0 + b1x2, r1,2 = -0.564 ± 0.216,
|r1,2| = 0.564 < R0.05cr(N – 2) = 0.576; |t(b1)| = 2.16 <
t0.05cr(N – 2) = 2.228; b0 = -2.80 ± 0.81, b1 = -6.15 ±
2.38; the minimum sample size sufficient for the
reliability of the correlation coefficient: N0.05min = 11;
F = 4.67 < F0.05cr(f1 = 1; f2 = 10) = 4.96. (17)
Since F < Fcr, the relationship between x1 and x2 is
not significant at the 95% confidence level. For
small sample sizes (N 15), the best estimate of the
correlation coefficient is the adjusted correlation
coefficient [33]:
r* = r∙[1 + 0.5∙(1 – r2)/(N – 3)]. (18)
For regression (15), the residuals are normally
distributed (16). In this case, a more accurate
quantification of collinearity can be used, as
suggested by Farrar and Glauber [34]. Farrar-
Glauber test, has a chi-square distribution with f =
m(m – 1)/2 degrees of freedom:
χ2 = - (N – 1 – (2m2 + 5)/6)∙ln(r1,1r2,2 r2,1r1,2)
= 3.64 < χ0.052,cr(f = 1) = 3.841. (19)
Here m2 = 2 is the number of explanatory variables.
Since χ2 < χ2,cr, then the hypothesis of the
independence of the explanatory variables does not
contradict the original data. The absence of a
significant relationship between the features εnb0 and
μ12 is also indicated by the inequality:
t = |r1,2|∙(N – 2)0.5/(1 – r1,22)0.5 =
2.16 < t0.052,cr(N 2) = 2.23. (20)
That is, the correlation coefficient |r1,2| statistically
insignificant. It is also possible to obtain an estimate
of the collinearity of the variables x1 and x2 by using
the following relation [8]:
F = (Nm2)r1,22/[(m2 – 1)(1 – r1,22)] =
4.67 < F0.05cr(f1 = 1; f2 = 10) = 4.90. (21)
Inequality (21) also indicates that the explanatory
variables εnb0 and μ12 can be recognized as
independent. Consequently, the explanatory
variables x1 and x2, have a simultaneous effect on
the resultant variable, in a significant,
multidirectional and independent manner.
The regression coefficients (15) need to be
transformed again to reveal the comparative effect
on bioactivity of the explanatory variables:
B1* = B1Sx1/SA = -0.656, B2* = B2Sx2/SA = 0.428.
(22)
Thus, in contrast to the results (15), the standardized
coefficients B1* and B2* do not differ very
significantly in absolute value. Using the
approximate relation (13), we determine the share
contribution of each variable to the explanation of
the variability of bioactivity:
Rappr2 = B1*rx1,A + B2*rx2,A = 0.562 + 0.377 = 0.939.
(23)
The largest contribution to (23) is made by the
explanatory variable εnb0 (56.2%). The approximate
coefficient of determination Rappr2 = 0.939 is very
close to the coefficient of determination (16): R22 =
0.937.
In order to compare the coefficients of
determination of the two models with different
numbers of explanatory factors m, their adjusted R*2
values must be calculated (14). For regression (8),
the adjusted coefficient of determination is R1*2 =
0.90 for m3 = 3. However, for multiple regression
(16) R2*2 = 0.933; m2 = 2. The difference in the
coefficients of determination |R1*2 R2*2| determines
the measure of additional explanation for the
variation of the resultant variable, by including
another explanatory variable in the regression.
Next, we use the following relation, which has the
Fisher F-distribution:
F = |R1*2R2*2|∙(Nm1 – 1)/(m1m2)/(1 – R1*2).
(24)
If F > F0.05cr(f1 = m1 m2; f2 = N m1 1), then the
additional explanatory variable must be retained in
the regression equation. In this case there is an
inverse inequality of F = 2.75 < F0.05cr(f1 = 1; f2 = 8)
= 5.32. That is, the additional explanatory variable
x3 does not improve the regression. The purpose of
the regression equation is not only to describe the
experiment satisfactorily. The regression equation
should indicate the physical phenomena associated
with the variability of bioactivity.
Thus, independent and simultaneously acting
explanatory molecular variables εnb0 and μ12 are
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.13
Vladimir Mukhomorov
E-ISSN: 2224-2678
122
Volume 21, 2022
associated with the toxicity of substituted
chlorobenzenes. Model (15) makes it possible to
make some assumptions about the properties of the
biophase region with which an exogenous molecule
can interact. In accordance with the regression
equation (15), biophase molecules must have a
significant dipole moment or charge. In addition, the
energy of the highest occupied molecular orbital
εhocc of the biophase should be close to the energy
εnb0 of the exogenous molecule. If it is possible to
briefly describe the initial information, then there is
confidence that some objective regularity has been
revealed that exists in the structure of the feature
space, allowing this reduction to be carried out [18].
The correlation between the experimental toxicity
values of substituted chlorobenzenes and theoretical
values is shown in Fig. 1. The energy of
stabilization of the molecular complex is the higher,
the greater the energy of the donor-acceptor
complex.
Table 1 also lists the Z and H molecular features for
chlorine-substituted compounds. Z is the average
number of electrons in the outer shell of atoms in a
molecule: Z = ΣiniZi/N [36,37]. Here ni is the
number of atoms of the i-th sort with the number of
electrons Zi on the outer electron shell. The
summation is performed on all atoms in the
molecule; Σini = N is the total number of atoms. The
electronic factor Z is related to the pseudopotential
of the molecule [38].
Fig. 1. Scatterplot and regression line. Calculated
and experimental values of average lethal doses (A
1000/LD50) of chlorobenzene derivatives (Table 1).
The regression line is given by the equation Amod =
a0 + a1Aexp, N = 12, a0 = 0.191 ± 0.168, a1 = 0.904 ±
0.080; R = 0.96 ± 0.03; R* = 0.97 > R0.05cr(N 2) =
0.576, F = 159.2 > F0.05cr(f1 = 1; f2 = 10) = 4.96;
straightness index: K = (N∙(1 R2))0.5 = 0.84 <
Kthr(threshold value) = 3.00 [35].
The information function H [39], for a discrete data
set, is quantified as follows: H = - Σjpjlog2pj. The
ratio pi = ni/N satisfies the following conditions: 0
pi 1, Σipi= 1. In which connection, pi = 0 means
the impossibility of the occurrence of the i-th event;
Σini = N; N is the number of atoms in the molecule.
The ratio nk/N determines the share holding of the
kth kind of atom in the molecule. For chemical
compounds from Table 1, it was found that the
molecular factor Z is closely related to the
polarizability of the α1 molecule: the linear
correlation coefficient is 0.93 ± 0.12. The statistical
significance of the correlation coefficient is
determined by the inequality:
t = |R|∙(N – 2)0.5/(1 – R2)0.5 =
12.36 >t0.05cr(N2) = 2.23. (25)
Inequality (25) uses a two-sided critical region for
the t-quantile. The Chaddock scale defines the
pairwise correlation coefficient as corresponding to
a “very close relationship” [40].
The relationship between the values of one-electron
energies εnb0 and the values of the molecular factor Z
was also verified. A very close linear relationship
was found (Fig. 2A) between these factors for
chlorine-substituted benzenes (chemical compound
numbers No = 1 – 6 in Table 1):
εnb0(Z)1 = a01 + a11Z, N1 = 6, R1 = -0.996 ± 0.004,
|R1*| = 0.998 > R0.05cr(N1 2) = 0.811; criterion of
significance of the correlation coefficient based on
the normalizing Fisher z-transform (Hotelling
corrections is taken into account [33]): uH = 2.98 >
u0.05(N) = z0.975∙(N 1)-0.5 = 0.86; S1 = 0.040;
sufficient sample size to ensure the validity of the
correlation coefficient: N0.05min < 4; a01 = 1.23 ±
0.13, a11 = -0.75 ± 0.03, |t(a11)| = 21.7 > t(a01) =
9.46 > t0.05cr(N1 – 2) = 2.776; F = 469.5 > F0.05cr(f1 =
1;f2 = 4) = 7.71; straightness index: K = 0.22 < Kthr =
3.0. (26)
Because the corrected correlation coefficient |R1*| =
0.998, and the value of S1 = 0.040, then this
relationship of features is close to the functional
dependence.
Z and εnb statistics:
N1 = 6, Zav = 3.75 ± 0.21; 95% confidence interval
(3.20-4.30), Zmin = 3.00, Zmax = 4.50, SZ1 = 0.524,
τmax = 1.43 = τmin = 1.43 < τ0.05cr,2(N1) = 1.996 <
τ0.05cr,1(N1) = 2.184; Wilk-Shapiro normality test: W
= 0.960 > W0.05cr(N1) = 0.788, David-Hartley-
Pearson normality test: U10.05cr(N1) = 2.200 < U =
[(Zmax Zmin)/SZ] = 2.863 < U20.05cr(N1) = 3.012, Nrepr
= 5; (27)
N1 = 6, εnbav = -1.56 ± 0.16; 95% confidence interval
(-1.97,-1.15), εnbmin = -2.097, εnbmax = -0.971, Sε =
0 1 2 3 4 5
0
1
2
3
4
5
À (experiment)
À (model)
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.13
Vladimir Mukhomorov
E-ISSN: 2224-2678
123
Volume 21, 2022
0.392, τmin = 1.37 < τmax = 1.50 < τ0.05cr,2(N1) =
1.996< τ0.05cr,1(N1) = 2.184; Wilk-Shapiro normality
test: W = 0.975 > W0.05cr(N1) = 0.788, David-
Hartley-Pearson normality test: U10.05cr(N1) = 2.200
< U = [(εnbmax εnbmin)/Sε] = 2.870 < U20.05cr(N1) =
3.012; Nrepr = 5. (28)
A B
Fig. 2. Scatterplots and regression lines. Relationship between the energy of the lowest free molecular orbital
Enb εnb0 of an exogenous molecule and the molecular factor Z. (A) Chloro-substituted benzenes. (B)
Nitrochlorobenzenes.
The sets Z and εnb are homogeneous and have a
normal distribution. A quantitative relationship
between these features was also found (Fig. 2B) for
chlorine-substituted nitrobenzenes (numbers of
chemical compounds Nos. = 8 - 12 in Table 1):
εnb0(Z)2 = a02 + a12Z, N2 = 5, R2 = - 0.89 ± 0.12,
|R2*| = 0.94 > R0.05cr(N2 2) = 0.8783; S2 = 0.166;
criterion of significance of the correlation
coefficient based on the normalizing Fisher z-
transform (Hotelling corrections is taken into
account): uH = 1.904 > u0.05(N) = z0.975∙(N 1)-0.5 =
0.98; sufficient sample size to ensure the validity of
the correlation coefficient: N0.05min < 4; a02 = -1.37 ±
0.59, a12 = -0.47 ± 0.14, |t(a12)| = 3.32 > t0.05cr(N2
2) = 3.182; F = 11.04 > F0.05cr(f1 = 1; f2 = 3) =
10.13; straightness index: K = 1.02 < Kthr = 3.0. (29)
Z and εnb statistics:
N2 = 5, Zav = 4.11 ± 0.26; 95% confidence interval
(3.38-4.83), Zmin = 3.71, Zmax = 5.00, SZ2 = 0.584,
τmin = 0.682 < τmax = 1.527 < τ0.05cr,2(N2) = 1.869<
τ0.05cr,1(N2)= 2.080; Wilk-Shapiro normality test: W =
0.773 > W0.05cr(N2) = 0.762, David-Hartley-Pearson
normality test: U10.05cr(N2) = 2.200 < U = [(Zmax
Zmin)/SZ] = 2.21 < U20.05cr(N2) = 3.222; Nrepr = 4;
N2 = 5, εnbav = -3.28 ± 0.14; 95% confidence interval
(-3.66,-2.30), εnbmin = -3.657, εnbmax = -2.987, Sε =
0.310, τmax = 0.95 < τmin = 1.22 < τmax = 1.527
τ0.05cr,2(N2) = 1.869 < τ0.05cr,1(N2) = 2.080; Wilk-
Shapiro normality test: W = 0.831 > W0.05cr(N2) =
0.762, David-Hartley-Pearson normality test:
U10.05cr(N2) = 2.200 U = [(εnbmax εnbmin)/Sε] = 2.16
< U20.05cr(N2) = 3.012, Nrepr = 4. (30)
An approximate comparative estimate of the
regression coefficients a11 and a12 can be made
using the following relation [41]:
t = |a11a12|/[S12/(N1 – 1)/SZ12 + S22/(N2 – 1)/SZ22]0.5
= 1.918 < t0.05cr(N1 + N2 – 4) = 2.365. (31)
That is, estimates of regression coefficients differ
insignificantly, since t < tcr.
Dispersion interaction does not make a statistically
significant contribution to the explanation of the
bioresponse of a number of compounds from Table
1. However, there is a relationship between the
molecular feature Z and the value of the dispersion
contribution:
x3(Z) = a0 + a1Z, N = 12, R = 0.96 ± 0.03, R* = 0.97
> R0.05cr(N 2) = 0.576; sufficient sample size to
ensure the validity of the correlation coefficient:
N0.05min < 5; criterion of significance of the
correlation coefficient based on the normalizing
Fisher z-transform (Hotelling corrections is taken
into account): uH = 1.942 > u0.05(N) = z0.975∙(N1)-0.5
= 0.591; a0 = -1.10 ± 0.95, a1 = 2.52 ± 0.24, t(a1) =
10.42 > t0.05cr(N2); standard error of the regression
estimate: 0.44; F = 108.5 > F0.05cr(f1 = 1;f2 = 10) =
4.96. (32)
Statistics of set Z:
3 3.5 4 4.5 5
2.5
2
1.5
1
0.5
Z, arb. units.
Enb, eV
3.5 4 4.5 5 5.5
3.8
3.6
3.4
3.2
3
2.8
Z, arb. units
Enb, eV
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.13
Vladimir Mukhomorov
E-ISSN: 2224-2678
124
Volume 21, 2022
N = 12, Zav = 3.87 ± 0.16; 95% confidence interval:
3.52 - 4.22; Zmin = 3.00, Zmax = 5.00, SZ = 0.548, τmin
= 1.59 < τmax = 2.06 < τ0.05cr,2(N) = 2.387 <
τ0.05cr,1(N) = 2.523; Wilk-Shapiro normality test: W =
0.951 > W0.05cr(N) = 0.859, David-Hartley-Pearson
normality test: U10.05cr(N) = 2.800 < U = [(Zmax
Zmin)/SZ] = 3.64 < U20.05cr(N) = 3.910; Nrepr = 10.
(33)
It follows that the sets of indices Z is homogeneous
and satisfies a normal distribution. The authors of
[42] present the additive contributions to the energy
of the pair interaction of a tetramethyluric acid
molecule with aromatic hydrocarbon molecules.
Including the magnitude of the dispersion
contribution is reported. Our check showed that in
this case, too, there is a statistically significant
relationship between the molecular factor Z and the
dispersion energy value of the pairwise interaction.
If there are chemical compounds capable of forming
hydrogen bonds, the energy contribution due to
hydrogen bonds must be taken into account in the
regression equations (8) and (15). The hydrogen
bond is essentially a quasi-chemical short-range
interaction of molecules. Therefore, the properties
of a hydrogen bond are difficult to describe using
the properties of isolated molecules. However, some
quantitative estimates can be made. Considering that
the terms of equation (8) are proportional to the
contributions to the binding energy, it can be
assumed that the contributions to the bioactivity
from these interactions are also approximately
equal.
The application of model (15) to the calculation of
the bioactivity of the 2,4-dichlorophenol molecule
leads to the following value of the resultant feature
A = 0.58 (εnb0 = -1.4eV, μ1 = 1.5D), which is
markedly lower than the experimental value 2.08.
However, it should be kept in mind that in an
isolated 2,4-dichlorophenol molecule there is an
intramolecular hydrogen bond between the hydroxyl
group proton and the chlorine atom in the ortho-
position. When a molecule enters a polar condensed
medium, the intramolecular hydrogen bond is
broken due to the electric field of the reaction. This
state of the molecule corresponds to a lower total
energy. In other words, the molecular state is
stabilized and the hydroxyl group has the
opportunity to take part in the formation of an
intermolecular hydrogen bond. In a polar dielectric
medium, with an increase in the dipole moment of a
molecule, the equilibrium of molecular forms with
and without an intramolecular hydrogen bond shifts
towards a molecular state with a large dipole
moment. That is, without the formation of an
intramolecular hydrogen bond. Such a situation has
indeed been observed experimentally. The
formation of an intermolecular hydrogen bond is
accompanied by an additional contribution to the
interaction energy. This in turn increases the
bioresponse A by about 1.0 (contribution to the total
interaction energy - 0.85εnb0). Therefore, the total
calculated value of bioactivity A will be
approximately equal to 1.58, which is close to the
experimental value. However, these remarks cannot
be applied to the 2,4,6-trichlorophenol molecule
(Table 1). This molecule contains a hydroxyl group.
However, the molecule is not involved in the
formation of intermolecular complexes through
hydrogen bonds. The fact is that the proton of the
hydroxyl group oscillates between two neighboring
chlorine atoms, which are characterized by
significant electronegativity. Oscillations are the
result of a proton tunneling through a potential
barrier. In this way, an intramolecular transition
from one equilibrium position to another is carried
out. Spectroscopic studies of 2,4,6-trichloro-,
2,4,5,6-tetrachloro- and pentachlorophenols confirm
the existence of intramolecular proton migration.
The application of the regression equation (8) to
predict the bioactivity of molecules is complicated
by the fact that the researcher is required to know
many molecular parameters. In particular,
knowledge of the energies of single-electron
molecular orbitals, which can be determined by
complicated and cumbersome quantum mechanical
calculations, is required. The results of these
calculations require professional analysis.
Fig.3. Scatterplot and regression line. The
regression line is approximated by the linear
equation: А(H)mod = b0 + b1H, N = 12; b0 = -5.25 ±
1.15, b1 = 4.02 ± 0.07, N0.05min = 5; R = 0.89 ± 0.07;
R* = 0.90 > R0.05cr(N 2) = 0.576, S = 0.604, F =
37.6 > F0.05cr(f1 = 1; f2 = 10) = 4.96; straightness
index: K = 1.59 < Kthr = 3.0.
However, as the analysis showed, some rapid
assessment of the toxicity of chlorine-substituted
1.2 1.4 1.6 1.8 2 2.2
1
0
1
2
3
4
H, bits
Bioactivity
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.13
Vladimir Mukhomorov
E-ISSN: 2224-2678
125
Volume 21, 2022
benzenes can be done using the information
function H (Table 1). Figure 3 shows a linear
relationship between the toxicity of chlorine-
substituted benzenes and the information function of
molecules. Using the regression equation in Fig.3,
we obtain the following estimate for the bioactivity
of 2,4-dichlorophenol (Aexp = 2.08), which was not
used in the original sample: Amod = 1.74 (H = 1.738
bits, Z = 3.692 arb. units.).
There is also a significant relationship between the
toxicity of chemical compounds and the explanatory
factor Z:
А(Z) = b0 + b1Z, N = 12, R = 0.62 ± 0.20, R* = 0.64
> R0.05cr(N 2) = 0.576; sufficient sample size to
ensure the validity of the correlation coefficient:
N0.05min = 10; b0 = -3.82 ± 2.29, b1 = 1.43 ± 0.59,
t(b1) = 2.44 > t0.05cr(N – 2) = 2.228; standard error of
the regression estimate: SA = 1.044; criterion of
significance of the correlation coefficient based on
the normalizing Fisher z-transform (Hotelling
corrections is taken into account): uH = 0.697 >
u0.05(N) = z0.975∙(N 1)-0.5 = 0.591; straightness
index: K = 2.7 < Kcr = 3.0. (34)
For 2,4-dichlorophenol, we obtain the following
toxicity estimate from equation (34): A = 1.46 at Z =
3.692 arb. units. The higher the Z and H values, the
higher the toxicity of the chemical compound. The
existence of such a trend for factor Z is also
indicated by the Abbe-Linnick criterion [27]:
q = 0.5∙Σi=1N-1(Zi+1Zi)2i=1N(ZiZav)2 =
0.415 < q0.05cr(N) = 0.5636,
Q* = - (1 – q)∙[(2N + 1)/(2 – (1 – q)2)]0.5 =
-2.27 < u0.05 = -1.645. (35)
A similar trend takes place for the molecular
information factor H:
q = 0.231 < q0.05cr(N) = 0.5636,
Q* = -3.24 < u0.05 = - 1.645. (36)
Regression (15) indicates only an important initial
stage of the manifestation of its biological action by
a chemical compound, associated with the fixation
of the molecule.
3.2 Saturated and unsaturated chlorine-
containing compounds
In this part of the article, we will again use the
general equation (8) to interpret the narcotic effect
of a series of saturated and unsaturated chlorine-
containing compounds. Isotoxic concentrations (C
in units of mM/l) of vapours of compounds causing
lateral positioning of 50% of white mice are used as
the biological response (Table 2). We use the
following regression equation, which is similar in
form to regression (8):
Amod ≡ 100/C = B0 + B1x1 + B2x2 + B3x3,
N = 15, B0 = -11.88 ± 1.11, B1 = - 0.96 ± 0.35, B2 =
0.90 ± 0.13, B3 = 2.72 ± 0.27, R3 = 0.980 >
R0.05cr(m3; N m3 1) = 0.703, R32 = 0.960, R3*2 =
0.949; m3 = 3; standard error of the regression
estimate: S3 = 0.905; |t(B0)| = 10.71 > t(B3) = 10.24
> t(B2) = 6.94 > t(B1) = 2.79 > t0.05cr(f = N m3
1) = 2.20; F = 88.04 > F0.05cr(f1 = 3; f2 = 11) = 3.59;
Σ3 = 9.0072; AIC3 = - 0.1100, SC3 = 0.2121, SS3 =
0.2501. (37)
Bioactivity statistics:
A: N = 15, Aav = 5.27 ± 1.04; 95% confidence
interval: 3.04 - 7.49; Amin = 0.71, Amax = 13.33, SA
= 4.01, τmin = 1.14 < τmax = 2.01 < τ0.05cr,2(N) =
2.493 < τ0.05cr,1(N)= 2.617; Wilk-Shapiro normality
test: W = 0.881 = W0.05cr(N) = 0.881, David-Hartley-
Pearson normality test: U10.05cr(N) = 2.970 < U =
[(Amax Amin)/SA] = 3.15 < U20.05cr(N) = 4.170; Nrepr
= 12.
Statistics of explanatory variables:
x1: N = 15, x1av = -1.19 ± 0.27; 95% confidence
interval: -1.77, -0.61; x1min = -2.75, x1max = 0.45, Sx1
= 1.04, τmax = 1.04 < τmin = 1.50 < τ0.05cr,2(N) =
2.493 < τ0.05cr,1(N) = 2.617; Wilk-Shapiro normality
test: W = 0.930 > W0.05cr(N) = 0.881, David-Hartley-
Pearson normality test: U10.05cr(N) = 2.970 < U =
[(x1max x1min)/Sx1] = 3.08 < U20.05cr(N) = 4.170, Nrepr
= 12,
x2: N = 15, x2av = 3.59 ± 0.61; 95% confidence
interval: 2.29 - 4.89; x2min = 0, x2max = 7.95, Sx2 =
2.35, τmin = 1.53 < τmax = 1.86 < τ0.05cr,2(N) =
2.493 < τ0.05cr,1(N) = 2.617; Wilk-Shapiro normality
test: W = 0.957 > W0.05cr(N) = 0.881, David-Hartley-
Pearson normality test: U10.05cr(N) = 2.970 < U =
[(x2max x2min)/Sx2] = 3.39 < U20.05cr(N) = 4.170;
Nrepr = 12,
x3: N = 15, x3av = 4.71 ± 0.30; 95% confidence
interval: 4.07 - 5.34; x3min = 3.308, x3max = 7.353,
Sx3 = 1.149, τmin = 1.22 < τmax = 2.31 <
τ0.05cr,2(N) = 2.493 < τ0.05cr,1(N) = 2.617; Wilk-
Shapiro normality test: W = 0.923 > W0.05cr(N) =
0.881, David-Hartley-Pearson normality test:
U10.05cr(N) = 2.970 < U = [(x3max x3min)/Sx3] = 3.52
< U20.05cr(N) = 4.170; Nrepr = 12. (38)
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.13
Vladimir Mukhomorov
E-ISSN: 2224-2678
126
Volume 21, 2022
Thus, the populations x1, x2, x3 and A are
homogeneous and normally distributed.
For regression (37), the t-values of all coefficients
|t(Bi)| > t0.05cr(N). Consequently, the coefficients
characterize the significant effect of each of the
intermolecular interaction contributions on the
toxicity of chemical compounds. To determine the
comparative influence of individual contributions of
intermolecular interactions, we turn to standardized
regression coefficients. We will use relations (22)
for this purpose:
B1* = B1Sx1/SA = -0.251, B2* = B2Sx2/SA = 0.525,
B3* = B3Sx3/SA = 0.778. (39)
Let us determine the contribution of each variable to
the variability of bioactivity. The approximate
value of the multiple coefficient of determination is
determined by the relation (13):
Rappr2 = B1*rx1,A + B2*rx2,A + B3*rx3,A =
0.098 + 0.179 + 0.686 = 0.963. (40)
Here, the pair correlation coefficients are rx1,A = -
0.392, rx2,A = 0.337 and rx3,A = 0.880. The
approximate multiple coefficient of determination
(40) practically coincides with the value of the
coefficient of determination R32 = 0.960 (37). Given
the values of the standardized regression
coefficients (39), it can be noted that the
explanatory variables do not equally affect the
variability of the bioactivity. In contrast to the series
of chlorobenzenes (Table 1), for which the
contribution from the variable x1 = εnb0 is maximum,
for a number of chemical compounds from Table 2
this contribution to regression (37) is minimal. At
the same time, the maximum contribution to the
variability of bioactivity is made by pair dispersion
interactions x3 = α1I1/(I1 + 10). For chlorobenzenes,
the sequence of influence on the bioactivity of
intermolecular interactions will be as follows (13):
x1(53.5%) > x2(32.1%) > x3(7.5%). Whereas for a
number of compounds from Table 2 the hierarchy of
influence will be the opposite: x1(9.8%) < x2(17.9%)
< x3(68.6%). The parentheses indicate the
percentage contributions of the explanatory
variables. Note that the signs of the coefficients of
Bi* (37) remain the same as for the regression
equation (10). Consequently, the direction of
influence of the explanatory variables on the
bioactivity of the molecules does not change. Thus,
the molecular factors taken into account by the
model explain 96.0% of the variation in the
bioresponse (R32 = 0.960) and only 4.0% remain
unexplained.
The hierarchy of values of the regression
coefficients Bi* makes it possible to indicate the
relative importance of the intermolecular
contributions taken into account in the regression.
The maximum contribution to the regression
equation is made by the explanatory variables that
determine the pair dispersion interaction. The
influence of acceptor interactions (~ εnb0) can be
considered as corrections to the dispersion and
dipole-dipole contributions. It seems that the
molecular region of the biophase with which the
molecule interacts has the highest filled molecular
orbital energy level, which lies below the εnb0 level
on the energy scale. That is, such an arrangement of
energy levels does not favor the transfer of an
electron to a free one-electron level εnb0.
Testing for collinearity between explanatory
variables resulted in the following values for the
pairwise correlation coefficients between
explanatory variables: r1,2 = 0.541, r1,3 = -0.545 и
r2,3= 0.064. To quantitatively check the presence of
collinearity between explanatory variables, we use
the Farrar-Glauber test (19):
χ2 = - (N – 1 – (2m + 5)/6)∙ln(det|ri,j|) = 12.16 >
χ0.052,cr (f = m(m – 1)/2) = 7.82; i = 1,2,3; j = 1,2,3.
(41)
Since χ2 > χ2,cr, it is necessary to reject the null
hypothesis about the absence of collinearity between
the explanatory variable at a significance level of α
= 0.05. To determine which explanatory variable
generates the greatest interdependence between
variables, the following relationship is used [8]:
ti,k = ri,k(N m)0.5/(1 – ri,k2)0.5 , (42)
which has a t-distribution with f = N m degrees of
freedom. Using relation (42), we obtain the
following sequence of inequalities: |t1,3| = 2.251 >
t1,2 = 2.228 > t0.05cr(f = N m3) = 2.179 > |t2,3| =
0.222. From these inequalities it follows that the
variable x1 generates the interdependence of
features.
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.13
Vladimir Mukhomorov
E-ISSN: 2224-2678
127
Volume 21, 2022
Table 2
Physico-chemical parameters of chlorine-containing compounds and isotoxic concentrations (C, mM/l) of the
vapors of these compounds, causing the lateral position of 50% of white mice.
Chemical compound
Z,
arb.
units
H,
bits
d20, g/ml [43]
n20 [43]
εnb0, eV
I1*), eV
μ1*), D,
α1**), 1024cm3
A exp., 100/C,
[44]
Amod,, Eq.(37)
1
Ethyl chloride
2.50
1.299
0.8978
1.3676
0.151
10.70
2.0 [14]
6.40
0.71
0.55
2
Propyl chloride
2.36
1.241
0.8909
1.3879
0.445
10.44
1.8 [14]
8.24
1.23
2.03
3
Vinyl chloride
3.00
1.460
0.9999
1.4046
-0.522
9.82
1.44[14]
7.83[15]
1.56
1.02
4
1,1-Dichlorovinyl
4.00
1.586
1.2180
1.4249
-1.216
9.59
1.13
8.07
2.50
1.17
5
1,2-Dichlorovinyl
4.00
1.586
1.2837
1.4490
-1.214
9.15
1.77
7.78[15]
2.50
2.20
6
1,1-Dichloroethane
3.25
1.500
1.1757
1.4164
-1.206
10.60
1.80[14]
8.38
3.08
3.90
7
Chloromethylene
2.80
1.372
1.3255
1.4242
-1.215
10.79
2.40[14]
6.48[15]
3.08
3.59
8
Trichlorovinyl
3.45
1.539
1.4642
1.4773
-2.276
9.48
1.01
10.06
4.00
4.53
9
Chloroform
4.86
1.449
1.4832
1.4459
-2.467
11.00
1.51
8.23
5.00
4.25
10
Tetrachlorovinyl
6.00
0.919
1.6227
1.5053
-2.623
9.66
0
12.03
5.00
6.70
11
1,2-Dichloroethane
3.25
1.500
1.2531
1.4476
-0.089
10.92
2.43
8.45
5.71
5.49
12
1,2-Dichloropropane
2.91
1.436
1.1559
1.4394
0.099
10.60
2.82
10.20
9.52
9.42
13
1,1,2-Trichloroethane
4.00
1.562
1.4714
1.4940
-1.264
10.89
2.72
10.28
10.00
10.5
14
1,1,2,2-Tetrachloroethane
4.75
1.500
1.5953
1.4940
-1.680
10.89
2.17
12.15
11.76
11.2
15
Pentachloroethane
5.50
1.299
1.6796
1.5025
-2.753
10.88
1.37
14.11
13.33
12.4
*) Dipole moments and ionization potentials of the molecules are calculated by the quantum-chemical method MINDO/3.
**) Polarizabilities of molecules are calculated using the Clausius-Mossotti formula: α = (n202 - 1)3M/(n202 + 2)/(4πd20NA);
d20 and n202 are the density and refractive index at 200C, respectively.
To eliminate or reduce the collinearity of the
explanatory variables, perform a linear
transformation. For example, for the variable x2 : x2*
= x1 x2. We write multiple regression (37) as
follows:
Amod ≡ 100/C = B0 + B1x1 + B2x2* + B3x3,
N = 15, B0 = -11.88 ± 1.11, B1 = -0.064 ± 0.287, B2
= - 0.897 ± 0.129, B3 = 2.716 ± 0.265, R3 = 0.980
> R0.05cr(m3 = 3; ν = N m3 1) = 0.703, R32 =
0.960, R3*2 = 0.954; m3 = 3; standard error of the
regression estimate: SA = 0.905; |t(B0)| = 10.71 >
t(B3) = 10.24 > t(B2) = 6.94 > t0.05cr(f = N m3 1)
> |t(B1)| = 0.224; F = 88.06 > F0.05cr(f1 = 3;f2 = 11) =
3.59 = 3.59; Σ3 = 9.007, AIC3 = -0.1100, SC3 =
0.2121, SS3 = 0.2501; B1* = -0.0168, B2* = -0.4448,
B3* = 0.7783. (43)
The approximate coefficient of determination Rappr2
practically coincides with the coefficient of
determination R32= 0.960 (43):
Rappr2 = B1*rx1,A + B2*rx*2,A + B3*rx3,A =
0.007 + 0.269 + 0.685 = 0.961. (44)
The regression coefficient B1 (43) is statistically
insignificant at the 95% confidence level.
Correlation coefficients were also determined
between the explanatory variables x1, x2* and x3: r1,2*
= 0.112, r1,3 = - 0.545 and r2*,3= - 0.207. Between
variables x1 and x2* there was a significant decrease
in the correlation coefficient (compare with r1,2 =
0.541). From the Farrar-Glauber relation (41) we
now obtain the following inequality: χ2 = 5.06 <
χ0.052,cr(f = 3) = 7.82. Thus, the null hypothesis that
there is no significant multicollinearity between the
explanatory variables can be accepted.
Statistics of the population of the explanatory
variable x2*:
N = 15, x2*av = - 4.78 ± 0.51; 95% confidence
interval: (-5.88, - 3.68); x2*min = -8.66, x2*max = -
2.49, Sx*2 = 1.989, τmax = 1.151 < τmin = 1.951 <
τ0.05cr,2(N) = 2.493 < τ0.05cr,1(N) = 2.617; Wilk-
Shapiro normality test: W = 0.919 > W0.05cr(N) =
0.881, David-Hartley-Pearson normality test:
U10.05cr(N) = 2.970 < U = [(x2*max x2*min)/Sx*2] =
3.10 < U20.05cr(N) = 4.170. (45)
Therefore, the set of elements x2* is homogeneous
and normally distributed. The influence of the
explanatory factor x1 = εnb0 on the variability of
bioactivity is markedly smaller compared to the
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.13
Vladimir Mukhomorov
E-ISSN: 2224-2678
128
Volume 21, 2022
contributions from variables x2 (or x2*) and x3.
Therefore, we will perform the following
comparative analysis. Write the following reduced
regression:
A ≡ 100/C = B0 + B2x2 + B3x3. (46)
The regression (46) statistics will be as follows:
N = 15, m4 = 2, R4 = 0.965 > R0.05cr(m4; N m4 – 1)
= 0.627, R42 = 0.932, R4*2 = 0.921; standard error
of the regression estimate: S4 = 1.131; B0 = -12.063
± 1.384, B2 = 0.681 ± 0.129, B3 = 3.164 ± 0.264;
|t(B3)| = 12.00 > |t(B0)| = 8.72 > t(B2) = 5.28 >
t0.05cr(f = Nm4 – 1) = 2.18, F = 82.04 > F0.05cr(f1
= 2;f2 = 12) = 3.88; Σ4 = 15.354, AIC4 = 0.2900,
SC4 = 0.5649, SS4 = 0.3014. (47)
There is no collinearity between variables x2 and x3
(r2,3 = 0.064). The multiple coefficient of sample
correlation R4 significantly exceeds the tabular value
of the correlation coefficient R0.05cr(m4; N m4 1).
Thus, 93.2% of the total variance is due to the
variability of the molecular factors x2 and x3. The
uncertainty factor is 6.8%. However, all three
comparative information tests AIC4, SC4, SS4
indicate that the quality of the regression (46) is
markedly reduced compared to the regression (37).
The standardized regression coefficients (46) are:
B2* = 0.399, B3* = 0.907. (48)
Adjusted coefficients of determination are R3*2 =
0.95 for regression (37) and R4*2 = 0.921 for reduced
regression (46), respectively. Using the standardized
regression coefficients (48) an approximate
coefficient of determination can be determined:
Rappr2 = B2* rx2,A + B3* rx3,A =
0.14 + 0.80 = 0.94. (49)
This value of the coefficient of determination is very
close to the value of R42 = 0.932 (47).
The significance of the contribution of x1 to the
regression equation (37) can be checked with the
following statistics:
F = (|R3*2R4*2|)(Nm3 – 1)/(m3m4)/(1 – R3*2) =
6.04 > F0.05cr(f1 = m3 m4; f2 = N m3 – 1) = 4.84.
(50)
Since F > Fcr the additional explanatory variable εnb0
is significant at the 95% confidence level. Let us
also check whether the decrease in the variance of
equation (37) is the result of an increase in the
number of connections compared to regression (46).
For normally distributed sets, the comparison of the
two variances of equations (37) and (46) is done
using a statistic that has an F-distribution:
F = S42/S32 = 1.56 <
F0.05cr(f1 = N m4 – 1; f2 = N m3 – 1) = 2.79. (51)
Therefore, at the 95% confidence level, the decrease
in the variance of the regression equation (37) is not
due to an increase in the number of explanatory
variables. Thus, variable x1 must be retained in the
regression equation. All three AIC, SC and SS
information tests do not contradict inequality (50).
Preference is given to the model for which the
information test has the least value. The test
comparison is only valid for models built for
samples containing the same number of
observations. A comparison of the quality criteria of
regression equations (10), (16), (37) and (46) is
presented in Table 3.
Table 3
Quality criteria for linear regression equations
i
m
N
Σi
AICi
SCi
SSi
Equation
1
3
12
1.1586
-1.8377
-1,5093
0.1196
(10)
2
2
12
1.2294
-1.9450
-1.6572
0.1109
(16)
3
3
15
9.0072
-0.1100
0.2121
0.2501
(37)
4
2
15
15.354
0.2900
0.5649
0.3014
(46)
As expected, the quality criteria of the regressions
correlate with each other: RAIC-SC = 0.9997, RAIC-SS =
0.997 and RSC-SS = 0.995. From the inequalities for
the t-values of the regression coefficients (38), one
can indicate the following sequence of
intermolecular interactions: dispersion interaction
(short - range) > dipole - dipole, induction and
polarization interactions > interaction associated
with electron transfer. Each of the contributions of
this sequence has a very specific physical meaning.
This, in turn, makes it possible to associate
interactions with certain physical characteristics of
biophase molecules. Highlighting interactions is
useful for determining the presumed characteristic
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.13
Vladimir Mukhomorov
E-ISSN: 2224-2678
129
Volume 21, 2022
molecular properties of the local centers with which
a bioactive molecule interacts.
The fact that the main contribution to the regression
equation comes from the dispersion interaction
allows us to make some assumptions about the
molecular properties of the biophase with which the
bioactive molecule forms a complex. According to
the definition (6) the interaction of molecules of a
number of chlorinated chemical compounds (Table
2) is most likely to take place with the biophase
region, which is characterized by high values of
electronic polarizability (α2) and has a high first
ionization potential (I2). It is also noteworthy that
for models (15) and (46) the dipole interaction turns
out to be significant. This result indicates the
importance of mutual orientations between bioactive
molecules and biophase molecules.
Figure 4 shows the correlation between the observed
bioactivity values of molecules and the calculated
bioactivities of chemical compounds using equation
(37).
A relationship was also found between the
dispersion contribution Edisp ~ x3 and the molecular
factor Z of chlorine-substituted hydrocarbons
presented in Table 2:
x3(Z) = a0 + a1Z, N = 15, R = 0.71 ± 0.14, R* =
0.73 > R0.05cr(N2) = 0.514; sample size sufficient
for the reliability of the correlation coefficient
N0.05min = 7; correlation coefficient significance test
based on the Fisher normalizing z-transform
(Hotelling corrections is taken into account): uH =
0.87 > u0.05(N) = z0.975∙(N 1)-0.5 = 0.523; a0 = 1.91
± 0.81, a1 = 0.74 ± 0.21, t(a1) = 3.58 > t(a0) = 2.36
> t0.05cr(N 2) = 2.16; standard error of the
regression estimate: 0.846; F = 12.85 > F0.05cr(f1 =
1; f2 = 13) = 4.70; straightness index: K = 2.73 < Kthr
= 3.00. (52)
Fig.4. Scatter plot and regression line. The
regression line is defined by the equation Amod = a0
+ a1Aexp; N = 15, a0 = 0.212 ± 0.355, a1 = 0.960 ±
0.054; R = 0.98 ± 0.01, R* = 0.982 > R0.05cr(N – 2) =
0.514; S = 0.81; F = 313.6 > F0.05cr(f1 = 1; f2 = 13) =
4.7; K = 0.77 < Kthr = 3.0.
Statistics of explanatory variable Z:
Z: N = 15, Zav = 3.78 ± 0.28; 95% confidence
interval: 3.17 - 4.38; Zmin = 2.36, Zmax = 6.00, SZ =
1.10, τmin = 1.29 < τmax = 2.03 < τ0.05cr,2(N) =
2.493 < τ0.05cr,1(N) = 2.617; Wilk-Shapiro normality
test: W = 0.934 > W0.05cr(N) = 0.881, David-Hartley-
Pearson normality test: U10.05cr(N) = 2.970 < U =
[(Zmax Zmin )/SZ] = 3.31 < U20.05cr(N) = 4.170; Nrepr
= 12. (53)
The sets of elements x3 (39) and Z (53) satisfy the
homogeneity and normality conditions at the 95%
confidence level.
Fig.5. Scatter plot and regression line. The
regression line is defined by equation (54).
For the chemical compounds from Table 2 there is
also a significant relationship between the molecular
factor Z and the energy εnb0 (Fig. 5):
εnb0(Z) = a0 + a1Z, N = 15, R = -0.85 ± 0.08, |R*| =
0.85 > R0.05cr(N 2) = 0.514; sample size sufficient
for the reliability of the correlation coefficient:
N0.05min = 5; correlation coefficient significance test
based on the Fisher normalizing z-transform
(Hotelling corrections is taken into account): uH =
1.179 > u0.05(N) = z0.975∙(N1)-0.5 = 0.523; standard
error of the regression estimate: 0.575; a0 = 1.87 ±
0.55, a1 = 0.81 ± 0.14, |t(a1)| = 5.79 > t(a0) =
3.41 > t0.05cr(N2) = 2.16; F = 33.50 > F0.05cr(f1 = 1;
f2 = 13) = 4.70; straightness index: K = 2.04 < Kthr =
3.00. (54)
The statistical significance of the correlation
coefficient is characterized by the inequality (26): t
= 4.35 > t0.05cr(N 2) = 2.16. According to the
Chaddock scale, the correlation coefficient is in the
range of 0.7 0.9, which is characterized as “close
relationship”.For the chemical compounds presented
in Table 2, a relationship was found between
bioactivity and the molecular factor Z. There is a
statistically significant trend between the Z value
and the bioactivity of chemical compounds. Indeed,
0 5 10 15
0
5
10
15
A (experiment)
A (model)
2 3 4 5 6
3
2
1
0
1
Z, arb. units
E(nb), eV
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.13
Vladimir Mukhomorov
E-ISSN: 2224-2678
130
Volume 21, 2022
using the Abbe-Linnick test (35) we obtain the
following inequality:
q = 0.473 < q0.05cr(N = 15) = 0.6027,
Q* = - 2.24 < u0.05 = - 1.645. (55)
Here Zav = 3.78 is the arithmetic mean of the set Zi.
Since inequalities (55) are satisfied, the null
hypothesis about the absence of a trend is rejected.
The alternative hypothesis of a trend is accepted.
The following significant linear relationship of
bioactivity with factor Z was also obtained:
А(Z) = d0 + d1Z, N = 15, R = 0.54 ± 0.20, R* = 0.56
> R0.05cr(N 2) = 0.514; sample size sufficient for
the reliability of the correlation coefficient N0.05min =
13; correlation coefficient significance test based on
the Fisher normalizing z-transform (Hotelling
corrections is taken into account): uH = 0.592 >
u0.05(N) = z0.975∙(N 1)-0.5 = 0.523; d0 = -2.23 ±
3.35, d1 = 1.99 ± 0.95, t(d1) = 2.33 > t0.05cr(N 2) =
2.160; standard error of the regression estimate: SA
= 3.50; F = 5.42 > F0.05cr(f1 = 1; f2 = 13) = 4.70.
(56)
It should be noted that regression (56) does not
make it possible to distinguish between cis- and
trans-isomers. The Z factor correlates with the
electronic polarizability of the molecule α1 (R = 0.72
> R0.05cr(N 2) = 0.514; t = 3.14 > t0.05cr(N 2) =
2.160). There is also a relationship (R = 0.71; t =
3.07 > t0.05cr(N 2)) with the value of α1I1/(I1 + 10).
There is also a relationship between the Z factor and
the MO energy εnb0 (|R| = 0.85 > R0.05cr(N 2) =
0.514; t = 4.35 > t0.05cr(N 2) = 2.160) (Fig. 5). For
chemical compounds (Table 2) having the general
formula C2HxCly (sample volume N = 11) there is a
very close linear relationship between the factor Z
and the value of the one-electron MO energy εnb0: R
= - 0.95, |R*| = 0.96 > R0.05cr(N 2) = 0.602; t =
5.18 > t0.05cr(N 2) = 2.262. Note that the energy
εnb0 of the molecule actually characterizes the
affinity of the molecule for the electron.
When analyzing the data in Table 2, of the two
possible structures of the 1,2-dichlorvinyl molecule,
the cis-isomer structure corresponding to the lower
total energy state of the molecule in a polar medium
was chosen (assuming κs = 80 for the static
dielectric permittivity of the polar medium and n32 =
1.777 for the refractive index). The quantitative
determination of the difference in the total energies
of a molecule in cis- and trans-configurations is
associated with the calculation of the difference
between two large quantities, the accuracy of which
depends significantly on the accuracy of the
quantum-chemical method used. Nevertheless, some
approximate quantitative estimates can be made if
the same method for calculating the electronic
structure of molecules is used in both cases. It is
assumed that possible inaccuracies of the quantum-
chemical method can be compensated for. The total
electronic energies of the trans-isomer and cis-
isomer calculated using the CNDO/2 method are
Etrans = - 1295.97 eV and Ecis = - 1295.95 eV,
respectively. However, in a polar dielectric medium,
the energy of the dipole cis-isomer, compared to the
nondipole trans-isomer, decreases on the energy
scale by
ER = - (κs – 1)(n32 + 2)μ12/[3a3(2κs + n32)] = - 08 eV.
(57)
Here it is accepted: μ1 = 1.77D; a = 2.5 Å is the
effective size of the molecule. This energy value is
noticeably higher in absolute value of the thermal
energy of the translational motion of a molecule at
room temperature (thermal energy 0.02 eV). The
difference in the total electronic energies of the
molecules is practically compensated. Then,
according to Boltzmann's statistics, one can estimate
the number of molecules in cis- and trans-
configurations in a condensed dielectric medium as
follows:
Ncis= Ntransexp[(EtransEcisER)/kBT] = 13Ntrans .
(58)
Consequently, at room temperature, there are more
than ten molecules in the cis-configuration per
molecule in the trans-configuration. That is, most of
the molecules in a polar dielectric medium will
preferably be in the cis-configuration.
Comparing the regressions (15) and (46), we can
make some assumptions about the selective nature
of the biological action at the molecular level of the
analyzed chlorine-containing chemical compounds.
In the first case (Table 1), chemical chlorinated
compounds are most likely to interact with those
areas in the body characterized by strong donor
properties. In the second case (Table 2), the active
sites of the biophase are more prone to the
formation of molecular associates, which are
stabilized due to dispersion and dipole interactions.
This is typical for the active centers of the biophase,
which have high polarization properties (large
values of polarizability α2). Since short-range
interactions are important for equations (15) and
(46), then for the manifestation of the biological
activity of chlorine-containing chemical
compounds, the molecules must approach the active
centers of the biosubstrate at close distances less
than 5Ǻ.
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.13
Vladimir Mukhomorov
E-ISSN: 2224-2678
131
Volume 21, 2022
4 Conclusion
Thus, the studied variability in the toxicity of
chlorinated hydrocarbons (Tables 1 and 2) is
probably due to the accumulation (at least in the
initial stages of action) of exogenous chemical
compounds in the body's active centers. The change
in toxicity can be associated with the electronic
structure of exogenous molecules, which determines
their ability to form bound molecular complexes due
to various types of intermolecular interactions. In
the case when the donor electron level εdon lies
higher on the energy scale of the acceptor level εacc,
a real electron transfer from the impurity molecule
to the biophase molecule is possible. In this case, in
the local region of the biophase, through relaxation
mechanisms, energy is radiationless released,
approximately equal to the difference εdon - εacc. This
energy can be used to destroy the equilibrium
structure of the biophase.
Multiple regression (8), (15), (37) reflect the
existence of objective relationships between the
bioresponse and a variety of molecular factors that
characterize the interaction of molecules in a
condensed medium. The general structure of the
response function (8) with a clear physical meaning
of the explanatory variables is able to cover various
aspects of the manifestation of the toxicity
properties of the analyzed chlorine-containing
chemical compounds of two different classes.
Additive molecular features x1, x2 and x3 are
intensive indicators that determine the cause-and-
effect relationships of bioactivity - the molecular
structure of chemical compounds. At the same time,
the models statistically significantly reflect the
relationship between individual molecular factors
and the biological response of the body. It is
important to note that the relatively simple
mathematical formulas derived from rigorous
theoretical concepts made it possible to obtain a
statistically significant relationship between
bioresponse and the molecular parameters of
chemical compounds. Such a relationship is difficult
to establish by the usual direct deduction from the
general to the particular.
References:
[1] Zahradnik R., Arch. Int. Pharmacodyn.,
Vol.135, 1962, p.311.
[2] Zahradnik R., Experientia, Vol.18, 1962, p.534.
[3] Boček K., Kopecky J., Krivucova M.,
Experientia, Vol.20, 1964, p.657.
[4] Kopecky J., Boček K., Experientia, Vol.23,
1967, p.125.
[5] Hansch C., On the Use of Quantitative
Structure-Activity Relationships in Drug Design
(Review). Chem.-Pharm. Journal, Vol.14,
No.10, 1980, pp.15-30. (in Russian).
[6] Hansch C., Leo A., Substituent Constant for
Correlation Analysis in Chemistry and Biology.
John & Sons, New York, Chichester, Brisbane,
Toronto, 1979.
[7] Aibinder N. E., Bezdvorny V.N., Krasovitskaya
M. L., In: Study of Biological Action of New
Products of Organic Synthesis and Natural
Compounds. Perm. 91 p., 1981, (in Russian).
[8] Förster E., Rönz B., Metohden der
Korrelations- und Regressionsanalyse, Verlag
Die Wirtschaft Berlin, 290 p., 1979.
[9] Likeś J., Laga J., Zăkladnŷ. Statistice Tabulky,
Praha, 356 p., 1978.
[10] Kubini H., In: Biological Activity and Chemical
Structure, Amsterdam, 1977.
[11] Golubev A.A., Lublina E.I., Tolokontsev N.A.,
Quantitative Toxicology, Leningrad, 1973 (in
Russian).
[12] Mukhomorov V.K., Frumin G.T., Quantitative
Ratios of Bioactivity - Electronic
Characteristics of Halocarbons of the Aliphatic
Series, Chem.-Pharm. Journal, Vol.16, No.10,
1982, pp.70-74 (in Russian).
[13] Mukhomorov V.K., Structure-Activity
Relationships. Intermolecular Interactions and
Toxicity of Compounds. Toxicology Letters,
Vol.88, Issue S1, 1996, p.87.
[14] Osipov V.A., Minkin V.M., Reference Book on
Dipole Moments, Мoscow, 1965 (in Russian).
[15] Vereshchagin A.N., Polarizability of
Molecules, Moscow, 1980, (in Russian).
[16] Frumin G.T., Mukhomorov V.K., Comparative
Analysis of Statistical Distributions as the Basis
of Bioactivity-Structure Models, Chem.-Pharm.
Journal., Vol.16, No.10, 1982, pp,70-74, (in
Russian).
[17] Klopman M., Simonetta M., Fujimoto H.,
Reactivity of Molecules and Reaction
Pathways, Moscow, 1977 (in Russian).
[18] Shchembelov G.A., Ustinyuk V.M., Mamaev
V.M., Ischenko V.M., Gloriozov I.P., Luzhkov
V.B., Orlov V.V., Simkin V.Y., Pupyshev V.I.,
Burmistrov V.N., (1980). Quantum-Chemical
Methods of Molecule Calculation. Edited by
V.M. Ustynyuk, Moscow, Publishing House
Chemistry, 1980 (in Russian).
[19] Sutton L.E., Tables of Interatomic Distances,
Configuration in Molecules, London, 1958,
1965.
[20] Frölich H., Theory of Dielectrics, Oxford,
1958.
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.13
Vladimir Mukhomorov
E-ISSN: 2224-2678
132
Volume 21, 2022
[21] Onsager L., Electric Moments of Molecules in
Liquids, J. Am. Chem. Phys., Vol.58, No. 8,
1936, p.1486-1493.
[22] Bakhshiev N.G., Spectroscopy of
Intermolecular Interactions, Leningrad, 1972
(in Russian).
[23] Willcams D., Schaad L., Murrel J., Deviations
from Pairwise Additivity in Intermolecular
Potentials, J. Chem. Phys., Vol.47, No 12,
1967, pp.4916-4923.
[24] Kestner N., Sinanoglu O., Effective
Intermolecular Pair Potentials in Nonpolar
Media, J. Chem. Phys., Vol.38, No.7, 1963,
pp.1730-1740.
[25] Gurevich L.V., Karachevtsev G.V., Kondratyev
V.N., Lebedev Yu. A., Medvedev V.A.,
Potapov V.K., Khodeev Yu. S., Energies of
Breaking Chemical Bonds. Ionization
Potentials and Electron Affinity, Moscow,
Nauka, 1974 (in Russian).
[26] Bazilevsky M.V., Method of Molecular Orbits.
Reactivity of Organic Molecules, Moscow,
Publishing House Chemistry, 1969 (in
Russian).
[27] Kobzar A.I., Applied Mathematical Statistics.
For Engineers and Scientists, FizMatLit.,
Moscow, 2006 (in Russian).
[28] Bolshev L.N., Smirnov N.V., Tables of
Mathematical Statistics, Moscow, Nauka, 1983
(in Russian).
[29] Akaike Hirotogu, A New Look at the Statistical
Model Identification, IEEE Transactions on
Automatic Control, Vol.19, No.6, 1974, pp.
716-723.
[30] Schwarz G., (1978). Estimating the Dimension
of Model, The Annals of Statistics, Vol.6, No.2,
1978, pp. 461-464.
[31] Dmitriev E.A., Mathematical Statistics in Soil
Science, Moscow State University Press, 1995.
[32] Draper N.R., Smith H., Applied Regression
Analysis, John Wiley & Sons, New York, 1996.
[33] Sachs L., Statistische Auswertungsmetoden,
Springer Verlag, Berlin, New York, 1972.
[34] Farrar D.E., Glauber R.R., Multicollinearity in
Regression Analysis: The Problem Revisited,
In: The Review of Economics and Statistics.
Vol.49, No.1, 2018, pp. 92-107.
[35] Zaitsev G.N., Mathematics in Experimental
Botany, Moscow, 1990, (in Russian).
[36] Mukhomorov V.K., Statistical Asptcts of the
Interrelation, Biomedical Statistics and
Informatics, Vol.1, No.1, 2016, pp. 24-34.
[37] Mukhomorov V.K., Entropy Approach to the
Study Biological Activity Chemical
Compounds, Advances in Biological
Chemistry, Vol.1, No.1, 2011, pp. 1-5.
[38] Veljkov V., Lalovič D., General Model
Pseudopotential for Positive Ions, Phys. Lett. A,
Vol.45, No1, 1973, pp. 59-62.
[39] Quastler H., Information Theory in Biology,
Moscow, IL, 1960, (in Russian).
[40] Chaddock R.E., Principles and Methods of
Statistics, Boston, New York, 1925.
[41] Johnson N.L., Leone F.C., Statistics and
Experimental Design, Vol.1, Second Edition,
John Wiley & Sons, New York, London,
Toronto, 1977.
[42] Caillet J., Pullman B., In: Molecular
Associations in Biology, Ed. B. Pullman,
Academic Press, 1968.
[43] Handbook of Chemistry and Physics, 52d. Ed.
R.C.Weast, Ohio, 1971.
[44] Golubev A.A., Lublina E.I., Tolokontsev N.A.,
Quantitative Toxicology, Leningrad, 1973.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the Creative
Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en_US
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2022.21.13
Vladimir Mukhomorov
E-ISSN: 2224-2678
133
Volume 21, 2022