
TABLE I
MINIMUM SAMPLE SIZE niFOR STATISTICAL POWER i
Hypothesis Statistical Power
Testa0.70 0.80 0.90
Student’s t–Test n0.70 = 1775 n0.80 = 2300 n0.90 = 3050
Mann–Whitney n0.70 = 3 n0.80 = 4 n0.90 = 5
aBased on level of significance α= 0.05.
This section presents a machine learning approach to predict
the median count based on training laboratory data. The
proposed approach is aimed at providing accurate and reliable
predictions, which can be of great value in various applica-
tions.
In the field of statistics, a random variable is a function
whose domain is the sample space and range is the real line.
The distribution of the random variable Xtells which values
the random variable takes on and how often it takes on these
values. One way to uniquely characterize the distribution of
a random variable is through a moment generating function,
if the function exists. The moment generating function of a
random variable can be expressed as the expected value,
MX(t) = E[etX ],(1)
for some positive number hsuch that −h<t<h.
Furthermore, the mean, variance and median may be expressed
in terms of a moment generating function, respectively:
•Mean is first moment of X
M′
X(0) = E[X] = µ(2)
•Variance is second central moment of X
M′′
X(0) −[M′
X(0)]2(3)
The median can be characterized in terms of moment generat-
ing function as well [11]. If ˜xindicates the sample median and
h(x)is the asymptotic distribution of ˜x, then the kmoments
˜µkof ˜xare defined as
˜µk=Eh(x−˜µ1)k.(4)
The connections among the moments of the median, machine
learning, and artificial intelligence is that robust machine
learning can be expressed as conditional moment restrictions
that constrain the conditional expectations of the moment
generating function [12].
A sub-field of artificial intelligence is machine learning. In
supervised machine learning, algorithms learn from training
data to predict responses when presented with new data. In
the case of the median, one can start with real count data
obtained from biological experiments in the laboratory from
which one would compute the sample medians. This would
serve as the training data.
Using the training data, the next step is to constrain the
conditional expectations of the moment generating function for
the median. Finally, use the training data and robust machine
learning to directly predict the median count values.
Biologists employ a wide range of biological procedures
to obtain experimental results. Before conducting the decisive
experiment that tests their hypothesis, they often devise treat-
ments and modify their subjects. Nonetheless, executing the
decisive experiment can be particularly complex, especially
when measuring specific variables. In order to generate accu-
rate data, biological researchers must be committed to utilizing
appropriate statistical techniques. It is crucial for researchers
to carefully consider the characteristics and properties of
their data when analyzing results, as applying inappropriate
statistical methods may lead to erroneous conclusions.
We presented the theoretical and methodological foundation
for using artificial intelligence and machine learning to predict
the median count. This research provides the functional formu-
lation of conditional moment restrictions. In further analysis
this work will be extended where medians obtained from
simulated data will be compared to AI aided medians to
conduct statistical root mean squared analysis to evaluate the
quality of AI predictions.
[1] Kim A., Mok B.R., Hahn S., Yoo J, Kim D.H., and Kim T.A. Al-
ternative splicing variant of NRP/B promotes tumorigenesis of gas-
tric cancer. BMB Rep. 2022 Jul;55(7):348-353. doi: 10.5483/BM-
BRep.2022.55.7.034. PMID: 35725010; PMCID: PMC9340087.
[2] He Y., Lu J, Ye Z., Hao S., Wang L., Kohli M., Tindall D..J, Li B.,
Zhu R., Wang L., Huang H. Androgen receptor splice variants bind to
constitutively open chromatin and promote abiraterone-resistant growth
of prostate cancer. Nucleic Acids Res. 2018 Feb 28;46(4):1895-1911.
doi: 10.1093/nar/gkx1306. PMID: 29309643; PMCID: PMC5829742.
[3] Li Y., Gao Xx., Wei C., Guo R., Xu H., Bai Z., Zhou J., Zhu J., Wang
W., Wu Y., Li J., Zhang Z., and Xie X. Modification of Mcl-1 alternative
splicing induces apoptosis and suppresses tumor proliferation in gastric
cancer. Aging (Albany NY). 2020 Oct 14;12(19):19293-19315. doi:
10.18632/aging.103766. Epub 2020 Oct 14. PMID: 33052877; PMCID:
PMC7732305.
[4] Greenbaum, A., A. Rajput, and G. Wan, RON kinase isoforms demon-
strate variable cell motility in normal cells. Heliyon, 2016. 2(9): p.
e00153.
[5] St-Pierre A.P., Shikon V, and Schneider D.C. Count data in biology-
Data transformation or model reformation? Ecol Evol. 2018 Feb
16;8(6):3077-3085. doi: 10.1002/ece3.3807. PMID: 29607007; PMCID:
PMC5869353.
[6] Ramachandran, K.M. and Tsokos, C.P. Mathematical Statistics with
Applications in R. San Diego, CA: Academic Press, 2020.
[7] Glenn Griesinger, N.L., Vrinceanu, D., Jackson, M., and Howell, W.C..
Elementary Statistics: A Guide to Data Analysis Using R. San Diego,
CA, Cognella, 2023.
[8] Passini MA, Bu J, Richards AM, Kinnecom C, Sardi SP, Stanek LM,
Hua Y, Rigo F, Matson J, Hung G, Kaye EM, Shihabuddin LS, Krainer
AR, Bennett CF, and Cheng SH. Antisense oligonucleotides delivered
to the mouse CNS ameliorate symptoms of severe spinal muscular
atrophy. Sci Transl Med. 2011 Mar 2;3(72):72ra18. doi: 10.1126/sci-
translmed.3001777. PMID: 21368223; PMCID: PMC3140425.
4. Artificial Intelligence and Statistical
Moment Generating Functions
4.1 Expressing Median as Moments of a Distribution
4.2 Artificial Intelligence Sub-field of Machine Learning
5. Discussion and Conclusions
References
MOLECULAR SCIENCES AND APPLICATIONS
DOI: 10.37394/232023.2024.4.5
Paul D. Glenn II, Nancy L. Glenn Griesinger, Demetrios Kazakos