An application of size-bias method
HASSAN HAJI
Department of Statistics
Imam Khomeini International University
Qazvin, IRAN
Abstract: In this paper, we derive an upper bound on the Kolmogorov distance between the
distribution of a sum of indicator random variables and a standard normal distribution by using the
size-bias method. Also, we give lower and upper bounds for distribution function of sum of
indicator random variables in two special points.
Keywords: Indicator random variables, size-biased distribution, Kolmogorov distance.
1. Introduction
Size bias occurs famously in waiting-time
paradoxes, undesirably in sampling schemes, and
unexpectedly in connection with Stein’s method,
tightness, analysis of the lognormal distribution,
Skorohod embedding, infinite divisibility, and
number theory [1,4]. For a non-negative random
variable
X
with
<)(= XE
, we say a
random variable
s
X
has the size-biased
distribution with respect to
X
if
)),((=))(( s
XfXXf EE
For all
R)[0,:f
such that
|<)(| XXfE
[2].
Let
i
n
iYY 1=
=
, where
0
i
Y
and
1.
have the size-biased distribution
of
i
Y
independent of
ijj
Y
)(
and
ij
s
j
Y
)(
for
ni 1,...,=
.
2. Define a vector
ij
i
j
Y
)( )(
such that its
conditional distribution given
coincides
with that of
ijj
Y
)(
given
i
Y
.
3. Choose an index
J
such that
)(
)(
=)=( Y
Y
jJP j
E
E
.
Then
s
J
J
k
Jk
sYYY
)(
=
has the size-biased
distribution with respect to
Y
(see Section 2.4
in [2]). For an indicator random variable
I
,
1=
)(
1)=(
=1)=( I
IP
IP sE
, which means
1=
s
I
. Then in this case,
1= )(
J
k
Jk
sYY
.
The paper is organized as follows. In Section
2, we show the simple calculations related to the
sum of indicator random variables on a random
permutation. Section 3 is devoted to the proofs of
our results. We use the size-bias method to prove
Theorem 2 and get an upper bound on the
Kolmogorov distance between the distribution of
sum of indicator random variables and a standard
normal distribution. Also, we give lower and
upper bounds for distribution function of
n
in
two special points. To emphasize the practical
usefulness of our results, we note that
n
is
related to the number of leaves in tree structures.
In the other words, the expectation and variance
of
n
is important for studying of random trees.
Received: October 25, 2022. Revised: May 10, 2023. Accepted: June 14, 2023. Published: July 12, 2023.
EQUATIONS
DOI: 10.37394/232021.2023.3.3
Hassan Haji
E-ISSN: 2732-9976
25
Volume 3, 2023
2. Preliminaries
Set
. 0,
1,
:=)( otherwise
trueisAif
AI
Let
),...,(= 1n
ttt
be a permutation on
}{1,2,...,n
and
1)>( ,= 1,
1
1=
nI ii
n
i
n
where
)>(=
,jiji ttI I
. We have the following
facts:
,
2
1
=1)=( 1, ii
IP
(1)
1.|>| ,
4
1
1|=| ,
6
1
=1)=( 1,1, ji
ji
IIP jjii
(2)
Let
)>>(=
,, kjikji tttI I
. Then
1.|>| ,
36
11|=| 0,=
1)=( 12,,12,,ji
ji
IIP jjjiii
(3)
Thus from (1),
.
2
1
=)(
n
n
E
From (2),
)(=)( 1,1,1,
1
1=
2
jjii
ji
ii
n
i
nIIIEE
6
2)2(
4
1)2)((
2
1
=
nnnn
12
453
=2 nn
(4)
and thus
.
12
1
=)(
n
n
Var
Since
)(1=)( AAcII
,
12
1
=)1(=)( 1,
1
1=
n
nI n
cii
n
i
VarVar
(5)
and from (3),
36
4)3)((
6
2
))(( 2
12,,
2
1=
nnn
Iciii
n
i
E
.
36
=2nn
Hence
.
36
43
)( 12,,
1
1=
n
Iciii
n
i
Var
(6) (6)
In the same manner,
.
36
43
)( 11,,
1
2=
n
Iciii
n
i
Var
(7)
Theorem 1 [4] Let
X
be a nonnegative
random variable with mean and variance
and
2
, respectively, both finite and positive.
Suppose
s
X
has the size-biased distribution
with respect to
X
which satisfies
CXX s ||
for some constant
0>C
with
probability one. Let
2
=
C
A
.
If
XX s
with probability one, then
0.> ,
2
exp)( 2tallfor
A
t
tFX
If the moment generating function
)(=)( X
em
E
is finite at
C2/=
, then
./2= 0, >
,
)2(
1)(
2
CBwheretallfor
BtA
t
tFX
exp
Such concentration of measure results are applied
to a number of new examples: the number of
relatively ordered subsequences of a random
permutation, sliding window statistics including
the number of
m
-runs in a sequence of coin
tosses, the number of local maxima of a random
function on a lattice, the number of urns
containing exactly one ball in an urn allocation
model, and the volume covered by the union of
n
balls placed uniformly over a volume
n
subset of
d
R
.
EQUATIONS
DOI: 10.37394/232021.2023.3.3
Hassan Haji
E-ISSN: 2732-9976
26
Volume 3, 2023
3. Main Results
In this section, an upper bound on the
Kolmogorov distance between the distribution of
a sum of indicator random variables and a
standard normal distribution is obtained by using
the size-bias method. Also, the lower and upper
bounds for distribution function of sum of
indicator random variables in two special points
is given.
The Wasserstein distance between any
two probability measures
and
on
))(,( RBR
is defined as follows
,)()()()(
sup
=),( xdxhxdxhdis Hh
W
RR
where
|}||)()(:|:{:= yxyhxhhH RR
.
For random variables
X
and
Y
, the
Kolmogorov distance between their distributions
is defined as
.|)()(|
sup
=),( xFxFYXdis YX
x
K
Also, for a random variable
X
with Lebesgue
density bounded
C
[7],
.),(2),( YXCdisYXdis WK
(8)
Let
X
be a non-negative random variable with
<)(XE
. Let
s
X
have the size-biased
distribution with respect to
X
. If
)(
)(
=X
XX
TVar
E
and
(0,1)NZ
, then [6,7]:
))|((
2
)(
)(
),( XXX
X
X
ZTdis sW EVar
Var
E
).)((
)(
)( 2
2
3XX
X
Xs E
Var
E
(9)
Using Jensen’s inequality for
2
=)( xxf
,
)),|(())|(( 21 GG XX EVarEVar
(10)
where
21,GG
are two sigma-fields, satisfying
21 GG
[3]. Thus, if
),...,(= 1,1,2 nn
II
F
,
then
F )( n
.
Theorem 2 Suppose
(0,1)NZ
and
.
12
1
2
1
=
n
n
Tn
Then
.
1)(
1
312
1
33
2
),(dis
2
1
3
n
n
n
n
ZT
K
Proof. Choose an index
J
uniformly at random
from the set
1}{1,..., n
, then size-bias
1, JJ
I
by letting it equal to one, and take the remaining
summands conditional on
1=
1, JJ
I
. We can
realize
1=
1, JJ
I
by adjusting the order of
J
t
and
1J
t
such that
1
>JJ tt
, and
s
n
denotes
the number of descents in
t
after adjusting the
order of
J
t
and
1J
t
. Then for
1=J
,
c
n
s
nIIIM 1,22,31,31 )1(=:=
,= 1,3,21,2 cc II
for
1= nJ
,
cnnnnnnn
s
nn IIIM 1,12,2,1)1(=:=
,= 2,1,1, cnnn
cnn II
and for
22 nJ
,
cJJJJJJJJJJn
s
nJ IIIIIM 1,21,1,2,11, )1(=:=
.= 12,,11,,1, cJJJ
cJJJ
cJJ III
From (5), (6) and (7),
)(
1)(
1
=))|(( 1
2
2=
1
2
ni
n
i
n
s
nMMM
nVarEVar F
)(
1)(
1
=11,,
1
2=
12,,
1
1=
1,
1
1=
2ciii
n
i
ciii
n
i
cii
n
i
III
n
Var
)()((
1)(
312,,
1
1=
1,
1
1=
2ciii
n
i
cii
n
i
II
n
VarVar
EQUATIONS
DOI: 10.37394/232021.2023.3.3
Hassan Haji
E-ISSN: 2732-9976
27
Volume 3, 2023
))(11,,
1
2=
ciii
n
i
I
Var
)
36
43
2
12
1
(
1)(
3
2
nn
n
.
1)12(
59
=2
n
n
Also,
))|)(((=))(( 22 F
n
s
nn
s
n EEE
)(
1)(
1
=21
2
2
2=
2
1
ni
n
i
MMM
nE
1.
Proof is completed from (9) and then (8), since
2
1
2
1/2
2
x
e
for all
Rx
.
Suppose
(0,1)NZ :
. It is obvious that for
1<<0 z
,
n
n
zn
FZ 0,)
1)/12(2
1)1)((
(
(11)
and for
1>z
,
. 1,)
1)/12(2
1)1)((
(
n
n
zn
FZ
(12)
Theorem 3 We have
1.< 0,
1> 1,
=)
2
1
(
lim s
s
s
n
Fn
n
Proof . Since
0>
n
, then
0=)
2
1
(s
n
Fn
for
0s
. Also
).
1)/12(2
1)1)((
(=)
2
1
(
n
sn
Fs
n
FT
n
From Theorem 2 and the definition of
Kolmogorov distance,
.
1
)
1)/12(2
1)1)((
()
1)/12(2
1)1)((
(
4
1
n
n
sn
F
n
sn
FZT O
From (11) and (12), the proof is completed.
Theorem 4 For
0>s
,
)
1
1)(
(exp11/2))1)((( 2
s
sn
snF n
and
).1)((exp1/2))1)((( 2
snsnF n
Proof. The inequalities are proved with selection
1)/12(
1)(
=
n
sn
t
in Theorem 1, since
1|| s
nn
.
References
[1] Arratia, R. Goldstein, L., Size bias, sampling,
the waiting time paradox, and infinite divisibility:
when is the increment independent, Available in
http://bcf.usc.edu/ larry/papers/pdf/csb.pdf, .
2009.
[2] Arratia, A. Goldstein, L. and Kochman, F.,
Size-bias for one and all, Preprint. Available at
arXiv: 1308.2729, 2013.
[3] Billingsley, P., Probability and Measure,
John Wiley and Sons, New York, 1885.
[4] Ghosh, S. and Goldstein, L., Concentration of
measures via size-biased couplings. Porbability
Theory and Related Fields, 2011, 149, 271-278.
[5] Ghosh, S. and Goldstein, L., Applications of
size biased couplings for concentration of
measures, Electronic Communications in
Probability, 2011, 16, 70-83.
[6] Goldstein, L. and Rinott, Y., Multivariate
normal approximations by Stein’s method and
size bias couplings, Journal of Applied
Probability, 1996, 33(1), 1-17.
[7] Ross, N., Fundamentals of Stein’s method,
Probability Surveys, 2011, 8, 210-293.
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
The author contributed in the present research, at all
stages from the formulation of the problem to the
final findings and solution.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
No funding was received for conducting this study.
Conflict of Interest
The author has no conflict of interest to declare that
is relevant to the content of this article.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
EQUATIONS
DOI: 10.37394/232021.2023.3.3
Hassan Haji
E-ISSN: 2732-9976
28
Volume 3, 2023