Positive Definite Kernels for Partitions
JYRKO CORREA-MORRIS
Mathemtics and Natural Sciences Department
Miami Dade College, Padron Campus
627 SW 27 Ave, Miami, Florida
USA
Abstract: This paper presents a comprehensive exploration of various families of positive definite kernels for
comparing partitions. It not only reviews existing examples from the literature but also introduces novel classes
of positive definite kernels. These new classes include kernels based on agreement and ones designed using the
concept of hidden variables. The study also focuses on assessing the compatibility of these kernels with structural
properties that capture the intrinsic notion of proximity between partitions. Notably, agreement-based kernels are
demonstrated to align well with this notion. Moreover, the paper provides two generic procedures for designing
hidden-feature-based kernels that also adhere to the specified structural properties.
Key-Words: clustering, definite positive kernels, lattice of partitions
Received: May 11, 2023. Revised: August 12, 2023. Accepted: September 23, 2023. Published: October 9, 2023.
1 Introduction
1.1 Overview.
Since the turn of the century, there has been a signif-
icant increase in interest in exploring existing clus-
tering methods, [1], [2], [3], within the field of pat-
tern recognition. This surge in interest can be traced
back to J. Kleinberg’s seminal paper on an impossi-
bility theorem for clustering, [4], which has served as
a catalyst for other formal studies, [5], [6], [7].
The exploration of measures for comparing par-
titions has also been influenced by this growing in-
terest. Evidently, notable contributions have been
made, [8], [9], [10], [11]. There seems to be a con-
sensus regarding the existence of fundamental princi-
ples that govern the natural proximity of partitions,
particularly when dealing with structural data. For
instance, when comparing graphs or trees, the edit-
ing distance, despite its computational complexity, is
often considered the natural proximity measure, and
proposed measures are evaluated based on their com-
patibility with this intrinsic paradigm, [12]. In this pa-
per, we focus on studying an almost unexplored class
of measures: that of positive definite kernels for com-
paring partitions. A similarity measure k:X×X
Ris said to be a positive definite kernel when the
dataset Xcan be embedded into a high-dimensional
Euclidean (possibly an infinite-dimensional Hilbert)
space Hby means of a map φsuch that the similarity
k(x, x)between the original data xand xmatches
the dot product hφ(x), φ(x)iHof the vectors φ(x)
and φ(x)in H, [13]. The map φ:X H, the
space H, and the vectors φ(x),xX, are referred to
as the feature map,feature space, and feature vectors,
respectively.
Kernel-based methods have emerged as powerful
tools for efficiently handling complex data such as
strings, graphs, images, documents, and multimodal
data, [14]. The key insight is that by computing sim-
ilarities as dot products in a high-dimensional feature
space Hwithout explicitly visiting it, linear models
can be effectively applied in non-linear settings, [13].
To achieve this, a nonlinear feature map φis utilized
to map the data into the high-dimensional space H,
where linear models can be employed. This approach
allows for the utilization of additional structures of the
feature space (e.g., vector space structure, topological
structure) that may not be present in the original data
space. As a result, operations like arithmetical oper-
ations on feature vectors can be extended to graphs,
histograms, and partitions, providing greater flexibil-
ity in handling diverse data types.
However, the success of kernel-based methods
largely depends on the choice of the kernel measure,
which should align with the specific application and
data characteristics.
In the context of clustering, the term “clustering
kernel” is often associated with the kernel K-means
method, [15]. This method involves mapping the
original data Xinto the feature space H, enabling K-
means to be performed in the transformed space us-
ing an implicit nonlinear map φ. A challenge with
this approach is that the explicit knowledge of the
feature map is not available, making the updating
of centroids cumbersome. Nonetheless, kernel K-
means has demonstrated empirical success in iden-
tifying complex clusters that are non-linearly sepa-
rable in the original data space. Furthermore, spec-
tral clustering, which leverages eigenvectors of a cer-
tain matrix computed from the similarity matrix, has
been shown to be a special case of kernel K-means,
[16]. Although variants of this method exist (e.g.,
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.77
Jyrko Correa-Morris
E-ISSN: 2224-2880
702
Volume 22, 2023
[17], [18], [19], [20]), their contributions to kernel
construction and design are somewhat limited.
In contrast, a cluster ensemble algorithm (see,
for instance, [21]) that utilized positive definite ker-
nels to compare partitions and construct average-
based consensus functions was reported, [22]. The
authors introduced “Set significance-based kernels”
for this purpose. Recognizing that maximizing con-
sensus functions can be computationally expensive,
even with simple measures like the Rand Index,
[23], the authors transformed the consensus prob-
lem from the space of partitions to the feature space,
which is a high-dimensional Euclidean space H. This
shift allowed for the use of well-known and eas-
ily computable solutions to the problem of maximiz-
ing average-based consensus functions in the feature
space. The article presented a novel approach to
tackle the cluster ensemble problem, reducing the task
of maximizing consensus functions to a kernel preim-
age problem. In other words, the consensus solution,
computed exactly in the feature space, needed to be
brought back to the space of partitions. However, [22]
made limited progress in solving the preimage prob-
lem, as it heavily depended on the specific kernel
used. Recent contributions in the field have shown
significant advancements towards achieving this goal
for specific kernels. For instance, [24] focused on
consensus functions defined from the Rand index,
while [25] concentrated on prototype-based kernels.
Nevertheless, this area of research remains rich with
several unanswered questions.
Driven by these inquiries, this paper presents sev-
eral classes of positive kernels for partitions and in-
vestigates their behavior by analyzing structural prop-
erties that emulate the inherent proximity between
partitions.
1.2 The natural notion of proximity
between partitions
Let Sbe a finite dataset. A partition p of Sis a col-
lection of nonempty subsets {C1, C2, . . . , Cs}of S,
called the clusters of p, such that S=s
i=1Ciand
CiCj=if i6=j. The set of all partitions of S
(henceforth denoted by PS) is endowed with the re-
finement of partitions. The partition p is said to refine
the partition pif, and only if, every cluster of pis
the union of some clusters of p. In particular, every
cluster of p is contained in some cluster of p. The
notation p pmeans p refines p.
Also, PSis endowed with two operations: the meet
and the join , which assign to any two given parti-
tions p and pthe coarsest partition p pthat refines
p and pand the finer partition simultaneously refined
by p and p, respectively. Two elements x, xSare
in the same cluster of p pwhen they are in the same
cluster of p and in the same cluster of p. Two ele-
ments x, xSare placed in the same cluster of pp
when there is a sequence x=xi1, xi2, . . . , xik=x
such that, for all j {1,2, . . . , k 1},xijand xij+1
are either placed in the same cluster of p or in the same
cluster of p.
PScan be spatially organized into layers of
partitions such that only partitions with the same
number of clusters are placed in the same layer:
(1) Place the all-singleton partition in the bottom
layer. (2) A partition pcovers another partition
p when p refines pand there is no partition p′′
refined by p that also refine p. In the kth layer,
place all partitions that cover some partition placed
in the (k1)th layer. (3) A line segment is drawn
between two partitions p and pif one of them cov-
ers the other. (The resulting diagram can be seen here:
https://blogs.ams.org/visualinsight/2015/06/15/lattice-
of-partitions/)
The underlying notion of proximity among parti-
tions is associated to this spatial organization. Indeed,
since the Hasse diagram is a connected graph, we can
reach one partition from another by exclusively trav-
eling along the edges of this diagram, the shorter the
path connecting two partitions, the greater the similar-
ity between these partitions. To see some of the char-
acteristic features of such a proximity, let us start by
considering an ascending path starting at a partition
p. Note that every partition in this path is not only re-
fined by p, but by all the partitions that precede it in
the path. Thus, if pand p′′ are two partitions in this
path such that pis reached before p′′ when traveling
this path from p, we have that p refines both pand p′′,
prefines p′′, and obviously, p is closer to pthan to
p′′. Since three partitions p pp′′ are always part
of an ascending path starting at p, such distinguishing
quality of the natural proximity can be formally stated
as follows:
Bottom-Up Collinearity: If p pp′′, then p
is more similar to pthan to p′′. Thus, it is expected
that any similarity measure kfor comparing partitions
(in particular, positive kernels) satisfies: k(p,p)
k(p,p′′),provided that p pp′′.
Analogously, we can take instead a descending
path starting at a partition p′′. All partitions that we
find along the way not only refine p′′ but all the parti-
tions that appear before it in this path. Consequently,
if the partition p is farther from p′′ in this path than
another partition p, then p′′ is refined by both pand
p, and pis refined by p. This distinguishing property
can be formally formulated as follows:
Top-Down Collinearity: If p pp′′, then
p′′ is more similar to pthan to p. Accordingly, it is
expected that any similarity measure kfor compar-
ing partitions (in particular, positive kernels) satisfies:
k(p′′,p)k(p′′,p), as long as p pp′′.
The two properties above only involve partitions
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.77
Jyrko Correa-Morris
E-ISSN: 2224-2880
703
Volume 22, 2023
either in an ascending path or a descending path of the
diagram that describes the natural spatial organization
of partitions. As we saw above, these conditions re-
quire for any two considered partitions that one re-
fines the other. How about partitions p and psuch
that neither of them refines the other? Note that, in
such a case, any path from one of these partitions to
the other either ascends first and then it descends, or
it descends first and then it ascends. The partition at
which this path turns back happens to be the join pp
of p and pand the meet ppof the partitions p and p,
respectively. Thus, the similarity between two parti-
tions p and pis at most equal to the highest similarity
value from among the similarity values between the
partitions p and p pand between the partitions p
and p p. This leads us to the following property.
Meet-Join Predominance: The similarity be-
tween two partitions p and pnever exceeds the max-
imum value from among the similarity between p and
ppand the similarity between p and p p. Thus, it
is expected that, for any partitions p and p,k(p,p)
max{k(p,pp), k(p,pp)}.When for any two par-
titions, the previous inequality always holds with the
meet (resp. with the join), we name this property Meet
predominance (resp. Join predominance).
1.3 Contributions and organization
The main contributions of this paper are: (1) The
class of agreement-based kernels is introduced and
formally studied. In particular, the Rank kernel and
consensus-based kernels are presented as paradigms
of this family. (2) Kernels designed from a set of hid-
den features are investigated. Special attention is ded-
icated to significance-set-based kernels and diameter-
based kernels. (3) The compliance of positive definite
kernels with the intrinsic notion of proximity between
partition is examined in detail. Analogous properties
have been previously employed to characterize met-
rics used for comparing partitions, [11]. In that work,
these properties were applied to study the behavior of
the metrics. Additionally, the compatibility between
these properties and submodularity has been investi-
gated, [26].
Aside from Introduction, the paper is organized
in five additional sections. In Section 2, agreement-
based kernels are introduced and their theoretical
foundation is established. Section 3 introduces and
studies the class of kernels designed from hidden fea-
tures. Particularly, two main procedures for the pur-
pose of designing hidden features whose associated
kernel is in compliance with intrinsic notion of prox-
imity between partitions are presented. Section 4
shows that prototype-based kernels do not respect the
natural notion of proximity between partitions. Sec-
tion 5 is devoted to the conclusions and future work,
while the proofs of the main results are included in the
appendix.
2 Agreement-based Kernels
In this section, Jdenotes the set of all possible un-
ordered pairs of elements in Swhile, for a given par-
tition p of S,J(p)denotes the set of those pairs in J
that lie in the same cluster of p. For two partitions p
and pof S, there are the sets: N00 =J(p)J(p),
N01 =J(p)J(p)C,N10 =J(p)CJ(p), and
N11 =J(p)CJ(p)C, where the superscript Cin-
dicates the complement of the set. The sizes of these
sets are denoted by n00,n01,n10, and n11, respec-
tively.
If a pair of objects is either in N00 or N11 (resp.
N10 or N10), then both partitions p and pagree with
(resp. differ in) regard to whether these objects should
be placed in the same cluster or not. Thus, n00 +n11
(resp. n01 +n10) represents the total number of agree-
ments (resp. disagreements) between p and p. Note
also that every pair belong to exactly one of these sets
and therefore n
2=n00 +n01 +n11 +n11. The fol-
lowing identities hold:
n00 =
s
i=1
r
j=1 nij
2;
n01 =
s
i=1 ni
2
s
i=1
r
j=1 nij
2;
n10 =
r
j=1 n
j
2
p
i=1
q
j=1 nij
2;
n11 =n
2n00 n01 n10;
where ni,n
j, and nij denote the sizes of the ith
cluster of p (1is), the jth cluster of p(1
jr), and the overlapping between the ith cluster of
p and the jth cluster of p, respectively.
To define an agreement-based kernel, we consider
non-negative functions ω, ν :JR+(not necessar-
ily different), which associate a non-negative weight
to each pair of objects {x, x}. The agreement-based
kernel corresponding to the weight functions ωand ν
is defined by:
kων (p,p) :=
{x,x}∈N00
ω(x, x)+
{x,x}∈N11
ν(x, x).
(1)
Proposition 1. Any agreement kernel kων is a posi-
tive definite kernel.
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.77
Jyrko Correa-Morris
E-ISSN: 2224-2880
704
Volume 22, 2023
Agreement-based kernels are compatible with the
intrinsic notion of proximity on the lattice of parti-
tions of S.
Proposition 2. Any agreement kernel kων satisfies
Collinearity, Dual Collinearity, and Meet Predomi-
nance. However, these kernels fail to satisfy Join Pre-
dominance.
2.1 Rand Index
The Rand Kernel is perhaps the most known
agreement-based kernel. It quantifies the similarity
between two partitions p and pby counting the total
number of agreements, all pairs being considered to
have the same weight (i.e., for every pair {x, x} J,
ω({x, x}) = ν({x, x}) = 1). Thus,
RK(p,p) = n11 +n00.(2)
The original Rand index, published in 1971 by W.
M. Rand, is given by RI(p,p) = n11+n00
n11+n10+n01+n00 ,
and hence RI(p,p) = RK(p,p)
(n
2). On the other
hand, we can normalize any kernel Kby setting
NK(p,p) = K(p,p)
K(p,p)K(p,p). Since RK(p,p) =
RK(p,p) = n
2, we get NRI(p,p) = RK(p,p)
(n
2)=
RI(p,p), which proves that the original Rand index
concurs with the normalized Rand kernel.
To show that agreement-based kernels fail in
general to satisfy Join Predominance, let us con-
sider the partitions p ={{x1, x2},{x3, x4}} and
p={{x1, x2, x3},{x4}} of S={x1, x2, x3, x4}.
Note that p p={{x1, x2, x3, x4}} and
that, for p and p,N00 ={{x1, x2}},
N01 ={{x1, x3},{x2, x3}},N10 ={{x3, x4}},
and N11 ={{x1, x4},{x2, x4}}; while for p
and p p,N00 ={{x1, x2},{x3, x4}},N01 =
{{x1, x3},{x1, x4},{x1, x3},{x2, x3},{x2, x3}}
(N10 and N11 are empty). This gives RK(p,p)=3
and RK(p,pp)=2, which means that p is more
similar to pthan to p p.
To end, note that, closely related to the Rand kernel
is its counterpart: the well-known Symmetric Differ-
ence metric:
SDM(p,p) = n10 +n01;(3)
defined as the total number of disagreements. Thus,
given partitions p and p, the equality RK(p,p) +
SDM(p,p) = n
2holds. Any positive definite ker-
nel khas an associated metric Mk, which is defined by
Mk(p,p) = k(p,p) + k(p,p)2k(p,p).SDM is
(up to a constant factor) the metric associated to RK:
MRK (p,p) = RK(p,p) + RK(p,p)2RK(p,p)
= 2n
22 [n11 +n00]
= 2 [n01 +n10]
= 2 SDM(p,p).
2.2 Consensus Kernels
The motivation for consensus kernels lies in the os-
tensible benefits of incorporating some prior knowl-
edge. The intuition here is to use the co-association
matrix associated to a given ensemble Eto determine
the weights of the pairs. Formally, given an ensem-
ble E={p1,p2, . . . , pm}of partitions of S, the as-
sociated co-association matrix is the n×nmatrix
AE= (aij )m
i,j=1 whose (i, j)-entry carries the ratio
between the number of partitions in Efor which the
pair {xi, xj}is placed in the same cluster and the total
number the partitions in the ensemble.
Defining the weight functions ωand νfrom the
matrix AEby ω({xi, xj}) = aij and ν({xi, xj}) =
1aij ,we define the consensus kernels associated to
the ensemble Eby:
kE(p,p) =
{xi,xj}∈N00
aij +
{xi,xj}∈N11
(1aij ).(4)
3 Another family of positive definite
kernel: Hidden features
Hidden features is a term used to designate data fea-
tures that are not actually observed but rather de-
termined by or inferred from the observed features.
Thus, hidden features are associated to functions fi:
XRthat assign every datum, in our case a parti-
tion p, to the measurement fi(p)of the ith hidden fea-
ture. Every partition is then represented by the vector
φ(p) = (f1(p), f2(p), . . . , fi(p), . . .).Given a finite
or enumerable set of hidden features, a positive de-
fined kernel for comparing partitions can be defined
by setting k(p,p) = ifi(p)·fi(p).
In this section, we seek sufficient conditions for
these kernels to be in compliance with the natural
proximity between partition. In this regard, Theorem
1 provides a procedure that enables us to construct
many examples of such kernels.
Theorem 1. Let fi,iI(finite set), be hidden fea-
tures for the partitions of the dataset S. Suppose that
the following conditions hold:
(i) For all pPS,fi(p)0.
(ii) ppand fi(p)>0imply fi(p)fi(p)>0.
Moreover, the equality holds for all hidden fea-
tures fiif, and only if, pand pare the same par-
tition.
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.77
Jyrko Correa-Morris
E-ISSN: 2224-2880
705
Volume 22, 2023
(iii) ppimplies
i,fi(p)>0
fi(p)
i,fi(p)>0
fi(p)<
i,fi(p)=0
fi(p).
(5)
(iv) For all p,pPS, if both fi(p)and fi(p)are
positive, then fi(pp)is positive as well.
Then the kernel k(p,p) = iIfi(p)·fi(p)satisfies
Bottom-Up Collinearity, Top-Down Collinearity, and
Meet Predominance.
Condition (i) establishes that all feature vectors
φ(p)have all their components nonnegative, which
amounts to saying that the feature map φembeds PS
into the first closed orthant of the p-dimensional Eu-
clidean space, where pis the number of hidden fea-
tures. Condition (ii) asserts that, if the measurement
of a hidden feature fiat the partition p is positive,
then it decreases along any ascending path starting
at p without vanishing. A word of caution is neces-
sary here. Condition (ii) must not be mistaken with
that which asserts that fidecreases along any ascend-
ing path without vanishing. Condition (ii) allows that
fi(p)=0and fi(p)>0, for some prefined by p.
Additionally, Condition (iii) establishes that the num-
ber of hidden features that vanish at p sufficiently ex-
ceeds the number of such features that vanishes at p.
To get a better insight into (iii), let φ(p)+and φ(p)+
denote the vectors obtained from φ(p)and φ(p)by
considering only those hidden variables that are posi-
tive when measured on p, and let φ(p)0be the vector
obtained from φ(p)by considering only those hidden
variables that vanish when measured on p. In terms
of the 1-norm of Euclidean spaces, (ii) implies that
||φ(p)||1=||φ(p)+||1>||φ(p)+||1while (iii) can
be restated as ||φ(p)+||1 ||φ(p)+||1<||φ(p)0||1,
which forces ||φ(p)||1<||φ(p)||1. Finally, Condi-
tion (iv) prevents fifrom vanishing at the meet p p
of two partitions p and pif fireturns a positive mea-
surement for both partitions p and p.
3.1 Set significance based kernels
The set-significance-based kernels were introduce by
Vega-Pons et. al. [22]. In this case, the hidden vari-
ables are the subsets of S. Given a partition p, how
significant with respect to p every subset sSit is
determined. Assuming that for a partition p the most
significant subsets of Sare its clusters, the closer to be
a cluster of p the subset sis, the more significant for
p. As an attempt to capture this essence, the following
definition is given. A function µ: 2S×PSR0is
said to be a measure of set significance if for all clus-
ters cand cof pand all same-size subsets s1and s2of
Swith s1cand s2c,µ(s1,p)< µ(s2,p)if, and
only if, cis bigger than c; and µ(s1,p) = µ(s2,p)if,
and only if, cand chave the same size.
Listing the subsets s1,s2, . . . , s2nof Sin a prede-
termined order, the ith hidden feature fimeasures the
significance of ith subset siof Sthrough the formula
fi(p) = µ(si,p).
In [22], the authors used the measure of signifi-
cance:
µ(si,p) = |si|
|c|,if sic for some c p;
0,otherwise (6)
where the bars | | indicate the size of the set.
The kernel kµis a typical example of a kernel con-
structed from the procedure described in Theorem 1.
Proposition 3. kµ(p,p) = 1
4s
i=1 r
j=1 2nij ·
n2
ij +nij
nin
j. Furthermore, kµsatisfies Bottom-Up
Collinearity, Top-Down Collinearity, and Meet Pre-
dominance.
3.2 Expanding on Theorem 1
Theorem 1 provides a suitable procedure for the con-
struction of clustering kernels that are consistent with
the intrinsic notion of proximity between partition.
However, this procedure hinges on sufficient condi-
tions that not all such kernels are supposed to meet.
This section gives an alternative procedure that also
ensures the compliance of hidden-feature-based ker-
nels with the aforementioned structural properties.
The idea consists of superseding conditions (ii)
(iv)with some analogous ones that are applicable to
families of hidden features that fail to satisfy those
required by Theorem 1.
Theorem 2. Let fi,iI(finite set), be hidden fea-
tures for the partitions of the dataset S. Suppose that
the following conditions hold:
(i) For all pPS,fi(p)0.
(v) ppand fi(p)>0imply fi(p)fi(p)>0.
(vi) ppimplies
i,fi(p)>0
fi(p)
i,fi(p)>0
fi(p)<
i,fi(p)=0
fi(p).
(7)
Then the kernel k(p,p) = iIfi(p)·fi(p)satisfies
Bottom-Up Collinearity, Top-Down Collinearity, and
Meet Predominance.
Now we introduce a novel class of kernels for par-
titions whose conformity with the intrinsic proxim-
ity between partition can be justified by Theorem 2:
diameter-based kernels.
Let us consider a similarity measure σ:S×
S[0,1] (note that σdoes not quantify the sim-
ilarity between partitions, but between the underly-
ing data). By a similarity measure here we mean a
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.77
Jyrko Correa-Morris
E-ISSN: 2224-2880
706
Volume 22, 2023
symmetric function (i.e., σ(x, x) = σ(x, x)) such
that every datum maximizes the similarity to itself
(i.e., σ(x, x)σ(x, x)for all x, xS). Given
a subset sof S, the diameter of swith respect to σis
diam(s) = max{σ(x, x) : x, xs, x 6=x}.
Like set-significance-based kernels, diameter-
based kernels kσare designed using the subsets sof
S(sorted in a list) as the hidden variables. Denoting
by ci(p)the total number of clusters of p contained in
the subset siand by and νi(p)the function indicator of
non-singleton clusters of p within si, having the value
1 if there are singleton clusters of p within siand the
value 0 otherwise, the ith hidden feature is defined by:
fi(p) = 1 + νi(p)·diam(si)
ci(p),if ci(p)>0;
0,if ci(p) = 0.
Proposition 4. kσsatisfies Bottom-Up Collinearity,
Top-Down Collinearity, and Meet Predominance.
4 A well-known family of not
well-behaved kernels
We conclude with a brief note regarding the
prototype-based kernels. These kernels are designed
from a set of prototypes P:= {p1,p2, . . . , pm} PS
and a dissimilarity measure d:PS×PSR(not
necessarily a metric) for comparing partition. Ev-
ery partition p PSis assigned the vector Vp=
(d(p,p1), d(p,p2), . . . , d(p,pm)) (feature map), and
the kernel is accordingly given by k(d,P)(p,p) =
hVp, Vpi.
Justified in part by the intuition that the proto-
types can be chosen by experts in the task at hand
to stand for the different characteristic features of
the data population, prototype-based measures have
gained significant acknowledgment in the scientific
community. Both empirical and theoretical evidence
have been provided in support of their use in pattern
recognition tasks, [?]. However, in the case of struc-
tured data, this approach falls short in capturing the
essential facets of the intrinsic proximity notion.
To illustrate this fact, let us consider a sim-
ple example that can be easily generalized. The
partitions of S={x1, x2, x3, x4}are described
through the feature vectors listed below using the
prototype set Pconsisting of the partitions p1=
{{{x1, x2, x3, x4}, p2={{x1, x2, x4},{x3}}, p3=
{{x1, x2},{x3, x4}}, p4={{x1, x3},{x2},{x4}},
and p5={{x1},{x2, x3, x4}}}, and the β-entropy
metric dβ(β= 4) on PSdefined by dβ(p,p) =
Hβ(p|p) + Hβ(p|p)where the condition β-entropy
is given by
Hβ(p|p) = 1
21β1
k
i=1
k
j=1 nij
nβ
k
j=1 nj
nβ
.
{{x1, x2, x3, x4}} φ
7− (0,0.8,1,1.1,0.8)
{{x1, x2, x3},{x4}} φ
7− (0.8,0.6,0.3,0.3,0.6)
{{x1, x2, x4},{x3}} φ
7− (0.8,0,0.3,0.4,0.8)
{{x1, x2},{x3, x4}} φ
7− (1,0.3,0,0.2,0.3)
{{x1, x2},{x3},{x4}} φ
7− (1.1,0.3,0.1,0.1,0.4)
{{x1, x3, x4},{x2}} φ
7− (0.8,0.6,0.3,0.3,0.6)
{{x1, x3},{x2, x4}} φ
7− (1,0.3,0.3,0.1,0.3)
{{x1, x3},{x2},{x4}} φ
7− (1.1,0.4,0.2,0,0.4)
{{x1, x4},{x2, x3}} φ
7− (1,0.3,0.3,0.2,0.3)
{{x1},{x2, x3, x4}} φ
7− (0.8,0.6,0.3,0.4,0)
{{x1},{x2, x3},{x4}} φ
7− (1.1,0.4,0.2,0.1,0.3)
{{x1, x4},{x2},{x3}} φ
7− (1.1,0.3,0.2,0.1,0.4)
{{x1},{x2, x4},{x3}} φ
7− (1.1,0.3,0.2,0.1,0.3)
{{x1},{x2},{x3, x4}} φ
7− (0.1,0.4,0.1,0.1,0.3)
{{x1},{x2},{x3},{x4}} φ
7− (1.1,0.3,0.1,0.1,0.3)
For p := {{x1},{x2},{x3, x4}}, p:=
{{x1, x2},{x3, x4}} and p′′ := {{x1, x2, x3, x4}},
we have that p pp′′ and k(dβ,P)(p,p) =
0.24 while k(dβ,P)(p,p′′) = 0.68, which contra-
dicts Bottom-Up Collinearity. Moreover, if we
consider p := {{x1, x2},{x3},{x4}}, p:=
{{x1, x2},{x3, x4}} and p′′ := {{x1, x2, x3, x4}},
then k(dβ,P)(p,p′′)=0.70 and k(dβ,P)(p,p′′) =
0.78, hence Top-Down Collinearity also fails.
As for Meet Predominance, take p :=
{{x1},{x2, x3, x4}} and p:= {{x1, x2},{x3, x4}}.
Then, p p={{x1},{x2},{x3, x4}}, and
k(dβ,P)(p,p)=1.06; however, k(dβ,P)(pp,p) =
0.35 and k(dβ,P)(pp,p) = 0.39.
5 Conclusions
In this paper, the almost unexplored class of positive
kernels for comparing partitions was rigorously stud-
ied on the basis of the structural properties that gov-
ern the natural proximity between partitions. Sev-
eral families of such kernels were introduced and
their theoretical foundations have been explained.
In particular, two generic procedures for designing
hidden-feature-based kernels in compliance with the
structural properties have been given, which can be
straightforwardly adapted for the comparison of other
structured data.
The main motivation of this paper lies in the new
perspective that positive kernels provide in the scope
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.77
Jyrko Correa-Morris
E-ISSN: 2224-2880
707
Volume 22, 2023
of consensus methods. In this regard, an impor-
tant future direction consists of the development of
preimage methods that advance the performance of
the heuristics used currently to approximate consen-
sus solutions.
6 Appendix: Proofs of the results
PROOF (Proposition 1). Suppose that the pairs in
Jhave been listed. To each partition p, the fea-
ture map φassigns a vector φ(p)whose number of
components is twice the number of pairs in J. Each
pair {x, x} Jis assigned two components of
the vector φ(p)denoted by φ(p)(x, x, i),i= 0,1,
φ(p)(x, x, i) = i·δ(x, x)·ω({x, x}) + (1
i)·(1 δ(x, x))ν({x, x}), where δ(x, x) = 1
if {x, x} J(p), and δ(x, x)=0otherwise. Only
one of these components will be positive, the other is
necessarily zero. Note also that, given any partitions
pand p, we have the following: (a) if a pair {x, x}
N00, then φ(p)(x, x,0) = φ(p)(x, x,0) = 0 and
φ(p)(x, x,1) = φ(p)(x, x,1) = ω({x, x}); (b)
if a pair {x, x} N01, then φ(p)(x, x,0) = 0, but
φ(p)(x, x,0) = ν({x, x}), and φ(p)(x, x,1) =
ω({x, x}), but φ(p)(x, x,1) = 0; (c) if a pair
{x, x} N10, then φ(p)(x, x,0) = ν({x, x}),
but φ(p)(x, x,0) = 0, and φ(p)(x, x,1) =
0, but φ(p)(x, x,1) = ω({x, x}); and (d)
if a pair {x, x} N11, then φ(p)(x, x,0) =
φ(p)(x, x,0) = ν({x, x})and φ(p)(x, x,1) =
φ(p)(x, x,1) = 0. It is now straightforward to ver-
ify that hφ(p), φ(p)iequals to
{x,x}∈Jφ(p)(x, x,0)·φ(p)(x, x,0) +
φ(p)(x, x,1)·φ(p)(x, x,1).(8)
PROOF (Proposition 2) . Let ppp′′. Let
N00 and N11 (resp. N
00 and N
11) denote the sets of
agreement between pand p(resp. pand p′′). Since
prefines both pand p′′,N00 =N
00. Since prefines
p′′,N
11 N11. This proves that every term in the
expansion of kων (p,p′′)is a term in the expansion of
kων (p,p). Therefore, kων (p,p)kων (p,p′′).
Under the same hypothesis, let N′′
00 and N′′
11 be the
sets of agreement between pand p′′. Then, N
11 =
N′′
11 and N
00 N′′
00. Thus, every term in the ex-
pansion of kων (p′′,p)is a term in the expansion of
kων (p′′,p), which forces kων (p′′,p)kων (p′′,p).
Since the clusters of ppare all the possible
nonempty intersections between the clusters of pand
the clusters of p,N00 =N′′′
00 and N11 N′′′
11, where
N′′′
00 and N′′′
11 denote the sets of agreements between p
and pp. Hence, kων (p,pp)kων (p,p).
PROOF (Theorem 1). Let ppp′′. Since
ppand pp′′, condition (ii)assures that if
fi(p)>0, then both fi(p)and fi(p′′)are also pos-
itive. Thus, fi(p)fi(p)and fi(p)fi(p′′)are positive
if, and only if, fi(p)is positive. In particular, these
products are both zero or both positive. Moreover, in
the case that fi(p)fi(p)and fi(p)fi(p′′)are positive,
since pp′′ and fi(p)>0,(ii)once again ensures
that fi(p)fi(p′′). Therefore, each positive term
in the expansion of k(p,p)is greater than the cor-
responding term in the expansion of k(p,p′′), which
proves k(p,p)k(p,p′′).
On the other hand, as we have noted above,
if pp′′, then k(p,p′′) = ifi(p′′)fi(p) =
i,fi(p)>0fi(p′′)fi(p).By virtue of condition
(iii),k(p′′,p) = i,fi(p)>0fi(p′′)fi(p)
i,fi(p)>0fi(p′′)fi(p) + i,fi(p)=0 fi(p′′)fi(p) =
k(p′′,p).
Finally, by virtue of condition (iv)we can deduce
that if fi(p)fi(p)is positive term in the expansion of
k(p,p), then the corresponding term fi(p)fi(pp)in
the expansion of k(p,pp)is also positive. Besides,
since ppp,(ii)guarantees fi(pp)> fi(p),
and consequently, k(p,pp)k(p,p).
PROOF (Proposition 3) . Let us consider any
two partitions p={c1,c2, . . . , cs}and p=
{c
1,c
2, . . . , c
r}of S, with ni,n
j, and nij denoting
the sizes of ith cluster ciof p, the jth cluster c
jof p,
and the intersection cic
j.
kµ(p,p) =
i
fi(p)fi(p)
=
s
i=1
r
j=1
scic
j
|s|
|ci|·|s|
|c
j|
=
s
i=1
r
j=1 nij
u=1 nij
u·u
ni·u
n
j
=
s
i=1
r
j=1
1
nin
j·nij
u=1 nij
u·u2.(9)
On the other hand, note that
nij
u=1 nij
u·u2=
nij
u=1
nij !
u!(nij u)! ·u2
=
nij
u=1
(nij 1)!
(u1)!(nij u)! ·nij ·u
=nij
nij
u=1 nij 1
u1·u. (10)
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.77
Jyrko Correa-Morris
E-ISSN: 2224-2880
708
Volume 22, 2023
In addition, since
nij 1
u1·u=nij 1
u1·(u1) + nij 1
u1
= (nij 1)nij 2
u2+nij 1
u1
Thus,
nij
u=1 nij 1
u1·u= (nij 1)
nij
u=1 nij 2
u2+
nij
u=1 nij 1
u1,
which gives us
nij
u=1 nij 1
u1·u= (nij 1) ·2nij 2+ 2nij 1
= 2nij 2·(nij + 1).(11)
Substituting (11) in (10), we get
nij
u=1 nij
u·u2= 2nij 2·(n2
ij +nij ),
and then replacing (10) in (9), we conclude
kµ(p,p) = 1
4
s
i=1
r
j=1
2nij ·n2
ij +nij
nin
j.
To prove that kµfulfills conditions (i)(iv)in
Theorem 1, note that µ(s,p)is either zero or the ratio
between the respective sizes of two sets and therefore
nonnegative. Furthermore, if pp, then every clus-
ter of pis contained in some cluster of p. If a subset
sof Sis contained into a cluster cof p, it will be also
contained into the cluster cof pthat contains c. This
is, scc. Hence, the ratio between the size of
sand the size of cis greater than the ratio between
the size of sand the size of c. Thus, if µ(s,p)>0,
then µ(s,p)>0and µ(s,p)µ(s,p). This proves
(ii). In order to verify (iii)(still assuming pp),
note that, if µ(s,p)> µ(s,p), then scc.
If sdoes not cover the entire c, consider the subset
s=s(cc). Obviously, sc; however, there
is no cluster of pcontaining s. Thus, µ(s,p)>0,
but µ(s,p) = 0. Moreover, given that |s|
|c|<1
µ(s,p)µ(s,p) = |s|
|c||s|
|c|
=|s|
|c|·|c| |c|
|c|
<|c| |c|
|c|
<|c| |c|
|c|+|s|
|c|(12)
=µ(s,p)
If s=c, then µ(s,c)=1. In this case, if ccis
another (single) cluster of p, say c0, then
µ(c,p)µ(c,p)+µ(c0,p)µ(c0,p) = 1 = µ(c,p);
otherwise, for s=cc,µ(s,p)=0because it is
the union of at least two clusters of p, and
µ(c,p)µ(c,p) = 1 |c|
|c|=|c| |c|
|c|=µ(s,p).
Lastly, observe that for any partitions pand p, if
the subset sis such that both µ(s,p)and µ(s,p)are
positive, then there are clusters cin pand cin pwith
scand sc. This implies s(cc). But cc
is a cluster of pp, and hence µ(s,pp)is positive.
PROOF (Proposition 4). By definition, the similar-
ity measure σhas been considered to be non-negative.
Therefore, all the components of the vector φσ(p)are
non-negative. This gives (i).
Let us suppose now that pp. In this case, every
cluster of pis contained in some cluster of p, and vice
versa. Thus, if siis a subset of Scontaining some
cluster cof p, then sicontains at least one cluster
cof p. Moreover, every singleton cluster of pis a
singleton cluster of p. Accordingly, ci(p)ci(p)
and νi(p)=0forces νi(p)=0. Hence, fi(p) =
1 + νi(p)·diam(si)
ci(p)1 + νi(p)·diam(si)
ci(p)=fi(p). This
gives (v).
Lastly, under the same assumption of pp, we
shall verify (vi). In this regard, note that
i,ci(p)>0
fi(p)
i,ci(p)>0
fi(p) =
i,ci(p)>0
νi(p)=1
diam(si)·νi(p)
ci(p)νi(p)
ci(p).
If νi(p) = 1 and ci(p)> ci(p)>0, then we shall
consider the subset sjiobtained from siby applying
the following procedure:
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.77
Jyrko Correa-Morris
E-ISSN: 2224-2880
709
Volume 22, 2023
Erase one element from each cluster cof pcon-
tained in si.The removal should be done in such a
way that one cluster c of p that is not a cluster of p
remains invariant.
Thus, we get that cji(p) = 0 and
cji(p)>0. In addition, since diam(si)
1, diam(si)νi(p)
ci(p)νi(p)
ci(p)<1
1 + νij(p)·diam(sji)
cji(p). This proves
i,ci(p)>0
fi(p)
i,ci(p)>0
fi(p)
i,ci(p)=0
fi(p).
and consequently (vi).
References:
[1] Kakeru Narita, Teruhisa Hochin, Yoshihiro
Hayashi, Hiroki Nomiya: Incremental Hierarchi-
cal Clustering for Data Insertion and Its Evalua-
tion International Journal of Software Innovation
(IJSI), 8(2), 2022, 1-22.
[2] Pasi Fränti, Sami Sieranoja, How much can k-
means be improved by using better initialization
and repeats?, Pattern Recognition, Vol. 93, 2019,
95-112.
[3] Seyed Saeed Hamidi, Ebrahim Akbari, Homayun
Motameni, Consensus clustering algorithm based
on the automatic partitioning similarity graph,
Data & Knowledge Engineering, Vol. 124, 2019,
101754.
[4] Jon M. Kleinberg, An Impossibility Theorem for
Clustering, in Advances in Neural Information
Processing Systems 15 [Neural Information Pro-
cessing Systems, NIPS 2002, December 9-14,
2002, Vancouver, British Columbia, Canada],
MIT Press, (2002) 446–453.
[5] Margareta Ackerman, Shai Ben-David, Simina
Brânzei, David Loker: Weighted clustering: To-
wards solving the users dilemma. Pattern Recog-
nit. 120, 2021, 108152.
[6] Carlsson, Gunnar and Mamoli, Facundo, Classi-
fying Clustering Schemes, Foundations of Com-
putational Mathematics, 13 (2), (2013) 221–252.
[7] Jyrko Correa-Morris, An indication of unifica-
tion for different clustering approaches, Pattern
Recognit., 46 (9), (2013) 2548–2561.
[8] Jean-Pierre Barthélemy and Bruno Leclerc, The
Median Procedure for Partitions, Partitioning
Data Sets, American Mathematics Society, Series
in Discrete Math, 1995, 3-34.
[9] Marina Meilǎ, Local equivalences of distances
between clusterings - a geometric perspective.
Mach. Learn. 86(3), 2012, 369-389.
[10] Jyrko Correa-Morris, Abel Urra-Yglesias, Este-
fano Reyes, Juan Martínez, Belarmino Gonzalez:
Comparing Partitions: Metric Characterizations,
Mean Partition, and Distortion, SAI (1), 2021,
857-875.
[11] Jyrko Correa-Morris, Comparing Partitions:
Shortest Path Length Metrics and Submodularity,
Intern. J. of Math. Models and Meth. in Appl. Sci.,
13, 2019, 45-51.
[12] Serratosa, F. Redefining the Graph Edit Dis-
tance. SN COMPUT. SCI. 2, 2021, 438.
[13] Shawe-Taylor, J., and Cristianini, N., Kernel
Methods for Pattern Analysis, Cambridge: Cam-
bridge University Press, 2004.
[14] Marco Cuturi, Positive definite kernels in ma-
chine learning, arXiv preprint arXiv:0911.5367,
2009 - arxiv.org.
[15] Bo Ma, Hui-yang Qu, Hau-san Wong, Kernel
clustering-based discriminant analysis, Pattern
Recognit., 40(1), 2007, 324-327.
[16] Inderjit S. Dhillon and Yugiang Guan and Brian
Kulis, Kernel k-means: spectral clustering and
normalized cuts, In KDD ’04: Proceedings of
the tenth ACM SIGKDD international conference
on Knowledge discovery and data mining, 2004,
551-556.
[17] Zhu Y and Ting K., Kernel-based clustering via
Isolation Distributional Kernel. Information Sys-
tems, 117, 2023, 102212.
[18] Yang X, Lin G, Liu Y, Nie F and Lin L., Fast
Spectral Embedded Clustering Based on Struc-
tured Graph Learning for Large-Scale Hyperspec-
tral Image, IEEE Geoscience and Remote Sensing
Letters, 19, 2020, 1-5.
[19] Wang R, Lu J, Lu Y, Nie F and Li X, Dis-
crete and Parameter-Free Multiple Kernel k -
Means, IEEE Transactions on Image Processing,
31, 2022, 2796-2808.
[20] Liu H, Chen J, Dy J and Fu Y., Transform-
ing Complex Problems into K-means Solutions,
IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence, 45(7), 2023, 1-20.
[21] Brijnesh J. Jain, The Mean Partition Theorem
in consensus clustering, Pattern Recognition, 79,
2018, 427-439.
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.77
Jyrko Correa-Morris
E-ISSN: 2224-2880
710
Volume 22, 2023
[22] Sandro Vega-Pons, Jyrko Correa-Morris, José
Ruiz-Shulcloper: Weighted partition consensus
via kernels. Pattern Recognit. 43(8), 2010, 2712-
2724.
[23] William M. Rand, Objective Criteria for the
Evaluation of Clustering Methods, emphJournal
of the American Statistical Association, 66(336),
1971, 668-675.
[24] Sandro Vega-Pons, Xiaoyi Jiang, José Ruiz-
Shulcloper: Segmentation Ensemble via Kernels,
ACPR 2011, 2011, 686-690.
[25] Lucas Franek, Xiaoyi Jiang: Ensemble cluster-
ing by means of clustering embedding in vector
spaces. Pattern Recognit. 47(2), 2014, 833-842.
[26] Jyrko Correa-Morris: The median partition and
submodularity, Appl. Math. Comput. 410, 2021,
126450.
Contribution of Individual Authors to the
Creation of a Scientific Article
Jyrko Correa-Morris was responsible for the sections
of the paper.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
Partial support for this research has been provided
by the U.S. Department of Education under grant
P120A200005.
Conflicts of Interest
The authors have no conflicts of interest to
declare that are relevant to the content of this
article.
Creative Commons Attribution License 4.0
(Attribution 4.0 International , CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.77
Jyrko Correa-Morris
E-ISSN: 2224-2880
711
Volume 22, 2023