Positive Definite Kernels for Partitions

JYRKO CORREA-MORRIS

Mathemtics and Natural Sciences Department

Miami Dade College, Padron Campus

627 SW 27 Ave, Miami, Florida

USA

Abstract: This paper presents a comprehensive exploration of various families of positive definite kernels for

comparing partitions. It not only reviews existing examples from the literature but also introduces novel classes

of positive definite kernels. These new classes include kernels based on agreement and ones designed using the

concept of hidden variables. The study also focuses on assessing the compatibility of these kernels with structural

properties that capture the intrinsic notion of proximity between partitions. Notably, agreement-based kernels are

demonstrated to align well with this notion. Moreover, the paper provides two generic procedures for designing

hidden-feature-based kernels that also adhere to the specified structural properties.

Key-Words: clustering, definite positive kernels, lattice of partitions

Received: May 11, 2023. Revised: August 12, 2023. Accepted: September 23, 2023. Published: October 9, 2023.

1 Introduction

1.1 Overview.

Since the turn of the century, there has been a signif-

icant increase in interest in exploring existing clus-

tering methods, [1], [2], [3], within the field of pat-

tern recognition. This surge in interest can be traced

back to J. Kleinberg’s seminal paper on an impossi-

bility theorem for clustering, [4], which has served as

a catalyst for other formal studies, [5], [6], [7].

The exploration of measures for comparing par-

titions has also been influenced by this growing in-

terest. Evidently, notable contributions have been

made, [8], [9], [10], [11]. There seems to be a con-

sensus regarding the existence of fundamental princi-

ples that govern the natural proximity of partitions,

particularly when dealing with structural data. For

instance, when comparing graphs or trees, the edit-

ing distance, despite its computational complexity, is

often considered the natural proximity measure, and

proposed measures are evaluated based on their com-

patibility with this intrinsic paradigm, [12]. In this pa-

per, we focus on studying an almost unexplored class

of measures: that of positive definite kernels for com-

paring partitions. A similarity measure k:X×X→

Ris said to be a positive definite kernel when the

dataset Xcan be embedded into a high-dimensional

Euclidean (possibly an infinite-dimensional Hilbert)

space Hby means of a map φsuch that the similarity

k(x, x′)between the original data xand x′matches

the dot product hφ(x), φ(x′)iHof the vectors φ(x)

and φ(x′)in H, [13]. The map φ:X→ H, the

space H, and the vectors φ(x),x∈X, are referred to

as the feature map,feature space, and feature vectors,

respectively.

Kernel-based methods have emerged as powerful

tools for efficiently handling complex data such as

strings, graphs, images, documents, and multimodal

data, [14]. The key insight is that by computing sim-

ilarities as dot products in a high-dimensional feature

space Hwithout explicitly visiting it, linear models

can be effectively applied in non-linear settings, [13].

To achieve this, a nonlinear feature map φis utilized

to map the data into the high-dimensional space H,

where linear models can be employed. This approach

allows for the utilization of additional structures of the

feature space (e.g., vector space structure, topological

structure) that may not be present in the original data

space. As a result, operations like arithmetical oper-

ations on feature vectors can be extended to graphs,

histograms, and partitions, providing greater flexibil-

ity in handling diverse data types.

However, the success of kernel-based methods

largely depends on the choice of the kernel measure,

which should align with the specific application and

data characteristics.

In the context of clustering, the term “clustering

kernel” is often associated with the kernel K-means

method, [15]. This method involves mapping the

original data Xinto the feature space H, enabling K-

means to be performed in the transformed space us-

ing an implicit nonlinear map φ. A challenge with

this approach is that the explicit knowledge of the

feature map is not available, making the updating

of centroids cumbersome. Nonetheless, kernel K-

means has demonstrated empirical success in iden-

tifying complex clusters that are non-linearly sepa-

rable in the original data space. Furthermore, spec-

tral clustering, which leverages eigenvectors of a cer-

tain matrix computed from the similarity matrix, has

been shown to be a special case of kernel K-means,

[16]. Although variants of this method exist (e.g.,

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.77

Jyrko Correa-Morris

E-ISSN: 2224-2880

702

Volume 22, 2023

[17], [18], [19], [20]), their contributions to kernel

construction and design are somewhat limited.

In contrast, a cluster ensemble algorithm (see,

for instance, [21]) that utilized positive definite ker-

nels to compare partitions and construct average-

based consensus functions was reported, [22]. The

authors introduced “Set significance-based kernels”

for this purpose. Recognizing that maximizing con-

sensus functions can be computationally expensive,

even with simple measures like the Rand Index,

[23], the authors transformed the consensus prob-

lem from the space of partitions to the feature space,

which is a high-dimensional Euclidean space H. This

shift allowed for the use of well-known and eas-

ily computable solutions to the problem of maximiz-

ing average-based consensus functions in the feature

space. The article presented a novel approach to

tackle the cluster ensemble problem, reducing the task

of maximizing consensus functions to a kernel preim-

age problem. In other words, the consensus solution,

computed exactly in the feature space, needed to be

brought back to the space of partitions. However, [22]

made limited progress in solving the preimage prob-

lem, as it heavily depended on the specific kernel

used. Recent contributions in the field have shown

significant advancements towards achieving this goal

for specific kernels. For instance, [24] focused on

consensus functions defined from the Rand index,

while [25] concentrated on prototype-based kernels.

Nevertheless, this area of research remains rich with

several unanswered questions.

Driven by these inquiries, this paper presents sev-

eral classes of positive kernels for partitions and in-

vestigates their behavior by analyzing structural prop-

erties that emulate the inherent proximity between

partitions.

1.2 The natural notion of proximity

between partitions

Let Sbe a finite dataset. A partition p of Sis a col-

lection of nonempty subsets {C1, C2, . . . , Cs}of S,

called the clusters of p, such that S=∪s

i=1Ciand

Ci∩Cj=∅if i6=j. The set of all partitions of S

(henceforth denoted by PS) is endowed with the re-

finement of partitions. The partition p is said to refine

the partition p′if, and only if, every cluster of p′is

the union of some clusters of p. In particular, every

cluster of p is contained in some cluster of p′. The

notation p p′means p refines p′.

Also, PSis endowed with two operations: the meet

∧and the join ∨, which assign to any two given parti-

tions p and p′the coarsest partition p ∧p′that refines

p and p′and the finer partition simultaneously refined

by p and p′, respectively. Two elements x, x′∈Sare

in the same cluster of p ∧p′when they are in the same

cluster of p and in the same cluster of p′. Two ele-

ments x, x′∈Sare placed in the same cluster of p∨p′

when there is a sequence x=xi1, xi2, . . . , xik=x′

such that, for all j∈ {1,2, . . . , k −1},xijand xij+1

are either placed in the same cluster of p or in the same

cluster of p′.

PScan be spatially organized into layers of

partitions such that only partitions with the same

number of clusters are placed in the same layer:

(1) Place the all-singleton partition in the bottom

layer. (2) A partition p′covers another partition

p when p refines p′and there is no partition p′′

refined by p that also refine p′. In the kth layer,

place all partitions that cover some partition placed

in the (k−1)th layer. (3) A line segment is drawn

between two partitions p and p′if one of them cov-

ers the other. (The resulting diagram can be seen here:

https://blogs.ams.org/visualinsight/2015/06/15/lattice-

of-partitions/)

The underlying notion of proximity among parti-

tions is associated to this spatial organization. Indeed,

since the Hasse diagram is a connected graph, we can

reach one partition from another by exclusively trav-

eling along the edges of this diagram, the shorter the

path connecting two partitions, the greater the similar-

ity between these partitions. To see some of the char-

acteristic features of such a proximity, let us start by

considering an ascending path starting at a partition

p. Note that every partition in this path is not only re-

fined by p, but by all the partitions that precede it in

the path. Thus, if p′and p′′ are two partitions in this

path such that p′is reached before p′′ when traveling

this path from p, we have that p refines both p′and p′′,

p′refines p′′, and obviously, p is closer to p′than to

p′′. Since three partitions p p′p′′ are always part

of an ascending path starting at p, such distinguishing

quality of the natural proximity can be formally stated

as follows:

Bottom-Up Collinearity: If p p′p′′, then p

is more similar to p′than to p′′. Thus, it is expected

that any similarity measure kfor comparing partitions

(in particular, positive kernels) satisfies: k(p,p′)≥

k(p,p′′),provided that p p′p′′.

Analogously, we can take instead a descending

path starting at a partition p′′. All partitions that we

find along the way not only refine p′′ but all the parti-

tions that appear before it in this path. Consequently,

if the partition p is farther from p′′ in this path than

another partition p′, then p′′ is refined by both p′and

p, and p′is refined by p. This distinguishing property

can be formally formulated as follows:

Top-Down Collinearity: If p p′p′′, then

p′′ is more similar to p′than to p. Accordingly, it is

expected that any similarity measure kfor compar-

ing partitions (in particular, positive kernels) satisfies:

k(p′′,p′)≥k(p′′,p), as long as p p′p′′.

The two properties above only involve partitions

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.77

Jyrko Correa-Morris

E-ISSN: 2224-2880

703

Volume 22, 2023

either in an ascending path or a descending path of the

diagram that describes the natural spatial organization

of partitions. As we saw above, these conditions re-

quire for any two considered partitions that one re-

fines the other. How about partitions p and p′such

that neither of them refines the other? Note that, in

such a case, any path from one of these partitions to

the other either ascends first and then it descends, or

it descends first and then it ascends. The partition at

which this path turns back happens to be the join p∨p′

of p and p′and the meet p∧p′of the partitions p and p′,

respectively. Thus, the similarity between two parti-

tions p and p′is at most equal to the highest similarity

value from among the similarity values between the

partitions p and p ∨p′and between the partitions p

and p ∧p′. This leads us to the following property.

Meet-Join Predominance: The similarity be-

tween two partitions p and p′never exceeds the max-

imum value from among the similarity between p and

p∨p′and the similarity between p and p ∧p′. Thus, it

is expected that, for any partitions p and p′,k(p,p′)≤

max{k(p,p∨p′), k(p,p∧p′)}.When for any two par-

titions, the previous inequality always holds with the

meet (resp. with the join), we name this property Meet

predominance (resp. Join predominance).

1.3 Contributions and organization

The main contributions of this paper are: (1) The

class of agreement-based kernels is introduced and

formally studied. In particular, the Rank kernel and

consensus-based kernels are presented as paradigms

of this family. (2) Kernels designed from a set of hid-

den features are investigated. Special attention is ded-

icated to significance-set-based kernels and diameter-

based kernels. (3) The compliance of positive definite

kernels with the intrinsic notion of proximity between

partition is examined in detail. Analogous properties

have been previously employed to characterize met-

rics used for comparing partitions, [11]. In that work,

these properties were applied to study the behavior of

the metrics. Additionally, the compatibility between

these properties and submodularity has been investi-

gated, [26].

Aside from Introduction, the paper is organized

in five additional sections. In Section 2, agreement-

based kernels are introduced and their theoretical

foundation is established. Section 3 introduces and

studies the class of kernels designed from hidden fea-

tures. Particularly, two main procedures for the pur-

pose of designing hidden features whose associated

kernel is in compliance with intrinsic notion of prox-

imity between partitions are presented. Section 4

shows that prototype-based kernels do not respect the

natural notion of proximity between partitions. Sec-

tion 5 is devoted to the conclusions and future work,

while the proofs of the main results are included in the

appendix.

2 Agreement-based Kernels

In this section, Jdenotes the set of all possible un-

ordered pairs of elements in Swhile, for a given par-

tition p of S,J(p)denotes the set of those pairs in J

that lie in the same cluster of p. For two partitions p

and p′of S, there are the sets: N00 =J(p)∩J(p′),

N01 =J(p)∩J(p′)C,N10 =J(p)C∩J(p′), and

N11 =J(p)C∩J(p′)C, where the superscript Cin-

dicates the complement of the set. The sizes of these

sets are denoted by n00,n01,n10, and n11, respec-

tively.

If a pair of objects is either in N00 or N11 (resp.

N10 or N10), then both partitions p and p′agree with

(resp. differ in) regard to whether these objects should

be placed in the same cluster or not. Thus, n00 +n11

(resp. n01 +n10) represents the total number of agree-

ments (resp. disagreements) between p and p′. Note

also that every pair belong to exactly one of these sets

and therefore n

2=n00 +n01 +n11 +n11. The fol-

lowing identities hold:

n00 =



i=1



j=1 nij

2;

n01 =



i=1 ni

2−



i=1



j=1 nij

2;

n10 =



j=1 n′

2−



i=1



j=1 nij

2;

n11 =n

2−n00 −n01 −n10;

where ni,n′

j, and nij denote the sizes of the ith

cluster of p (1≤i≤s), the jth cluster of p′(1≤

j≤r), and the overlapping between the ith cluster of

p and the jth cluster of p′, respectively.

To define an agreement-based kernel, we consider

non-negative functions ω, ν :J→R+(not necessar-

ily different), which associate a non-negative weight

to each pair of objects {x, x′}. The agreement-based

kernel corresponding to the weight functions ωand ν

is defined by:

kων (p,p′) := 

{x,x′}∈N00

ω(x, x′)+ 

{x,x′}∈N11

ν(x, x′).

(1)

Proposition 1. Any agreement kernel kων is a posi-

tive definite kernel.

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.77

Jyrko Correa-Morris

E-ISSN: 2224-2880

704

Volume 22, 2023

Agreement-based kernels are compatible with the

intrinsic notion of proximity on the lattice of parti-

tions of S.

Proposition 2. Any agreement kernel kων satisfies

Collinearity, Dual Collinearity, and Meet Predomi-

nance. However, these kernels fail to satisfy Join Pre-

dominance.

2.1 Rand Index

The Rand Kernel is perhaps the most known

agreement-based kernel. It quantifies the similarity

between two partitions p and p′by counting the total

number of agreements, all pairs being considered to

have the same weight (i.e., for every pair {x, x′} ∈ J,

ω({x, x′}) = ν({x, x′}) = 1). Thus,

RK(p,p′) = n11 +n00.(2)

The original Rand index, published in 1971 by W.

M. Rand, is given by RI(p,p′) = n11+n00

n11+n10+n01+n00 ,

and hence RI(p,p′) = RK(p,p′)

2). On the other

hand, we can normalize any kernel Kby setting

NK(p,p′) = K(p,p′)

√K(p,p)K(p′,p′). Since RK(p,p) =

RK(p′,p′) = n

2, we get NRI(p,p′) = RK(p,p′)

2)=

RI(p,p′), which proves that the original Rand index

concurs with the normalized Rand kernel.

To show that agreement-based kernels fail in

general to satisfy Join Predominance, let us con-

sider the partitions p ={{x1, x2},{x3, x4}} and

p′={{x1, x2, x3},{x4}} of S={x1, x2, x3, x4}.

Note that p ∨p′={{x1, x2, x3, x4}} and

that, for p and p′,N00 ={{x1, x2}},

N01 ={{x1, x3},{x2, x3}},N10 ={{x3, x4}},

and N11 ={{x1, x4},{x2, x4}}; while for p

and p ∨p′,N00 ={{x1, x2},{x3, x4}},N01 =

{{x1, x3},{x1, x4},{x1, x3},{x2, x3},{x2, x3}}

(N10 and N11 are empty). This gives RK(p,p′)=3

and RK(p,p∨p′)=2, which means that p is more

similar to p′than to p ∨p′.

To end, note that, closely related to the Rand kernel

is its counterpart: the well-known Symmetric Differ-

ence metric:

SDM(p,p′) = n10 +n01;(3)

defined as the total number of disagreements. Thus,

given partitions p and p′, the equality RK(p,p′) +

SDM(p,p′) = n

2holds. Any positive definite ker-

nel khas an associated metric Mk, which is defined by

Mk(p,p′) = k(p,p) + k(p′,p′)−2k(p,p′).SDM is

(up to a constant factor) the metric associated to RK:

MRK (p,p′) = RK(p,p) + RK(p′,p′)−2RK(p,p′)

= 2n

2−2 [n11 +n00]

= 2 [n01 +n10]

= 2 SDM(p,p′).

2.2 Consensus Kernels

The motivation for consensus kernels lies in the os-

tensible benefits of incorporating some prior knowl-

edge. The intuition here is to use the co-association

matrix associated to a given ensemble Eto determine

the weights of the pairs. Formally, given an ensem-

ble E={p1,p2, . . . , pm}of partitions of S, the as-

sociated co-association matrix is the n×nmatrix

AE= (aij )m

i,j=1 whose (i, j)-entry carries the ratio

between the number of partitions in Efor which the

pair {xi, xj}is placed in the same cluster and the total

number the partitions in the ensemble.

Defining the weight functions ωand νfrom the

matrix AEby ω({xi, xj}) = aij and ν({xi, xj}) =

1−aij ,we define the consensus kernels associated to

the ensemble Eby:

kE(p,p′) = 

{xi,xj}∈N00

aij +

{xi,xj}∈N11

(1−aij ).(4)

3 Another family of positive definite

kernel: Hidden features

Hidden features is a term used to designate data fea-

tures that are not actually observed but rather de-

termined by or inferred from the observed features.

Thus, hidden features are associated to functions fi:

X→Rthat assign every datum, in our case a parti-

tion p, to the measurement fi(p)of the ith hidden fea-

ture. Every partition is then represented by the vector

φ(p) = (f1(p), f2(p), . . . , fi(p), . . .).Given a finite

or enumerable set of hidden features, a positive de-

fined kernel for comparing partitions can be defined

by setting k(p,p′) = ifi(p)·fi(p′).

In this section, we seek sufficient conditions for

these kernels to be in compliance with the natural

proximity between partition. In this regard, Theorem

1 provides a procedure that enables us to construct

many examples of such kernels.

Theorem 1. Let fi,i∈I(finite set), be hidden fea-

tures for the partitions of the dataset S. Suppose that

the following conditions hold:

(i) For all p∈PS,fi(p)≥0.

(ii) p≺p′and fi(p)>0imply fi(p)≥fi(p′)>0.

Moreover, the equality holds for all hidden fea-

tures fiif, and only if, pand p′are the same par-

tition.

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.77

Jyrko Correa-Morris

E-ISSN: 2224-2880

705

Volume 22, 2023

(iii) p≺p′implies



i,fi(p)>0

fi(p)−

i,fi(p)>0

fi(p′)<

i,fi(p)=0

fi(p′).

(5)

(iv) For all p,p′∈PS, if both fi(p)and fi(p′)are

positive, then fi(p∧p′)is positive as well.

Then the kernel k(p,p′) = i∈Ifi(p)·fi(p′)satisfies

Bottom-Up Collinearity, Top-Down Collinearity, and

Meet Predominance.

Condition (i) establishes that all feature vectors

φ(p)have all their components nonnegative, which

amounts to saying that the feature map φembeds PS

into the first closed orthant of the p-dimensional Eu-

clidean space, where pis the number of hidden fea-

tures. Condition (ii) asserts that, if the measurement

of a hidden feature fiat the partition p is positive,

then it decreases along any ascending path starting

at p without vanishing. A word of caution is neces-

sary here. Condition (ii) must not be mistaken with

that which asserts that fidecreases along any ascend-

ing path without vanishing. Condition (ii) allows that

fi(p)=0and fi(p′)>0, for some p′refined by p.

Additionally, Condition (iii) establishes that the num-

ber of hidden features that vanish at p sufficiently ex-

ceeds the number of such features that vanishes at p′.

To get a better insight into (iii), let φ(p)+and φ(p′)+

denote the vectors obtained from φ(p)and φ(p′)by

considering only those hidden variables that are posi-

tive when measured on p, and let φ(p′)0be the vector

obtained from φ(p′)by considering only those hidden

variables that vanish when measured on p. In terms

of the 1-norm of Euclidean spaces, (ii) implies that

||φ(p)||1=||φ(p)+||1>||φ(p′)+||1while (iii) can

be restated as ||φ(p)+||1− ||φ(p′)+||1<||φ(p′)0||1,

which forces ||φ(p)||1<||φ(p′)||1. Finally, Condi-

tion (iv) prevents fifrom vanishing at the meet p ∧p′

of two partitions p and p′if fireturns a positive mea-

surement for both partitions p and p′.

3.1 Set significance based kernels

The set-significance-based kernels were introduce by

Vega-Pons et. al. [22]. In this case, the hidden vari-

ables are the subsets of S. Given a partition p, how

significant with respect to p every subset s⊆Sit is

determined. Assuming that for a partition p the most

significant subsets of Sare its clusters, the closer to be

a cluster of p the subset sis, the more significant for

p. As an attempt to capture this essence, the following

definition is given. A function µ: 2S×PS→R≥0is

said to be a measure of set significance if for all clus-

ters cand c′of pand all same-size subsets s1and s2of

Swith s1⊆cand s2⊆c′,µ(s1,p)< µ(s2,p)if, and

only if, cis bigger than c′; and µ(s1,p) = µ(s2,p)if,

and only if, cand c′have the same size.

Listing the subsets s1,s2, . . . , s2nof Sin a prede-

termined order, the ith hidden feature fimeasures the

significance of ith subset siof Sthrough the formula

fi(p) = µ(si,p).

In [22], the authors used the measure of signifi-

cance:

µ(si,p) = |si|

|c|,if si⊆c for some c ∈p;

0,otherwise (6)

where the bars | | indicate the size of the set.

The kernel kµis a typical example of a kernel con-

structed from the procedure described in Theorem 1.

Proposition 3. kµ(p,p′) = 1

4s

i=1 r

j=1 2nij ·

n2

ij +nij

nin′

j. Furthermore, kµsatisfies Bottom-Up

Collinearity, Top-Down Collinearity, and Meet Pre-

dominance.

3.2 Expanding on Theorem 1

Theorem 1 provides a suitable procedure for the con-

struction of clustering kernels that are consistent with

the intrinsic notion of proximity between partition.

However, this procedure hinges on sufficient condi-

tions that not all such kernels are supposed to meet.

This section gives an alternative procedure that also

ensures the compliance of hidden-feature-based ker-

nels with the aforementioned structural properties.

The idea consists of superseding conditions (ii)−

(iv)with some analogous ones that are applicable to

families of hidden features that fail to satisfy those

required by Theorem 1.

Theorem 2. Let fi,i∈I(finite set), be hidden fea-

tures for the partitions of the dataset S. Suppose that

the following conditions hold:

(i) For all p∈PS,fi(p)≥0.

(v) p≺p′and fi(p′)>0imply fi(p′)≥fi(p)>0.

(vi) p≺p′implies



i,fi(p′)>0

fi(p′)−

i,fi(p′)>0

fi(p)<

i,fi(p′)=0

fi(p).

(7)

Then the kernel k(p,p′) = i∈Ifi(p)·fi(p′)satisfies

Bottom-Up Collinearity, Top-Down Collinearity, and

Meet Predominance.

Now we introduce a novel class of kernels for par-

titions whose conformity with the intrinsic proxim-

ity between partition can be justified by Theorem 2:

diameter-based kernels.

Let us consider a similarity measure σ:S×

S→[0,1] (note that σdoes not quantify the sim-

ilarity between partitions, but between the underly-

ing data). By a similarity measure here we mean a

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.77

Jyrko Correa-Morris

E-ISSN: 2224-2880

706

Volume 22, 2023

symmetric function (i.e., σ(x, x′) = σ(x′, x)) such

that every datum maximizes the similarity to itself

(i.e., σ(x, x′)≤σ(x, x)for all x, x′∈S). Given

a subset sof S, the diameter of swith respect to σis

diam(s) = max{σ(x, x′) : x, x′∈s, x 6=x′}.

Like set-significance-based kernels, diameter-

based kernels kσare designed using the subsets sof

S(sorted in a list) as the hidden variables. Denoting

by ci(p)the total number of clusters of p contained in

the subset siand by and νi(p)the function indicator of

non-singleton clusters of p within si, having the value

1 if there are singleton clusters of p within siand the

value 0 otherwise, the ith hidden feature is defined by:

fi(p) = 1 + νi(p)·diam(si)

ci(p),if ci(p)>0;

0,if ci(p) = 0.

Proposition 4. kσsatisfies Bottom-Up Collinearity,

Top-Down Collinearity, and Meet Predominance.

4 A well-known family of not

well-behaved kernels

We conclude with a brief note regarding the

prototype-based kernels. These kernels are designed

from a set of prototypes P:= {p1,p2, . . . , pm} ⊆ PS

and a dissimilarity measure d:PS×PS→R(not

necessarily a metric) for comparing partition. Ev-

ery partition p ∈PSis assigned the vector Vp=

(d(p,p1), d(p,p2), . . . , d(p,pm)) (feature map), and

the kernel is accordingly given by k(d,P)(p,p′) =

hVp, Vp′i.

Justified in part by the intuition that the proto-

types can be chosen by experts in the task at hand

to stand for the different characteristic features of

the data population, prototype-based measures have

gained significant acknowledgment in the scientific

community. Both empirical and theoretical evidence

have been provided in support of their use in pattern

recognition tasks, [?]. However, in the case of struc-

tured data, this approach falls short in capturing the

essential facets of the intrinsic proximity notion.

To illustrate this fact, let us consider a sim-

ple example that can be easily generalized. The

partitions of S={x1, x2, x3, x4}are described

through the feature vectors listed below using the

prototype set Pconsisting of the partitions p1=

{{{x1, x2, x3, x4}, p2={{x1, x2, x4},{x3}}, p3=

{{x1, x2},{x3, x4}}, p4={{x1, x3},{x2},{x4}},

and p5={{x1},{x2, x3, x4}}}, and the β-entropy

metric dβ(β= 4) on PSdefined by dβ(p,p′) =

Hβ(p|p′) + Hβ(p′|p)where the condition β-entropy

is given by

Hβ(p|p′) = 1

21−β−1





i=1

k′



j=1 nij

nβ−

k′



j=1 nj

nβ

.

{{x1, x2, x3, x4}} φ

7−→ (0,0.8,1,1.1,0.8)

{{x1, x2, x3},{x4}} φ

7−→ (0.8,0.6,0.3,0.3,0.6)

{{x1, x2, x4},{x3}} φ

7−→ (0.8,0,0.3,0.4,0.8)

{{x1, x2},{x3, x4}} φ

7−→ (1,0.3,0,0.2,0.3)

{{x1, x2},{x3},{x4}} φ

7−→ (1.1,0.3,0.1,0.1,0.4)

{{x1, x3, x4},{x2}} φ

7−→ (0.8,0.6,0.3,0.3,0.6)

{{x1, x3},{x2, x4}} φ

7−→ (1,0.3,0.3,0.1,0.3)

{{x1, x3},{x2},{x4}} φ

7−→ (1.1,0.4,0.2,0,0.4)

{{x1, x4},{x2, x3}} φ

7−→ (1,0.3,0.3,0.2,0.3)

{{x1},{x2, x3, x4}} φ

7−→ (0.8,0.6,0.3,0.4,0)

{{x1},{x2, x3},{x4}} φ

7−→ (1.1,0.4,0.2,0.1,0.3)

{{x1, x4},{x2},{x3}} φ

7−→ (1.1,0.3,0.2,0.1,0.4)

{{x1},{x2, x4},{x3}} φ

7−→ (1.1,0.3,0.2,0.1,0.3)

{{x1},{x2},{x3, x4}} φ

7−→ (0.1,0.4,0.1,0.1,0.3)

{{x1},{x2},{x3},{x4}} φ

7−→ (1.1,0.3,0.1,0.1,0.3)

For p := {{x1},{x2},{x3, x4}}, p′:=

{{x1, x2},{x3, x4}} and p′′ := {{x1, x2, x3, x4}},

we have that p ≺p′≺p′′ and k(dβ,P)(p,p′) =

0.24 while k(dβ,P)(p,p′′) = 0.68, which contra-

dicts Bottom-Up Collinearity. Moreover, if we

consider p := {{x1, x2},{x3},{x4}}, p′:=

{{x1, x2},{x3, x4}} and p′′ := {{x1, x2, x3, x4}},

then k(dβ,P)(p′,p′′)=0.70 and k(dβ,P)(p,p′′) =

0.78, hence Top-Down Collinearity also fails.

As for Meet Predominance, take p :=

{{x1},{x2, x3, x4}} and p′:= {{x1, x2},{x3, x4}}.

Then, p ∧p′={{x1},{x2},{x3, x4}}, and

k(dβ,P)(p,p′)=1.06; however, k(dβ,P)(p∧p′,p) =

0.35 and k(dβ,P)(p∧p′,p′) = 0.39.

5 Conclusions

In this paper, the almost unexplored class of positive

kernels for comparing partitions was rigorously stud-

ied on the basis of the structural properties that gov-

ern the natural proximity between partitions. Sev-

eral families of such kernels were introduced and

their theoretical foundations have been explained.

In particular, two generic procedures for designing

hidden-feature-based kernels in compliance with the

structural properties have been given, which can be

straightforwardly adapted for the comparison of other

structured data.

The main motivation of this paper lies in the new

perspective that positive kernels provide in the scope

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.77

Jyrko Correa-Morris

E-ISSN: 2224-2880

707

Volume 22, 2023

of consensus methods. In this regard, an impor-

tant future direction consists of the development of

preimage methods that advance the performance of

the heuristics used currently to approximate consen-

sus solutions.

6 Appendix: Proofs of the results

PROOF (Proposition 1). Suppose that the pairs in

Jhave been listed. To each partition p, the fea-

ture map φassigns a vector φ(p)whose number of

components is twice the number of pairs in J. Each

pair {x, x′} ∈ Jis assigned two components of

the vector φ(p)denoted by φ(p)(x, x′, i),i= 0,1,

φ(p)(x, x′, i) = i·δ(x, x′)·ω({x, x′}) + (1 −

i)·(1 −δ(x, x′))ν({x, x′}), where δ(x, x′) = 1

if {x, x′} ∈ J(p), and δ(x, x′)=0otherwise. Only

one of these components will be positive, the other is

necessarily zero. Note also that, given any partitions

pand p′, we have the following: (a) if a pair {x, x′} ∈

N00, then φ(p)(x, x′,0) = φ(p′)(x, x′,0) = 0 and

φ(p)(x, x′,1) = φ(p′)(x, x′,1) = ω({x, x′}); (b)

if a pair {x, x′} ∈ N01, then φ(p)(x, x′,0) = 0, but

φ(p′)(x, x′,0) = ν({x, x′}), and φ(p)(x, x′,1) =

ω({x, x′}), but φ(p′)(x, x′,1) = 0; (c) if a pair

{x, x′} ∈ N10, then φ(p)(x, x′,0) = ν({x, x′}),

but φ(p′)(x, x′,0) = 0, and φ(p)(x, x′,1) =

0, but φ(p′)(x, x′,1) = ω({x, x′}); and (d)

if a pair {x, x′} ∈ N11, then φ(p)(x, x′,0) =

φ(p′)(x, x′,0) = ν({x, x′})and φ(p)(x, x′,1) =

φ(p′)(x, x′,1) = 0. It is now straightforward to ver-

ify that hφ(p), φ(p′)iequals to



{x,x′}∈Jφ(p)(x, x′,0)·φ(p′)(x, x′,0) +

φ(p)(x, x′,1)·φ(p′)(x, x′,1).(8)

PROOF (Proposition 2) . Let pp′p′′. Let

N00 and N11 (resp. N′

00 and N′

11) denote the sets of

agreement between pand p′(resp. pand p′′). Since

prefines both p′and p′′,N00 =N′

00. Since p′refines

p′′,N′

11 ⊆N11. This proves that every term in the

expansion of kων (p,p′′)is a term in the expansion of

kων (p,p′). Therefore, kων (p,p′)≥kων (p,p′′).

Under the same hypothesis, let N′′

00 and N′′

11 be the

sets of agreement between p′and p′′. Then, N′

11 =

N′′

11 and N′

00 ⊆N′′

00. Thus, every term in the ex-

pansion of kων (p′′,p)is a term in the expansion of

kων (p′′,p′), which forces kων (p′′,p′)≥kων (p′′,p).

Since the clusters of p∧p′are all the possible

nonempty intersections between the clusters of pand

the clusters of p′,N00 =N′′′

00 and N11 ⊆N′′′

11, where

N′′′

00 and N′′′

11 denote the sets of agreements between p

and p∧p′. Hence, kων (p,p∧p′)≥kων (p,p′).

PROOF (Theorem 1). Let pp′p′′. Since

pp′and pp′′, condition (ii)assures that if

fi(p)>0, then both fi(p′)and fi(p′′)are also pos-

itive. Thus, fi(p)fi(p′)and fi(p)fi(p′′)are positive

if, and only if, fi(p)is positive. In particular, these

products are both zero or both positive. Moreover, in

the case that fi(p)fi(p′)and fi(p)fi(p′′)are positive,

since p′p′′ and fi(p′)>0,(ii)once again ensures

that fi(p′)≥fi(p′′). Therefore, each positive term

in the expansion of k(p,p′)is greater than the cor-

responding term in the expansion of k(p,p′′), which

proves k(p,p′)≥k(p,p′′).

On the other hand, as we have noted above,

if pp′′, then k(p,p′′) = ifi(p′′)fi(p) =

i,fi(p)>0fi(p′′)fi(p).By virtue of condition

(iii),k(p′′,p) = i,fi(p)>0fi(p′′)fi(p)≤

i,fi(p)>0fi(p′′)fi(p′) + i,fi(p)=0 fi(p′′)fi(p′) =

k(p′′,p′).

Finally, by virtue of condition (iv)we can deduce

that if fi(p)fi(p′)is positive term in the expansion of

k(p,p′), then the corresponding term fi(p)fi(p∧p′)in

the expansion of k(p,p∧p′)is also positive. Besides,

since p∧p′p,(ii)guarantees fi(p∧p′)> fi(p′),

and consequently, k(p,p∧p′)≥k(p,p′).

PROOF (Proposition 3) . Let us consider any

two partitions p={c1,c2, . . . , cs}and p′=

{c′

1,c′

2, . . . , c′

r}of S, with ni,n′

j, and nij denoting

the sizes of ith cluster ciof p, the jth cluster c′

jof p′,

and the intersection ci∩c′

kµ(p,p′) = 

fi(p)fi(p′)



i=1



j=1 



s⊆ci∩c′

|s|

|ci|·|s|

|c′

j|





i=1



j=1 nij



u=1 nij

u·u

ni·u

n′

j



i=1



j=1

nin′

j·nij



u=1 nij

u·u2.(9)

On the other hand, note that

nij



u=1 nij

u·u2=

nij



u=1

nij !

u!(nij −u)! ·u2

nij



u=1

(nij −1)!

(u−1)!(nij −u)! ·nij ·u

=nij

nij



u=1 nij −1

u−1·u. (10)

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.77

Jyrko Correa-Morris

E-ISSN: 2224-2880

708

Volume 22, 2023

In addition, since

nij −1

u−1·u=nij −1

u−1·(u−1) + nij −1

u−1

= (nij −1)nij −2

u−2+nij −1

u−1

Thus,

nij



u=1 nij −1

u−1·u= (nij −1)

nij



u=1 nij −2

u−2+

nij



u=1 nij −1

u−1,

which gives us

nij



u=1 nij −1

u−1·u= (nij −1) ·2nij −2+ 2nij −1

= 2nij −2·(nij + 1).(11)

Substituting (11) in (10), we get

nij



u=1 nij

u·u2= 2nij −2·(n2

ij +nij ),

and then replacing (10) in (9), we conclude

kµ(p,p′) = 1



i=1



j=1

2nij ·n2

ij +nij

nin′

j.

To prove that kµfulfills conditions (i)−(iv)in

Theorem 1, note that µ(s,p)is either zero or the ratio

between the respective sizes of two sets and therefore

nonnegative. Furthermore, if pp′, then every clus-

ter of pis contained in some cluster of p′. If a subset

sof Sis contained into a cluster cof p, it will be also

contained into the cluster c′of p′that contains c. This

is, s⊆c⊆c′. Hence, the ratio between the size of

sand the size of cis greater than the ratio between

the size of sand the size of c′. Thus, if µ(s,p)>0,

then µ(s,p′)>0and µ(s,p)≥µ(s,p′). This proves

(ii). In order to verify (iii)(still assuming pp′),

note that, if µ(s,p)> µ(s,p′), then s⊆c⊂c′.

If sdoes not cover the entire c, consider the subset

s′=s∪(c′−c). Obviously, s′⊂c′; however, there

is no cluster of pcontaining s′. Thus, µ(s′,p′)>0,

but µ(s′,p) = 0. Moreover, given that |s|

|c|<1

µ(s,p)−µ(s,p′) = |s|

|c|−|s|

|c′|

=|s|

|c|·|c′| − |c|

|c′|

<|c′| − |c|

|c′|

<|c′| − |c|

|c′|+|s|

|c′|(12)

=µ(s′,p′)

If s=c, then µ(s,c)=1. In this case, if c′−cis

another (single) cluster of p, say c0, then

µ(c,p)−µ(c,p′)+µ(c0,p)−µ(c0,p′) = 1 = µ(c′,p′);

otherwise, for s′=c′−c,µ(s′,p)=0because it is

the union of at least two clusters of p, and

µ(c,p)−µ(c,p′) = 1 −|c|

|c′|=|c′| − |c|

|c′|=µ(s′,p′).

Lastly, observe that for any partitions pand p′, if

the subset sis such that both µ(s,p)and µ(s,p′)are

positive, then there are clusters cin pand c′in p′with

s⊆cand s⊆c′. This implies s⊆(c∩c′). But c∩c′

is a cluster of p∧p′, and hence µ(s,p∧p′)is positive.

PROOF (Proposition 4). By definition, the similar-

ity measure σhas been considered to be non-negative.

Therefore, all the components of the vector φσ(p)are

non-negative. This gives (i).

Let us suppose now that p≺p′. In this case, every

cluster of pis contained in some cluster of p′, and vice

versa. Thus, if siis a subset of Scontaining some

cluster c′of p′, then sicontains at least one cluster

cof p. Moreover, every singleton cluster of p′is a

singleton cluster of p. Accordingly, ci(p)≥ci(p′)

and νi(p′)=0forces νi(p)=0. Hence, fi(p′) =

1 + νi(p′)·diam(si)

ci(p′)≥1 + νi(p)·diam(si)

ci(p)=fi(p). This

gives (v).

Lastly, under the same assumption of p≺p′, we

shall verify (vi). In this regard, note that



i,ci(p′)>0

fi(p′)−

i,ci(p′)>0

fi(p) =



i,ci(p′)>0

νi(p′)=1

diam(si)·νi(p′)

ci(p′)−νi(p)

ci(p).

If νi(p′) = 1 and ci(p)> ci(p′)>0, then we shall

consider the subset sjiobtained from siby applying

the following procedure:

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.77

Jyrko Correa-Morris

E-ISSN: 2224-2880

709

Volume 22, 2023

Erase one element from each cluster c′of p′con-

tained in si.The removal should be done in such a

way that one cluster c of p that is not a cluster of p′

remains invariant.

Thus, we get that cji(p′) = 0 and

cji(p)>0. In addition, since diam(si)≤

1, diam(si)νi(p′)

ci(p′)−νi(p)

ci(p)<1≤

1 + νij(p)·diam(sji)

cji(p). This proves



i,ci(p′)>0

fi(p′)−

i,ci(p′)>0

fi(p)≤

i,ci(p′)=0

fi(p).

and consequently (vi).

References:

[1] Kakeru Narita, Teruhisa Hochin, Yoshihiro

Hayashi, Hiroki Nomiya: Incremental Hierarchi-

cal Clustering for Data Insertion and Its Evalua-

tion International Journal of Software Innovation

(IJSI), 8(2), 2022, 1-22.

[2] Pasi Fränti, Sami Sieranoja, How much can k-

means be improved by using better initialization

and repeats?, Pattern Recognition, Vol. 93, 2019,

95-112.

[3] Seyed Saeed Hamidi, Ebrahim Akbari, Homayun

Motameni, Consensus clustering algorithm based

on the automatic partitioning similarity graph,

Data & Knowledge Engineering, Vol. 124, 2019,

101754.

[4] Jon M. Kleinberg, An Impossibility Theorem for

Clustering, in Advances in Neural Information

Processing Systems 15 [Neural Information Pro-

cessing Systems, NIPS 2002, December 9-14,

2002, Vancouver, British Columbia, Canada],

MIT Press, (2002) 446–453.

[5] Margareta Ackerman, Shai Ben-David, Simina

Brânzei, David Loker: Weighted clustering: To-

wards solving the user’s dilemma. Pattern Recog-

nit. 120, 2021, 108152.

[6] Carlsson, Gunnar and Mamoli, Facundo, Classi-

fying Clustering Schemes, Foundations of Com-

putational Mathematics, 13 (2), (2013) 221–252.

[7] Jyrko Correa-Morris, An indication of unifica-

tion for different clustering approaches, Pattern

Recognit., 46 (9), (2013) 2548–2561.

[8] Jean-Pierre Barthélemy and Bruno Leclerc, The

Median Procedure for Partitions, Partitioning

Data Sets, American Mathematics Society, Series

in Discrete Math, 1995, 3-34.

[9] Marina Meilǎ, Local equivalences of distances

between clusterings - a geometric perspective.

Mach. Learn. 86(3), 2012, 369-389.

[10] Jyrko Correa-Morris, Abel Urra-Yglesias, Este-

fano Reyes, Juan Martínez, Belarmino Gonzalez:

Comparing Partitions: Metric Characterizations,

Mean Partition, and Distortion, SAI (1), 2021,

857-875.

[11] Jyrko Correa-Morris, Comparing Partitions:

Shortest Path Length Metrics and Submodularity,

Intern. J. of Math. Models and Meth. in Appl. Sci.,

13, 2019, 45-51.

[12] Serratosa, F. Redefining the Graph Edit Dis-

tance. SN COMPUT. SCI. 2, 2021, 438.

[13] Shawe-Taylor, J., and Cristianini, N., Kernel

Methods for Pattern Analysis, Cambridge: Cam-

bridge University Press, 2004.

[14] Marco Cuturi, Positive definite kernels in ma-

chine learning, arXiv preprint arXiv:0911.5367,

2009 - arxiv.org.

[15] Bo Ma, Hui-yang Qu, Hau-san Wong, Kernel

clustering-based discriminant analysis, Pattern

Recognit., 40(1), 2007, 324-327.

[16] Inderjit S. Dhillon and Yugiang Guan and Brian

Kulis, Kernel k-means: spectral clustering and

normalized cuts, In KDD ’04: Proceedings of

the tenth ACM SIGKDD international conference

on Knowledge discovery and data mining, 2004,

551-556.

[17] Zhu Y and Ting K., Kernel-based clustering via

Isolation Distributional Kernel. Information Sys-

tems, 117, 2023, 102212.

[18] Yang X, Lin G, Liu Y, Nie F and Lin L., Fast

Spectral Embedded Clustering Based on Struc-

tured Graph Learning for Large-Scale Hyperspec-

tral Image, IEEE Geoscience and Remote Sensing

Letters, 19, 2020, 1-5.

[19] Wang R, Lu J, Lu Y, Nie F and Li X, Dis-

crete and Parameter-Free Multiple Kernel k -

Means, IEEE Transactions on Image Processing,

31, 2022, 2796-2808.

[20] Liu H, Chen J, Dy J and Fu Y., Transform-

ing Complex Problems into K-means Solutions,

IEEE Transactions on Pattern Analysis and Ma-

chine Intelligence, 45(7), 2023, 1-20.

[21] Brijnesh J. Jain, The Mean Partition Theorem

in consensus clustering, Pattern Recognition, 79,

2018, 427-439.

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.77

Jyrko Correa-Morris

E-ISSN: 2224-2880

710

Volume 22, 2023

[22] Sandro Vega-Pons, Jyrko Correa-Morris, José

Ruiz-Shulcloper: Weighted partition consensus

via kernels. Pattern Recognit. 43(8), 2010, 2712-

2724.

[23] William M. Rand, Objective Criteria for the

Evaluation of Clustering Methods, emphJournal

of the American Statistical Association, 66(336),

1971, 668-675.

[24] Sandro Vega-Pons, Xiaoyi Jiang, José Ruiz-

Shulcloper: Segmentation Ensemble via Kernels,

ACPR 2011, 2011, 686-690.

[25] Lucas Franek, Xiaoyi Jiang: Ensemble cluster-

ing by means of clustering embedding in vector

spaces. Pattern Recognit. 47(2), 2014, 833-842.

[26] Jyrko Correa-Morris: The median partition and

submodularity, Appl. Math. Comput. 410, 2021,

126450.

Contribution of Individual Authors to the

Creation of a Scientific Article

Jyrko Correa-Morris was responsible for the sections

of the paper.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

Partial support for this research has been provided

by the U.S. Department of Education under grant

P120A200005.

Conflicts of Interest

The authors have no conflicts of interest to

declare that are relevant to the content of this

article.

Creative Commons Attribution License 4.0

(Attribution 4.0 International , CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

WSEAS TRANSACTIONS on MATHEMATICS

DOI: 10.37394/23206.2023.22.77

Jyrko Correa-Morris

E-ISSN: 2224-2880

711

Volume 22, 2023