Reduction of search space for the mean partition problem
JYRKO CORREA-MORRIS
Mathemtics and Natural Sciences Department
Miami Dade College, Padron Campus
627 SW 27 Ave, Miami, Florida
USA
Abstract: The contributions of this paper are threefold. First, it conducts a formal comparison of the primary ap-
proaches to consensus clustering, using the concepts of agreement and consent. Secondly, it presents theoretical
evidence justifying the preference for mean-based methods, which rely on consent, over other agreement-based
procedural methods like the q-rule, which are now mostly used as quality evaluators in practice. Thirdly, the
paper computes the exact reduction achieved by criteria available in existing literature to assess the quality of
mean-based consensus solutions and reduce the search space’s size. Finally, it compiles the regions where con-
sensus functions associated with well-known dissimilarity measures, such as the Mirkin metric and Variation of
Information, accumulate their consensus solutions.
Key-Words: Lattice of partitions, Consensus, Mean partition, Quota Rules, Cluster ensemble, Search Space
Reduction
Received: May 25, 2023. Revised: August 26, 2023. Accepted: September 28, 2023. Published: October 20, 2023.
1 Introduction
Clustering problems are widespread and have led to
the development of numerous algorithms; however,
the lack of reliable performance information has cre-
ated “The Users Dilemma,” [1], making algorithm
selection uncertain.
To address this, two approaches have emerged.
First, formal theories aim to establish generic con-
cepts and rules to classify and understand cluster-
ing algorithms, [2], [3], [4], [5]. Second, more ad-
vanced algorithms combine outcomes from multiple
traditional methods to produce integrated solutions,
known as “consensus methods,” [6], [7], [8], [9]. This
paper primarily focuses on consensus methods, par-
ticularly those based on the concept of a mean parti-
tion.
1.1 Consensus methods
Consensus algorithms take a set of partitions
p1,p2, . . . , pm, usually called an ensemble or a
profile, generated by standard clustering algorithms
and merge them into a holistic solution called the
“consensus partition.” These input partitions rep-
resent partial solutions, akin to expert judgments
from different perspectives. The consensus par-
tition aims to condense these partial perspectives
into a comprehensive representation. There are
several interesting axiomatic studies, many of them
published in journal of economics or social sci-
ences, [10], [11], [12], [13], [14], which address
partitions as equivalence relations and define the
consensus mechanisms as a function that receives an
m-tuple of equivalence relations (an ordered ensem-
ble) and returns the equivalence relation resulting of
taking the meet (see Section 2) of some subset of the
inputs determined by a fixed subset of the indexes
(e.g., the meet of the first, third, and fifth partitions
in the tuple), [11]. Similar mechanisms are obtained
if instead of the meet, the join operator is used. In
the first case, the consensus mechanism is called a
meet aggregator, while in the second case it is called
join aggregator. What varies from one aggregator
to another of the same kind is the set of indexes
to be considered. Apart from the axiomatic ap-
proaches, there exist two closely related fundamental
mechanisms of consensus in the ambit of partitions,
which have been often applied in pattern recognition
and machine learning tasks: quota rules [15] and
mean-based consensus, [16], [6], [7], [8].
Quota rules constitute a slightly more flexible join
aggregator. Given 0qm(mdenotes the total
number of partitions in the ensemble), the q-quota rule
is the finest partition that places in the same cluster
all those pairs of objects that were placed in the same
cluster of more than qpartitions of the ensemble, [15],
[17]. Given a subset Aof X, we said that a partition
of Xrefines Ais every cluster of this partition either
lies entirely within Aor entirely outside A. The dual
q-quota rule, 0qm, is the coarsest partition that
refines all subsets Aof Xthat are refined by more
than qpartitions of the ensemble.
In turn, the consensus solutions in the mean-based
approach are defined as the minimizers of the con-
sensus function φ(p) =
m
i=1
D(p,pi), where Dis
a dissimilarity measure for quantifying the resem-
blance between two partitions and p1,p2, . . . , pmare
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.81
Jyrko Correa-Morris
E-ISSN: 2224-2880
736
Volume 22, 2023
the members of the ensemble.
When comparing different methodologies, quota
rules align nicely with the idea of agreement, offer-
ing straightforwardly interpretable results. However,
they are highly sensitive to factions within the en-
semble, which is common in practice. This sensitiv-
ity leads to fragmented partitions and limited insights
into data interrelations. Meet aggregators share sim-
ilar sensitivity issues. In contrast, dual quota rules
and join aggregators tend to produce compact parti-
tions dominated by a few large clusters, known as the
“chaining effect.”
Mean-based methods, favored for their robustness
in practical applications, handle diverse ensembles
and provide more informative outcomes compared to
quota rules, especially regarding cluster number and
size. However, accurately computing mean-based so-
lutions can be computationally expensive due to the
large search space, unless additional information is
available for effective reduction. Additionally, the ac-
curacy of approximations by commonly used heuris-
tic methods for averaging remains uncertain.
From a foundational perspective, there are uncer-
tainties about how the consensus function φthat com-
putes the mean using a dissimilarity measure aligns
meet and join operations in the lattice of partitions
with basic arithmetic operations on real numbers.
These unique characteristics raise questions concern-
ing the solutions quality and regarding the solutions
location.
1.1.1 Quality of consensus solutions
As mentioned earlier, quota rules and dual quota rules
align well with the concept of agreement, making
their results easily interpretable and practical. While
quota rules offer limited but consistently reliable in-
formation, dual quota rules might provide some noise
by potentially including some irrelevant information.
Quota rules may not be highly informative as consen-
sus solutions, but they capture essential information.
In contrast, dual quota rules encompass all the infor-
mation (even if it includes noise) that a quality con-
sensus solution should contain.
Consequently, these rules, although not serving as
primary consensus solutions, serve as quality evalua-
tors. They assess the quality of mean-based consen-
sus solutions as follows: a consensus solution is con-
sidered acceptable if it contains at least the informa-
tion provided by a quota rule and no more than what a
dual quota rule offers. These rules play a crucial role
in evaluating the effectiveness of mean-based consen-
sus solutions.
This situation has been empirically addressed by
several researchers, [18], [19], [20].
1.1.2 Location of consensus solutions
To tackle computational challenges, various ap-
proaches have proposed reducing the search space,
each with its own method. While these reductions
may appear intuitive, they often lack substantial ev-
idence or support from general principles governing
consensus solution behavior.
The exponential growth of the Bell sequence (the
nth Bell number represents the number of partitions of
a set with nelements) underscores the impossibility
of evaluating each individual partition of a given fi-
nite dataset to find consensus solutions. Hence, prac-
tical procedures frequently employ unverified search
methods. Pruning the search space becomes neces-
sary to handle computational constraints. This in-
volves identifying a smaller region where consensus
solutions are likely to be found, thereby improving
computational efficiency and approximation quality.
However, certain questions arise that warrant con-
sideration: Is there a common region where con-
sensus solutions, regardless of the consensus func-
tion, should be sought? For many dissimilarity mea-
sures, the answer is yes. As mentioned earlier, quota
rules and dual quota rules serve as lower and upper
bounds of consensus solutions, significantly reducing
the search space.
Another important question is whether all consen-
sus functions accumulate their solutions in the same
region of the search space. The answer is no. For ex-
ample, consensus functions associated with the lattice
metric and the symmetric difference metric accumu-
late their solutions in opposing regions of the partition
space, [15].
1.2 Contributions
Motivated by these considerations, this paper delves
into the distinction between “agreement” and “con-
sent” within the context of clustering consensus meth-
ods, focusing on significant subtleties and nuances,
and formalizes these concepts. Subsequently, the arti-
cle translates the primary agreement-based consensus
criteria into rules for evaluating the quality of mean-
based consensus solutions.
In addition to these criteria, the paper incorporates
other criteria derived from axiomatic approaches,
though it is important to note that not all of them are
equally applicable or desirable. These are necessary
steps to facilitate access to the main contribution of
the paper: the exact computation of the pruning capa-
bility of these criteria.
For each criterion, the paper investigates its po-
tential as a means to prune the search space by pre-
cisely calculating the number of partitions it elimi-
nates. This aspect of the methods has not been ex-
plored to the best of my knowledge, making it a novel
and unexplored area of research.
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.81
Jyrko Correa-Morris
E-ISSN: 2224-2880
737
Volume 22, 2023
By addressing these aspects, the paper aims to
deepen our understanding of clustering consensus
methods and provide valuable insights into their per-
formance and effectiveness.
2 The size of the lattice of partitions
Let X={x1, x2, . . . , xn}be a finite set. A parti-
tion p of Xis a collection p ={C1, C2, . . . , Cs}of
subsets of Xsuch that CiCj=for i=jand
s
i=1Ci=X. The subsets C1, C2, . . . , Csare called
the clusters or blocks of p. Throughout this paper,
mXdenotes the partition all whose clusters are sin-
gletons —mX:= {{x}:xX}—, while gXde-
notes the partition whose only cluster is the entire set
X—gX:= {X}. There is a natural partial order
between partition: we say that the partition p refines
the partition p, in notation notation p p, if the
following condition holds: for every pair of objects
x, xX, if xand xare in the same cluster of p,
then they are also in the same cluster of p. Naturally,
ppmeans that p strictly refines p. We say that
a partition p covers a partition pif (a) p and pare
different partitions; (b) prefines and (c) if pre-
fines p′′ which in turn refines p, then either p′′ =p
or p′′ =p (i.e., there is no partition in between pand
p). The notation pp means p covers p. For any
two partitions p and pthere are partitions that refine
both. The coarsest partition that satisfies this property
is denoted by p pand is called the meet of p and p.
Two elements x, y Xare placed in the same cluster
of p pif and only if they are simultaneously placed
in the same cluster of p and in the same cluster of p.
Therefore the clusters of p pare all the possible
non-empty intersections of one cluster of p with one
cluster of p. For any two partitions p and pthere are
partitions that are refined by both. The finest partition
that satisfies this property is denoted by p pand is
called the join of p and p. Two elements x, y X
are placed in the same cluster of p pif and only if
there is a sequence x=xi1, xi2, . . . , xik=ysuch
that, for all j {1,2, . . . , k 1},xijand xij+1 are
either placed in the same cluster of p or in the same
cluster of p. Thus, the cluster of p pare the subsets
of Xthat can be expressed both as unions of clusters
of p and as unions of clusters of p, and are minimal
with respect to this property. Henceforth, PXdenotes
the lattice of all partitions of X.
The partitions with exactly n1clusters, i.e.,
partitions all whose clusters are singletons except for
one which includes exactly two elements, are called
the atoms or atomic partitions of PX. From now
on, pxxdenotes the atomic partition whose only non-
singleton cluster is {x, x}. Moreover, AX, or simply
A, denotes the set of atoms of PXwhile A(p)denotes
the set of all the atoms of PXthat refine the given
partition p. Analogously, the partitions with exactly
2 clusters are called the co-atoms or co-atomic parti-
tions of PX. Henceforth, pMdenotes the co-atomic
partition whose only clusters are Mand XM.
CAXdenotes the set of co-atoms of PXand CA(p)
denotes the set of co-atoms of PXthat are refined the
given partition p. It is well-known that every partition
p=mXcan be expressed as the join of atom while
every partition p =gXcan be represented as the meet
of co-atoms.
The number of partitions in PXis given by the nth
Bell number, Bn, where nis the number of elements
in X. Bell numbers can be computed by using any of
the following equivalent formulas:
Bn=1
e
k=0
kn
k!(Dobinski’s formula); (1)
Bn=
n1
k=0 n1
kBk, B0= 1; (2)
Bn=
n
k=1
S(n, k).(3)
Equation (3) defines the Bell numbers as the sum
of the Stirling numbers of the second kind, S(n, k),
where, for specific values of nand k,S(n, k)gives
the number of partitions of nelements into kclusters.
In turn, Equations (1) and (2) together show that the
difference between two consecutive Bell numbers is
an exponential, which implies that Bnincreases ex-
tremely quickly as nincreases. Table 1 shows the first
ten Bell numbers, which are additionally compared to
2n1, as well as the difference between any two con-
secutive of them, in order to numerically illustrate the
claim above.
Table 1: The first ten Bell numbers
n7 8 9 10
Bn877 4,140 21,147 115,975
BnBn1674 3,273 17,007 94,828
2n164 128 256 512
3 Consensus by agreement vs.
consensus by consent: Quota rules
and mean-based consensus
This section commences by exploring the distinction
between two closely related concepts: ”agreement”
and ”consent.” The primary objective of this discus-
sion is to offer formal evidence regarding the dissimi-
larity between the two main approaches in consensus
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.81
Jyrko Correa-Morris
E-ISSN: 2224-2880
738
Volume 22, 2023
methods: quota rules and mean-based methods. Fur-
thermore, it seeks to clarify why the latter approach
has gained more popularity in applied fields such as
pattern recognition and machine learning.
In essence, agreement can be viewed as what is
desired, whereas consent represents what is accepted.
Consequently, in an ensemble, a member may choose
to consent to a proposal that is not their top choice,
aiming to achieve a solution that might not be optimal
but is satisfactory to all members. As a result, obtain-
ing consent appears to be easier than achieving agree-
ment. This flexibility in consent-based methods leads
to more informative solutions with a better balance
between the size and number of clusters. On the con-
trary, agreement-based methods tend to reveal only a
limited amount of the underlying data structure, pri-
marily due to either a high level of fragmentation or a
high level of concentration. Organizing the lattice of
partitions, we observe that agreement methods often
produce partitions that lie either in the lower or higher
levels of the partition space.
Despite the limited information provided by
agreement-based methods, it is undeniable and can
be valuable as a source of information when seeking
to prune the search space in mean-based methods. In
this context, it plays a crucial role, even though it may
not suffice as a standalone consensus solution.
To formalize these ideas, I shall introduce the fol-
lowing definition:
Definition 1 In the process of creating the consensus
solution p, the jth member of the ensemble, pj, is
said to agree that the pair of objects xand xto be
placed in the same cluster of pif and only if xand x
are in the same cluster of pj. Thus, the ensemble E=
{p1,p2, . . . pm}is said to contain a quota q[0,1]
of agreement with the fact that the pair of objects x
and xin Xto be placed in the same cluster of pif
and only if more than mq members of the ensemble E
agree that xand xto be placed in the same cluster of
p. In other words, if
|{pj E :pxxpj}| > mq.
In the process of creating the consensus solution
p, the jth member of the ensemble, pj, is said to con-
sent, according to the dissimilarity D,that the pair of
objects xand xin Xto be placed in the same cluster
of pif and only if the partition obtained from pby
merging those clusters containing xand x, respec-
tively, either maintains or lowers the dissimilarity
with respect to pj. In other words, D(pj,ppxx)
D(pj,p). The entire ensemble Eis said to consent,
according to D, that the pair of objects xand xin
Xto be placed in the same cluster of pif and only if
φ(ppxx)φ(p), which amounts to saying that
m
i=1
D(pi,ppxx)
m
i=1
D(pi,p).
The dissimilarity D(pj,p)can be thought as the
degree of consent that the jth member of the ensem-
ble, pj, assigns to the proposal “the partition pis the
consensus solution”.
This definition establishes that a consensus
reached by agreement is the solution of certain q-
quota rule, while a consensus obtained by consent is
the solution of a certain mean-based method. Now, it
is worthy, for the sake of completeness, to introduce
a more formal definition of quota rules.
Quota rules:Given an ensemble E=
{p1,p2, . . . pm}of partial partitionings of the
data set X, and a real number q[0,1], the q-quota
rule, in notation pq, is the finer partition from among
those that are refined by every atom that, in turn,
refines more than mq members of the ensemble.
By definition, the finer partition refined by some
given partitions is their join. It can be therefore con-
cluded that:
pq={pxx A :|{pi E :pxxpi}| > mq}.
In the case that the set of such atoms is empty, then
pq=mX.
Among the q-quota rules, unanimity rule and ma-
jority rule stand out by being used the most in practice
(see, for instance, [15]).
Unanimity rule:The unanimity rule is the q-quota
rule that corresponds to q=m1
m. (In this case,
mq =m1, which means that the atoms to be con-
sidered are those that refine all the members of the
ensemble.) Accordingly, the unanimity consensus is
given by the meet of all the members of the ensemble:
m
i=1
pi.
Majority rule:The majority rule is the q-quota
rule that corresponds to q= 1/2.
In an effort to address the issue of quota rules tend-
ing to generate small clusters, the concept of Dual
Quota Rules has been introduced [15].
Dual quota rules:Given an ensemble E=
{p1,p2, . . . pm}of partial partitionings of the data set
Xand a real number q[0,1], the dual q-quota rule,
in notation pd
q, is the coarser partition from among
those that refine every co-atoms that, in turn, is re-
fined by more than mq members of the ensemble.
Since the coarser partition that refines some given
partitions is their meet, it can be concluded that:
pd
q={pMCA,:|{pi E :pipM}| > mq}.
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.81
Jyrko Correa-Morris
E-ISSN: 2224-2880
739
Volume 22, 2023
In the event that the set of such co-atoms is empty,
then pd
q=gX.
The Dual unanimity rule and the Dual majority
rule correspond to q=m1
mand q=1
2, respectively.
Similar to quota rules, these dual rules are frequently
employed in practical applications.
Restated in terms of consent, quota rules and dual
quota serve as criteria designed to evaluate the in-
ner workings of other consensus methods that may
not be equally transparent. This assessment provides
valuable information about the quality and location of
their consensus solutions, which can then be utilized
to effectively prune the search space.
Quota rules for consent:Given an ensemble E=
{p1,p2, . . . pm}of partial partitionings of the data set
Xand a real number q[0,1], the q-quota rule for
consent establishes that any consensus partition, par-
ticularly the minimizers of the mean-based consensus
function φ(p) =
m
i=1
D(p,pi), must be refined by pq.
In essence, q-quota rules for consent stipulate that
pairs of objects appearing together in the same cluster
for more than qm partitions of the ensemble should
be grouped together in the same cluster of any con-
sensus partition. Consequently, this rule ensures that
any consensus solution must be at least as inclusive as
pq, rather than declaring pqitself as the consensus so-
lution. To illustrate, in terms of intervals in PX, these
rules propose replacing a larger interval, namely the
entire set PXof partitions X, with a smaller interval
defined as [pq,gX].
Similarly, the dual q-quota rules for consent assert
that any consensus solution lies within the interval
[mX,pq].
Dual quota rules for consent:Given an ensem-
ble E={p1,p2, . . . pm}of partial partitionings of
the data set Xand a real number q[0,1], the dual
q-quota rule for consent establishes that any consen-
sus partition, particularly the minimizers of the mean-
based consensus function φ(p) =
m
i=1
D(p,pi), must
refine pd
q.
It is time to introduce the Undesired Atoms crite-
rion. Instead of listing the pairs that should appear
together in the same cluster of a consensus partition,
the following criterion has a prohibitive approach.
Undesired atoms:Given an ensemble E=
{p1,p2, . . . pm}of partial partitionings of the dataset
X, this rule establishes that any atomic parti-
tion pxxfor which there is no sequence x=
xi1, xi2, . . . , xik=xsuch that every atom pxij,xij+1 ,
j= 1,2, . . . , j 1, refines at least one partition of the
ensemble E, must refine no consensus solution.
This rule finds its foundation in the notion that
there are essentially two fundamental reasons for
placing a pair of objects in the same cluster of a con-
sensus partition: (i) when the ensemble reaches an
agreement on this pairing, and (ii) when the ensem-
ble decides to do so through consent or cooperation.
For (i), this pair must appear together in a sufficient
number of ensemble members, ensuring the required
quota of agreement. Conversely, in (ii), this pair is
placed together in the same cluster of the consensus
partition due to the chaining effect resulting from co-
operation among the ensemble members. An unde-
sirable atom is one that lacks justification for its ele-
ments to be grouped together in the same cluster, nei-
ther through agreement nor cooperation.
To elaborate, if a partition p refined by an unde-
sirable atom pxxis prohibited from being a consen-
sus solution, it implies that the union of the intervals
[pxx,gX], where pxxranges over the set of undesired
atoms, does not contain any consensus partitions.
To conclude this section, I will introduce two novel
criteria that I have recently uncovered while inves-
tigating the influence of submodularity on the local-
ization of consensus partitions, [21], [22]. Submodu-
lar functions can be considered akin to discrete con-
vex functions, and as it is widely recognized, con-
vexity is one of the most advantageous scenarios in
continuous optimization. The exploration of the im-
pact of this seemingly natural property has garnered
increasing interest within several areas of mathemat-
ics and computer science, including optimization and
machine learning.
Unanimity rule for consent bounded from above
by the ensemble:Given an ensemble E=
{p1,p2, . . . pm}of partial partitionings of the dataset
X, this rule establishes that any consensus solution
must be refined by the meet of all the members of E,
m
j=1 pj, and refines at least one member of E.
Essentially, this rule asserts that any consensus
partition can be obtained by refining a certain parti-
tion of the ensemble to be discovered, ensuring that
no pairs placed together in the same cluster for any
partition in the ensemble are split apart. Formally,
this statement implies that all consensus partitions lie
within the union of the intervals
m
j=1
pj,pi
, pi E.
While this rule may impose a restrictive condition,
it is not intended to be satisfied by every consensus
function; rather, it has emerged naturally when inves-
tigating the behavior of consensus solutions produced
by submodular consensus functions.
In a similar vein and stemming from the same
source, the following and final rule posits that any
consensus partition can be identified by merging clus-
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.81
Jyrko Correa-Morris
E-ISSN: 2224-2880
740
Volume 22, 2023
ters from a particular member of the ensemble to be
revealed, with the assurance that all new pairs are the
result of cooperation among the members of the en-
semble.
Dual unanimity rule for consent bounded from
below by the ensemble:Given an ensemble E=
{p1,p2, . . . pm}of partial partitionings of the dataset
X, this rule establishes that any consensus solution
must be refined by at least one the pis in Eand re-
fines m
j=1 pj, the join of all the members of E. Ac-
cordingly, any consensus partition lies in the union of
the intervals
pi,
m
j=1
pj
, pi E.
4 On the exact prune of the search
space
In this section, we calculate the exact reduction of-
fered by each criterion examined in the previous sec-
tion. The phrase ”reduction provided by the rule R
implies that if an average consensus function satisfies
the rule R, the minimizers of φare among the parti-
tions of PXaccepted by R.
As each criterion provides a region that is either an
interval or a collection of intervals, it is convenient to
determine the number of partitions contained within
an interval [p,p].
Lemma 1 Let pand p={C1, C2, . . . , Ck}be par-
titions such that pp. If the cluster Ci,1ik,
of pis the union of niclusters of p, then the number
of partitions in the interval [p,p]is
k
i=1
Bni.
Table 2 shows some illustrative numbers.
Table 2: Computing the number of partitions in [p,p]
by applying Lemma 1
n k #p n1;. . . ;nk
k
i=1
BniBn
10 2 7 4 75 115,975
10 2 8 4 225 115,975
10 2 8 6 406 115,975
12 3 10 5 520 4,213, 597
12 4 11 4 300 4,213,597
15 3 12 5 3 900 1,382,958,545
15 4 10 4 375 1,382,958,545
PROOF 1 Any partition that refines the partition p
can be thought of as an union of partitions of the clus-
ters of p. Thus, as a first step toward our desired
conclusion, we could think that if we compute the to-
tal number aiof ways in which every cluster Ciof p
can be partitioned, then the product of the product of
the ais would provide us with an estimate of the num-
ber that we are looking for. Notice that Ciis a set
itself, hence the number of partitions of Ciis the Bell
number corresponding to the number of elements in X
that form Ci. However, we do not want to count every
partition that refines p, but only those that are refined
by pas well. To take into consideration the latter con-
dition, it suffices to count only those partitions of Ci
in which the elements lying in the same cluster of p
remain together. This means that the finest partition
of Cithat we can consider is that consisting of those
clusters of pthat are contained in Ci. This suggests
that, instead of considering the elements of Xthat
form Cito determine the Bell number that matches
ai, the clusters of pcontained in Cishould be con-
sidered as individual elements to determine such Bell
number. Since there are niof such clusters of p, the
number of ways of partitioning Ciunder the required
condition is Bni, the nith Bell number. Therefore, the
number of partitions in the interval [p,p]is
k
i=1
Bni.
Now, we are poised to calculate the precise reduc-
tion offered by each criterion presented in the previ-
ous section. I will present them in the same order in
which they were introduced earlier.
Proposition 1 If pqhas kclusters, then the reduction
provided by q-quota rule for consent is of BnBk
partitions.
PROOF 2 Remember that the q-quota rule for con-
sent basically establishes that any consensus partition
must lie in the interval pq,gX. Since the only cluster
of gXis the entire set X, which is the union of kclus-
ters of pq, it can be concluded, in view of Lemma 1,
that the number of partitions in the interval pq,gXis
Bk. Consequently, the reduction provided by q-quota
rule is of BnBkpartitions.
Let us analyze now the reduction provided by the
dual q-quota rule.
Proposition 2 If the cluster Ciof pd
qhas nielements,
then the reduction provided by the dual q-quota rule
is of Bnk
i=1 Bnipartitions.
PROOF 3 The dual q-quota rules basically states
that any consensus partition lies in the interval
mX,pd
q. Since all the clusters of mXare singletons,
each cluster of pd
qis the union of niclusters of mXif
and only if it contains nielements of X. The result is
now a direct consequence of Lemma 1.
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.81
Jyrko Correa-Morris
E-ISSN: 2224-2880
741
Volume 22, 2023
Now, let us direct our attention to the Undesired
Atoms rule. Before calculating the reduction pro-
vided by this rule, we must address a more funda-
mental question: How many undesirable atoms are
there? An atom pxxis deemed undesirable if and only
if the elements xand xbelong to different clusters
of
m
j=1
pj—the join of all members in the ensemble
E. Thus, the number of undesirable atoms is equiva-
lent to the number of possibilities for selecting a pair
of elements from Xthat reside in distinct clusters of
m
j=1
pj.
Given this, each element in an arbitrary cluster Cu
of the join can be paired with each element in an-
other arbitrary cluster Cvof this partition, resulting
in #Cu·#Cvundesirable pairs. This proves the fol-
lowing lemma.
Lemma 2 If
m
j=1
pjhas kdistinct clusters
C1, C2, . . . , Ckwith n1, n2, . . . , nk, respectively,
then, according to E, there are
1u<vk
nunv
undesired atoms.
Lemma 3 The join of r1atoms consists of at least
nrclusters (and at most n1clusters).
PROOF 4 Given an arbitrary partition pand an ar-
bitrary atom pxx, either xand xare together in the
same cluster of por not. In the case that xand xare
placed in the same cluster of p, then ppxx=p. Oth-
erwise, when the join of pand pxxis taken, the clus-
ters of pthat contain xand x, respectively, merge into
a single cluster. Thus, ppxxp, which means that
ppxxhas one cluster less than p. Therefore, each
time the join of any partition and an atom is taken,
either the partition states the same or its number of
clusters decreases by one.
Suppose the join of ratoms is now considered.
We can think of the initial partition pas any of these
atoms, and then, we will take the join with the r1
remaining atoms, one at a time. By virtue of the rea-
soning that we just made, after having performing the
join with all the r1remaining atoms, the initial p
lost no more than r1clusters.
Proposition 3 Let UA ={A1,A2, . . . , Aq}be the
set of all the undesired atoms according to the ensem-
ble E. The reduction provided by Undesired Atoms
rule is of
qBn1+
n1
j=2
q
r=j
(1)r+1prj
Bnj(4)
partitions, where prj is the number of sets consist-
ing of exactly rundesired atoms whose join has nj
clusters (1jr).
PROOF 5 According to this rule, if a partition lies
in some of the intervals [Aj,gX], then it is prohib-
ited from being a consensus solution. As a result,
the union of these intervals must be removed from the
search space and the reduction provided by this rule
is the cardinality of such a union.
Applying the inclusion-exclusion principle, we get
that #
q
j=1
[Aj,gX]is equal to
q
r=1
u∈Or
q
(1)r+1#
u
pu,gX,(5)
where Or
q:= {u= (u1, u2, . . . , ur)Zr: 0 <
u1< u2< . . . < ur< q}and upushould be
understood as pu1pu2. . . pur, provided u=
(u1, u2, . . . , ur) Or
q.
Now, by virtue of Lemma 3, the join of r2atoms
at least nrclusters. Let us group all the possible
joins of rundesired atoms according to their number
of clusters. Using the fact that if a partition phas
njclusters, the interval [p,gX]has Bnjparti-
tions, we can conclude that Aj1Aj2. . . Ajrcon-
tributes to (5) with (1)r+1Bnjprovided this join
has njclusters. Denoting now by prj the num-
ber of joins of ratoms that have njclusters, we
get that the total contribution such undesired atoms
is (1)r+1prj Bnj. Adding up these contributions,
the result follows.
Next, we examine the Unanimity Rule for Con-
sent Bounded from Above by the Ensemble. As
demonstrated earlier, this rule stipulates that consen-
sus solutions should lie in the union of the inter-
vals
m
j=1
pj,pi
,i= 1,2, . . . , m. Applying again
the inclusion-exclusion principle combined with the
property that the intersection of two intervals of the
form [p,p]and [p,p′′], respectively, is the interval
[p,pp′′]because pp′′ is the coarser partition that
simultaneously refines the partitions pand p′′, it fol-
lows that
m
i=1
m
j=1
pj,pi
is equal to:
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.81
Jyrko Correa-Morris
E-ISSN: 2224-2880
742
Volume 22, 2023
m1
r=1
(1)r1
u∈Or
m
#
m
j=1
pj,
u
pu
.
It suffices now to apply Lemma 1 to prove the fol-
lowing proposition.
Proposition 4 Given u Or
m,r= 1,2, . . . , m 1,
let nu
sustand for the number of clusters of
m
j=1
pj
whose union is the suth cluster of
u
pu, where
pu1,pu2, . . . , purbelong to ensemble E. The re-
duction provided by Unanimity Rule for Consent
Bounded From Above by the Ensemble is of:
Bn
m1
r=1
(1)r1
u∈Or
m
su
Bnu
su.
Analogously to the previous rule, we can calculate
the reduction provided by the Dual Unanimity Rule
for Consent Bounded from Below by the Ensemble.
This rule essentially states that any consensus solu-
tion lies in the union of the intervals
pi,
m
j=1
pj
,
i= 1,2, . . . , m. The cardinality of these intervals
can be determined using the inclusion-exclusion prin-
ciple coupled with the fact that the intersection of two
intervals of the form [p,p]and [p′′,p]is the interval
[pp′′,p]because pp′′ is the finer partition that
simultaneously refines the partitions pand p′′.
Thus, the cardinality #
m
i=1
pi,
m
j=1
pj
is equal to:
m1
r=1
(1)r1
u∈Or
m
#
u
pu,
m
j=1
pj
.
On account of Lemma 1 we get the following
proposition.
Proposition 5 Given u Or
m,r= 1,2, . . . , m
1, let mu
sstand for the number of clusters of
u
pu,
pu1,pu2, . . . , pur E, whose union is the sth cluster
of
m
j=1
pj,1sk. The reduction provided by Dual
Unanimity Rule for Consent Bounded From Below by
the Ensemble is of:
Bn
m1
r=1
(1)r1
u∈Or
m
k
s=1
Bmu
s.
5 State-of-the-art dissimilarity
measures
The Table 3 compiles the most significant results
reported in literature for consensus functions associ-
ated to well-known dissimilarity functions.
Table 3: State-of-the-art measures’ compliance with
reduction criteria (Unanimity (U); Dual Unanimity
(DU); Majority (M); Dual Majority (DM)).
Measure U DU M DM
Var. of Inform.
Mirkin metric
Dual Symm. Diff. Metric
Lattice Metric
0-1 Disagreement meas.
Split-merge meas.
As previously mentioned, the works of
Barthélemy and Leclerc ensure that the consen-
sus functions linked to any metric constructed from
a lower valuation exhibit a behavior akin to the
consensus functions associated with the Variation
of Information and Mirkin metric, respectively.
Likewise, the consensus functions related to any
metric constructed from an upper valuation display a
behavior similar to the consensus function associated
with the lattice metric [15].
6 Conclusions and future work
This paper embarked on an exploration of lattice the-
ory to delve into the fundamental distinctions between
the two primary approaches to consensus within the
domain of partitional clustering. By presenting for-
mal evidence, we clarified the preference for average-
based methods over quota rules, despite the latters
intuitive and easily interpretable nature. We under-
scored that quota rules and similar criteria have not
diminished in importance; rather, they have assumed
a new role.
These methods are now instrumental in apprais-
ing the quality of average-consensus solutions and
furnishing valuable insights into the whereabouts of
average solutions. They assist in the efficient prun-
ing of the search space. The principal contribution
of this article is the compilation of key criteria for
evaluating average-consensus solutions and the quan-
tification of the exact reduction in the search space
achieved by each criterion. Moreover, these findings
offer insights into the pruning capabilities of each cri-
terion, enhancing our understanding of mean-based
consensus methods, which carry significant practical
implications.
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.81
Jyrko Correa-Morris
E-ISSN: 2224-2880
743
Volume 22, 2023
Looking ahead, further investigation is warranted
to analyze the various average consensus functions
corresponding to well-known dissimilarity measures.
While the Symmetric Difference measure has re-
ceived extensive scrutiny, other widely-used metrics
such as Variation of Information and Van Dongen’s
metric remain less explored. The results presented
here expand the horizons for in-depth exploration of
mean-based consensus methods in future research.
References:
[1] Anil K. Jain, Data clustering: 50 years beyond K-
means. Pattern Recognit. Lett. 31(8), 2010, 651-
666.
[2] Jon M. Kleinberg, An Impossibility Theorem for
Clustering, in Advances in Neural Information
Processing Systems 15 [Neural Information Pro-
cessing Systems, NIPS 2002, December 9-14,
2002, Vancouver, British Columbia, Canada],
MIT Press, (2002) 446–453.
[3] Margareta Ackerman, Shai Ben-David, Simina
Brânzei, David Loker: Weighted clustering: To-
wards solving the users dilemma. Pattern Recog-
nit. 120, 2021, 108152.
[4] Carlsson, Gunnar and Mamoli, Facundo, Classi-
fying Clustering Schemes, Foundations of Com-
putational Mathematics, 13 (2), (2013) 221–252.
[5] Jyrko Correa-Morris, An indication of unifica-
tion for different clustering approaches, Pattern
Recognit., 46 (9), (2013) 2548–2561.
[6] Tossapon Boongoen, Natthakan Iam-On, Cluster
ensembles: A survey of approaches with recent
extensions and applications, Computer Science
Review, 28, 2018, 1-25.
[7] Mustafa R. Kadhim, Guangyao Zhou, Wenhong
Tian, A novel self-directed learning framework
for cluster ensemble, Journal of King Saud Uni-
versity - Computer and Information Sciences,
34(10), Part A, 7841-7855.
[8] Hanan G. Ayad, Mohamed S. Kamel, On voting-
based consensus of cluster ensembles, Pattern
Recognition, 43(5), 2010, 1943-1953.
[9] Caroline X. Gao, Dominic Dwyer, Ye Zhu,
Catherine L. Smith, Lan Du, Kate M. Filia, Jo-
hanna Bayer, Jana M. Menssink, Teresa Wang,
Christoph Bergmeir, Stephen Wood, Sue M. Cot-
ton, An overview of clustering methods with
guidelines for application in mental health re-
search, Psychiatry Research, 327, 2023, 115265.
[10] Fishburn, Peter C., and Ariel Rubinstein, Aggre-
gation of equivalence relations, Journal of classi-
fication 3, 1986, 61-65.
[11] Dinko Dimitrov, Thierry Marchant and Debasis
Mishra, Separability and aggregation of equiva-
lence relations, Economic Theory, 51 (1), 2012,
212.
[12] Livieratos, John, Phokion G. Kolaitis, and Lef-
teris Kirousis, On the computational complexity
of non-dictatorial aggregation, Journal of Artifi-
cial Intelligence Research, 72, 2021, 137-183.
[13] Baronchelli Andrea, The emergence of con-
sensus: a primer, R. Soc. Open Sci., 2018,
5172189172189.
[14] Jeff E. Biddle; Daniel S. Hamermesh, Theory
and Measurement: Emergence, Consolidation,
and Erosion of a Consensus, History of Political
Economy, 49 (Supplement), 2017, 34–57.
[15] Jean-Pierre Barthélemy and Bruno Leclerc, The
Median Procedure for Partitions, Partitioning
Data Sets, American Mathematics Society, Series
in Discrete Math, 1995, 3-34.
[16] Sandro Vega-Pons, Jyrko Correa-Morris, José
Ruiz-Shulcloper: Weighted partition consensus
via kernels. Pattern Recognit. 43(8), 2010, 2712-
2724.
[17] Zoi Terzopoulou, Quota rules for incomplete
judgments, Mathematical Social Sciences, 107,
2020, 23-36.
[18] J. He, L. Cai and X. Guan, Differential Pri-
vate Noise Adding Mechanism and Its Applica-
tion on Consensus Algorithm, IEEE Transactions
on Signal Processing, 68, 2020, 4069-4082, doi:
10.1109/TSP.2020.3006760.
[19] Teixeira, P., Marques, M. M., Hagger, M.
S., Silva, M., Brunet, J., Duda, J., Hag-
ger, M., Classification of techniques used in
self-determination theory-based interventions in
health contexts: an expert consensus study, 2019,
https://doi.org/10.31234/osf.io/z9wqu
[20] Adrianna Kozierkiewicz-Hetmańska, The anal-
ysis of expert opinions’ consensus quality, Infor-
mation Fusion, 34, 2017, 80-86.
[21] Jyrko Correa-Morris, Comparing Partitions:
Shortest Path Length Metrics and Submodularity,
Intern. J. of Math. Models and Meth. in Appl. Sci.,
13, 2019, 45-51.
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.81
Jyrko Correa-Morris
E-ISSN: 2224-2880
744
Volume 22, 2023
[22] Jyrko Correa-Morris: The median partition and
submodularity, Appl. Math. Comput. 410, 2021,
126450.
Contribution of Individual Authors to the
Creation of a Scientific Article
Jyrko Correa-Morris was responsible for the sections
of the paper.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
Partial support for this research has been provided
by the U.S. Department of Education under grant
P120A200005.
Conflicts of Interest
The authors have no conflicts of interest to
declare that are relevant to the content of this
article.
Creative Commons Attribution License 4.0
(Attribution 4.0 International , CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
WSEAS TRANSACTIONS on MATHEMATICS
DOI: 10.37394/23206.2023.22.81
Jyrko Correa-Morris
E-ISSN: 2224-2880
745
Volume 22, 2023