Mathematics of Classifications, Chu spaces and the Continuum
DANIEL PARROCHIA
Institute of philosophical research (IRPHIL)
University Jean Moulin - Lyon III
1C avenue des Frères Lumière CS 78242 69372 Lyon Cedex 08
FRANCE
Abstract: We call classification the operation consisting of distributing objects in classes or groups which are,
in general, less numerous than them. The relations between these classes may be (or not) partially or totally
ordered. So there exist many kinds of classification schemes. Formally speaking, a classification may be a lattice,
a semilattice, a chain, a hypergraph, a matroid, a tree, etc. Our purpose is to find the underlying mathematical
structure of all these classifications. We explain how we can represent them in a unique way, constituting what has
been called before a ”metaclassification” and which is, in fact, a Chu space. Thus, category theory can describe
in terms of morphisms different operations on classifications that, according to Barwise and Seligman, we report
in the following. Finally, we show that such a ”metaclassification” is the fundamental brick from which we can
get some information about the mathematical continuum.
Key-Words: Partition, partition lattice, chains of partitions, Chu spaces, ellipsoids, continuum.
Received: May 19, 2022. Revised: February 13, 2023. Accepted: March 21, 2023. Published: April 13, 2023.
1 Introduction
In this article, we first define what should be an invari-
ant for a classification, as well as a classification func-
tion. Then we give different examples of increasingly
strong classificatory structures (hypergraphs, covers,
class systems, partitions). In the following, we study
more particularly the lattice of the partitions of a finite
set. We show that this model generalizes to the case
of minimum covers and to other situations. We then
introduce the project of a general theory of classifi-
cations with its related concepts (metaclassification,
Chu spaces, categories and infomorphisms). Finally,
failing to be able to construct a real algebra of classifi-
cations, we introduce a conjecture of convergence for
the ellipsoids of the space of classifications. We end
on the question of the continuum, already mentioned
in [19].
2 Invariants and classification
functions
Let Ebe a nonempty finite or infinite set. The various
elements of Eare usually compared by the means of
some invariant (see, [13]).
For example, partitioning N, the set of natural num-
bers, into odd and even numbers, supposes you take
for invariant their classes modulo 2. Now, if you
want to classify the abstract sets in general, then you
will have to take for invariant their cardinals. In ex-
perimental sciences (physics, chemistry, natural sci-
ences...), there are more complex invariants, such as
discrete groups, Lie groups and so on.
Invariants are criteria that allow us to tell whether
the objects we compare are similar or not. How-
ever, in practical domains, the invariants are not easy
to find. Generally, we must previously construct
the notions of ”likeness”or ”proximity”of two ob-
jects according to the phenomenology, and use, in or-
der to express them rigorously, a mathematical coef-
ficient of similarity, which is a pairwise ”distance”
among the objects of the basic set. These ones are
points of the space that we identify with the set E=
{a, b, ..., x...}.
We then define a function d:E×ERwith par-
ticular properties (generally, it satisfies the axioms of
some distance or dissimilarity coefficient) that we will
define later.
Now a classification function is a function fthat takes
this distance function don Eand returns some orga-
nization Γof E(hypergraph, system of classes, par-
tition, etc.). Γis a ”classification” and the sets in Γ
will be called ”classes”. There may be some prob-
lem to define such a function and some yet reasonable
requirements (such as scale invariance, richness and
consistency) lead to a contradiction. One or the other
must be weakened (see, [14]).
3 Examples of different structures
Definition 3.1 (Hypergraph).Let P(E)be the pow-
erset of E. A hypergraph is a pair H= (E, P ),
where Eis a set of vertices (or nodes) and Pa set
of nonempty subsets called (hyper)edges or links.
Therefore, Pis a subset of P(E)\ . In such a struc-
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.14
Daniel Parrochia
E-ISSN: 2224-3402
119
Volume 20, 2023
ture, the set of edges, does not ”cover” the set E, be-
cause some node may have a degree zero, i.e. may
have no link to some edge (see, [6]).
Definition 3.2 (Covers).Assume Definition 3.1 and
suppose now we add the following conditions (see,
[18]):
(C0)EP.
In this case, we say that Pis a cover (or covering) of
the set of vertices E.
Definition 3.3 (Systems of classes).Assume now
Definition 3.1 and 3.2. and suppose that, for every
element, its singleton is in P. In symbols:
(C1)xP, {x} P.
then, we get a system of classes, in the sense of [7].
Definition 3.4 (Partitions).Assume 3.1, 3.2 and 3.3
and let us add now the new following condition. For
every cibelonging to P:
(C2)cicj=.
(C3)ci=E.
Then Pis a partition of Eand the ciare the classes of
the partition P.
4 The lattice of partitions
Call now xP y, the relation ”x belongs to the same
class as y” and denote P(E)the set of partitions of
a set E. A partition Pis finer than a partition Pif
xP y =xP y. This relation allows us to define a
partial order on P(E)that we shall denote PP.
We can see immediately that (P(E),) is a lattice:
1) it is a partial order; 2) moreover, every pair (P, P )
has a greatest lower bound PPand a least up-
per bound PP. One proves that P(E)is comple-
mented, semi-modular and atomic (if the initial data
Eis a non atomic set, we can, under reasonable con-
ditions, reduce the data to the atomic elements of E)
(see, [15]).
Example : The lattice of partitions for |E| = 3 (see Fig.
1) : as usual, we write (a, b)for {{a},{b}}:
P0= (a, b, c); P1= (ab, c); P2= (ac, b);
P3= (a, bc); P4=E= (abc).
P0is the discrete partition, whose classes are single-
tons. P4is only one class, say: E.
(123)
(12,3)!! ! ! (13,2)! ! (1, 23)
! ! ! (1, 2, 3)
Figure 1: The partition lattice for |E|= 3
5 Hierarchical classification
In this context, a hierarchical classification, i.e a chain
Cof partitions of the lattice P(E), is a totally ordered
subset of P(E). We have:
C={P1, P2, ..., Pn}with P1< P2< ... < Pn,
and PiP(E).
Example : the four chains of P(E)for |E|= 3
are:
C0={E, P0};C1={E, P1, P0};
C2={E, P2, P0};C3={E, P3, P0}.
Of course, there is a one-to-one correspondence be-
tween chains and usual presentation of hierarchical
classifications, that are trees or dendograms (see Fig.
2):
{{a,b,c}}! ! ! ! ! {{a,b,c}}
{{a,b}{c}!! ! {a,b}! ! ! {c}! ! !
!!!!
{{a}{b}{c}} {a}! ! ! ! {b}! ! ! {c}
0
1
2
Figure 2: Correspondence between chain and hierar-
chical classification
Note that the whole set of chains C(E) has itself a
mathematical structure : it is a semilattice for set in-
tersection. This model allows us to get all the possible
partitions of P(E)and all the possible chains of C(E),
the set of chains of the partition lattice P(E).
6 Distance and ultrametric matrices
Given a tree or dendogram Tcorresponding to a chain
of partitions, we introduce now (as in Fig.2, right)
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.14
Daniel Parrochia
E-ISSN: 2224-3402
120
Volume 20, 2023
an index iI(IN)associated to the levels
of this structure, so it becomes an indexed hierarchy
H.
Let us now define on the product set E×Ea mapping
d:E×ER+, such that, for all e, eE, we have
the following properties:
1. d(a, b)0;
2. d(e, e) = 0 iff e=e;
3. d(e, e) = d(e, e).
dis a dissimilarity coefficient. Now if it satisfies
also:
4. e, e, e′′ E, d(e, e)Max(d(e, e′′), d(e, e′′)),
then, dis an ultrametrics.
By setting, for all e, eE, d(e, e) = i(He, e), we
can deduce from the indexed hierarchy Ha dissimi-
larity don E, which is in fact an ultrametrics on E.
Clearly, d(e, e)is the smallest iR+, so that eand
eare in the same class of Pi, i being the level of the
classification.
Example : Consider Fig. 2 (right) : 0 is the level of
discrete partition, 1 the intermediary level and 2 the
level of the coarse partition, the set Eitself, as a whole
part of itself. We can see that:
d(a, a) = 0 because level 0 is the level where 1 be-
longs to the same class as itself. d(a, b) = 1 because
level 1 is the level where 1belongs to the same class
as 2.d(a, c) = 2 because level 2 is the level where 1
belongs to the same class as 3.
So we can write the corresponding ultrametric ma-
trix:
a b c
a0 1 2
b1 0 2
c2 2 0
The problem is that there is no chance the objects
of the world be spontaneously ordered according to
some ultrametric distance. If there any, we have to
extract it from the empirical data, with some reason-
able hypotheses.
7 A generalization
To solve the problem, clustering scientists propose
algorithms to construct classifications by bottom-up
or top-down methods which aggregate the objects to
be classified by comparing the pairwise distances de-
fined on them. There is a lot of possibilities.
Since, a new problem arise : there is no consensus
about the type of distances we can choose to put em-
pirical datas in order, whether it concerns the dis-
tances between the objects or the distances between
the subsets of objects.
But in the fundational point of view which is ours, it
is clear that we must choose the most general distance
dthat we can find.
As the relations between objects and classes are not
necessarily exclusive but yield overlapping situations,
some searchers like Jardine and Sibson (see, [12])
have proposed to weaken the notions of partition and
classification, changing them, respectively, in the no-
tions of k-partitions and k-classifications.
Let us explain, first, the meaning of such generaliza-
tions. The authors begin with a definition of a dis-
similarity coefficient like the previous one. They just
precise that it is not necessary that the function dsat-
isfies (4), the ultrametric inequality. On the contrary,
it may satisfy a weaker condition such that:
5. d(a, c)d(a, b) + d(b, c)for all a, b, c E(this
inequality defines in fact what is called now a ”linear
dissimilarity).
On this basis, Jardine and Sibson introduce the fol-
lowing definitions:
Definition 7.1 (k-partition).Ak-partition is a par-
tition which makes possible the existence of a max-
imum of k1objects in the overlaps between the
classes (i.e. in the intersection of them).
Definition 7.2 (k-classification).Ak-classification is
a nested sequence of k-partitions. The system will
be hierarchic in case k= 1, and overlapping in case
k > 1.
Definition 7.3 (k-dendogram).By a corresponding
generalization, we may define the notion of a k-
dendogram, i.e. a dendogram where clusters, at a
given level, may overlap to the extent of k1ob-
jects.
In this view, the nearest neighbor algorithm aggre-
gates to an object emits nearest neighbor enand then
iterate the same method with en, etc.). The result,
noted as B1in the hierarchical case, can be general-
ized in a sequence of methods denoted by (Bk). B1
is the ”single-link” method, which leads to a hierar-
chical dendogram (1-dendogram). The second mem-
ber of the sequence (B2) may be called the ”double-
link”method, and leads to a dendogram in which clus-
ters may overlap to the extent of one object (a 2-
dendogram). And so on.
Let Ebe the set of objects to be classified, with Card
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.14
Daniel Parrochia
E-ISSN: 2224-3402
121
Volume 20, 2023
E=p. Then Bp(1) gives an exact representation
of the dissimilarity coefficient. It can be shown that
the family of measures of distortion is monotonic de-
creasing with increasing k, becoming zero in case
k=p1.
The sequence of cluster methods (Bk)can be given a
simple graph-theoretic description which generalizes
the one given for the single-link method. The clusters
at level hin (Bk(d)) may be obtained as follows. A
graph is drawn, whose vertices represent the objects
and whose edges join just those pairs of points which
represent objects, with dissimilarity less than or equal
to a given number h.
The maximal complete subgraphs (i.e. the maximal
subsets of the set of vertices in which all possible
edges are present) are marked, and wherever the ver-
tex sets of two such subgraphs intersect in at least k
vertices, further edges are drawn in to make the union
of the two vertex sets into a complete subgraph. The
process is repeated until there is no further alteration.
We just indicate the result in Fig. 3:
a! c b
d
e
a! b c d e
Figure 3: A k-classification
As Jardine and Sibson said, this algorithm is not suit-
able for computation, and graphic representations are
not necessarily very meaningful. But it does not mat-
ter for our purpose.
The main point is the comment of J.-P. Benzecri (see,
[5]). This one shows that the generalization of the
ultrametric inequality, which is carried out by the
authors and leads to their hierarchy of overlapping
clusters, is based on a system of pseudo-bowls
defined as follows:
Definition 7.4 (Pseudo-bowl).We call ”pseudo-
bowl”with diameter δ, compared to the index of dis-
tance d, a subset Bof E, the set of objects, which is
maximal among those such that:
i, iB, d(i, i)δ.
As Jardine and Sibson show, Bis an ”ultimate clus-
ter”for the binary relation d(i, i)δ. This relation
is an equivalence relation only in the case dis ultra-
metric.
On the contrary, if dsatisfies the inequality of order
k, two pseudo-bowls of the same diameter (compared
with d) coincide if the cardinality of their intersection
is superior or equal to k2. Obviously, in the case
k= 3, d is ultrametric.
Of course, as Benzecri writes, Jardine and Sibson’s
approach incurs some criticism:
1. As the authors recognized, there is no simple algo-
rithm for generating the system of all pseudo-bowls
with any diametrer δassociated to a given distance
d.
2. Now if we want the condition of order kvery dif-
ferent from the condition p= 3, the number k3
(i.e. the order of covering which is allowed between
the clusters) must be negligible, compared with the
cardinality of the set Eof objects. Unhappily, it does
not seem that the algorithm of distance changes could
be still practicable for k > 6.
3. The tree-representation of this kind of classifica-
tion is not very clear.
Anyway, this proves that the most general type of dis-
similarity coefficient that we can choose is the one
which yields this space of pseudo-bowls.
8 Other situations
In fact, the case of Fig. 3, as the following example of
a more complex semilattice (see Fig. 4), where clus-
ters, at different levels, may overlap to the extent of
2 - 1 = 1 object (so we have a 2-dendogram in the
terminology of Jardine and Sibson), already give ul-
trametrics :
Figure 4: A more complex semilattice
Indeed, this semilattice is an overlapping classifica-
tion. Pose:
i={a, b, c};j={b, d};g={c, d};h={d, e};
k={g, h}={c, d, e};l={k, f}={c, d, e, f};
m={i, j, k}={a, b, c, d, e};
n={m, l}={a, b, c, d, e, f }.
Its ultrametric matrix of distances appears down be-
low:
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.14
Daniel Parrochia
E-ISSN: 2224-3402
122
Volume 20, 2023
a b c d e f
a0 2 2 4 4 5
b2 0 2 2 4 5
c2 2 0 1 2 3
d4 2 1 0 1 3
e4 4 2 1 0 3
f5 5 3 3 3 0
So, to some k-classifications may correspond ultra-
metric distances. These ones are included in the set
of all ultrametrics distances whose structure has been
discovered by Gondran ([10]) in the following theo-
rem:
Theorem 8.1 (Gondran 1976).The matrix of ultra-
metric distances between the elements of a set Ehas
a structure of semiring over R+{∞}.
Proof. There exists an associative law , which may
be interpreted as ”Min”.
didj=Min(di, dj).
The unit element of is (+), and we have:
di+= + di=Min(di,+) = di.
There exists also an associative law *, which is dis-
tributive over . This law * is interpreted as ”Max”,
and its unit element is 0, since:
didj=Max(di, dj)
and:
0di=di0 = Max(0, di) = di.
This structure may be extended to Mn, the set of all
the matrices of the chains of C(E).
In the same way, as the set of minimal covers form
a lattice, the chains of minimal covers are the chains
of this lattice and their ultrametric matrices are in the
semiring (R {+∞},,) of Gondran. However,
as the set of covers is only a preorder, we cannot get
easily the chains of covers, and so, in this case, the
previous good property is lost.
This means also that many overlapping classifications
have matrices of distances that are not ultrametric.
The situation may be the same for fuzzy classifica-
tions.
9 Towards a general theory of
classifications
This situation suggests to present every type of quasi-
taxonomic organization defined on a set of objects
(system of classes, partitions, k-partitions, classifica-
tions, k-classifications, etc.) as a system of intersect-
ing bowls (see, [19]).
In order to get the most general point of view, we
may replace (as we have done in Fig. 3 right) bowls
by ellipsoids, knowing that, when some ”good”
distance, if there is any, has been defined on the
set of objects for some classification, the ellipsoids
may be interpreted as ellipsoids of inertia which give
information about the aspects of classes.
Proposition 9.1. Classes and every type of quasi-
taxonomic organization of classes on a set of objects
Emay be represented as ellipsoids of the space, i.e.,
if we project it on a surface of dimension 2, as simple
ellipses of the plane.
Proof. Let Ebe the set of objects to be classified.
As we know, every object eimay be represented as
a point of the space. Let Ska subset of points. Every
subset Sjmay be circumbscribed by an ellipse. Sup-
pose now that an Sjbelongs to some Sk. Then, Sk
will be also an ellipsoid. And so on, until we reach
the whole space of objets E.
We use ellipsoids or ellipses, rather than simple cir-
cles or n-spheres, because in factorial analysis, we
can represent classes by ellipsoids of the space. In the
case when classifications are constructed by bottom-
up methods from a table of data, we can get infor-
mation on the direction of extension of the ellipses
associated to the classes in relation to the respec-
tive weight of their elements (we call that kind of
ellipses ”ellipses of inertia”). In computer science,
there exist some programs to get that automatically
(see, [11]).
When we do not get such an information, it is always
possible to have a qualitative representation of classes
in 2-dimension by ellipses. In those cases, the direc-
tion and the extension of ellipses are purely conven-
tional.
Partitions and chains of partitions give very clear hy-
pergraphs. Covers, chains of covers, lattices of cov-
ers or lattices of minimal covers, crossed partitions
(like the Mendeleiev table of Elements), etc. would
yield more complicated representations. Indeed, it is
always possible to draw these ones.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.14
Daniel Parrochia
E-ISSN: 2224-3402
123
Volume 20, 2023
10 Metaclassification
Let us recall now the following sets that can be de-
fined on a nonempty finite set E:
1. P(E), power set of E;
2. P(E), set of all partitions of E.
3. C(E), set of all chains of partitions on E.
Let us define now : U(E) = {P(E),P(E),C(E)}.
U(E)is the whole universe of classifications.
Even if it is a bit complicated, we can give an idea of
this universe by representing all the possible pseudo-
bowls, in fact ellipsoids (Neuville’s ellipsoids) and
sequences of ellipsoids, in one and the same figure.
Intersecting or non intersecting classes are at the pe-
riphery or the biggest circle. Then, partitions may be
found in the middle circle and chains just around the
center. Pierre Neuville called this figure a ”metaclas-
sification”(see Fig. 5).
Figure 5: Metaclassification
When we ask : what is such a structure (from a math-
ematical point of view)? The answer is that we have
got, at the most general level, the set of all possible
open subsets of a set. Why do we speak of ”open sub-
sets”? Because we must observe that the boundaries
of the ellipsoids are just there to realize the classes.
But only the elements that they contain are to be con-
sidered as actually forming the classes. So, the classes
themselves must be identified only with the interiors
of the ellipsoids.
Now a well-known result of topology is that the al-
gebra of the set of open subsets of a set is a Heyting
algebra (see, [21]) . By the way, the previous struc-
ture is also a relatively pseudo-complemented lattice
(see, [22]).
11 Chu spaces
The foregoing developments are already included in
[18] and [19]. But we think we have now a bit more.
In fact, more complete presentation of metaclassifica-
tion makes it appear as a Chu space.
The Chu space concept originated with Michael Barr
in 1979 (see, [1]) but the details were developed by his
student Po-Hsiang Chu, whose masters thesis formed
the appendix of the 1979 paper1.
Statically, a Chu space (A, r, X)over a set Kconsists
of a set Aof points, a set Xof states, and a function
r:A×XK. This makes it an A×Xmatrix with
entries drawn from K, or equivalently a K-valued bi-
nary relation between Aand X(ordinary binary rela-
tions being 2-valued).
More dynamically, Chu spaces transform in the man-
ner of topological spaces, with Aas the set of points,
Xas the set of open sets, and ras the membership
relation between them, where Kis the set of all pos-
sible degrees of membership of a point in an open
set.
It is easy to recognize these elements in the above
metaclassification, so we can state the following
proposition:
Proposition 11.1. The metaclassification M(U)is
the Chu space (E, r, C)where Eis the set of elements
to be classified, Cthe set of classes, and rthe degree
of membership of an element to a class.
If K={0,1}, we get the situation described in [18]
or [19] and, for every eEand C C, we have
eCor e/C, which yields formal classifications.
But if K= [0,1], we may integrate in the structure
fuzzy classifications in the sense of Zadeh (see, [25],
[4]) or in the sense of Lerman (see, [16]).
One of the advantages of using Chu’s space is to get
information about what is in the metaclassification
structure and how, precisely, it is there. This amounts
to comparing two different metaclassifications, i.e.
two particular Chu spaces, therefore to constructing
a continuous function from one to the next.
Now we can observe that the counterpart of a con-
tinuous function from (A, r, X)to (B, s, Y )is a pair
(f, g)of functions f:AB, g :YXsatisfying
the adjointness condition s(f(a), y) = r(a, g(y)) for
all aAand yY. That is, fmaps points for-
wards at the same time as g maps states backwards.
The adjointness condition makes g the inverse image
function f1, while the choice of Xfor the codomain
of gcorresponds to the requirement for continuous
functions that the inverse image of open sets be open.
Such a pair is called a Chu transform or morphism of
Chu spaces2.
As it is known, the category of Chu spaces over K
1For the history of the construction, see, [2].
2A replacement of the adjointness condition (s(f(a), y) = r(a, g(y))
by the more general one (s(f(a), y)r(a, g(y)) leads to a variant of
Chu spaces, called ”dialectica spaces” and due to de Paiva (see, [17]).
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.14
Daniel Parrochia
E-ISSN: 2224-3402
124
Volume 20, 2023
and their maps is denoted by Chu(Set,K). As is clear
from the symmetry of the definitions, it is a self-dual
category: it is equivalent (in fact isomorphic) to its
dual, the category obtained by reversing all the maps.
It is furthermore a -autonomous category with dual-
izing object (K, λ, {∗})where λ:K× {∗} Kis
defined by λ(k, ) = k, as indicated in [1]3.
Usually, Chu spaces arise as the case V=Set, that
is, when the monoidal category Vis specialized to the
cartesian closed category Set of sets and their func-
tions. But a more general enriched category, The cat-
egory Chu(V, k)appeared for the first time in the ap-
pendix to [1], which constitutes the masters thesis of
Po-Hsiang Chu.
Chu spaces are very important for our foundational
point of view in classification theory4.
From the metaclassification, understood as a Chu
space, we can deduce a lot of mathematical structures.
For example, Chu spaces over 2 realize both topolog-
ical spaces and coherent spaces introduced by J.-Y.
Girard, while Chu spaces over Krealize any cate-
gory of vector spaces over a field whose cardinality
is at most that of K. This was extended by Vaughan
Pratt (see, [20]) to the realization of k-ary relational
structures by Chu spaces over 2k. For example, the
category Grp of groups and their homomorphisms is
realized by Chu(Set, 8) since the group multiplication
can be organized as a ternary relation (see, [8]), and
Chu(Set, 2) realizes a wide range of ”logical” struc-
tures such as semilattices, distributive lattices, com-
plete and completely distributive lattices, Boolean al-
gebras, complete atomic Boolean algebras, etc. Of
course, the semiring of Gondran (which corresponds
to hierarchical classifications) may be realized in the
same way.
So Chu spaces not only formalize our metaclassifica-
tion structure but give all required accesses to a clas-
3Incidentally, one knows that, as such, it is a model of Jean-Yves Gi-
rard’s linear logic (see, [9]).
4Chu space were not studied in their own right until more than a decade
after the appearence of the more general enriched notion. In [19], we
quoted the book of Barwise and Seligman, who used some application
of them in information theory (see, [3], 33, 92). We even mention ([19],
38) that they defined a classification (A, B, R) as consisting of a set A
of tokens, a set Bof types and a relation Rbetween tokens and types
(such that RA×B), but this language was not very clear and lacked
generality. For the authors, tokens could be objects and types could be
attributes describing the objects. But, as this structure was essentially de-
fined for the study of a channel in information theory and has been only
applied elsewere in formal concept analysis, it was very difficult to see
the real importance of Chu spaces for a foundational viewpoint in classi-
fication theory. It was not obvious, indeed, to see that they could realize
a lot of mathematical structures and that using categories and morphisms
to formalize classification theory was very useful, even if the authors in-
troduced some trivial operations on classifications as, for example, sums,
construction of invariants or quotients, and some dualizing of these last
(see, [3], 81-88).
sification of mathematical structures. But we can say
more.
12 Chu spaces and informorphisms
The application of the formalism of Chu spaces in
classification theory was carried out by Jon Barwise
and Jerry Seligman in their reflection on information
theory (see, [3]). As one can easily understand, the
Chu spaces defined above formalize not only the
notion of ”meclassification”, but that of ”classifi-
cation” itself. In addition, they allow us to define
in an elegant and unified way a certain number of
operations on classifications.
According to Barwise and Seligman (see, [3], 69), a
classification A= (tok(A), typ(A),|=A)is defined by
the following objects:
1. a set tok(A)of objects to classify, called ”tokens”
of A;
2. a set, typ(A) of objects used to classify tokens,
and called ”types” of A;
3. a binary relation, |=Abetween tok(A) and
typ(A).
It is clear that such a definition is in fact only a para-
phrase of the mathematical notion of Chu space. We
now introduce some terminology.
12.1 Terminology
1. If a|=Aα, then ais said to be of type αin A.
2. For any token atok(A),the type-set of ais the
set:
typ(a) = {αtyp(A)|a|=Aα}.
3. Similarly, for any type αtyp(A),the set of to-
kens of αis the set:
tok(a) = {atyp(A)|a|=Aα}.
4. α1and α2in typ(A) are coextensive in A- which
is written α1Aα2- if tok α1) = tok(α2).
5. The tokens a1and a2are indistinguishable in A-
which is written a1Aa2- if typ(α1) = typ(α2).
6. Finally, the Aclassification is separated if there
is no distinct indistinguishable token, i.e. if
α1Aα2implies a1Aa2.
7. The classification Ais extensional if all the co-
extensive types are identical, i.e. if α1Aα2
implies that α1α2.
Like any Chu space, a classification, of course, can be
both extensional and separate.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.14
Daniel Parrochia
E-ISSN: 2224-3402
125
Volume 20, 2023
12.2 Informorphisms
Barwise and Seligman, following the model of
morphisms between Chu spaces, introduce the notion
of infomorphism. It is indeed necessary to be able to
think of the relations between classifications defined
on the same objects or, on the contrary, on different
objects. To do this, given two classifications to be
compared, one then defines f=f^, fpairs of
functions such as f^ is a function of the types of one
of the classifications in the types of the other, and f
is a function of the tokens of one of the classifications
in the tokens of the other.
Definition 12.1. We say that f, from Ato B, is a
contravariant pair, and we write: f:ABif f^:
typ (A)to typ (B)and f: tok (B)to tok (A).
Definition 12.2. We say that f, from Ato B, is a
covariant pair, and we write: f:ABif f^:
typ(A)typ(B)and f: tok(A)typ (B).
Definition 12.3. Ainfomorphism f:ABfrom
Ato Bis a contravariant pair of functions f=f^,
f satisfying the following fundamental property of
informorphisms:
f(b) |=Aαssi b|=B
f^(α),
for each token btok(B) and each type
αtyp(A).
The accents will be omitted to the right of fif there
is no confusion, knowing that the circumflex puts f
on types and that the reverse accent puts it on to-
kens.
In terms of a diagram, we can then write:
typ(A)f//
|=A
typ(B)
|=B
tok(A)tok(B)
f
oo
12.3 Duality type-tokens
When including a subsection you must use, for its
heading, small letters, 12pt, left justified, bold, Times
New Roman as here.
Definition 12.4. For any classification A, the flip of
Ais the classification Awhose tokens are the types
of A, whose types are the tokens of Aand such that
α|=Aaif and only if a|=Aα.
Given a pair of functions f:AB, we define
f:BAby f^ = f and f = f^. When
classifications are defined by double-entry data
matrices, this operation is equivalent to swapping
rows and columns.
We can prove three propositions concerning infomor-
phisms:
Proposition 12.1. f:ABis an infomorphism if
and only if f:BAis an informorphism.
Proposition 12.2. 1. (A)=Aand (f)=f.
2. (fg)=gf.
Proposition 12.3. Given an informorphism f:A
B, if a type αis coextensive with a type αin A, then,
f(α)is coextensive with f(α)in B.
12.4 Operations on classifications
There are obviously many operations which, from
a given classification, make it possible to generate
another classification. Barwise and Seligman have
described essentially two of them, as well as their
interactions with informorphisms.
Definition 12.5. Given the classifications Aand B
the sum A+Bof Aand Bis the classification defined
as follows:
1. The set tok(A+B) is the Cartesian product of
tok(A) and tok (B). Specifically, the tokens
(A+B) are the pairs (a, b)of tokens such as
atokA)and btok(B).
2. The set typ(A+B) is the disjoint union of typ(A)
and typ(B). Concretely, the types of A+Bare
pairs (i, α) where i= 0 and αtyp(A) or else
i= 1 and αtyp(B).
3. The classification relation |=A+Bof A+Bis de-
fined by:
(a, b)|=A+B(0, α)iff a|=Aα;
(a, b)|=A+B(1, β)iff b|=Bβ;
Definition 12.6. There are natural infomorphisms
σA:AA+Band σB:BA+Bdefined as
follows:
1. σA(α) = 0, αfor each α in typ(A);
2. σB(β) = 1, βfor each βtyp(B);
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.14
Daniel Parrochia
E-ISSN: 2224-3402
126
Volume 20, 2023
3. For each pair a, b tok (A+B), σA(a, b) = a
and σB(a, b) = b.
Proposition 12.4 (Universal application for sums).
Given the infomorphisms f:ACand g:BC,
there exists a unique informorphism f+gsuch that
the following diagram is commutative.
A
f
##
F
F
F
F
F
F
F
F
F
σA//A+B
f+g
B
σB
oo
g
{{w
w
w
w
w
w
w
w
w
C
The notions of sum and infomorphism can then be
generalized to families of classifications without any
problem (see, [3], 83).
12.5 Invariants and quotients
Definition 12.7 (Invariant).Given a classification A,
an invariant is a pair I= , R)composed of a set
Σtyp(A)of types of Aand of a binary relation R
between the tokens of Asuch that, if aRb, then, for
each αΣ, a |=Aαif and only if b|=Aα.
Definition 12.8 (Quotient).Let I= , R)be an in-
variant of the classification A. The quotient of Aby
I, denoted by A/I, is the classification having types
Σ, whose tokens are the R-classes of equivalence of
tokens of A, with the condition:
[a]R|=A/Iαif and only if a|=Aα.
We also introduce the following definitions:
Definition 12.9. Given an invariant I= , R)
over A, the it canonical quotient infomorphism τI:
A/IAis the function inclusion defined on the
types and who, on the tokens. applies each token of
Ato its equivalence class.
Definition 12.10. Given an invariant I= , R)on
A, an infomorphism f:BArespects Iif:
1. For each β in typ(B), f(β)Σ.
2. If a1Ra2, then, f(a1) = f(a2).
We can also formulate the following propositions and
definitions:
Proposition 12.5. Let Ibe an invariant of A. Given
any infomorphism f:BAwhich respects I, there
exists a unique informorphism f:BA/Isuch
that the following diagram is commutative:
A/IτI//A
B
f
OOf
|
|
|
|
==
|
|
|
|
|
Definition 12.11. An informorphism f:ABis
said to be ”token-identical” if tok(A) = tok(B) and if
f is the identity function on this set. Dually, fis said
to be ”type-identical” if typ(A) = typ(B) and if f^ is,
this time, the identity.
Proposition 12.6. Given the classifications Aand
B, if Ais separated, then there is at most one type-
identical infomorphism from Ato B.
We can still show, as Barwise and Seligman do (see,
[3], 87-88), that there exist dual notions of invariants
and quotients, hence also a dual proposition to Propo-
sition 11.6.
In conclusion, the immersion of the theory of classi-
fications, via the notion of Chu space, in a categori-
cal formalism, undoubtedly has the merit of unifying
the formalisms ordinary used to describe classifica-
tions and the operations that can be done on them. In
this sense, it presents a stylistic improvement in the
presentation of the theory. However, basically, noth-
ing is really fundamentally new, and category theory,
as Barwise and Seligman apply it, tends instead to
complicate and obscure the simpler operations (em-
bedding of one classification into another one, sum
of two (or more) classifications, etc.). The formalism
does not allow, in particular, to identify new classes
of morphisms which would make it possible to go fur-
ther than what was proposed in [19] or [18]. It re-
mains that the notion of ”Chu space” makes it pos-
sible to treat classifications of objects and classifi-
cation of mathematical structures in the same way.
It also shows that the ”metaclassification” (or set of
Neuville ellipsoids) defined in [19] is in fact a Chu
space.
13 A conjecture of convergence
As we may see, Fig. 5, representing what we have
called ”metaclassification”, shows classes and se-
quences of classes that are partitions or classifica-
tions, overlapping or not, formal or fuzzy, and so
on. This includes of course the pseudo-partitions
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.14
Daniel Parrochia
E-ISSN: 2224-3402
127
Volume 20, 2023
and pseudo-classifications in the sense of Bruck-
ner.
Now an observation : knowing that the aim of a par-
tition or of a classification is always to locate an ele-
ment in one or more than one class, if we have suc-
cessive partitions or classifications, it means that all
the sequences of classes contained in this graph are
decreasing sequences. René Thom (see, [24]) have
already suggested that a decreasing sequence of em-
bedded open sets could converge on this minimal el-
ement which is the point.
So we can propose the following conjecture:
Conjecture 13.1. The universe U(E)of all classifi-
cations possibly defined on a set Eeffectively presents
itself as a decreasing sequence of open subsets con-
verging on this minimal element which is the point,
i.e., as it happens, the index of the metaclassification.
It follows from the previous observations that U(E),
equipped with variable distances donly defined by the
fact that d<δis not a complete metric space. So we
cannot apply to this space the good theorems of topol-
ogy like, for instance, the theorem of Baire.
Moreover, let Tbe the topology defined on U(E). We
know that a sequence (un)nNUNconverges on
Uif, for every open set Oof Tcontaining the
element , there exists a natural number Nsuch that
all the un, for nN, belong to O. We know also
that it would be sufficient that the space U(E)were a
separable space to assert that this limit is unique. But
U(E)is not a separable space and we do not know
how to fix this N.
However, this conjecture - the convergence of the
Neuville’s ellipsoids - if it comes to be verified -
would mean that we have in fact a proof of existence
of a ”metaclassification”, that is, a proof of existence
of a kind of ”classification of classifications”, even if
we cannot be clearer about what it contains. Taking
things back to front, this may also suggest that a point
is something which is capable of generating a meta-
classification.
Of course, this result would be a relatively poor one:
because we have not very much information about
this overlapping classification.
In particular, we do not know its laws of composition,
we have no metric on it and we do not know the rela-
tions between the classifications it contains and how
we could go from one to another: in fact, we lack
a true algebra of classifications. But this existence
would prove that one can think of such a structure,
even if it is not actually accessible in details.
|E|1 2 3 4 5 6 7 8
|P(E)|1 2 5 15 52 203 877 4140
Table 1: Number of partitions in P(E)according to
the cardinal of E
Another result, connected to it, is the following one:
Proposition 13.2. The set of all classifications on an
infinite set generates the continuum.
Proof. To see that, it is sufficient to remember that,
from Cantor, the cardinality of [0,1] is the same than
the cardinality of Ror Rnand even of RN. Now if
we associate a pair (w1, w2)of elements of Cto the
semi-axes of any ellipses, we can prove (see, [23])
that the set of all the sequences of ellipses has the
same cardinality than R2, i.e the same cardinality than
[0,1].
14 Concrete problems: the search for
an algebra of classifications
There are many modes of organization on a set and
it is sometimes very difficult to choose between
them.
As we know, partitions are very numerous (see table
1).
So it is not very easy to examine which classifica-
tion is the best one among, say, several thousands of
them. The situation is worse with weaker structures
like covers or even minimal covers. Let’s quickly ex-
plain what it is.
Recall that a family Fof nonempty subsets of a set
E, whose union contains the given set E(and which
contains no duplicated subsets) is called a cover (or a
covering) of E.
A particular kind of cover is the minimal cover. A
minimal cover is a cover for which the removal of one
member destroys the covering property.
Of course, we can make orderings on covers and build
hierarchies of covers or minimal covers (see, [19]).
But if the set L(E)of minimal covers is a lattice for
the refinement relation, the set R(E)of all covers has
no interesting properties : it is only a preorder (or a
quasi-order) for the refinement relation (that we de-
fine in the same way as for the partition ordering).
Moreover, computing the covers of a set leads imme-
diately to big numbers (see table 2). So it becomes
rapidly impossible to examine the very numerous pos-
sible chains of covers.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.14
Daniel Parrochia
E-ISSN: 2224-3402
128
Volume 20, 2023
|E|1 2 3 4
|R(E)|1 5 109 32297
Table 2: Number of covers in R(E)according to the
cardinal of E
Worse again would be the situation if we examine the
number of all possible fuzzy partitions, classifications
or chains of covers that we can define on a set.
One way to escape would be to get some algebra of
classifications, which can explain clearly the links
between their structures. Such an algebra is diffi-
cult to get because classifications are commutative
but nonassociative structures, and almost all mathe-
matical structures are associative. But there are some
candidates like, for instance, what we call ”dendri-
form algebras”(for more information on them, see,
[19]).
15 Conclusion
Let us resume the main point of these little introduc-
tion to the mathematics of classification. Accord-
ing to us, the whole set of classifications realizes a
new construction of the continuum. There are already
many constructions of the continuum in mathemat-
ics but this one does not need a particular hypothe-
sis about the infinite. One just needs Cantor theorem
about the bijection between [0,1] and Ror, more gen-
erally, Rn. We must hope now that the metaclassifi-
cation and its construction of the continuum will be
explored in the future by mathematicians, so that we
get much more information about the set of ellipsoids
or sequences of ellipsoids contained in it.
References
[1] Barr, M., *-Autonomous categories, Springer-
Verlag, Berlin, 1979.
[2] Barr, M., ”The Chu Construction: history of an
idea”, Theory and Applications of Categories, Vol.
17, No. 1, 10?16, 2006.
[3] Barwise, J. and Seligman J., Information Flow,
the Logic of Distributed Systems, Cambridge Uni-
versity Press, Cambridge, 1997, 2008.
[4] Bellacicco, A., ”Fuzzy Classifications”, Syn-
these, Vol. 33, No. 2/4, 273-281, Sept. 1976.
[5] Benzécri, J.-P., et alii., L’Analyse des données,
tome 2, correspondances, Dunod, Paris, 1973.
[6] Berge, C., Graphes et hypergraphes, Dunod,
Paris, 1970.
[7] Brucker, F., Barthélemy, J.-P., Eléments de clas-
sifications, Hermès-Lavoisier, Paris, 2007.
[8] Certaine, J., ”The ternary operation (abc) =
ab1cof a group, Bull. Amer. Math. Soc. 49, 869-
877, 1943.
[9] Girard, J.-Y., ”Linear logic”, Theoretical Com-
puter Science, 50, 1-102, 1987.
[10] Gondran, M., Minoux, M., Graphes et Algo-
rithmes, Eyrolles, Paris, 1979.
[11] Jambu, M., Classification automatique pour
l’analyse des données, 2 tomes, Bordas-Dunod,
Paris, 1978.
[12] Jardine, N., Sibson, R., ”A model for taxon-
omy”, Mathematical Bio-sciences 2 (1968), 465-
482.
[13] Kechris, A. S., Actions of Polish groups and
classification problems, Analysis and Logic, Lon-
don Math. Soc. Lecture Note Series, Cambridge,
Cambridge University Press, 2001.
[14] Kleinberg, J., ”An impossibility theorem for
Clustering”, Advances in Neural Information Pro-
cessing Systems (NIPS), Proceedings of the 2002
Conference held in British Columbia, Canada, 15,
2002, 529-536.
[15] Lerman, I., Les bases de la classification au-
tomatique, Gauthier-Villars, Paris, 1970.
[16] Lerman, I., Classification automatique et anal-
yse ordinale des données, Dunod, Paris, 1981.
[17] Paiva (de), V., ”A dialectica-like model of lin-
ear logic”, Proc. Conf. on Category Theory and
Computer Science, Springer-Verlag Lecture Notes
in Computer Science, 389, 341-356, Manchester,
September 1989.
[18] Parrochia, D., ”Mathematical Theory of Clas-
sification”, Knowledge Organization 45(2):184-
201, January 2018.
[19] Parrochia D, Neuville, P., Towards a general
theory of classifications, Birkhäuser, Basel, 2013.
[20] Pratt, V. R., ”The Stone gamut: A coordinati-
zation of mathematics”, Proc. 10th Annual IEEE
Symp. on Logic in Computer Science, 444-454,
Montreal, 1995.
[21] Rasiowa, H., Sikorski, R., The Mathematics
of Metamathematics, Paǹstwowe Wydawnictwo
Naukowe, Varsovie, 1963, 3rd ed., 1970.
[22] Rasiowa, H., An Algebraic Approach to Non-
Classical Logics, North Holland, Amsterdam,
London, 1974.
[23] Serre, J.-P., Course in Arithmetics, Springer-
Verlag, Berlin, 1973.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.14
Daniel Parrochia
E-ISSN: 2224-3402
129
Volume 20, 2023
[24] Thom, R., ”Une définition continue du Nom-
bre”, Les rencontres physiciens-mathématiciens de
Strasbourg - RCP25, Conférences de M. Chap-
eron, A. Chenciner, R. Lozi, J. Martinet et J.P.
Ramis, P. Moussa, R. Moussu, F. Pham, R. Thom,
exp. no 6, p. 163-189, tome 41, 1990.
[25] Zadeh, L. A., ”Fuzzy sets”, Information and
Control, 8, 338-353, 1965.
Contribution of individual authors to
the creation of a scientific article
(ghostwriting policy)
Daniel Parrochia is the sole author of this article and
all that it contains.
Conflict of interest
The author declares no conflict of interest.
Sources of funding for research
presented in a scientific article or
scientific article itself
No source of funding.
Creative Commons Attribution
License 4.0 (Attribution 4.0
International , CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en_US
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.14
Daniel Parrochia
E-ISSN: 2224-3402
130
Volume 20, 2023