Classifying PDO Kalamata Olive Oil from Geographic Origins of the

Messenia Region based on Statistical Machine Learning

THEODOROS ANAGNOSTOPOULOS1, IOAKEIM SPILIOPOULOS2

1Department of Business Administration,

University of West Attica,

12241 Athens,

GREECE

2Department of Food Science and Technology,

University of Peloponnese,

24100 Kalamata,

GREECE

Abstract: - Kalamata is a smart city located in southeastern Greece in the Mediterranean basin and it is the

capital of the Messenia regional unit. It is known for the famous Protected Designation of Origin (PDO)

Kalamata olive oil produced mainly from the Koroneiki olive variety. The PDO Kalamata olive oil, established

by Council regulation (EC) No 510/2006, owes its quality and special characteristics to the geographical

environment, olive tree variety, and human factor. The PDO Kalamata olive oil is produced exclusively in the

regional unit of Messenia, being the main profit of local farmers. However, soil chemical composition,

microclimates, and agronomic factors are changed within the Messenia spatial area leading to differentiation of

PDO Kalamata olive oil characteristic. In this paper, we use statistical machine learning algorithms to

determine the geographical origin of Kalamata olive oil at PDO level based on synchronous

excitation−emission fluorescence spectroscopy of olive oils. Evaluations of the statistical models are promising

for differentiating the origin of PDO Kalamata olive oil with high values of prediction accuracy thus enabling

companies that process and bottle kalamata olive oil to choose olive oil from a specific region of Messenia that

fulfills certain characteristics. Concretely, the current research effort focuses on a specific olive oil variety

within a limited geographic region. Intuitively, future research should also focus on validation of the proposed

methodology to other olive oil varieties and production areas.

Key-Words: - PDO Kalamata olive oil, synchronous emission-excitation, fluorescence spectroscopy, statistical

machine learning, data fusion, data visualization, multiclass classification, model evaluation.

1 Introduction

Smart agriculture is the dimension of the smart city

concept aiming to define methods of efficient

geographic cultivation in rural areas, [1]. The

cropping of plants useful for cities’ citizens is the

main area of interest for smart farming, [2].

Specifically, in the area of olive oil farmers in the

Messenia region of Greece produce the protected

designation of origin (PDO) extra virgin olive oil

with the name Kalamata olive oil in the rural areas

of the smart city Kalamata the Koroneiki olive

variety is almost exclusively cultivated which

produces the extra virgin olive oil with organoleptic

properties, [3], [4]. Such areas provide farmers the

capability to gain more income since specific olive

oil microclimates affect the quality of the selected

variety, [5].

To protect olive oil quality and prevent its

adulteration, global governmental agencies like the

European Commission, International Olive Council,

Codex Alimentarius, etc have developed standards

to regulate olive oil by establishing a set of physical,

chemical, and organoleptic characteristics, [6]. The

traditional chemical methods to ensure olive oil

quality are focused on the identification and

quantification of pre-defined compounds or classes

of compounds of olive oil according to the

regulations of the above-mentioned global

governmental agencies. These methods are time-

consuming and demand expensive apparatus. The

same for the detection of olive oil adulteration

Received: June 9, 2023. Revised: February 24, 2024. Accepted: April 3, 2024. Published: May 13, 2024.

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2024.20.15

Theodoros Anagnostopoulos, Ioakeim Spiliopoulos

E-ISSN: 2224-3496

137

Volume 20, 2024

although these methods fail to detect the

adulteration from certain adulterants.

In recent years the non-targeted analysis has

attracted much attention. This approach focuses on

screening the olive oil without any prior knowledge

of chemical composition. In this approach, we used

analytical techniques that produce a signal which is

affected by all the compounds (i.e., metabolites)

present in olive oil. These methods shorten the

analysis process but a vast number of data sources

are required to perform data analytics based on

statistical machine learning algorithms, [7].

To assess the quality of gathered olive oil there is

a need to incorporate specific Internet of Things

(IoT) devices, [8]. A device that is commonly used

for such a process is fluorescence spectroscopy,

which is calibrated accordingly to perform

differences of excitation and emission radiation to

the olive oil sample, [9]. Concretely, fluorescence

spectrometry has been used extensively in the past

years due to its efficient precision in recognizing

chemical components of olive oil samples thus

exploiting its overall quality, [10]. Specifically,

adopted technology can access input data from olive

oil sample sources to measure optimally the

chemical ingredients of a given olive oil sample as

well as to be able to discriminate the olive oil

quality categories as well as its origin, [11].

Intuitively, fluorescence spectra technology can

detect in high effectiveness adulteration of olive oil

with other lower-quality oils, such as sunflower oil

or soybean oil, [12]. Collecting samples from

different geographical origins enables the generation

of different data sources, [13]. Exploited data can be

visualized and analyzed by statistical machine

learning algorithms. Intuitively, the application of

statistical classifiers enables the classification of

olive oil samples into certain categories able to

differentiate the quality of each sample, [14].

In this paper, we input synchronous emission-

excitation fluorescence spectra of PDO Kalamata

olive oils of different local geographic origins from

Messenia to observe the resulting data sources. DPO

Kalamata olive oils were from the areas of (1) Aris,

(2) Thouria, (3) Verga, (4) Arfara, and (5)

Meligalas. Subsequently, we input such data sets to

certain statistical machine learning algorithms to

assess which of them has optimal results to

recognize the different local cultivations. Adopted

statistical learning algorithms are evaluated with

certain evaluation methods and metrics to observe

an optimal classification of input olive oil samples.

The outcome of the research effort is to be able to

characterize the specific origin of each PDO

Kalamata olive oil (within the Messenia region) thus

companies that process and bottlebottle Kalamata

olive oil can choose olive oil from specific regions

of Messenia that fulfils superior characteristics.

The rest of the paper is organized as follows. In

Section 2 it is presented the prior work in the

research effort area. Section 3 defines the adopted

data model. In Section 4 evaluation parameters are

defined. In Section 5 experiments are performed and

results are observed. Section 6 discusses the

strengths and the weaknesses of the proposed

research effort, while Section 7 concludes the paper

and proposes future work.

2 Prior Work

Extra virgin and virgin olive oil have recently

attracted consumer interest because of their quality,

and its potential health benefits derived from their

consumption. The high price of extra virgin olive oil

and its reputation makes olive oil a target for

fraudsters. Significant research has been performed

in the literature in the area of olive oils’ analysis,

classification, authentication, origin, and

adulteration. Spectroscopic techniques such as

ultraviolet-visible (i.e., UV–Vis) absorption [15],

[16], fluorescence spectroscopy [17], Raman

spectroscopy [18], mass spectrometry [19], nuclear

magnetic resonance [20] and FT-NIR [21] have

been proposed to classify and detect adulteration

and origin of olive oil. Classification based on

statistical machine learning is used to compare

virgin olive oil quality in [22]. Fluorescence

spectroscopy is used along with principal

component technology and factorial discriminant

analysis for monitoring and classifying certain

virgin olive oil varieties. Raman spectroscopy is

incorporated in [23], to identify olive oil quality

using classification techniques. Intuitively, the

adopted method used a one-dimensional

convolutional deep-learning neural network to

observe optimal classification results. Portable

Raman spectroscopy is used in [24], to provide

quality assessment and control of several olive oil

varieties. Subsequently, the proposed method

adequately covers the cases of adulterated

compound low-quality oils within the virgin olive

oil.

Classification and authentication techniques are

incorporated in [25], to distinguish the origins of

virgin olive oil. Specifically, it is proposed an

authentication process is proposed to analyze

volatile olive oil compounds and chemometrics to

assess the quality of certain olive oil varieties within

a local geographic area. Statistical machine learning

algorithms are incorporated in [26], to classify

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2024.20.15

Theodoros Anagnostopoulos, Ioakeim Spiliopoulos

E-ISSN: 2224-3496

138

Volume 20, 2024

specific olive oil varieties. Concretely, the adopted

method uses discrimination techniques to input

machine learning algorithms with spectroscopic data

thus achieving effective prediction accuracy of olive

oil behavior by exploiting fusion emission and

absorption. Fluorescence spectroscopy is

incorporated in, [27], to classify the high quality of

olive oil. Intuitively, the proposed method assesses a

certain thermal oxidation technique, which exploits

the potentiality of an Ultra Violet (UV) fluorescence

spectroscopy system to perform specific imaging

classification of extra virgin olive oil varieties.

A time series classification algorithm is

incorporated in [28], which can distinguish several

virgin olive oil varieties. Subsequently, a statistical

transformation of the generated input data sources is

performed on each virgin olive oil variety to assess

the ensemble classification schema thus observing

optimal values of the prediction accuracy evaluation

metric. A multivariate classification analysis is

incorporated in [29], which can distinguish extra

virgin olive oils. Concretely, the adopted method is

based on Fourier Transform Infrared Spectroscopy

(FTIR) along with multivariate analysis to classify

virgin olive oils’ geographic origins, which come

from several producing countries. Adulterated olive

oil, in [30], can be discriminated with the

incorporation of Attenuated Total Reflection (ATR)

and FTIR spectroscopy technologies. Intuitively, the

proposed methods are capable of distinguishing pure

samples of virgin olive oil from different oil blends

by exploiting the potentiality of partial least squares

discriminant analysis (PLS-DA) applied to given

olive oil compounds.

Methods and applications for distinguishing

several extra virgin olive oils’ local geographic

origins are proposed in the literature, [31].

Specifically, the classification of olive oil

geographic origins is based on certain chemometric

data sources. Such chemometric data are generated

from several olive oils compounds, which input the

fluorescence spectroscopy decision-making models

to achieve optimal prediction accuracy.

Synchronous scanning of chemometric data sources

produced by significantly detailed fluorescence

spectroscopy measurements is also supported in

certain research efforts, [32]. Such knowledge is

then exploited by specific statistical classification

learning models, which can distinguish several

varieties of edible extra virgin olive oils. Edible

olive oils' premium quality is assessed in the

literature, [33]. Concretely, such ability is achieved

by the incorporation of synchronous fluorescence

spectroscopy, which can differentiate the

quantification of tocopherols from the input olive oil

compounds.

Geographic origins of olive oil varieties, [34],

are feasible due to the incorporation of chemometric

analysis. Specifically, such advanced analytical

methodology, which is applied to data sources can

predict olive oil’s registered designation with

optimal precision taking into consideration

synchronous excitation and emission of

fluorescence spectra values. Rapid spectroscopic

methods (Vis–NIR and FT-MIR) along with PLS

analysis were applied to study thermal stress of

virgin olive oils, [35]. Concretely, due to the

manipulation of generated data sources to certain

statistical learning models, which can evaluate

optimally spectroscopic and chemometric

technologies. Pattern recognition is also

incorporated in extra virgin olive oil varieties

classification, [36]. Intuitively, near-infrared

spectrometry provides the technical methodology to

assess the strengths of screening methods, which are

then used to authenticate extra virgin olive oils from

near local geographic origins. Shelf-Life olive oil

varieties are monitored and then classified to certain

geographic origins, [37]. Subsequently, IoT sensors

and actuators technology is exploited to enhance

fluorescence spectroscopy characteristics thus being

able to correctly assess the multiclass classification

process, which is based on certain statistical

learning models.

There are many research approaches that deal

with the origins of olive oil based on statistical

machine learning. Promising efforts incorporate

generated data from several chemometric

technologies. However, data manipulation requires

improvement to distinguish data interconnections,

which can provide efficient results. In this research

effort, fluorescence spectroscopy is exploited by

applying enhanced data preprocessing. Such

optimized data sources are then used by a statistical

machine learning algorithm to perform multiclass

classification to distinguish between certain local

cultivations’ origins of the Koroneiki olive oil

variety in the smart city of Kalamata, which is

located in the Messenia region, Greece.

3 Data Model

Data provided to perform analytics are synchronous

emission-excitation fluorescence spectra. These

spectra were recorded on a Perkin Elmer LS55

spectrofluorometer using solution 1% w/v olive oil

in n-hexane, where Δλ (i.e., the difference between

excitation and emission wavelength) was adjusted to

30 nm, [38]. The excitation and emission slit were

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2024.20.15

Theodoros Anagnostopoulos, Ioakeim Spiliopoulos

E-ISSN: 2224-3496

139

Volume 20, 2024

tuned to 4 nm. The scan rate was 50nm/min. Each

olive oil sample was measured triplicate using the

new freshly prepared solution. Each measurement of

an olive oil sample was statistically handled as a

different sample of the same origin.

Such data have a certain structure. Specifically,

observed data sources are collected from PDO

Kalamata olive oil produced in a variety of local

areas in the rural areas of the smart city of Kalamata

in the Messenia region, Greece. Concretely, there

are collected data from 29 olive oil samples from

the local cultivation areas of (1) Aris, (2) Thouria,

(3) Verga, (4) Arfara, and (5) Meligalas. Intuitively,

local cultivation areas are classified into the

following 5 classes, namely: (1) Aris: Class 1, (2)

Thouria; Class 2, (3) Verga: Class 3, (4) Arfara:

Class 4, and (5) Meligalas: Class 5. Subsequently,

the distribution of collected data samples per class

are as follows: (1) 2 samples from Aris, (2) 2

samples from Thouria, (3) 7 samples from Verga,

(4) 15 samples from Arfara, and (5) 3 samples from

Meligalas.

3.1 Data Structure

Synchronous emission-excitation fluorescence

spectra are composed of two-dimensional

coordinates, i.e., 󰇛 󰇜, where   󰇟 󰇠 is the

identifier of each olive oil class, where:    refers

to Aris,    refers to Thouria ,    refers to

Verga,    refers to Arfara, and    refers to

Meligalas local geographic origins. Concretely, 

dimension depicts the emission wavelength

measured in nanometers (i.e., ) for all sample

classes while  dimension depicts

photoluminescence intensity, which is an arbitrary

net number based on internal calibration of the

spectrofluorometer device for each of the  data

sample classes.

3.1.1 Visualizing Initial Data Samples

Intuitively, according to the initial data sample

measurements, assigned specific values for 

dimension in the initial interval such as:  

󰇟󰇠 for all data sample classes (i.e., the 5

classes of local cultivation origins). Subsequently,

initial data samples are averaged according to

   󰇟 󰇠 values based on each olive oil class.

Such average is performed to be able to provide a

simple and easily understandable visualization

based on initial data samples for each of the 5

classes. Concretely, averaged values in  dimension

is varying according to the examined olive oil data

sample classes, as follows: (1) in case of Aris  has

initial values in the interval,  󰇟󰇠,

(2) in case of Thouria  has initial values in the

interval,  󰇟󰇠, (3) in case of Verga

 has initial values in the interval, 

󰇟󰇠, (4) in case of Arfara  has initial

values in the interval,  󰇟󰇠, and (5)

in case of Meligalas  has initial values in the

interval,  󰇟󰇠. Figure 1, visualizes

the initial data samples (i.e., synchronous

photoluminescence spectra) per certain class of

olive oil geographic origin. It can be observed that

classes are not easily distinguished from each other

based on initial data measurements. This should be

treated accordingly with the data fusion process to

have a clearer view of how classes could be more

easily distinguished.

Fig. 1: Initial data (i.e., synchronous emission-

excitation fluorescence spectra) are assigned to a

specific class of certain local origin of PDO

Kalamata olive oil. Observed spectra were recorded

at Δλ = 30

3.2 Data Fusion

Data fusion is a widely adopted method used in

machine learning literature in case there is a need to

understand in depth the inherent complexity of

initial data sources. Concretely, the data fusion

process is applied to experimental input data sources

to visualize the statistical qualitative trends of the

data provided, thus being able to incorporate

efficient machine learning algorithms to observe

optimal results. Intuitively, initial data samples are

transformed according to a specific data fusion

process to remove outliers and missing values that

occurred during the initial measurement process

performed by the fluorescence spectroscopy device.

Specifically, such a data fusion process can provide

easily distinguished classes between each other in

contrast to the initial data due to the adopted

transformation. Intuitively,  dimension values of

each data sample are transformed into  interval

values. Such intervals form the predictive attributes,

which will input the statistical learning classifier to

predict the correct class of olive oil origin.

Concretely,   󰇟 󰇠 is the assigned identifier of

each transformed wavelength interval (i.e.,

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2024.20.15

Theodoros Anagnostopoulos, Ioakeim Spiliopoulos

E-ISSN: 2224-3496

140

Volume 20, 2024

predictive attribute) of the olive oil components.

Subsequently, for    󰇟 󰇠 it holds that in

case of:    refers to  󰇟󰇠 that is

assigned to the predictive attribute of ‘tocopherols’,

   refers to  󰇟󰇠 that is assigned to

the predictive attribute of ‘phenolic compounds’,

   refers to  󰇟󰇠 that is assigned to

the predictive attribute of ‘oxidation products of

triglycerides’,    refers to  󰇟󰇠 that

is assigned to the predictive attribute of ‘oxidation

products of tocopherols’, and    refers to 

󰇟󰇠 that is assigned to the defined predictive

attribute of ‘chlorophylls’, components.

Subsequently, measured values in  dimension

have an arbitrary initial distribution according to the

examined olive oil data sample class as produced by

the synchronous photoluminescence spectra. Such

values are transformed into 

 aggregated values for

each olive oil class origin, , and each assigned

identifier, to each transformed wavelength

interval, , (i.e., a certain predictive attribute)

according to specific olive oil measured compounds

of local geographic origin. Concretely, it holds that



 is a transformed average value that is assigned to

each fused predictive attribute (i.e., ) of a certain

data sample class. There are specific 

 fused values

given certain data instances of the initial data

observed by the 29 olive oil samples (i.e., class

values) from different local reas for each of the 

interval values, (i.e., predictive attributes).

Fig. 2: Fused data (i.e., based on initial synchronous

emission-excitation fluorescence spectra) are

assigned to a specific class of certain local origin of

PDO Kalamata olive oil. Observed spectra were

recorded at Δλ = 30

3.2.1 Visualizing Fused Data Samples

Intuitively, according to the fused data sample

measurements, for certain   , is assigned

specific values for the  interval values (i.e.,

predictive attributes) according to   󰇟 󰇠for the

fused data sample classes (i.e., the 5 classes of

origins). Subsequently, fused data samples are

averaged according to 

    󰇟 󰇠 values for

each olive oil class. Such average is performed to be

able to provide a simple and easily understandable

visualization based on fused data samples for each

of the 5 classes. Concretely, averaged fused values

in 

 (i.e., predictive attributes’ value range) is

varying according to the examined olive oil data

sample classes, as follows: (1) in case of Aris 

  

   󰇟 󰇠has observed fused data values of:



 󰇟  󰇠, (2) in case

of Thouria, 

      󰇟 󰇠 has certain fused

values, 

 󰇟   󰇠, (3)

in case of Verga 

      󰇟 󰇠 has fused

values, 

 󰇟  󰇠, (4)

in case of Arfara 

      󰇟 󰇠 has fused

values, 

 󰇟   󰇠, and

(5) in case of Meligalas 

      󰇟 󰇠 has

fused values, 

 󰇟   󰇠,

Figure 2, visualizes the fused data samples per

certain class of olive oil origin. It can be observed

that classes are now more easily distinguished from

each other based on fused data measurements. This

is the reason for treating initial data samples with a

data fusion process to have a clearer view of how

classes could be more easily distinguished.

4 Evaluation Parameters

Assessing the performance of the adopted statistical

machine learning algorithm, certain valuation

methods and evaluation metrics should be

incorporated to perform specific experiments and

observe derived results.

4.1 Evaluation Method

To evaluate a statistical machine learning algorithm

there are used certain evaluation methods. Authors

adopt one of the widely used evaluation methods,

due to its simplicity and optimum results, which is

10-fold cross-validation, [39]. Specifically, such an

evaluation method divides the input dataset into 10

equal sized parts and then in a certain loop

incorporates the first 9 parts to train the statistical

learning classification algorithm and the remaining

1 to test the classifier. This process is repeated until

all the parts are used for training and testing. The

proposed evaluation method is adopted in the

machine learning methodology since it provides

effective results based on certain input data able to

explain the observed data source’s predictive

analytics behavior.

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2024.20.15

Theodoros Anagnostopoulos, Ioakeim Spiliopoulos

E-ISSN: 2224-3496

141

Volume 20, 2024

Table 1. Confusion matrix

4.2 Evaluation Metrics

Given the evaluation method, which is proposed to

support the experimental setup there is a need to

adopt specific evaluation metrics. Such metrics are:

(1) prediction accuracy, (2) correctly classified

instances, and (3) confusion matrix that can assess

the efficiency of a statistical classification

algorithm.

4.2.1 Prediction Accuracy

The effectiveness of the adopted statistical learning

algorithm is assessed by incorporating prediction

accuracy evaluation metric,   󰇟 󰇠, which is

defined in the following mathematical equation, (1):

  









(1)

Where, 

, are the instances, which are

classified correct as positives, and, 

, are the

instances, which are classified correct as negatives.

In addition, , are the instances, which are

classified false are positives, and, , are the

instances, that are classified false as negatives. A

low value of  means a weak classifier while a high

value of a indicates an efficient statistical learning

classifier. Concretely, experimental assessment

based on the defined statistical quantities of: (1)



, (2) 

, (3) , and (4) , which

compose the prediction accuracy evaluation metric’s

experimental value, achieve to express the data

sources’ dynamics and explain the observed optimal

results.

4.2.2 Correctly Classified Instances

In statistical machine learning, it is common to

express prediction accuracy as a percentage thus

observed results being more easily interpreted and

presented. Concretely, it is used the term correctly

classified instances,   󰇟󰇠, which is

defined according to the following mathematical

equation, (2):

   (2)

Where, a value close to  means that the

classification algorithm is not efficient, while a

value close to  indicates that the statistical

algorithm is able to classify instances optimally.

4.2.3 Confusion Matrix

We also evaluated the adopted statistical

classification algorithm with the confusion matrix

evaluation metric. Confusion matrix is a special

form of matrix, which in the case of a multiclass

classification of 5 classes, (i.e., Class 1: Aris, Class

2: Thouria, Class 3: Verga, Class 4: Arfara, and

Class 5: Meligalas) has the following encoded form,

as described in Table 1.

Where, “A” quantity depicts the number of Class

1 instances, which are classified correctly as

instances of Class 1. “B” quantity depicts the

number of Class 1 instances, which are falsely

classified as instances of Class 2. “C” quantity

depicts the number of Class 1 instances, which are

falsely classified as instances of Class 3. “D”

quantity depicts the number of Class 1 instances,

which are falsely classified as instances of Class 4.

“E” quantity depicts the number of Class 1

instances, which are falsely classified as instances of

Class 5. The same holds for the rest elements of the

confusion matrix. A given classification model is

considered efficient if it maximizes the elements of

the main diagonal of the confusion matrix (i.e., “A”,

“G”, “M”, “S”, and “Y”) and minimizes the other

elements. A confusion matrix is incorporated in

machine learning evaluation methodology to

support efficiently and explain in deep detail the

statistical nature of output experimental results

observed by the prediction accuracy evaluation

metric.

5 Experiments and Results

The data model, which is based on fused data values

is used to perform certain experiments and observe

derived results. An experimental setup is necessary

to formulate the experimental phase with certain

evaluation methods and metrics and observe the

results of the current research effort.

5.1 Experimental Setup

Specific parameters are incorporated to set up the

experimental process. Concretely, it is defined as

the number of classes, which is assigned to each

data sample instance. Intuitively, predictive

attributes used to describe a certain class are defined

accordingly. Subsequently, a certain statistical

machine learning algorithm should be adopted to

perform the experiments and observe the results.

5.1.1 Multiclass Classification

Since the number of classes is 5 this classification

process is characterized as a multiclass classification

Class 1

Class 2

Class 3

Class 4

Class 5

 Classified as

A

B

C

D

E

Class 1

F

G

H

I

J

Class 2

K

L

M

N

O

Class 3

P

Q

R

S

T

Class 4

U

V

W

X

Y

Class 5

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2024.20.15

Theodoros Anagnostopoulos, Ioakeim Spiliopoulos

E-ISSN: 2224-3496

142

Volume 20, 2024

problem. Specifically, 5 classes are defined as

follows: (1) Class 1: Aris, (2) Class 2: Thouria, (3)

Class 3: Verga, (4) Class 4: Arfara, and (5) Class 5:

Meligalas. Concretely the number of predictive

attributes is also 5, which are characterized as

follows: (1) Predictive Attribute 1: ‘tocopherols’,

(2) Predictive Attribute 2: ‘phenolic compounds’,

(3) Predictive Attribute 3: ‘oxidation products of

triglycerides’, (4) Predictive Attribute 4: ‘oxidation

products of tocopherols’, and (5) Predictive

Attribute 5: ‘chlorophylls’. The number of data

sample instances is 29, which have the following

distribution per class: (1) 2 samples from Class 1

i.e., Aris, (2) 2 samples from Class 2, i.e., Thouria,

(3) 7 samples from Class 3, i.e., Verga, (4) 15

samples from Class 4, i.e., Arfara, and (5) 3 samples

from Class 5, i.e., Meligalas.

5.1.2 Logistic Statistical Learning Algorithm

To select the optimum statistical learning algorithm

that is effective in this multiclass classification

problem we experimented with several statistical

learning classifiers available in the Weka machine

learning software, [40]. Intuitively, the machine

learning algorithm, which has optimal predictive

behavior emerged to be a Logistic statistical

learning algorithm (i.e., the implementation of the

Logistic Regression algorithm in Weka machine

learning software) thus it is adopted for further

experimentation to observe the derived results of the

current research study.

5.2 Derived Results

To evaluate the experimental phase there is a need

to define a specific evaluation method (i.e., 10-fold

cross-validation) and metrics used to assess the

efficiency of the adopted statistical learning

algorithm, which in this case is the logistic

statistical machine learning algorithm. Concretely,

based on certain evaluation parameters specific

derived results are observed, which define the

effectiveness of the incorporated experimental setup

adopted in the current research effort. Intuitively, to

understand the observed results and be able to

explain the research effort’s findings it is significant

to use the incorporated evaluation method and

evaluation metrics. Such knowledge would reveal

the inherent complexity that exists in the provided

data sources aiming to observe optimal results for

the adopted machine learning algorithm.

5.2.1 Observed Prediction Accuracy

The evaluation method incorporated to evaluate the

adopted machine learning multiclass classification

algorithm is 10-fold cross-validation. According to

this evaluation method observed prediction accuracy

is:   , which is a high value for prediction

accuracy thus proving that the adopted statistical

learning algorithm is suitable for the examined

multiclass classification problem. Concretely, the

high value observed for the prediction accuracy

enables the adopted machine learning algorithm to

be incorporated for similar use in new unseen olive

oil instances in a further future research that might

extend the potentiality of the current research effort

to the geographical region of interest.

5.2.2 Observed Correctly Classified Instances

According to the evaluation method of 10-fold

cross-validation correctly classified instances it

occurred to be:   , which indicated that

the selected statistical machine learning algorithm is

an optimal choice for the examined classification

problem.

5.2.3 Observed Confusion Matrix

Confusion matrix results as derived based on a 10-

fold validation evaluation method for the examined

multiclass classification problem. Derived results

are presented in Table 2.

Table 2. Confusion matrix observed results

It can be observed that most of the classified

instances are located in the main diagonal of Table

2. Specifically, the quantity of elements in the main

diagonal depicts the significant number of certain

instances, which are correctly classified. Concretely,

such an optimal prediction behavior indicates a

robust classification algorithm for the examined

multiclass classification problem. Such a detailed

confusion matrix enables the observation of

experimental results in deep detail thus being able to

assess the efficiency of the adopted machine

learning algorithm for predicting PDO Kalamata

olive oil in other provided experimental instances.

6 Discussion

Problem definition indicates a multiclass

classification problem of 5 discrete classes, with 5

separate predictive attributes and a total of 29

sample instances based on the local geographic

origins of the Koroneiki olive oil variety, which is

Class 1

Class 2

Class 3

Class 4

Class 5

 Classified as

2

0

Class 1

0

2

0

Class 2

1

0

6

0

Class 3

0

15

0

Class 4

0

3

Class 5

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2024.20.15

Theodoros Anagnostopoulos, Ioakeim Spiliopoulos

E-ISSN: 2224-3496

143

Volume 20, 2024

cultivated in the smart city of Kalamata in the

Messenia region, Greece. Subsequently, the current

research effort has achieved high values of the

observed results based on certain evaluation metrics,

which indicate the robustness of the examined

evaluation parameters. Intuitively, the current

research study has significant strengths as well as

certain weaknesses, which should be presented with

regard to a complete methodological research frame.

6.1 Weaknesses of the Study

Initial data as measured by the synchronous

photoluminescence spectra IoT device are

characterized as primitive raw data values, which

should be further processed to enter a statistical

machine learning algorithm to be evaluated

properly. Concretely, visualizing initial data sources

results in a complex plot, where there is vagueness

in distinguishing the adopted 5 classes of the initial

data source. Intuitively, such inefficiency results in

limited evaluation capability based on the available

initial data. Subsequently, classes get tangled up

with each other thus making an inference

assumption difficult to be applied. Exploitation of

visualized initial data is not suitable for further

experimentation in the current form. A fusion

process is required in the initial data to remove

outliers and missing values before further

processing.

6.2 Strengths of the Study

The data fusion process adopted in the current

research study eliminates the vagueness of the

adopted 5 data classes. Concretely, fused data

enabled the emergence of 5 discrete predictive

attributes, which aim to face the vagueness of the

initial data. Specifically, by visualizing fused data it

is proved that the classes and the predictive

attributes are distinguished easily, thus being able to

proceed with further experimentation. Intuitively,

the adopted evaluation method and metrics have

proved to be effective in defining optimal derived

results. Subsequently, the selection of a Logistic

statistical machine learning classifier emerged to be

an efficient solution to the multiclass classification

problem. Concretely, the adopted classification

algorithm was able to predict different classes based

on the fused data sources. Such effectiveness

enabled the capability of distinguishing the origin of

PDO Kalamata olive oil produced in specific local

areas in the rural areas of the Kalamata smart city.

7 Conclusions and Future Work

PDO Kalamata olive oil is an extra virgin olive oil

produced in the province of Messenia in

southeastern Greece (the name stands for capital

city Kalamata). Because of different soil

composition, microclimates, and agronomic factors

olive oil from different areas of Messenia has

diverse characteristics, although within the limits

described by council regulation (EC) No 510/2006.

Adopted synchronous photoluminescence spectra of

olive oils IoT device can specify the different

origins of PDO Kalamata olive oil. In this research

effort, we use statistical machine learning

algorithms to classify several geographic origins.

Evaluation of the statistical models are based on

certain methods and metrics, which have proved to

be promising for differentiating origins thus

enabling olive oil companies to choose PDO

Kalamata olive oil from a specific area of Messenia

with superior characteristics.

According to our research outcomes, future work

should mainly focus on the incorporation of more

detailed input measurement data sources based on

improvements in synchronous photoluminescence

spectra IoT-enabled technology, thus providing a

more robust input to the selected statistical

classification algorithm. Concretely, data fusion

techniques should be reapplied on the more detailed

initial data sources to input several statistical

learning algorithms, which might result in more

effective results. Intuitively, current research could

be further used in more detail to verify

authentication and to detect adulteration of olive oils

with protected designation of origin (PDO) thus

facing the fraud problem occurring in the olive oil

trade. Intuitively, the current research effort focuses

on a specific olive oil variety within a limited

geographic region, while future research should also

focus on the validation of the proposed

methodology to other olive oil varieties and

production areas within the Messenia region and/or

in other geographic regions of Greece that are

popular for the quality of their olive oil production.

References:

[1] S. Paiho, P. Tuominen, J. Rokman, M.

Ylikerala, J. Pajula, and H. Siikavirta,

“Opportunities of collected city data for smart

cities”, IET Smart Cities, Volume 4, Issue 4,

2022, pages 275 – 291.

[2] J. L. D. Boer, and B. Erickson, “Setting the

Record Straight on Precision, Agriculture

Adoption”, Agronomy Journal, Volume 111,

Issue 4, 2019, pages 1535 – 2139.

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2024.20.15

Theodoros Anagnostopoulos, Ioakeim Spiliopoulos

E-ISSN: 2224-3496

144

Volume 20, 2024

[3] X. Miao, J. Ma, X. Miu, H. Zhang, Y. Geng,

W. Hu, Y. Deng, and N. Li, “Integrated

transcriptome and proteome analysis the

molecular mechanisms of nutritional quality

in ‘Chenggu-32’ and ‘Koroneiki’ olives fruits

(Olea europaea L.)”, Journal of Plant

Physiology, Volume 288, Issue 154072, 2023,

pages 1 – 12.

[4] L. Trabelsi, B. Ncube, A. B. Hassena, M.

Zouairi, F. B. Amar, and K. Gargouri,

“Comparative study of productive

performance of two olive oil cultivars

Chemlali Sfax and Koroneiki under arid

conditions”, South African Journal of Botany,

Volume 154, Issue 1, 2023, pages 356 – 364.

[5] A. Issa, M. E. Riachy, C. B. Mitri, J. Doumit,

W. Skaff, and L. Karam, “Influence of

geographical origin, harvesting time and

processing system on the characteristics of

olive-mill wastewater: A step toward reducing

the environmental impact of the olive oil

sector”, Environmental Technology &

Innovation, Volume 32, Issue 103365, 2023,

pages 1 – 12.

[6] R. Aparicio, M. T. Morales, R. A. Ruiz, N.

Tena, and D. L. G. González, “Authenticity of

olive oil: Mapping and comparing official

methods and promising alternatives”, Food

Research International, Volume 54, Issue 2,

2013, pages 2025 – 2038.

[7] D. I. Ellis, H. Muhamadali, S. A. Haughey, C.

T. Elliott, and R. Goodacre, “Point–and–

shoot: Rapid quantitative detection methods

for on–site food fraud analysis–moving out of

the laboratory and into the food supply chain”,

Analytical Methods, Volume 7, Issue 22,

2015, pages 9375 – 9716.

[8] P. Rajak, A. Ganguly, S, Adhikary, and S.

Bhattacharya, “Internet of Things and smart

sensors in agriculture: Scopes and

challenges”, Journal of Agriculture and Food

Research, Volume 14, Issue 100776, 2023,

pages 1 – 13.

[9] J. Krause, H. Gruger, L. Gebauer, X. Zheng,

J. Knobbe, T. Pgner, A. Kicherer, R. Gruna,

T. Langle, and J. Beyerer, “Smart

Spectrometer–Embedded Optical

Spectroscopy for Applications in Agriculture

and Industry”, Sensors, Volume 21, Issue 13,

2021, pages 1 – 18.

[10] R. Karoui, and C. Blecker, “Fluorescence

Spectroscopy Measurement for Quality

Assessment of Food Systems – a Review”,

Food Bioprocess Technology, Volume 4,

Issue 1, 2011, pages 364 – 386.

[11] S. Khani, J. B. Ghasemi, and Z. P. Vanak,

“Development of computer vision system for

classification of olive oil samples with

different harvesting years and estimation of

chlorophyll and carotenoid contents: A

comparison of the proposed method’s

efficiency with UV-Vis spectroscopy”,

Journal of Food Composition and Analysis,

Volume 129, Issue 106078, 2024, pages 1 –

42.

[12] S. K. Drakopoulou, A. S. Kritikou, C.

Baessmann, and N. Thomaidis, “Untargeted

4D-metabolomics using Trapped Ion Mobility

combined with LC-HRMS in extra virgin

olive oil adulteration study with lower-quality

olive oils”, Food Chemistry, Volume 434,

Issue 137410, 2024, pages 1 – 9.

[13] M. E. Schiano, F. Sodano, C. Cassiano, E.

Magli, S. Seccia, M. G, Rimoli, and S.

Albrizio, “Monitoring of seven pesticide

residues by LC-MS/MS in extra virgin olive

oil samples and risk assessment for

consumers”, Food Chemistry, Volume 442,

Issue 138498, 2024, pages 1 – 8.

[14] R. Reda, T. Saffaj, I. Bouzida, O. Saidi, M.

Belgrir, B. Lakssir, and E. M. E. Hadrami,

“Optimized variable selection and machine

learning models for olive oil quality

assessment using portable near infrared

spectroscopy”, Spectrochimica Acta Part A:

Molecular and Biomolecular Spectroscopy,

Volume 303, Issue 123213, 2023, pages. 1 –

11.

[15] K. D. T. M. Milanez, T. C.A. Nóbrega, D. S.

Nascimento, M. Insausti, B. S. F. Band, and

M. J. C. Pontes, “Multivariate modeling for

detecting adulteration of extra virgin olive oil

with soybean oil using fluorescence and UV–

Vis spectroscopies: A preliminary approach”,

LWT – Food Science and Technology,

Volume 85, Issue 1, 2017, pages 9 – 15.

[16] R. A. Santos, J. C. Cancilla, A. P. Pérez, A.

Moral, and J. S. Torrecilla, “Quantifying

binary and ternary mixtures of monovarietal

extra virgin olive oils with UV–vis absorption

and chemometrics”, Sensors and Actuators B:

Chemical, Volume 234, Issue 1, 2016, pages

115 – 121.

[17] I. D. Merás, J. D. Manzano, D. A. Rodríguez,

and A. M. Peña, “Detection and quantification

of extra virgin olive oil adulteration by means

of autofluorescence excitation–emission

profiles combined with multi–way

classification”, Talanta, Volume 178, Issue 1,

2018, pages 751 – 762.

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2024.20.15

Theodoros Anagnostopoulos, Ioakeim Spiliopoulos

E-ISSN: 2224-3496

145

Volume 20, 2024

[18] Y. Li, T. Fang, S. Zhu, F. Huang, Z. Chen,

and Y. Wang, “Detection of olive oil

adulteration with waste cooking oil via Raman

spectroscopy combined with iPLS and

SiPLS”, Spectrochimica Acta Part A:

Molecular Biomolecular Spectroscopy,

Volume 189, Issue 1, 2018, pages 37 – 43.

[19] F. D. Girolamo, A. Masotti, I. Lante, M.

Scapaticci, C. D. Calvano, C. Zambonin, M.

Muraca, and L. A. Putignani, “Simple and

effective mass spectrometric approach to

identify the adulteration of the mediterranean

diet component extra–virgin olive oil with

corn oil”, International Journal of Molecular

Sciences, Volume 16, Issue 9, 2015, pages

20896 – 20912.

[20] A. Rotondo, L. Mannina, and A. Salvo,

“Multiple Assignment Recovered Analysis

(MARA) NMR for a Direct Food Labeling:

The Case Study of Olive Oils”, Food

Analytical Methods, Volume 12, Issue 1,

2019, pages 1238 – 1245, DOI:

10.1007/s12161-019-01460-4.

[21] M. M. Mossoba, H. Azizian, A. R. F. Kia, S.

R. Karunathilaka, and J. K. G. Kramer, “First

Application of Newly Developed FT–NIR

Spectroscopic Methodology to Predict

Authenticity of Extra Virgin Olive Oil Retail

Products in the USA”, Lipids, Volume 52,

Issue 5, 2017, pages 443 – 455, DOI:

10.1007/s11745-017-4250-5

[22] H. Zaroual, C. Chene, E. M. E. Hadrami, and

R. Karoui, “Comparison of four classification

statistical methods for characterizing virgin

olive oil quality storage up to 18 months”,

Food Chemistry, Volume 370, Issue 131009,

2022, pages 1 – 16.

[23] X. Wu, S. Gao, Y. Niu, Z. Zhao, B. Xu, R.

Ma, H. Liu, and Y. Zhang, “Identification of

olive oil in vegetable blend oil by one-

dimensional convolutional neural network

combined with Raman spectroscopy”, Journal

of Food Composition and Analysis, Volume

108, Issue 104396, 2022, pages 1 – 7.

[24] I. H. A. S. Barros, L. S. Paixao, M. H. C.

Nascimento, V. J. Lacerda, P. R. Figueiras,

and W. Romao, “Use of portable Raman

spectroscopy in the quality control of exrtra

virgin olive oil and adulterated compound

oils”, Vibrational Spectroscopy, Volume 116,

Issue 103299, 2021, pages 1 – 10.

[25] L.Cecchi, M. Migliorini, E. Giambanelli, A.

Rosseti, A, Cane, N. Mulinacci, and F.

Melani, “Authentication of the geographical

origin of virgin olive oils from the main

worldwide producing countries: A new

combination of HS-SPME-GC-MS analysis of

volatile compounds and chemometrics applied

to 1217 samples”, Food Control, Volume 112,

Issue 107156, 2020, pages 1 – 10.

[26] D. Stefas, N. Gyftokostas, P. Kourelias, E.

Nanou, V. Kokkinos, C. Bouras and S.

Couris, “Discrimination of olive oils based on

the olive cultivar origin by machine learning

employing the fusion of emission and

absorption spectroscopic data”, Food Control,

Volume 130, Issue 108318, 2021, pages 1 – 8.

[27] V. Rotich, D. F. A. Riza, F. Giametta, T.

Suzuki, Y. Ogawa, and N. Kondo, “Thermal

oxidation assessment of Italian extra virgin

olive oil using an UltraViolet (UV) induced

fluorescence imaging system”,

Spectrochimica Acta Part A: Molecular and

Biomolecular Spectroscopy, Volume 237,

Issue 118373, 2020, pages 1 – 8.

[28] A. Bagnall, L. Davis, J. Hills, and J. Lines,

“Transformation Based Ensembles for Time

Series Classification”, Proceedings of the

2012 SIAM International Conference on Data

Mining (SDM), Anaheim, California, USA,

April 26 – 28, 2012, pages 307 – 318.

[29] H. S. Tapp, M. Defernez, and E. K. Kemsley,

“FTIR Spectroscopy and Multivariate

Analysis Can Distinguish the Geographic

Origins of Extra Virgin Olive Oils”, Journal

of Agricultural and Food Chemistry, Volume

51, Issue 21, 2003, pages 6110 – 6115, DOI:

10.1021/jf030232s.

[30] P. D. L. Mata, A. D. Vidal, J. M. B. Sendra,

A. R. Medina, L. C. Rodriguez, and M. J. A.

Canada, “Olive oil assessment in edible oil

blends by means of ATR-FTIR and

chemometrics”, Food Control, Volume 23,

Issue 2, 2012, pages 449 – 455.

[31] E. Sikorska, I. Khmelinskii, and M. Sikorski,

“Analysis of Olive Oils by Fluorescence

Spectroscopy: Methods and Applications”,

InTech, Volume 1, Issue 1, 2012, pages 63 –

88, DOI: 10.5772/30676.

[32] E. Sikorska, T. Gorecki, I. V. Khmelinskii, M.

Sikorski, and J. Koziol, “Classification of

edible oils using synchronous scanning

fluorescence spectroscopy”, Food Chemistry,

Volume 89, Issue 2, 2005, pages 217 – 225.

[33] E. Sikorska, A. G. Swiglo, I. Khmelinskii, and

M. Sirorski, “Synchronous Fluorescence of

Edible Vegetable Oils. Quantification of

Tocopherols”, Journal of Agriculture and

Food Chemistry, Volume 53, issue 18, 2005,

pages 6988 – 6994, DOI: 10.1021/jf0507285.

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2024.20.15

Theodoros Anagnostopoulos, Ioakeim Spiliopoulos

E-ISSN: 2224-3496

146

Volume 20, 2024

[34] N. Dupuy, Y. L. Dreau, D. Ollivier, J. Artaud,

C. Pinatel, and J. Kister, “Origin of French

Virgin Olive Oil Registered Designation of

Origins Predicted by Chemometric Analysis

of Synchronous Excitation-Emission

Fluorescence Spectra”, Journal of

Agricultural and Food Chemistry, Volume 53,

Issue 24, 2005, pages 9361 – 9368.

[35] R. M. Maggio, E. Valli, A. Bendini, A. M. G.

Caravaca, T. G. Toschi, and L. Cerretani, “A

spectroscopic and chemometric study of

virgin olive oils subjected to thermal stress”,

Food Chemistry, Volume 127, Issue 1, 2011,

pages 216 – 221.

[36] E. Bertran, M. Blance, J. Coello, H. Iturriaga,

S. Maspoch, and I. Montoliu, “Near infrared

spectrometry and pattern recognition as

screening methods for the authentication of

virgin olive oils of very close geographical

origins”, Journal of Near Infrared

Spectroscopy, Volume 8, Issue 1, 2000, pages

45 – 52.

[37] A. L. Prieto, N. Tena, R. A. Ruiz, D. L. G.

Gonzalez, and E. Sikorska, “Monitoring

Virgin Olive Oil Shelf-Life by Fluorescence

Spectroscopy and Sensory Characteristics: A

Multidimensional Study Carried Out under

Simulated Market Conditions”, Foods,

Volume 9, Issue 12, 2020, pages 1 – 20, DOI:

10.3390/foods9121846

[38] The LS-55 and LS-45 Fluorescence

Spectrofluorometers, Perkin Elmer, [Online].

https://resources.perkinelmer.com/lab-

solutions/resources/docs/BRO_LS-55andLS-

45FluorescenceSpectrophotometer.pdf

(Accessed Date: February 26, 2024).

[39] E. Frank, M. A. Hall, and I. H. Witten, The

WEKA Workbench. Online Appendix for

“Data Mining: Practical Machine Learning

Tools and Techniques”, Morgan Kaufmann,

Fourth Edition, 2016.

[40] M. Hall, E. Frank, G. Holmes, B. Pfahringer,

P. Reutemann, and I. H. Witten, “The WEKA

Data Mining Software: An Update”, SIGKDD

Explorations, Volume 11, Issue 1, 2009,

pages 10 – 18.

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

The authors equally contributed to the present

research, at all stages from the formulation of the

problem to the final findings and solution.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

No funding was received for conducting this study.

Conflict of Interest

The authors have no conflicts of interest to declare.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2024.20.15

Theodoros Anagnostopoulos, Ioakeim Spiliopoulos

E-ISSN: 2224-3496

147

Volume 20, 2024