Data Augmentation to Improve the Soundscape Ranking Index

Prediction

ROBERTO BENOCCI1, ANDREA POTENZA1, GIOVANNI ZAMBON1,

ANDREA AFIFY2,3, H. EDUARDO ROMAN2

1Department of Earth and Environmental Sciences (DISAT)

University of Milano-Bicocca

Piazza della Scienza 1, 20126 Milano

2Department of Physics

University of Milano-Bicocca

Piazza della Scienza 3, 20126 Milano

3NEXiD Edge

NEXiD

Via Fabio Filzi 27, 20124 Milano

Abstract: Predicting the sound quality of an environment represents an important task especially in urban parks

where the coexistence of sources of anthropic and biophonic nature produces complex sound patterns. To this

end, an index has been deﬁned by us, denoted as soundscape ranking index (SRI), which assigns a positive weight

to natural sounds (biophony) and a negative one to anthropogenic sounds. A numerical strategy to optimize the

weight values has been implemented by training two machine learning algorithms, the random forest (RF) and

the perceptron (PPN), over an augmented data-set. Due to the availability of a relatively small fraction of labelled

recorded sounds, we employed Monte Carlo simulations to mimic the distribution of the original data-set while

keeping the original balance among the classes. The results show an increase in the classiﬁcation performance.

We discuss the issues that special care needs to be addressed when the augmented data are based on a too small

original data-set.

Key-Words: - soundscape; data augmentation; machine learning; ecoacoustic indices; soundscape ranking index;

urban parks;

Received: April 9, 2023. Revised: July 2, 2023. Accepted: September 2, 2023. Published: September 20, 2023.

1 Introduction

Soundscape analysis has grown in importance dur-

ing the last decades as a tool for a rapid evalua-

tion of environmental sounds. The analysis relies on

non-invasive techniques for ecological surveillance

thanks to the use of a widespread monitoring tech-

nology known as passive acoustic monitoring (PAM)

techniques. The latter can provide large amount of

data for long periods of time even in hardly acces-

sible geographical locations. In this respect, ecoa-

coustics has emerged as a new branch of acous-

tics, which is based on the idea that acoustic ambi-

ence measurements can provide a signiﬁcant infor-

mation on the sound sources present within a given

landscape. Thus, the contributions from different

sound sources, identiﬁed as geophony (physical), bio-

phony (biological) and anthropogenic (anthrophony

or technophony) sources, [1], [2], [3], [4], may pro-

vide an estimation of the health of a habitat across

space and time. The complexity of the assemblage of

sounds in an environment is usually summarized in

terms of ecoacoustics indices, which can be evaluated

over speciﬁc intervals of time, which map the acous-

tic dynamics of an ecosystem, [5], into time series

describing the observed status changes taking place

in a given habitat, [6]. The information carried by

the calculation of ecoacoustic indices are usually val-

idated by listening to hours of recordings in order to

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2023.19.85

Roberto Benocci, Andrea Potenza,

Giovanni Zambon, Andrea Afify,

H. Eduardo Roman

E-ISSN: 2224-3496

891

Volume 19, 2023

ITALY

identify known sound categories. This activity is gen-

erally referred to as sound-truthing.

The manual procedure to identify the different

sound sources is typically very time consuming, re-

quiring speciﬁc technical preparationa and long-time

training. For this reason, this methodology is usu-

ally applied to small data-sets, [7], [8]. A summa-

tive approach has also been proposed by providing

qualitative information of the sound characteristics in

each audio recording, [9]. In this sense, the consid-

eration of just few or many bird sounds, few or many

bird species, and low or high trafﬁc noise, etc., may

simplify and speed up the validation process. How-

ever, such information alone cannot provide an ac-

curate assessment of the acoustic quality characteriz-

ing a complex habitat. In order to estimate a sound-

scape index, an attempt has been recently made by

empirically accounting for the contribution of differ-

ent sound sources, which assigns a positive weight to

natural sounds (biophony) and a negative one to an-

thropogenic sounds. The results can be summarized

in terms of an index denoted as soundscape ranking

index (SRI), which has proved to disentangle com-

plex environmental sound blends such as those found

in urban parks, [10]. The analysis outlined in this pa-

per refers to the sound recordings taken at Parco Nord

(Northern Park) of the city Milan (Italy).

It is well established that machine learning (ML)

algorithms can be trained to learn generic features

extracted from a database, [11], [12], [13], which

can then be used for dealing with many useful and

complex tasks. For example, the application of ML

techniques for studying the dynamics of a sound-

scape has provided accurate identiﬁcations of differ-

ent bird species, [14], [15], and also the separation

of the different audio sources when they are mixed

in a set containing an unknown combination of their

components, [16]. In urban areas, ML models have

been used to predict long-term acoustic patterns from

short-term sound measurements, [17], and for ﬁlter-

ing out anomalous noise events before computing the

actual trafﬁc noise maps, [18]. ML techniques have

been also applied for soundscape classiﬁcation, [19],

[20], species recognition, [21], [22], and the identiﬁ-

cation of mixed noise sources using a two-stage clas-

siﬁer which is able to discriminate different urban

acoustic events in real time, by relying on the nor-

mally present redundant information from a network

of acoustic sensors in the city of Barcelona, [23].

The implementation of a huge manual labelling

(about 60,000 sound recordings) has proved to be

satisfactory for the identiﬁcation of different sound

sources, [24]. A huge dataset collected over four

years across Sonoma County (California) by citizen

scientists was used to predict soundscape components

with success, [25]. The automatic recognition of

the soundscape quality of urban recordings has been

studied by applying four different support vector ma-

chine (SVM) regressors to a combination of spectral

features, [26]. In [27], a combination of temporal,

spectral, and perceptual features was used to classify

urban sound events belonging to nine different cate-

gories. In [28], different acoustic indicators taken in

the city of Barcelona were used to train several clus-

tering algorithms with recognizing the possibility of

clustering the city area according to the noise levels.

ML has been also used in a preliminary work, to op-

timize the deﬁnition of SRI, [29]. Here, the set of

weights that deﬁne the SRI were searched by train-

ing four machine learning algorithms, Decision Tree

(DT), Random Forest (RF), Adaptive Boosting (Ad-

aBoost), Support Vector Machine (SVM), over a rela-

tively small number of labelled sound recorded audio

ﬁles.

More speciﬁcally, it was shown that two classiﬁ-

cation models, DT and AdaBoost, were able to pro-

vide a set of weights characterized by a rather good

classiﬁcation performance. The obtained results were

in quantitative agreement with two different statis-

tical approaches: a self-consistent estimation of the

mean SRI values at each site, [30], and a cluster anal-

ysis performed, over the extracted features at each

site, [31].

In this work, we studied the possibility of pre-

dicting the SRI at an urban park in the city of Mi-

lan (Italy) from the extracted spectral features of au-

dio recordings in the form of seven ecoacoustic in-

dices. Using these features, in [29], we obtained

quite poor classiﬁcation scores. In actual facts, clas-

siﬁcation performance is strongly affected by unbal-

anced classes (in our case the number of recording

occurring in each sound quality class). To deal with

this issue, we aim here at improving the classiﬁcation

scores by using Monte Carlo simulations to create a

larger and well-balanced labelled dataset.

2 Materials and Methods

The urban area of study is described, together with

the employed instrumentation. The method used is

brieﬂy reviewed by reminding the reader of the def-

inition of the SRI and the way it is optimized using

machine learning methods.

2.1 Area of Study

Following our previous works, we discuss results on

the urban zone known as the Parco Nord (Northern

Park) in Milan city. The latter covers an area of about

800 hectares, located within a quite developed and ur-

banised zone. The park possesses a tree-covered par-

cel of little more than 20 hectares surrounded by few

agricultural ﬁelds, lawns, paths and roads (Figure 1).

The location of the park is delimited to the north by

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2023.19.85

Roberto Benocci, Andrea Potenza,

Giovanni Zambon, Andrea Afify,

H. Eduardo Roman

E-ISSN: 2224-3496

892

Volume 19, 2023

trafﬁcked roads, including a highway and a main city

street, at about 100 m from the wooded parcel. To the

west side of the park there is also a small city airport,

which is around 500 m from the tree-line edge.

2.2 Audio recorders

In our study, we employed commercial SMT Security

low-cost digital audio recorders. Their working set up

was chosen to be able to measure at a sampling rate of

48 kHz in an almost continuous mode. Devices with

similar characteristics in terms of frequency response

were selected in the measuring campaign. For the

analysis, we scheduled the recording to coincide with

greatest singing avifauna activity, taken on May 25th

2015, during the morning hours from 06:30 a.m. to

10:00 a.m. (CET). They correspond to 3.5 h for each

sensor and recording session. As mentioned in Fig-

ure 1, some sensors did not work properly and were

excluded. As a result, we were left with 16 sites,

while the total recordings data analyzed resulted in

1120 ﬁles.

2.3 Aural Survey

In order to quantify the quantify the presence of bio-

phonies, anthrophonies and geophonies, an aural sur-

vey was implemented. In particular, an expert lis-

tened carefully each one of the recordings according

to the following scheme: for every three minutes of

continuous recording, only one-minute was listened

and interpreted. This yielded a total of 70 min of

listening per site. In order to facilitate the task, the

expert considered only ﬁve sound categories: birds,

other animals, road trafﬁc noise, other noise sources

(such as airplanes, trains). Then, information about

each type of sound source, such as its occurrence or

not presence, in addition to the intensity, were deter-

mined.

More speciﬁcally, four parameters were consid-

ered and evaluated: (1) Individual abundance, re-

ferred to simply as, no–few–many subjects. (2)

Perceived singing activity, represented in terms of

the fraction of time (percent) attributed to avian

vocalizations, (0 100 %). (3) Species richness,

also referred to as, none–one–more than one species.

(4) Vocalization intensity, again referred to as, no–

low–high intensity. Regarding other animals and/or

people, the indicator was considered as an either

present or absent event.

It was found that road trafﬁc was the main anthro-

pogenic noise in the zone. For this source, only two

parameters were considered: (1) Noise intensity, rep-

resented as, no–low–high intensity. (2) Typology of

trafﬁc, that is either continuous or intermittent trafﬁc,

or no trafﬁc at all.

2.4 The soundscape ranking index, SRI

We brieﬂy review the deﬁnition of the SRI, intro-

duced as a simple criterion to estimate the sound qual-

ity of a local environment, [10]. Let us consider a

single audio recording labeled by the letter `. In this

case the SRI is deﬁned as,

SRI`=

∑

i=0

cNiNi,`,(1)

where nc+1 is the total number of identiﬁed cate-

gories (birds, other animals, road trafﬁc noise, other

noise sources, rain and wind), here nc=4, Ni,r=1

if the ith sound category is present at the record-

ing r, and Ni,r=0 otherwise. The coefﬁcients can

take one of the following values: cNi>0 (cNi→c+,

c++) when the sound category corresponds to a nat-

ural sound, in that case we split the values into two

possible sub-ranges (+,++); and cNi<0 (cNi→c−,

c−−) corresponding to a potential disturbing event,

also split into two sub-ranges. The absence of birds

vocalization is regarded as a neutral event, cN0=0,

setting the separation between natural and anthro-

pogenic events. In Table 1, we summarize the em-

ployed ranges of variability for the coefﬁcients cNi.

Our deﬁnition Eq. (1) is expected to yield SRI val-

ues representative of the quality of the environmental

sound. For the sake of simplicity, we choose three

intervals of SRI to deﬁne the environmental sound

quality, for a single recording denoted simply as `,

given by

SRI`<0[poor quality],

0≤SRI`≤2[medium quality],(2)

SRI`>2[good quality].

2.5 SRI optimization procedure

To each sound category i, we assign a weight, cNi,

according to the attributes extracted from the audio

recording (see the column ‘Attribute’ in Table 2). For

instance, the singing activity gets a weight depend-

ing on the percentage of singing birds detected in

each recording: the interval (0,25]%, corresponds to

a weight of 0.25 ×c++, while the (25,50]% one be-

comes the weight 0.50 ×c++, etc. As a general con-

straint, we assume that cNican vary within the inter-

vals mentioned in Table 1.

For the purpose of the using the ML techniques,

both the spectral features associated to the ecoacous-

tic indices, and the corresponding SRIs need to be

split into a ‘training’ set, and a ‘test’ one. The former

is used as input for each classiﬁcation model, whereas

the ﬁnal performance of the latter is quantiﬁed ac-

cording to the classiﬁcation metrics used, here em-

ployed the F1–score. In the process of classiﬁcation,

the SRIs become the target variable which is expected

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2023.19.85

Roberto Benocci, Andrea Potenza,

Giovanni Zambon, Andrea Afify,

H. Eduardo Roman

E-ISSN: 2224-3496

893

Volume 19, 2023

Figure 1: Aerial view of the Parco Nord, on which we have indicated the grid of sensors employed in the record-

ings (red and yellow dots). The red spots correspond to the effectively working sensors, while the fewer yellow

spots those which did not function properly and were excluded from the analysis. On the plot, we have high-

lighted the A4 highway and Padre Turoldo street to the north of the park. The Bresso airport can be seen at the

left side of the picture.

Table 1: Intervals of variation for the coefﬁcient cNiwhich are assigned to each sound category, i=0,...,4, to be

employed in Eq. (1). In the present calculations, we have taken the total range of variation of the coefﬁcients to

be: −5≤cNi≤5.

cNiRange

c++ [2, 5]

c+[0, 2]

cN00

c−[-2,0]

c−− [-5,-2]

to be predicted by the classiﬁcation model. The pro-

cess is repeated for every combination of weights,

cNi, for i>0, that are varied in the corresponding in-

terval of variability. The optimal set of weights deﬁn-

ing our target SRI is taken from the best classiﬁcation

score obtained.

2.6 Classiﬁcation models

It is convenient to brieﬂy describe the classiﬁca-

tion models used for predicting the manual labelling

sound categories and the SRI index from the ecoa-

coustic indices. As a matter of fact, ML learning

methods are suitable to capture the potentially non-

linear relationships among variables, which are not

known a priori. The models we have considered here,

which have been implemented in Python program-

ming language, [32], are the following,

• Random Forest (RF),

• Single-layer neural network or Perceptron

(PPN).

The RF is used to ﬁt a given number of decision tree

classiﬁers on various sub-samples of the data-set, us-

ing an averaging procedure to improve its predictive

accuracy, and very importantly to also control over-

ﬁtting, [33]. For RF, we use default parameters with

the exception of Max-depth = 3, typically devoted to

control the size of the tree to prevent over-ﬁtting.

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2023.19.85

Roberto Benocci, Andrea Potenza,

Giovanni Zambon, Andrea Afify,

H. Eduardo Roman

E-ISSN: 2224-3496

894

Volume 19, 2023

Table 2: Coefﬁcient cNiassigned to each sound category to be used in Eq. (1).

Category Attribute cNi

no cN0

Birds singing few c+

many c++

no cN0

Birds species .2c+

>2 c++

0 0.00 ×c++

(0,25] 0.25 ×c++

Singing activity (%) (25,50] 0.50 ×c++

(50,75] 0.75 ×c++

(75,100] 1.00 ×c++

no trafﬁc c+

Trafﬁc type continuous c−

intermittent c−−

zero c+

Trafﬁc intensity low c−

high c−−

Other sound sources absent cN0

present c−−

The PPN was the ﬁrst and simplest type of artiﬁ-

cial neural network worked out and reported in the

Literature, [34]. In a PPN network, the informa-

tion oscillates back and forth from the initial input

nodes, via the (if any) hidden nodes, toward the out-

put nodes. The simplest kind of neural network is

clearly just a single-layer Perceptron, consisting of

one layer of output nodes. The inputs are then fed di-

rectly to the outputs via a series of weights. The sum

of all the products of weights and inputs are then cal-

culated at each node. If the resulting value is above a

given threshold, the neuron ﬁres and is updated as ac-

tivated value, otherwise it remains at the deactivated

value. For the Perceptron model, we used the follow-

ing settings,

• seven neurons as input layer with activation

function ReLU,

• three neurons as output layer with activation

function Softmax,

• number of epochs 100,

• learning rate 0.001.

In our calculations for the implementation of the

SRI optimization procedure, the supervised classiﬁ-

cation models have been trained on the 80% of the

data, and tested on the remaining 20%. The split-

ting of the data have been performed using a stratiﬁed

procedure in keeping the proportions between the tar-

get variable classes. We chose to implement the RF

model and the PPN for essentially two reasons: The

former yielded a good performance over other classi-

ﬁcation models in similar contexts, [29], whereas the

latter represents the simplest neural network, both in

terms of complexity and computing resources.

2.7 Ecoacoustic indices

In this work, we focused on the following set of ecoa-

coustic indices: the acoustic entropy index (H), [35],

the acoustic complexity index (ACI), [36], the nor-

malized difference soundscape index (NDSI), [37],

the bio-acoustic index (BI), [38], the dynamic spec-

tral centroid (DSC), [39], the acoustic diversity index

(ADI), [39] and the acoustic evenness index (AEI),

[39].

The indices were evaluated using the R statisti-

cal package (in particular, the version 3.5.1, [40]).

Speciﬁcally, the fast Fourier transform (FFT) was

computed by the function spectro available in the R

package “seewave”, [41]. The calculations were re-

stricted to the frequency interval (0.1-12) kHz based

on 1024 data points, corresponding to a frequency

resolution FR = 46.875 Hz and, therefore, to a time

resolution TR = 1/FR = 0.0213 s. The indices were

calculated using the “soundecology” package, [42].

Finally, a dedicated script running in the “R” envi-

ronment has been written to calculate the DSC index.

For each one-minute recording, obtained as discussed

above, we computed seven cumulative indices. Each

recording was then represented by seven indices or

features.

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2023.19.85

Roberto Benocci, Andrea Potenza,

Giovanni Zambon, Andrea Afify,

H. Eduardo Roman

E-ISSN: 2224-3496

895

Volume 19, 2023

2.8 Data augmentation by Monte Carlo

simulations

We used Monte Carlo simulations to generate a set

of random numbers with the same distribution as the

original data set, that in our case correspond to the

distribution of the seven ecoacoustic indices in each

sound quality class. For each combination of weights,

cNi, for i>0, varied in the assigned interval, a distri-

bution for each of the seven ecoacoustics indices was

derived. For each combination of weights, a Monte

Carlo data augmentation has been applied in order to

reproduce the original distribution in each class, and

care was taken in order to obtain a balanced distribu-

tion of classes. In our case, we managed to obtain

500 instances for each class. As an example, Figure 2

illustrates the density distribution of ACI of the orig-

inal data-set split into the three classes obtained for a

particular combination of weights (c++=2.7, c+= 1.5,

c−=-1.0, c−−=-2.5).

Operationally, each of the ecoacoustic indices is

ﬁrst normalized, by subtracting its mean value and

dividing by its standard deviation. Then, the cumu-

lative probability is computed and, ﬁnally, its inverse

function is derived (Figure 3). A random number thus

corresponds to a value of ACI belonging to the origi-

nal distribution.

3 Results

We ran RF and PPN models to attempt a prediction of

the soundscape ranking indexes calculated assigning

a set of weights to each sound category. We varied the

different weights according to the intervals reported

in Table 2. Then, the best combination of weights

is the one which has the highest score provided by

each classiﬁcation model. In our case, we decided to

employ the F1score and the accuracy criterion. The

F1score is usually suited for dealing with unbalanced

classes, whereas the accuracy method for balanced

classes. The results are reported in Table 3.

The best F1–score for RF yields 0.63 and 0.60-

0.61 for DNN (Table 3). This means that the simpler

RF model can provide a slightly higher classiﬁcation

score.

3.1 Implementation of the Monte Carlo

simulations

For each combination of weights, the occurrences in

each class can substantially vary as it can be observed

in Figure 4, showing the counts distribution for each

class (Class 1, Class 2, Class 3), produced by differ-

ent weight combinations whose range of variability is

deﬁned in Table 2.

The median values and the interquartile ranges

(IQR) for Class 1 and Class 2 show that the major-

ity of counts are unbalanced with respect to Class 3

(Table 4). This result may also be interpreted by con-

sidering that the majority of data ﬁles show a preva-

lence of Good (Class 3) SRIs.

This unbalanced distribution of classes may in-

troduce a bias in the classiﬁcation process. One of

the methods to augment the data-set is to lean on a

Monte Carlo expansion of the data based on the dis-

tribution of the original data-set split into each class.

Monte Carlo data augmentation can effectively repro-

duce the original distribution in each class, thus lead-

ing to a more balanced distribution of classes. In our

case, we managed to obtain 500 instances for each

class.

At ﬁrst, we compared the classiﬁcation by using

the augmented data-set that, as we mentioned, have

been balanced to reach 500 elements for each class

using the same set of weights that are reported in Ta-

ble 3. The results are shown in Table 5, suggesting

that data augmentation and data balancing provide

quite different results than those obtained using the

original unbalanced data.

Since the expanded data represent a balanced as-

set for the classiﬁcation models, we explored the ef-

fect of data augmentation on other combination of

weights. Thus, we ﬁrst expanded the data on the ba-

sis of MC calculation, creating new data sets with

balanced classes and then we repeated the classiﬁ-

cation for all the augmented data sets (360,000 ﬁles

obtained from the combinations of all the possible

weights in the range reported in Table 1). One of the

major concerns when using MC data augmentation is

the numerousness of the original class. Indeed, data

expansion based on few data (see data belonging to

whiskers of the boxplot of Figure4) would bias the

new augmented class. For this reason, we decided to

set the numerousness of the original class to be ex-

panded above 100 counts. The results of the distribu-

tion for the Accuracy measure, for both models, are

reported in Figure5. The two distributions are similar

showing an accuracy peak value of 0.50 and 0.53 for

PPN and RF, respectively.

In actual situations, we are mainly interested in

the highest scores yielded by the two classiﬁcation

models. These results are reported in Table 6.

4 Discussion

The seven spectral features, derived by integrating

the ecoacoustic indices over the whole length of the

recording (1 minute), contain condensed informa-

tion about the spectral variability that could be repre-

sented by lower integration times. Therefore, it does

not appear to be enough to represent the complex-

ity of the soundscape in a single summative index.

However, both models yield a signiﬁcant improve-

ment in classifying the new augmented dataset, in

going from 0.60-0.63 (F1–score) to 0.74-0.75 (Accu-

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2023.19.85

Roberto Benocci, Andrea Potenza,

Giovanni Zambon, Andrea Afify,

H. Eduardo Roman

E-ISSN: 2224-3496

896

Volume 19, 2023

Figure 2: Density distribution of ACI of the original data-set split into the three classes obtained with the following

combination of weights: c++=2.7, c+= 1.5, c−=-1.0, c−−=-2.5.

Figure 3: (Left panel) Cumulative probability distribution calculated for the normalized ACI, by subtracting the

mean value and dividing by the standard deviation. (Right panel) Inverse function used for the MC simulations.

Table 3: Results for the RF and PPN models, using eco-acoustic indices as extracted spectral features. Clas-

siﬁcation measure (F1–Score with its standard deviation), range of weights values and class numerousness are

reported. Class 1, Class 2, Class 3, correspond, respectively, to: Poor, Medium, Good, soundscape quality.

F1–score c++ c+c−c−− Class 1 Class 2 Class 3

RF 0.63±0.12 2.0 2.0 [-1.4, -1.5] [-2.6, -2.7] 228 515 377

DNN 0.61±0.04 2.1 0 0 -4.8 432 364 324

0.60±0.06 [2.2; 2.3] [0.9; 1.3] [-1.9; -1.8] [-2.2; -2.0] [245; 276] [388; 413] [456; 462]

Table 4: Median values and interquartile ranges (IQR) for Class 1, Class2 and Class 3, corresponding to Figure4.

Median IQR

Class1 165 185

Class 2 170 120

Class 2 785 292

racy; F1–score measure is pretty similar). The set of

weights that realizes this new classiﬁcation presents

some differences from those obtained in the original

unbalanced classiﬁcation but are similar to each other

(RF and PPN with augmented data).

In Figure 6, we compare the SRI maps calculated

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2023.19.85

Roberto Benocci, Andrea Potenza,

Giovanni Zambon, Andrea Afify,

H. Eduardo Roman

E-ISSN: 2224-3496

897

Volume 19, 2023

Figure 4: Counts distribution for each class: Class 1, Class 2, Class 3, corresponding to Poor, Medium, Good

soundscape quality, of the original data-set calculated over all the combination of weights.

Table 5: Accuracy measure calculated for RF and PPN models, using seven eco-acoustic indices as extracted

spectral features, 500 augmented data for each class and the same weights reported in Table 3.

Accuracy

RF 0.48

DNN 0.59

Figure 5: Density distribution of accuracy measure calculated from the augmented data. The peak of the distri-

butions is at 0.51 (dashed red line) and 0.53 (dashed black line) for PPN and RF, respectively.

Table 6: Best accuracy scores obtained for RF and PPN models, using 500 augmented spectral features per class.

Weights values and class numerousness of original data are also reported.

Accuracy max F1–score c++ c+c−c−− Class 1 Class 2 Class 3

RF 0.751 0.751 2.1 1.9 -0.0 -3.5 123 128 869

PPN 0.746 0.737 2.0 1.5 -0.1 -5.0 371 354 395

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2023.19.85

Roberto Benocci, Andrea Potenza,

Giovanni Zambon, Andrea Afify,

H. Eduardo Roman

E-ISSN: 2224-3496

898

Volume 19, 2023

Figure 6: SRI maps obtained for the models: (a) PPN and, (b) RF.

over the study area based on the weights obtained for

the RF and PPN models with the augmented data. For

each of the 16 sites, we considered the median value

of the SRI computed over all the measurements corre-

sponding to the labeled recordings. As expected, the

results are very similar showing a clustering of sites

facing the trafﬁc noise sources and those at the park

interior. These results are in fairly good agreement

with previous analysis obtained using a statistical ap-

proach, [30], [31].

5 Conclusions

The soundscape analysis in urban parks can assess the

distance between natural habitats and artiﬁcial/recon-

structed green areas. In this work, seven ecoacous-

tic indices, calculated over 16 sites of Parco Nord

of Milan, Italy, have been used to predict a single

index, named the soundscape ranking index, SRI,

which has the advantage of yielding a quick overview

of the quality of environment sound. SRI is com-

puted through the optimization of the weights which

appear in its deﬁnition (Eq. 2). Each combination

of weights provides a different numerousness of the

classes (Class 1, Class 2, Class 3 corresponding to

Poor, Medium, Good soundscape quality) of the orig-

inal data-set. These imbalanced classes may repre-

sent one of the factors inﬂuencing the classiﬁcation

models. In actual situations, the use of two very sim-

ple classiﬁcation models, RF and PPN, with the orig-

inal data-set, yielded a maximum F1-score of 0.60-

0.61. We applied MC calculations to balance the

three classes for each combination of weights and

recomputed the classiﬁcation algorithms. From the

new computation, we ruled out those combinations

of weights with initial numerousness lower than 100.

This choice was justiﬁed to avoid the introduction

of artiﬁcial peaked distributions in the original data,

which once augmented would have affected the clas-

siﬁcation performance. In this way, we ended up with

the following classiﬁcation performance expressed in

terms of Accuracy (usually employed for balanced

classes classiﬁcation, but we also report the F1–score

for comparison with the previous calculations):

• RF : Accuracy = 0.75 (F1–score = 0.751),

• PPN : Accuracy = 0.746 (F1–score = 0.737).

As a future development, we envisage the use of

larger labeled datasets, that is by using additional

recordings with the corresponding aural survey, and

the application of deep neural network to develop

more efﬁcient classiﬁcations. The deﬁnition of SRI

and the threshold deﬁning the classes of sound qual-

ity will be subject of a further investigation.

References:

[1] Krause, B. The Loss of Natural Soundscapes.

Earth Island Journal, Spring 2002. (www.earth-

island.org/journal/index.php/magazine/archive)

[2] Pijanowski, B. C., Farina, A., Gage, S.

H., Dumyahn, S. L., Krause, B. L. What

is soundscape ecology? An introduction

and overview of an emerging new science.

Landscape Ecology 26, 1213–1232 (2011).

(https://doi.org/10.1007/s10980-011-9600-8).

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2023.19.85

Roberto Benocci, Andrea Potenza,

Giovanni Zambon, Andrea Afify,

H. Eduardo Roman

E-ISSN: 2224-3496

899

Volume 19, 2023

[3] Pavan, G. Fundamentals of Soundscape

Conservation. In: Farina A., Gage S.

H. (Eds.). Ecoacoustics: The Ecolog-

ical Role of Sounds, 235–258 (2017).

(https://doi.org/10.1002/9781119230724.ch14).

[4] Sethi, S. S., Jones, N. S., Fulcher, B.

D., Picinali, L., Clink, D. J., Klinck, H.,

Orme, C. D. L., Wrege, P. H., Ewers, R.

M. Characterizing soundscapes across diverse

ecosystems using a universal acoustic fea-

ture set. Proceedings of the National Academy

of Sciences 117(29), 17049–17055 (2020).

(https://doi.org/10.1073/pnas.2004702117).

[5] Lellouch, L., Pavoine, S., Jiguet, F., Glotin,

H., Sueur, J. Monitoring temporal change of

bird communities with dissimilarity acoustic in-

dices. Methods in Ecology and Evolution 5(6),

495–505 (2014). (https://doi.org/10.1111/2041-

210X.12178).

[6] Kasten, E. P., Gage, S. H., Fox, J., Joo,

W. The remote environmental assess-

ment laboratory’s acoustic library: An

archive for studying soundscape ecology.

Ecological Informatics 12, 50–67 (2012).

(https://doi.org/10.1016/j.ecoinf.2012.08.001).

[7] Pérez-Granados, C., Traba, J. Estimating

bird density using passive acoustic monitor-

ing: A review of methods and suggestions

for further research. Ibis 163, 1–19 (2021).

(https://doi.org/10.1111/ibi.12944).

[8] Shonﬁeld, J., Bayne, E. M. Autonomous

recording units in avian ecological research:

Current use and future applications. Avian

Conservation and Ecology 12(1), 14 (2017).

(https://doi.org/10.5751/ace-00974-120114).

[9] Benocci, R., Roman, H. E., Bisceglie, A., An-

gelini, F., Brambilla, G., Zambon, G. Eco-

acoustic assessment of an urban park by statisti-

cal analysis. Sustainability 13(14), 7857 (2021).

(https://doi.org/10.3390/su13147857).

[10] Benocci, R., Roman, H. E., Bisceglie, A., et al.

Auto-correlations and long time memory of envi-

ronment sound: The case of an Urban Park in the

city of Milan (Italy). Ecological Indicators 134,

108492 (2022). (https://doi.org/10.1016/j.ecol-

ind.2021.108492).

[11] Cavallari, G. B., Ribeiro, L. S., Ponti, M. A.

Unsupervised representation learning using con-

volutional and stacked auto-encoders: A do-

main and cross-domain feature space analy-

sis. In: 31st SIBGRAPI Conference on Graph-

ics, Patterns and Images (SIBGRAPI). IEEE,

440–446 (2018). (https://doi.org/10.1109/SIB-

GRAPI.2018.00063).

[12] Ponti, M. A., Ribeiro, L. S. F., Nazare, T. S.,

Bui, T., Collomosse, J. Everything you wanted

to know about deep learning for computer vi-

sion but were afraid to ask. In: 30th SIBGRAPI

Conference on Graphics, Patterns and Images

Tutorials (SIBGRAPI-T). IEEE, 17–41 (2017).

(https://doi.org/10.1109/SIBGRAPI-T.2017.12).

[13] Nunes C., Solteiro Pires E. J., Reis A. Ma-

chine Learning and Deep Learning applied

to End-of-Line Systems: A review. WSEAS

Transactions on Systems 21, 147–156 (2022).

(https://doi.org/10.37394/23202.2022.21.16)

[14] Christin, S., Hervet, É., Lecomte, N. Applica-

tions for deep learning in ecology. Methods in

Ecology and Evolution 10(10), 1632–1644

(2019). (https://doi.org/10.1111/2041-

210X.13256).

[15] Fairbrass, A. J., Firman, M., Williams, C.,

Brostow, G. J., Titheridge, H., Jones, K. E.

CityNet–Deep learning tools for urban ecoacous-

tic assessment. Methods in Ecology and Evo-

lution 10(10), 1632–1644 (2019). Methods in

Ecology and Evolution 10(2), 186–197 (2019).

(https://doi.org/10.1111/2041-210X.13114).

[16] Lin, T. H., Tsao, Y. Source separation in ecoa-

coustics: A roadmap towards versatile sound-

scape information retrieval. Remote Sensing in

Ecology and Conservation 6(3), 236–247 (2020).

(https://doi.org/10.1002/rse2.141).

[17] Navarro, J. M., Pita, A. Machine Learn-

ing Prediction of the Long-Term Environmen-

tal Acoustic Pattern of a City Location Us-

ing Short-Term Sound Pressure Level Measure-

ments. Applied Sciences 13(3), 1613 (2023).

(https://doi.org/10.3390/app13031613).

[18] Orga F., Socoró J. C., Alías F., Alsina-

Pagès R. M., Zambon G., Benocci R., Bis-

ceglie A. Anomalous noise events considerations

for the computation of road trafﬁc noise lev-

els: The DYNAMAP’s Milan case study (2017)

24th International Congress on Sound and Vi-

bration, ICSV 2017. (pdf at: http://hdl.han-

dle.net/2072/376268).

[19] Piczak, K. J. Environmental sound classiﬁcation

with convolutional neural networks. In: IEEE

25th international workshop on machine learning

for signal processing (MLSP). IEEE, 1–6 (2015).

(https://doi.org/10.1109/MLSP.2015.7324337).

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2023.19.85

Roberto Benocci, Andrea Potenza,

Giovanni Zambon, Andrea Afify,

H. Eduardo Roman

E-ISSN: 2224-3496

900

Volume 19, 2023

[20] Salamon, J., Bello, J. P. Deep convolutional

neural networks and data augmentation for

environmental sound classiﬁcation. IEEE Sig-

nal processing letters 24(3), 279–283 (2017).

(https://doi.org/10.1109/LSP.2017.2657381).

[21] Ward, J. H. Hierarchical grouping to optimize

an objective function. Journal of the American

Statistical Association 58(301), 236–244 (1963).

(https://doi.org/10.1080/01621459.1963.10500845).

[22] Ruff, Z. J., Lesmeister, D. B., Appel, C.

L., Sullivan, C. M. Workﬂow and convolu-

tional neural network for automated identiﬁca-

tion of animal sounds. Ecological Indicators 124,

107419 (2021). (https://doi.org/10.1016/j.ecol-

ind.2021.107419).

[23] Vidaña-Vila, E., Navarro, J., Stowell, D.,

Alsina-Pagès, R. M. Multilabel Acous-

tic Event Classiﬁcation Using Real-World

Urban Data and Physical Redundancy

of Sensors. Sensors 21(22), 7470 (2021).

(https://doi.org/10.3390/s21227470).

[24] Mullet, T. C., Gage, S. H., Morton, J. M.,

Huettmann, F. Temporal and spatial variation

of a winter soundscape in south-central Alaska.

Landscape Ecology 31(5), 1117–1137 (2016).

(https://doi.org/10.1007/s10980-015-0323-0).

[25] Quinn, C. A., Burns, P., Gill, G., Bali-

gar, S., Snyder, R. L., Salas, L., Goetz,

S. J., Clark, M. L. Soundscape classiﬁcation

with convolutional neural networks reveals tem-

poral and geographic patterns in ecoacoustic

data. Ecological Indicators 138, 108831 (2022).

(https://doi.org/10.1016/j.ecolind.2022.108831).

[26] Giannakopoulos, T., Siantikos, G., Perantonis,

S., Votsi, N. E. and Pantis, J. Automatic sound-

scape quality estimation using audio analysis. In:

Proceedings of the 8th ACM International Con-

ference on Pervasive Technologies Related to As-

sistive Environments, Corfu, Greece (2015), pp.

1–9. (https://doi.org/10.1145/2769493.2769501).

[27] Tsalera, E., Papadakis, A., Samarakou,

M. Monitoring, proﬁling and classiﬁca-

tion of urban environmental noise using

sound characteristics and the KNN algo-

rithm. Energy Reports 6, 223–230 (2020).

(https://doi.org/10.1016/j.egyr.2020.08.045).

[28] Pita, A., Rodriguez, F. J., Navarro, J.

M. Cluster analysis of urban acoustic envi-

ronments on Barcelona sensor network data.

International Journal of Environmental Re-

search and Public Health 18(16), 8271 (2021).

(https://doi.org/10.3390/ijerph18168271).

[29] Benocci, R., Aﬁfy, A., Potenza, A., Ro-

man, H.E., Zambon, G. Toward the deﬁni-

tion of a soundscape ranking index (SRI)

in an urban park by machine learning

techniques. Sensors 23(10), 4797 (2023).

(https://doi.org/10.3390/s23104797).

[30] Benocci, R., Aﬁfy, A., Potenza, A., Ro-

man, H.E., Zambon, G. Self-Consistent

Soundscape Ranking Index: The Case of

an Urban Park. Sensors 2023, 23, 3401.

(https://doi.org/10.3390/s23073401).

[31] Benocci, R., Potenza, A., Bisceglie, A.,

Roman, H.E., Zambon, G. Mapping of the

Acoustic Environment at an Urban Park in

the City Area of Milan, Italy, Using Very

Low-Cost Sensors. Sensors 2022, 22, 3528.

(https://doi.org/10.3390/s22093528).

[32] Python. Available at https://www.python.org/

(accessed on 05/05/2023).

[33] Tibshirani, R., Friedman, J. The Elements of

Statistical Learning: Data Mining, Inference,

and Prediction. (Trevor Hastie, Second Edition,

2009).

[34] Zell, A. Simulation Neuronaler Netze (Addison-

Wesley, Bonn 1994). (https://doc1.biblio-

thek.li/aal/FLMF007250.pdf).

[35] Sueur, J., Pavoine, S., Hamerlynck, O., Du-

vail, S. Rapid acoustic survey for biodiver-

sity appraisal. PLoS ONE 3(12), e4065 (2008).

(https://doi.org/10.1371/journal.pone.0004065).

[36] Pieretti, N., Farina, A., Morri, D. A new

methodology to infer the singing activity of

an avian community: The Acoustic Complex-

ity Index (ACI). Ecological Indicators 11(3),

868–873 (2011). (https://doi.org/10.1016/j.ecol-

ind.2010.11.005).

[37] Grey, J. M., Gordon, J. W. Perceptual ef-

fects of spectral modiﬁcations on musical

timbres. The Journal of the Acoustical So-

ciety of America 63(5), 1493–1500 (1978).

(https://doi.org/10.1121/1.381843).

[38] Boelman, N. T., Asner, G. P., Hart, P. J., Mar-

tin, R. E. Multitrophic invasion resistance in

hawaii: Bioacoustics, ﬁeld surveys, and airborne

remote sensing. Ecological Applications 17(8),

2137–2144 (2007). (https://doi.org/10.1890/07-

0004.1).

[39] Yang, W., Kang, J. Soundscape and

sound preferences in urban squares:

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2023.19.85

Roberto Benocci, Andrea Potenza,

Giovanni Zambon, Andrea Afify,

H. Eduardo Roman

E-ISSN: 2224-3496

901

Volume 19, 2023

A case study in Shefﬁeld. Journal of

Urban Design 10(1), 61–80 (2005).

(https://doi.org/10.1080/13574800500062395).

[40] R Core Team. R: A Language and Environ-

ment for Statistical Computing. R Foundation for

Statistical Computing: Vienna, Austria, 2018.

(Available online: https://www.R-project.org/.

(accessed on 05/05/2023).

[41] Seewave: Sound Analysis and Synthesis. Avail-

able online: https://cran.r-project.org/web/pack-

ages/seewave/index.html. (accessed on

05/05/2023).

[42] Soundecology: Soundscape Ecology. Available

online: https://cran.r-project.org/web/pack-

ages/soundecology/index.html. (accessed on

05/05/2023).

Contribution of Individual Authors to the

Creation of a Scientiﬁc Article (Ghostwriting

Policy)

R.B. conceptualized the study, carried out the M.C.

simulation and wrote the original draft preparation;

A.P. organized the data sets and produced map rep-

resentations; G.Z. supervised the research; A.A. im-

plemented the M.L. software; H.E.R. reviewed and

edited the manuscript.

Sources of Funding for Research Presented in a

Scientiﬁc Article or Scientiﬁc Article Itself:

No funding was received for conducting this study.

Conﬂicts of Interest

The authors have no conﬂicts of interest to

declare that are relevant to the content of this

article.

Creative Commons Attribution License 4.0

(Attribution 4.0 International , CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT

DOI: 10.37394/232015.2023.19.85

Roberto Benocci, Andrea Potenza,

Giovanni Zambon, Andrea Afify,

H. Eduardo Roman

E-ISSN: 2224-3496

902

Volume 19, 2023