Framework for Posture and Face Recognition using Kinect an ambient-

intelligence method

LIVIU VLADUTU, COSMIN MARIAN

Faculty of Automatic Control and Computer Science

University “Politehnica” of Bucharest

Address Splaiul Independentei no. 313

Sector 6, Bucureşti, 060042

ROMANIA

Abstract: - This electronic document describes one of the first full open-source frameworks for posture and face

detection and classification using Kinect sensors, based entirely on an open-sourced software platform. It has

been designed with the possibility of automatic monitoring of either an elderly person who is situated in a

closed space (a room with furniture, appliances, and other rigid objects), or a patient who suffers from a

physiological disorder. It is also based on an original hardware platform (the AmI-Platform), which includes

more Kinect sensors, data acquisition stations, and crunches (data servers).

The articles tackle two of the most important issues for ambient intelligence: accurate determination of the

person and his or her facial landmarks (to understand emotions) and fast and reliable posture determination (to

take appropriate measures in case of an accident).

Key-Words: - face detection and recognition, postures classification, Kinect, Ambient-Intelligence, Open

Source, Ambient Assisted Living, sensor networks, facial landmarks.

Received: February 26, 2022. Revised: November 22, 2022. Accepted: January 15, 2023. Published: February 22, 2023.

1 Introduction

The problem that determined us to start this project

from its inception was the context of the aging of

Europe (aka the graying of Europe). There are 3

main factors that influence this phenomenon:

the higher life expectancy, the decrease in

fertility, and a certain decrease in the mortality

rate (life expectancy has increased all over

Europe, Japan, and North America). We

designed a data pipeline for online acquisition,

processing, and storage of localization and

tracking of an older person in a room, as

previously described in [3] and [1], in order to

deal with an increased population of elder

inhabitants and the need for constant care and

supervision (for example, to avoid unwanted

falls). As input devices, we used MS Kinect

360, the first generation of motion-sensing input

devices developed by Microsoft.

Our context-awareness framework (we called it

AmI-Lab) is applied to a major problem in

ambient intelligence (from now on, AmI),

namely ambient assisted living (a.k.a. AAL).

The European Commission’s Information

Society Technologies Advisory Group

(ISTAG), see [5], is proactively supporting

people in their daily activities. With more

support in the last 10 years from the European

Commission (see AAL Projects in the

Reference Section), we aim to improve the

quality of life of the elderly in their homes by

increasing their autonomy, assisting them in

their daily activities, and letting them feel more

secure, protected, and supported. Our current

approach continues our previous work in

providing the AmI site with support for online

data acquisition and multi-sensor fusion [1], [3],

and will be completed with a comprehensive

activity recognition framework—ubiquitous

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.3

Liviu Vladutu, Cosmin Marian

E-ISSN: 2224-2872

13

Volume 22, 2023

supervision adapted for people with special

needs (older people, persons with disabilities),

from a clinical and non-clinical perspective.

Typical emotion expressions are seen in various

psychiatric conditions, including schizophrenia,

depression, and autism spectrum conditions [2],

so another target population for this project is

represented by people with neurological

disorders. The authors conclude that "emotions

can be expressed in the face, gesture, posture,

voice, and behavior and affect physiological

parameters such as the heart rate or the body

temperature."

Our paper is organized into the following nine

main chapters: The second chapter briefly

introduces the organization of the Ambient

Intelligence Laboratory from the Faculty of

Automation and Computers at the Polytechnic

University of Bucharest, showing the physical

structure of the laboratory and the main

components of the data-acquisition pipeline.

also briefly mentions the structure of the

pipeline.

Chapter 3 shows our approach to posture

representation and the components (segment

lines, segment-to-segment angles, and segment-

to-plane ones) that are used as features to assert

the posture of the persons that are "seen" by the

Kinect sensors.

Chapter 4 continues with a summary

introduction of our proprietary skeleton viewer,

written in Scala, a new programming language

that smoothly integrates features of object-

oriented and functional languages, enabling

Java and other programmers to be more

productive.

Chapter 5 follows with a detailed description of

how we designed the experiments for both

posture and faces recognition.

Chapter 6 introduces the basics of modern face

detection and recognition algorithms in both

images and video frames and their relation to

the AmI-Lab platform.

The first part of chapter 7, "Experimental

Results," emphasizes the labeling procedure

we’ve used and the main 4 classification

procedures (SVM, random forests, decision,

and extra trees), which use an open-source

toolkit, Scikit-learn, an ensemble of simple and

efficient tools for data mining and data analysis

built on Python, NumPy, SciPy, and Matplotlib.

Further on, the chapter concludes with figures

showing the performance of the face

recognition and detection algorithms.

Chapter 8 discusses related work (such as the

state of the art in the last 20 years) for

classification and detection of postures and

faces.

Finally, chapter 9, together with conclusions

and further milestones for the continuation of

this project, aims mainly at tighter integration

with the AmI-Platform for online skeleton

classification and to show other researchers the

importance of IoT, smart cities, and sensor

network projects.

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.3

Liviu Vladutu, Cosmin Marian

E-ISSN: 2224-2872

14

Volume 22, 2023

There were 2 distinct phases: 2.1. the hardware

setup of the laboratory, including wiring,

computers, cabled and wireless networks and

2.2. the design of the data pipeline and

toolboxes.

2.1 The Ambient-Intelligence Laboratory

(AmI-Lab)

The experiments described in this paper were

carried out at the Polytechnic University of

Bucharest's Ambient Intelligence Laboratory

(AmI-Lab). The laboratory contains both a

hardware platform and a software platform [15],

as extensively presented elsewhere: [1]. The

hardware platform consists of nine Kinect

cameras placed around the room, data

acquisition computers, and network and data

processing computers.

2.2 The AmI-Platform

The AmI Software Platform is a distributed

platform architecture that acts as a data pipeline

with measurements coming from sensors

(Kinect cameras, Arduino sensor boards, etc.)

connected to the data-acquisition computers.

2.2.1 Database

The data from sensors is sent to acquisition

units for long- and short-term storage and to

processing units. We used this platform to

easily record experiments and retrieve the

collected data in a format that is easy to read

and process. The database we used is

MongoDB (a NoSQL DBMS), and for

communication between the PDUs (Processing

Data Units), we are using Kestrel, a distributed

messaging queue system open-sourced by

Twitter. The architecture is presented in Figure

1.

2.2.2 The tracking system from Data pipeline

As clearly detailed in [1], the software that

processes this data is organized as a pipeline

made up of individual PDUs (processing data

units) that communicate through a system of

distributed queues. There are two types of

PDUs: data acquisition PDUs, which poll the

sensors for new data and record these

Figure 1: The architecture of the implemented framework.

2 Problem Formulation

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.3

Liviu Vladutu, Cosmin Marian

E-ISSN: 2224-2872

15

Volume 22, 2023

measurements into a database of trajectories,

and data crunching PDUs, which use face

recognition and voice recognition webservices

and the recorded data to generate automatic

examples by correlating spatio-temporal

information. The software is installed on two

types of processing nodes: data acquisition

nodes (running data acquisition PDUs and the

database engine) and data crunching PDUs

(running detection and correlation algorithms).

3. Problem of Posture representation

For the purpose of posture recognition, we

have chosen to use schematic skeleton data

derived from the Kinect depth map images [6]

by the NITE/OpenNI middleware. The skeleton

provided consists of 15 3D joint positions (e.g.,

head, neck, right/left foot), forming 16 relevant

segments between them (e.g., left/right arm,

left/right leg) as described in Fig. 2. For each

joint, the X, Y, and Z coordinates are provided

in millimeters, relative to the camera coordinate

reference system, with origins in the center of

the camera and the Z axis pointing out of it.

In order to represent a posture using these data,

we have devised a list of features that we can

compute based on the joint positions. The

features chosen can be separated into two

classes: angles between segments and angles

between segments and the horizontal plane.

Below, we give the list of angles we have

selected. Table 1 displays the 15 features of

type angle between segment and segment. Table

2 displays a list of three features of the type of

angle between segment and plane. Taking into

account all features, we arrive at 19 features.

3.1. Purpose of framework for posture

representation

The purpose of our framework is to use this set

of features and supervised learning to

automatically recognize a set of postures. This

framework is independent of the set of postures

considered, allowing us to use it in a larger

range of possible applications where posture

recognition from Kinect skeletons is needed.

For the purpose of evaluating this framework,

we have selected a list of 10 postures that are

distinguished by the relative positions of the

two hands.

3.2. Features used for posture representation

• Right Hand down, Left Hand down (RH-

D_LH_D) in Figure 4

• Right Hand lateral, Left Hand down (RH-

L_LH_D)

• Right Hand Front, Left Hand down (RH-

F_LH_D)

• Right Hand Up, Left Hand down (RH-U_LH_D)

• Right Hand down, Left Hand up, (RH-D_LH_U)

• Right Hand-L, Left Hand up (RH-L_LH_U)

• Right Hand-L, Left Hand-L (RH-L_LH_L)

• Right Hand front, Left Hand front (RH-F_LH_F)

• Right Hand up, Left Hand front (RH-U_LH_F)

• Right Hand up, Left Hand up (RH-U_LH_U)

Figure 2. The distribution of the 15 3D joint

positions used for acquisition

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.3

Liviu Vladutu, Cosmin Marian

E-ISSN: 2224-2872

16

Volume 22, 2023

4 Posture visualization & annotation

To help us visualize the collected data and

annotate it in order to run the supervised

learning algorithms, we have built a

visualization tool using the Scala programming

language [7], which runs on the Java platform,

and using the Swing UI toolkit and OpenGL for

3D visualization of the skeleton. A screenshot

of the window of the application is presented in

Figure 3. The user can load a file of data

consisting of measurements in JSON format

([1] and [8]) with 3D skeleton data and RGB

data captured by the video camera. At the

bottom of the window, there is a slider that

allows the user to go from frame to frame and

inspect and annotate the data.

The window contains a 3D visualization area

where the user can rotate the skeleton in 3D in

order to examine it, a RGB view that shows the

image from the Kinect camera with the skeleton

projected in 2D and superimposed over it, and

an area that contains buttons with the

considered postures. By pressing the

corresponding buttons, the user annotates a

certain skeleton to have that posture. The

posture list is loaded from a text file, making

the application itself independent of the

considered set of postures. In the bottom left

corner, the user has a list of the features

considered, showing the value of each for the

current selected frame. He can select certain

features in order to display a color-coded map

of the value of that feature under the slider (see

Figure 3).

Once the relevant postures have been annotated,

the user can save the postures along with their

label in a file that can be loaded later or used in

another application. This application proved

very useful to us both in terms of easing the

annotation process and helping us better

understand the data.

5. Design of the experiment

The experiment was run in the Ambient

Intelligence Laboratory of the University

"Polytechnic" of Bucharest, Faculty of

Automatic Control and Computer Science. The

data was collected in several sessions. In each

session, the subject positioned himself in the

viewing field of a Kinect camera at variable

angles (0, 45, 90, 135, 180 degrees) from the

direction in which the camera was pointing and

switched from posture to posture. The data

provided by the Kinect, including the 3D

skeleton, the 2D skeleton, and the RGB data

with timestamps, was saved in several files [1].

The files contained on each line a measurement

in JSON format with fields describing the

measurement type (image_rgb or skeleton), the

timestamp, and corresponding data (e.g., for

image_rgb, the resolution and the compressed

Figure 3. Screenshot of the viewer

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.3

Liviu Vladutu, Cosmin Marian

E-ISSN: 2224-2872

17

Volume 22, 2023

image as JPEG with the Base64 encoding, and

for skeleton, the 3D and 2D positions of joints).

In total, we have collected 9 raw data files

consisting of 3867 measurements, of which

3736 were skeleton data and 104 were RGB

images, totaling 69.3 MB. The approach we

used is also compatible with other Scala-Spark

big data frameworks and the latest data

processing pipelines.

The next step was to load each data file in the

viewer application and annotate the relevant

skeletons with the correct postures. It is

important to note that some postures were not

considered to match any of the postures in our

set, so they were not annotated. The annotated

data was saved to files and merged into one

single file. Each line of the file contained a

labeled measurement in the JSON format

containing the value of each feature, the label

associated with it, and the timestamp. A total of

2527 postures were annotated in a file of 1.6

MB. The next step was to write a Python script

that loads the data and uses the scikit-learn

Python toolkit in order to run several learning

algorithms on the labeled data and evaluate

their performances using cross validation [9]

with a split of 75% of the data for training and

25% of the data for validation. Scikit-learn is an

open-source, simple, and efficient tool for data

mining and data analysis built on NumPy,

SciPy, and Matplotlib; see [10], [11].

We have found this tool very easy to use for the

tasks we face, allowing us to write only 105

lines of Python code for loading the data,

applying and validating the learning algorithms

on the data, and displaying the results. The

Scala code for our viewer will be uploaded to

the folder "viewer-scala" in the repo [15].

The use of a supervised learning technique

allowed us to reach our end goal of having a

framework for automatically learning postures

by just giving examples of skeletons that have a

certain posture from a given set of postures.

6. Face detection and recognition

implementation

Face detection and recognition is of primordial

importance for us since the events that are about

to occur (like falling) or just happened are to be

observed not only in a posture alteration but

also in how the facial landmarks are suddenly

modified (in a grimace), showing both pain and

surprise as well as anger. We used the 68-point

mappings (for mouth, eyes, and eyebrows) that

were obtained by training a shape predictor on

the labeled HYPERLINK

"https://ibug.doc.ic.ac.uk/resources/facial-point-

annotations/" iBUG 300-W dataset [12].

One can find in the literature books and

tutorials on computer vision using OpenCV, the

powerful library that we used in conjunction

with dlib, numpy, pickle (for serialization), and

face recognition developed using machine

learning and OpenCV; however, I do strongly

recommend the books by [13], [13], and [17].

Figure 5: All the students participating in

the Face detection & recognition

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.3

Liviu Vladutu, Cosmin Marian

E-ISSN: 2224-2872

18

Volume 22, 2023

Algorithms work quite fast, and for

recognition we used >320 images with

approximately 175 for testing. There were 8

classes for face classification: the 6 students

from Figure 5, "first_author" (FA in Table 9),

and a class "unknown," where we tested some

faces from the "lfw" database ([16]; see

References).

The classification results (97%) are promising,

meaning that the algorithms need to be further

studied and faster implementations are to be

written in C++ (see last chapter), using

OpenCV and gcc (work in progress). On all

acquisition stations, we’ve used OpenCV

(version 3.4.3) and Ubuntu 18.04 and 22.04

(LTS) as operating systems. The code for face

detection and a new folder, "FaceRecognition,"

have been uploaded to GitHub in the

"FaceDetect" folder [15].

That folder also contains the code for

recognition in frames and videos, along with

some of the bash scripts for checking the data

and installing OpenCV.

The code also works for collections of frames

(video-sequences) but we don’t enter into this

right now due to the lack of space.

7. Experimental Results

As we already mentioned, from 3736 postures

gathered we annotated 2527 postures with the

following labels (see Table 3). A number of

1209 postures were found not to match any of

the classes. For the purpose of supervised

learning, we have chosen to test four learning

algorithms already implemented in scikit-learn.

For each algorithm, we computed a score given

by the ratio of the correctly classified examples

from the validation set.

Another measurement we performed was to

build and graphically display a confusion

matrix [19]. The confusion matrix shows which

postures were marked as other postures and

how many times.

Below, we present the results we got from the

four algorithms:

Support Vector Machines-Hearst (1998); For

this algorithm, we obtained a score of 0.875.

From the selected algorithms, this was the one

having the worst performances.

The results are presented in Table 4.

Random Forest Classifier: This algorithm

obtained a score of 0.96, making it one of the

best-performing algorithms from the tested

algorithms for our problem. The results are in

Table 5.

Extra Tree Classifier: This algorithm had a

score of 0.958, having a very close performance

to the one obtained by the Random Forest

Classifier. Results are in Table 6.

Decision Tree Classifier: The score obtained

by this classifier was 0.93 (Table 7).

Figure 4. Decision Tree Classifier for postures

(confusion matrix)-based on info in Table 5.

Figure 4 above shows a screen capture of the

confusion matrix, yet another way to quantify

the classification errors. Looking at the

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.3

Liviu Vladutu, Cosmin Marian

E-ISSN: 2224-2872

19

Volume 22, 2023

performance of the results, we can conclude

that we obtained good results for the framework

we proposed and for the relatively short list of

postures we considered. Even if the list of

postures considered was short, we can notice

that the postures themselves were very similar

to each other, yet we obtained prediction scores

that were over 95%.

Comparing the four algorithms, we can notice

that the ensemble learning methods [21] had

very good performance, being a good match for

the problem and the set of features we have. To

conclude this section, we present a table with

the performance of the four algorithms tested

(Table 8).

8. Related Work

Since the first introduction of the first-

generation Kinect, back in November 2010,

there have been many research groups dealing

with the problem of body part posture

classification using the schematic skeleton

representation returned by a dedicated chip

(within the Kinect sensor) for the range camera

developed by Primesense, an Israeli developer.

We are using the simplified expression "Kinect

sensor," although in fact it represents a set of

integrated sensors for range, visible light,

sound, and infrared light.

The main applications for posture classification

are:

1. human gesture recognition for AAL (ambient

assisted living) scope (our aim);

2. sign-language understanding (using mainly

hand-shape recognition), [22], and [23];

3. posture analysis for ergonomic training [24];

Apart from a previous work of our group [3],

we are aware of some other similar recent

works in kinect-based posture classification for

AAL and senior monitoring applications, i.e.,

[25] and the EvAAL competition (2013). While

one approach focuses on event detection

(falling) when transitioning from sitting to

standing and vice versa [25], the other primarily

summarizes a competition of fast-setups (60

minutes) for indoor localization and tracking

[26]. We have described a general framework

for posture analysis (a complete set based on

proprietary code for the pipeline of data

acquisition, storage, and analysis) that can be

easily extended to more postures and greater

completeness.

While [27] only works with displaying a

human skeletal joint model and using a LED

cube, [28] analyzes more postures (22 of them),

including leg postures, and uses PCA as feature

extraction for a SVM classifier. Our work is, as

far as we know, the only one that offers a

complete kit of solutions (face and posture

acquisition and classification) for human

subjects that might need constant monitoring.

9. Conclusions & future work

At this moment, all the classification is of the

type "supervised," meaning all the data for

classification (i.e., matrices of angles for a

posture) are known to which category they

belong.

Since we have a large number of postures to

classify, we used cross-validation in order to

check how well the classifier can generalize

when dealing with an independent set.

The main contributions of this work to the

scientific community are:

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.3

Liviu Vladutu, Cosmin Marian

E-ISSN: 2224-2872

20

Volume 22, 2023

A. the design of a proprietary, flexible pipeline

for the acquisition and classification of human

postures; we haven’t used any commercial

software whatsoever; all the code was designed

by the authors using diverse programming

languages (see above) or taken from the open-

source community (from GitHub or SVN);

B. The classifier is easily extensible to include

more families of postures (such as lying down

with different hand and leg positions) or being

bent sideways (with different hand positions). It

all depends on the scenario that can be used for

experiments. We have recorded all the

experiments in a laboratory without any support

to attenuate a (simulated) fall. These are the

main conclusions about the importance of

AmI-based elderly monitoring:

 Mobility monitoring: The system

can monitor the movement and physical

activity of the elderly, helping to detect

any changes in their behavior and alert

caretakers if necessary. The desktops for

acquiring signals can be replaced by

cheaper tablets (see previous experience,

[29]) or laptops or smartphones;

 Ease of Use: cheap sensors (like Kinect)

were designed to be user-friendly,

making it easier for the elderly to use

and interact with the system. This can be

especially important for seniors, who

may have limited technical skills or

mobility.

 Continuous Monitoring: The use of

sensors and other devices can allow for

continuous monitoring of the elderly,

providing a real-time understanding of

their well-being and allowing for quick

intervention if necessary.

 Improved Safety: By using AmI in

conjunction with commodity hardware,

elderly individuals can enjoy a safer and

more secure environment with features

like fall detection, emergency response

systems, and real-time monitoring.

 Increased Independence: AmI systems

can help the elderly maintain their

independence and quality of life by

enabling them to continue living in their

own homes for longer periods of time.

This can also reduce the burden on

caregivers and nursing homes.

 Indoor localization: The system can

provide real-time information about the

location of the elderly within their home,

helping caretakers respond quickly if

needed.

Also as future work:

a) Systems that think like humans in

interpreting ambient information (posture, face

expression, sound, and trajectories) to

understand a subject's mood and

health/behavior in real-time.

b) Systems that act like humans in taking the

appropriate, immediate actions as a response to

a) above.

c) We aim to use this framework to create

different "real-life like" scenarios (in context

with its awareness framework), which can be

linked to our recent IoT project;

d) Another idea is to try dimensionality

reduction before classification using Principal

Components Analysis (PCA) or Independent

Component Analysis; this system can be

leveraged on top of an existing marketplace in

order to allow researchers from around the

world to share sensor recordings among them;

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.3

Liviu Vladutu, Cosmin Marian

E-ISSN: 2224-2872

21

Volume 22, 2023

• We have chosen Figshare

(www.figshare.com) as the marketplace to share

our sensor with other researchers from the

global community so that they can test their

data analysis and classification algorithms and

codes as well.

• A version of the algorithms for face

detection & recognition is in the works using

gcc (versions 7.3, 8 and above as C++

compilers) instead of Python;

A new paper might include a modern approach

for real-time detection and classification of

objects using deep learning.

Table 3

[THE 1209 MISCLASSIFIED POSTURES]

Posture’s Index

Number of

Misclassified

Significance of the posture

P1

718

R-hand Down-Left hand Down

P2

269

R-hand Down-Left hand Front

P3

251

R-hand Down-Left hand Lateral

P4

230

R-hand Down-Left hand Up

P5

149

R-hand-Front-Left hand Down

P6

185

R-hand Front-Left hand Front

P7

186

R-hand Lateral-Left hand Down

P8

219

R-hand Lateral-Left hand

Lateral

P9

168

R-hand Up-Left hand Down

P10

152

R-hand Up-Left hand Up

Table 4

[RESULTS OBTAINED USING SVM CLASSIFIER]

P1

P2

P

3

P4

P5

P6

P7

P8

P9

P1

0

P1

166

0

1

0

P2

7

28

2

0

10

0

P3

3

0

22

1

0

P4

0

2

34

0

3

0

P5

7

4

0

62

0

P6

8

0

57

1

0

P7

0

5

0

8

52

0

P8

5

2

0

1

0

45

0

P9

3

0

1

0

47

0

P10

0

4

40

Table 5

[RESULTS OBTAINED USING RANDOM FORREST CLASSIFIER]

P1

P2

P3

P4

P5

P6

P7

P8

P9

P1

0

P1.

164

2

0

1

0

P2.

1

46

0

10

0

P3.

1

2

22

1

0

P4.

0

1

37

0

1

3

0

P5.

0

73

0

P6.

2

0

60

2

1

0

1

P7.

0

2

63

0

P8.

3

0

50

0

P9.

1

0

1

0

49

0

P10.

0

1

43

Table 6

[RESULTS OBTAINED USING EXTRA TREE CLASSIFIER]

P1

P2

P3

P4

P5

P6

P7

P8

P9

P10

P1.

165

0

1

0

1

0

P2.

0

47

2

0

10

0

P3.

2

0

23

1

0

P4.

0

2

36

0

1

0

P5.

1

0

72

0

P6.

4

0

60

1

0

1

P7.

0

2

63

0

P8.

2

0

51

0

P9.

2

0

1

48

0

P10

0

3

41

Table 7

[RESULTS OBTAINED USING DECISION TREE CLASSIFIER]

P1

P2

P3

P4

P5

P6

P7

P8

P9

P10

P1.

162

2

0

4

1

0

P2.

2

43

1

0

1

0

P3.

2

22

1

0

P4.

0

4

35

0

1

3

0

P5.

0

70

0

3

0

P6.

3

0

2

57

2

0

1

P7.

0

1

62

0

P8.

2

1

0

50

0

P9.

2

0

1

45

0

P10

0

1

43

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.3

Liviu Vladutu, Cosmin Marian

E-ISSN: 2224-2872

22

Volume 22, 2023

Table 8

[Performance of all the 4 classification algorithms for postures]

Algorithm

Index

Type

Classification

precision

1.

Support Vector Machines

87.5%

2.

Random Forrest

96%

3.

Extra-tree classifier

95.8%

4.

Decision-tree classifier

93%

Acknowledgment:

The authors would like to express their

gratitude to the European Commission for the

support of the ERRIC FP7 project.

This project insured the financial support not only

for the hardware equipment from within the AmI-

Lab but also for the authors monthly scholarships.

Last, but not least, we would like to express our

gratitude to Prof. Adina Florea (director of the

project), Andrei Ismail, Diana Tatu, Cosmin-

Gabriel Samoila, Denis Ilie and Mihai Trascau for

the important contribution they made in the design

of AmI-Lab project and for helping us with the

experiments.

References:

[1] Ismail A. Marian C. and Florea A., “A

Framework for Consistent Experimentation

in AmI”, AgTAmI 2013 workshop, The 19th

International Conference on Control

Systems and Computer Science (CSCS), 2013.

[2] Grabowski K, Rynkiewicz A. et.al, “Emotional

expression in psychiatric conditions: New

technology for clinicians” in Psychiatry and

Clinical Neurosciences"- Japanese Society of

Psychiatry and Neurology, 2018.

[3] Ismail A., Florea A. “Multimodal Indoor

Tracking of a Single Elder in an AAL

Environment”, Ambient Intelligence-Software

and Applications: 4th International Symposium

on Ambient Intelligence (ISAmI 2013-

Advances in Intelligent Systems and

Computing), Heidelberg, Springer-Verlag.

[4] Ten S., Mirror Image (2010), “How Kinect

depth sensor works – stereo triangulation?”,

https://mirror2image.wordpress.com/2010/11/3

0/how-kinect-works-stereo-triangulation/

[5] European Commission, “I. A. Group. Scenarios

for ambient intelligence in 2010”, Ambient

Assisted Living (AAL) Projects:

http://www.aal-europe.eu/projects-main/

[6] Shotton J, Fitzgibbon A et. al. “Real-Time

Human Pose Recognition in Parts from Single

Depth Images”, IEEE CVPR, pp. 1297-1304

2011

[7] Odersky M, Spoon L and Venners B.

Programming in Scala: Updated for Scala 2.12

3rd Edition, Walnut Creek California, Artima

Press, 2016.

[8] Crockford, D, “The application/json media

type for javascript object notation

(JSON)”,RFC 4627, Network Working Group,

2006.

[9] Kohavi R, "A study of cross-validation and

bootstrap for accuracy estimation and model

selection." IJCAI. Vol. 14. No. 2. pp. 1137-

1143, 1995.

[10] Pedregosa F. et al, “Scikit-learn: Machine

Learning in Python”, The Journal of Machine

Learning Research 12, pp. 2825-2830, 2011.

[11] Cournapeau D., Scikit-Learn repo: http://scikit-

learn.org

[12] Sagonas, C., Antonakos E., Tzimiropoulos G

et al. “300 faces In-the-wild challenge:

Database and results”. Image and Vision

Computing (IMAVIS), Special Issue on Facial

Landmark Localisation "In-The-Wild”, 2016.

[13] Rosebrock, A.”Face Alignment with OpenCV

and Python”, PyImageSearch, 2017.

[14] Rosebrock, “Deep Learning for Computer

Vision with Python, Practitioner Bundle, 1st

Edition (1.2.2)”, PyImageSearch, 2017.

[15] Vladutu L., Marian C., Ismail A., Trascau M.

https://github.com/vliviu

[16] Huang Gary B, “Labeled Faces in the Wild”:

http://vis-www.cs.umass.edu/lfw/

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.3

Liviu Vladutu, Cosmin Marian

E-ISSN: 2224-2872

23

Volume 22, 2023

[17] Kaehler A, Bradski G, Learning OpenCV 3:

Computer Vision in C++ with the OpenCV

Library, O’Reilly Media, 2017.

[18] Rosebrock, A.”Practical Python and OpenCV”,

PyImageSearch, 2016.

[19] Stehman, S. V., "Selecting and interpreting

measures of thematic classification accuracy"

Remote sensing of Environment 62.1, pp. 77-

89, 1997.

[20] Hearst MA, "Support vector machines.",

Intelligent Systems and their Applications,

IEEE 13.4: pp. 18-28, 1998.

[21] Dietterich, T. G., "Ensemble methods in

machine learning", in Multiple classifier

systems. Berlin Heidelberg, Springer pp. 1-15,

2000.

[22] Keskin C., Kıraç F., Kara, Y. E., Akarun

L.,“Real Time Hand Pose Estimation Using

Depth Sensors”, Advances in Computer Vision

and Pattern Recognition, pp 119-137, 2013.

[23] Pugeault N., Bowden R., “Spelling It Out:

Real–Time ASL Finger-spelling Recognition”,

(ICCV 2011 Workshops) IEEE International

Conference on Computer Vision, pp. 1114-

1119, 2011.

[24] Raya S. & Teizer J.,“Real-time construction

worker posture analysis for ergonomics

training”, Advanced Engineering

Information, Vol. 26, Issue 2, pp. 439-455.

2012.

[25] Parajuli, M., Dat Tran, Wanli Ma &

Sharma, D., “Senior health monitoring

using Kinect”, using Kinect, IEEE Fourth

Int’l Conference on Communications &

Electronics (ICCE), 2012.

[26] EvAAL, Evaluation of localization and activity

recognition systems for ambient assisted living:

The experience of the 2012 EvAAL

competition, Journal of Ambient-intelligence

and Smart Environments, 5,pp.119 – 132, 2013.

[27] Zheng X. 3D Human Postures Recognition

Using Kinect, Conference: Proceedings of

the 2012 4th International Conference on

Intelligent Human-Machine Systems and

Cybernetics-Volume 01, pp. 344 – 347,

2012.

[28] Zhang Z., Liu Y., Li A., Wang M.”A novel

method for user-defined human posture

recognition using Kinect”, 7th IEEE

International Congress on Image and Signal

Processing, pp. 736 -740, 2014.

[29] Albu F, Hagiescu D, Puica M, Vladutu L,”

Intelligent tutor for first grade children’s

handwriting application”, Madrid, 2nd-4th of

March, 2015. 1 (INTED2015 ISBN: 978-84-

606-5761-3), pp. 3708-3717.

Contribution of Individual Authors &

collaborators to the Creation of a Scientific

Article (Ghostwriting Policy)

Liviu Vladutu (LV), Cosmin Marian (CM) and

Andrei Ismail carried out the programs design,

implementation and optimization, experiments for

the data pipeline. LV has drafted and corrected the

paper.

Adina Florea was the Director of the project,

supervising all phases.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

The project was financed by the ERRIC FP7

project, number 264207, FP7-REGPOT-2010-1

(http://www.erric.eu)

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2023.22.3

Liviu Vladutu, Cosmin Marian

E-ISSN: 2224-2872

24

Volume 22, 2023

Conflict of Interest

The authors have no conflicts of interest to declare

that are relevant to the content of this article.