Applying Compiling Techniques in DTD Conversion

SOUHEIL TAWK1, MAROUN ABI ASSAF *2, JOSEPH CONSTANTIN1, KABLAN BARBAR1,

JACQUES BOU ABDO3

1LaRRIS, Faculty of Sciences, Lebanese University, Fanar, LEBANON

2Department of Computer Science and Information Technology,

Holy Spirit University of Kaslik, Jounieh, LEBANON

3School of Information Technology, University of Cincinnati, USA

*Corresponding Author

maroun.abiassaf@usek.edu.lb

Abstract: In this paper, we propose to apply compilation techniques to Documents Type Definition (DTD). At

the syntactic level, we create a context free grammar, denoted GDT D , for the set of all DTDs in order to generate

them. The semantic level is defined by using attributes. Each nonterminal of GDT D is assigned a set of attributes

and for each production of GDT D a set of attribute equations is defined. This DTD compilation process can

be used in different domains. In this article, we provide an attribute grammar that can be used to eliminate

unnecessary parentheses in element declarations in a DTD. In addition, the DTD compilation process can be

applied to the domain of model conversion. It makes it possible to split the model conversion into two steps: the

syntactic transformation step, which consists of rewriting the source DTD in terms of XML syntax, and the second

step of the effective conversion algorithm, which can be easily expressed by an Extensible Stylesheet Language

Transformation (XSLT).

Key-Words: Attributed Grammar, Compiler Design, Computation Theory, DTD, Models Conversion

Received: March 14, 2024. Revised: September 13, 2024. Accepted: November 13, 2024. Published: December 20, 2024.

1 Introduction

DTDs are textual models for the structure of Exten-

sible Markup Language (XML) documents which as

well as used for data exchange over the Internet, [1].

They are integrated into XML documents as well as

converted into relational models of databases. This is

because the Structured Query Language (SQL) lan-

guage offers a very powerful way of extracting data

from databases, [2]. It is important to know that we

store data in relational databases if we are interested

in data manipulation. If we need to exchange or trans-

form data over the Internet, we store it in XML docu-

ments.

Many mapping algorithms have been proposed in the

literature for the problem of mapping DTDs to rela-

tional schemas, [3], [4], [5], [6], [7], [8], [9], [10],

[11], [12], [13]. Some of these algorithms did not con-

sider the semantics of DTDs during the mapping pro-

cess, so the resulting relational schema loses the orig-

inal DTD information, [3], [4], [5]. In [6], the authors

use a table and object relational mapping. The ta-

ble based mapping cannot handle mixed content, and

the object based mapping of mixed content is ineffi-

cient. Therefore, these two models are poorly suited

for centered documents. Other mapping algorithms

preserved the cardinality constraints, [7], [8]. How-

ever, they did not take functional dependencies via

DTDs into account. In [9], the authors developed a

mapping algorithm that preserves both the structure

of DTDs and the semantics implied by the constraints

of DTDs, but the multivalued dependencies are not

preserved by the mapping algorithm. The mapping

algorithm in [10], was developed to map DTDs to a

relational schema that preserves not only the content

and structure, but also the semantics of the original

XML documents. A hybrid inline algorithm was pro-

posed to map DTDs to relational schemas while pre-

serving functional dependencies, [11]. In [12], the au-

thors defined the dependency relationships in XML

documents and, based on these relationships, created

a set of rules for mapping the XML document to a

relational database. In [13], the authors described a

technique for mapping a specific DTD to a relational

schema. This technique consisted of converting the

DTD to an Xschema, simplifying the Xschema con-

straints, and mapping the collection types.

Unfortunately, all studies focus on the conversion al-

gorithms and ignore how to read the DTD source

and divide it into small units for the conversion al-

gorithms. We believe that our approach is the first

to focus on applying of syntatic and semantic analy-

sis of the compilation process to DTDs. In fact, the

concept of attributed grammar has already been ap-

plied in various research area, [14], [15], [16]. In [14],

the author introduces the concept of attributed gram-

mar for compiler description as a declarative way by

associating attributes with nonterminals and attribute

equations with productions. In logic programming,

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2024.21.51

Souheil Tawk, Maroun Abi Assaf,

Joseph Constantin, Kablan Barbar,

Jacques Bou Abdo

E-ISSN: 2224-3402

547

Volume 21, 2024

the authors of [15], consider an abstract substitution

as an attribute attached to the nodes of a tree, and

then the propagation process of abstract substitutions

through the tree can be expressed in terms of attribute

evaluation. In the field of web development, the au-

thors of [16], give a system that is able to automati-

cally generate web forms or editors to collect data re-

lated to predefined DTDs. Once the the data entry is

complete, the web based form creates the correspond-

ing XML documents. The system receives a DTD

as input and relies on attributed grammars to assign

semantics to the elements and generates an HTML

based editor on the fly.

In this paper, we want to create a formal framework

for analyzing DTDs viewed as text models or strings.

For this purpose, we consider the set of DTDs as a

context free language called LDT D . We define a con-

text free grammar GDT D that recognizes the language

LDT D . Then, any DT D conversion can be realized

by enriching the grammar GDT D with an attributed

system which consists of associating attributes with

the nonterminals and attribute equations with the pro-

ductions of the grammar GDT D . We give an exam-

ple of an attribute grammar that eliminates unneces-

sary parentheses in an element declaration in a DTD.

An element declaration is a list or regular expres-

sion composed of the operator ’,’ , the disjunction se-

quence ’!’ , the opening and closing parentheses, and

the names of the subelements. Another example is the

model conversion from DT D to a relational model,

which consists of converting the DTD to another in-

termediate model in terms of XML syntax. The con-

tribution of this approach is that in the case of mul-

tiple transformations, a stylesheet can be written and

applied to a suitable intermediate XML based model

for each transformation without having to reanalyze

the source DTD, [17], [18]. The main advantage of

our approach:

1. The formalization of the set of DTDs by a context

free grammar GDT D . Then each operation over a

DTD consists of defining attributes over GDT D ,

which means developing an attributed grammar

based compiler.

2. The application of GDT D in the field of DTD

conversion. We can define attributes via GDT D

that convert a given DTD into another DTD

while preserving the declaration of elements

and attributes and respecting the XML syntax.

The intermediate model thus obtained is called

DT DXM L and any conversion algorithm for

the source DTD can be expressed by an XSLT

stylesheet applicable to the intermediate model

DT DXM L. This approach makes it possible to

write the DTD conversion process in two sepa-

rate modules: one module is an attribute gram-

mar application for DTDS or a DTD compiler

and the other module is just an XSLT stylesheet

for the conversion algorithm (Fig. 1).

The paper is structured as follows: Section 2 ex-

plains the principle of algebraic and attributed gram-

mars in arithmetic theory. Section 3 defines an alge-

braic grammar for generating DTDs. Section 4 de-

fines an attributed grammar for eliminating unnec-

essary parentheses in DTD declarations. Section 5

shows the results of the experiment, while section 6

summarizes the paper with a conclusion.

2 Algebraic and Attributed

Grammars

This section is dedicated to explain the syntactic anal-

ysis of DTDs. These concepts include context free

and attributed grammars, which form the basis of

compilation theory in order to create compilers for

programming languages. A context free grammar

(CFG) is a system of rules that enables the genera-

tion of a set of words from an axiom. The alphabet of

the language is divided into two groups: Terminals,

i.e. the letters of the generated words, and nontermi-

nals, i.e. intermediate symbols that can be replaced by

terminals or other nonterminals. In formal language

theory, a context free grammar is a formal grammar

whose production rules can be applied to a nontermi-

nal symbol independently of its context. In particu-

lar, in a context free grammar, each production rule

has the form A→α, where A is a single nonterminal

symbol and αis a string of terminals and/or nontermi-

nals. αcan be empty). Regardless of which symbols

surround it, the single nonterminal A on the left can

always be replaced by αon the right.

Attributed grammars are a formalism for describ-

ing the semantic analysis of programming languages

(Fig. 2). The principle is to assign attributes to each

nonterminal and to assign a system of attribute equa-

tions to each production of the grammar, [19], [20].

The attributes are evaluated along a derivation tree

Figure 1: Conversion steps: formalisation of the set of

DTDs by a context free grammar GDT D . Transform

of the DTD into intermediate model DT DXM L using

GDT D with attributes. Conversion algorithm can be

expressed by an XSLT stylesheet

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2024.21.51

Souheil Tawk, Maroun Abi Assaf,

Joseph Constantin, Kablan Barbar,

Jacques Bou Abdo

E-ISSN: 2224-3402

548

Volume 21, 2024

(for a specific input word) to determine the seman-

tics of the input word. In the case of programming

languages, there is a context free grammar for each

programming language. The attributes are used to

calculate the semantics of all statements of a pro-

gram when they are considered as input words of

the grammar of the corresponding programming lan-

guage. An Attribute Grammar (AG) is defined as

AG =< G, A, E > where:

•G=<Σ, N, S, P > where:

•Σis the set of terminals symbols.

• N is is a set of nonterminal symbols.

• S is the start symbol (axiom) and X∈N.

• P is the set of the productions rules.

P = p / p: X0→α1X1α2…αnXnαn+1

where ∀i∈[0 . . . n],Xi∈Nand ∀j∈

[1 . . . n + 1] ,αj∈Σ∗

• A is a set of attributes divided into two subsets

Asand Ah,A=As∪Ah, and for each X∈N

are associated: As(X)is the set of synthesized

attributes where As(X)⊂As,Ah(X)is the set

of inherited attributes where Ah(X)⊂Ah.

•Eis the set of equations in which for each syn-

thesized attribute s∈As(X0), there is an at-

tribute equation defining s(X0)of the form :

s(X0) = fs(s1(X1), . . . , sn(Xn), h(X0)) where

∀i∈[1 . . . n], si(Xi)∈As(Xi)and h∈

Ah(X0), as shown in Fig. 3, and for each inher-

ited attribute hj∈Ah(Xj), there is an attribute

equation defining hj(Xj)of the form:

hj(Xj) = fhj (h0(X0), s1(X1), . . . , sn(Xn))

where h0∈Ah(X0)and ∀i∈[1 . . . n],si∈

As(Xi)as shown in Fig. 4.

Figure 2: Structure of a compiler. It composed of syn-

tax analysis and semantic analysis.

Figure 3: Order of computation of synthesized at-

tributes.

Figure 4: Order of computation of synthesized at-

tributes.

3 Definition of an Algebraic

Grammar for DTDs

In our approach for the formalization of the syntac-

tical analysis of the DTDs, we define the algebraic

grammar GDT D =<Σ, N, S, P > by :

•Σ = {<, >, (,),|,∗,?,+,",!,[,],#, , , identifier,

DOCTYPE, ELEMENT, ATTLIST, EMPTY,

ANY, PCDATA, id, CDATA, ID, IDREF,

FIXED, REQUIRED, IMPLIED, string } : de-

notes the alphabet.

• N = {Start, DTD, LineType, Attlist, AttSuite, El-

ementType, List, Occurs, AttType, Enumerate,

EnumerateSuite, AttOption, InitialVal}: is the

set of non terminals,

• Start is the axiom.

• P is the following set of productions:

P0: Start →DTD

P1: DTD →<!DOCTYPE identifier [Line-

Type]

P2: LineType →<!ELEMENT identifier

ElementType>AttList LineType

P3: LineType →

P4: AttList →<!ATTLIST identifier1 identi-

fier2 AttType AttOption AttSuite>

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2024.21.51

Souheil Tawk, Maroun Abi Assaf,

Joseph Constantin, Kablan Barbar,

Jacques Bou Abdo

E-ISSN: 2224-3402

549

Volume 21, 2024

P5: AttList →

P6: AttSuite→identifier AttType AttOption

AttSuite

P7: AttSuite →

P8: ElementType →EMPTY

P9: ElementType →ANY

P10: ElementType→(#PCDATA)

P11: ElementType→(List) Occurs

P12: List →id Occurs , List

P13: List →id Occurs |List

P14: List →(List) Occurs, List

P15: List →(List) Occurs |List

P16: List →(List) Occurs

P17: List →id Occurs

P18: Occurs →*

P19: Occurs →?

P20: Occurs →

P21: Occurs →+

P22: AttType →CDATA

P23: AttType →ID

P24: AttType →IDREF

P25: AttType →(Enumerate) Initialval

P26: Enumerate →string EnumerateSuite

P27: EnumerateSuite →Enumerate

P28: EnumerateSuite →

P29: AttOption →#FIXED InitialVal

P30: AttOption →#REQUIRED

P31: AttOption →#IMPLIED

P32: InitialVal →string

P33: InitialVal →””

We start in the grammar GDT D to write the produc-

tions P0and P1that define the syntax of a DT D. We

declare the elements with their productions. We asso-

ciate a nonterminal to produce an element and another

nonterminal to define the type of an element. These

nonterminals are ElementType and LineType respec-

tively. We add the Attlist nonterminal because the el-

ement can have attributes and the LineType can pro-

duce multiple elements (P2and P3). The Attlist can

produce attribute declarations, the AttType nontermi-

nal stands for the type of an attribute and the AttOp-

tion nonterminal produces the options for an attribute.

We write the productions P4-P7. The element type

can be (#PCDATA), ANY, EMPTY, composed or

generate a list of operators (P8-P11). To develop

the nonterminal List, let’s look at an example of the

element declaration <!element A(B,C*,(D|E),F*)>.

The structure of a type element is therefore a paren-

thesized expression with operators (+, *, …) and sep-

arators (, and |). We therefore suggest the productions

P12 -P17 for List. The productions P18 -P21 generate

terminal operators such as *, ?, +, and . The produc-

tions P22 -P25 generate the attribute type, which can

be CDATA, ID, IDREF and enumerate with initial

value. The productions P26 -P28 are used to generate

the character string enumerate in XML. The produc-

tions P29 -P33 are used to generate attribute options

with an initial value. The attribute options can be

#FIXED, #REQUIRED, #IMPLIED. The initial value

is generated in the P32 and P33 productions. This

value can be empty or defined as a string.

Based on this grammar, the derivation tree of the

first two lines of following DTD is shown in Fig. 5

DTD →<!DOCTYPE A [

<!ELEMENT A(B,(C,D)) >

<!ELEMENT B(#PCDATA) >

<!ELEMENT C(#PCDATA) >

<!ELEMENT D(#PCDATA) >

Figure 5: Example of a derivation tree for a given

DTD: The productions P0and P1generate the first

line of the DTD. The production P2defines the ele-

ment A and calls P11 using the nonterminal Element-

Type. We use the nonterminal List of productions

P11,P12,P16 and P17 to define the elements B, C and

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2024.21.51

Souheil Tawk, Maroun Abi Assaf,

Joseph Constantin, Kablan Barbar,

Jacques Bou Abdo

E-ISSN: 2224-3402

550

Volume 21, 2024

4 Attributed Grammar for

Simplifying DTD Declarations.

In this section, we give an example of an attributed

grammar for eliminating useless parenthesis element

declarations in DTDs. The structure of an element

declaration in a DTD is a rational expression on the

alphabet composed of the operator (,), the or operator

(|), the star operator (*), the exclamation mark (!), the

parentheses, and the identifiers. We distinguish three

categories of lists:

• The lists that contain only the comma operator,

e.g. (B, C, D, E , F).

• The lists that contain only the or operator. For

example, (B|C|D|E|F).

• The list with the commas and an or operator. As

an example, (B, C |D, E ).

Now, we will define the notion of useless parenthesis

in the structure of the element declaration.

In general, parentheses are considered unnecessary if

they do not change the meaning of the declaration

and do not improve the readability of the code. In

our case, we consider parentheses unnecessary if they

only occur in inner lists. For example, the parenthe-

ses of the inner list (C, D) are useless in the list (B, (C,

D), E) and then (B, (C, D), E) can be reduced to the

corresponding list (B, C, D, E). In contrast, the paren-

theses of the inner list (C, D) should be retained in the

list (B |(C, D), E) as the operators of the inner and

outer lists are different.

The parentheses of the inner list (D, E, F) in the outer

list (B, C, ( D, E ,F), G) are useless, as the only opera-

tor of the list D, E, F is a comma (,) and the preceding

and following operators are also a comma (,). The

list (B, C ,(D, E, F), G) can be reduced to (B, C , D,

E, F, G). In addition, the parenthesis of the inner list

(C|D) should be retained in the list (A,B,(C|D),E). By

definition, the inner brackets are useless iff :

• Its list has a unique operator (, or |).

• The two operators before and after the inner list in

the outer list are the same and equal to the unique

operator of the inner list.

Now, we consider the occurrence of the inner list in

a derivation tree in the grammar GDT D where we ex-

press:

• The unique operator of the inner list.

• The preceding and following operators in the

form of attributes over the production of GDT D .

First, we need an attribute to calculate the reduced list

and the other parts of an element declaration; this at-

tribute is the synthesized attribute which is called ”d”.

The unique operator of an inner list is represented by

a synthesized attribute ”o”. To calculate the value

of ”o” for an inner list, we use another inherited at-

tribute ”o1”, which is initialized to null at the begin-

ning of an inner list. Thus, the productions and their

attributes are given in Table 1, Table 2, Table 3, Ta-

ble 4, Table 5, Table 6, and Table 7 (Appendix). We

can divide the productions of the grammar GDT D

into three groups:

• group 1 : from P0to P11.

• group 2 : from P12 to P17.

• group 3 : from P18 to P33.

The productions of group 1 and group 3 reproduce in

the attribute ”d” all parts of the given DTD that are

outside the inner parentheses. If we use the DTD :

<! Doctype A [

<! ELEMENT A ((B, (C))) >

<! ELEMENT B (#REQUIRED) >

<! ELEMENT C (#REQUIRED) >

The evaluation of the attributes via the productions of

group 1 and group 3 generates the main parts of the

previous DTD, namely:

<! Doctype A [

<! ELEMENT A ( ) >

<! ELEMENT B (#REQUIRED) >

<! ELEMENT C (#REQUIRED) >

The attribute evaluation on the productions of group

2 decides whether the parentheses of the inner lists

(B, (C)) and (C) in the productions P14 ,P15,P16

should be eliminated or not. We note that the produc-

tion P11 generates the outer parentheses in the dec-

larations <! ELEMENT A (. . .)>and initializes

the preceding operator of the inner lists to the empty

string. The production P12 and P13 do the same as

P11 in terms of attributes. Note that if a nonterminal

appears multiple times in a production, these occur-

rences are numbered in order. Let us define in pro-

duction P14,o(List2) is the operator of the inner list

”List2”. The nonterminal ”List” appears three times

in the production. These occurrences are numbered

from 0 to 2. o1(List0) is the operator preceding the

inner list ”List2”. The operator ”,” which appears

after the nonterminal ”Occurs” is the following op-

erator of the list ”List2”. We check if the opera-

tor of the inner list ”List2” and the preceding oper-

ator o1(List0) are the same as the following operator

”,”. If it is true, we eliminate the parentheses, oth-

erwise we regenerate them in the statement defining

d(List0). The production P15 does the same as the

production P14 by using the operator ”|. The produc-

tion P16 verifies if the the operator preceding the inner

list o1(List0) is the same as the operator of the inner

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2024.21.51

Souheil Tawk, Maroun Abi Assaf,

Joseph Constantin, Kablan Barbar,

Jacques Bou Abdo

E-ISSN: 2224-3402

551

Volume 21, 2024

list o(List2), then the parenthesis are eliminated, oth-

erwise they should be generated. The production P17

generates the last identifier.

5 Experimental Results

At the implementation level, the syntactic analyzer

consists of functions that correspond to all GDT D

productions while the semantic analyzer consists of

a single function for the evaluation of attributes in a

derivation tree. This evaluation function is a switch

statement, in which there is one case for each GDT D

production. Fig. 6 shows the structure of the DTD

compiler. It is worth mentioning here that the syntac-

tic analyzer is the same for all DTD conversion meth-

ods. However, we create a new attribute evaluation

function for each DTD conversion method. The de-

veloped program takes a DTD as input and produces

a DTD as output by calculating the structure of an ele-

ment and removing unnecessary parentheses. An ex-

ample of the DTD:

<!DOCTYPE A [

<!ELEMENT A ((B, (C)))>

<!ATTLIST A id ID #REQUIRED>

<!ELEMENT B (#PCDATA)>

<!ELEMENT C (((E) |F))>

<!ELEMENT E (#PCDATA)>

<!ELEMENT F (#PCDATA)>

We obtain the following output:

<!DOCTYPE A [

<!ELEMENT A (B, C)>

<!ATTLIST A id ID #REQUIRED>

<!ELEMENT B (#PCDATA)>

<!ELEMENT C (E |F)>

<!ELEMENT E (#PCDATA)>

<!ELEMENT F (#PCDATA)>

6 Conclusion

In this paper, we have created a new formal frame-

work for the set of DTDs in the form of a context free

language and the corresponding context free gram-

mar. The context free grammar generates a tree based

implementation, called derivation tree, for each DTD

in the computer memory. In addition to this, we

can enrich the context free grammar GDT D with at-

tributes:

Figure 6: Algorithm Architecture using Compiling

Techniques in DTD Conversion

Table 1: Semantic rules for the productions P0-P4

of the grammar GDT D

P0:

d(Start) = d(DTD)

P1:

d(DTD) = ”<!DocType ” & identifier & ”[” &

d(LineType) & ”] >"

P2:

d(LineType0) = ”<! Element” & identifier &

d(ElementType) ”>” & d(AttList) & d(Line-

Type8)

P3:

d(LineType) = ””

d(AttList) = ”<!ATTLIST” & identifier1 &

identifier2 & d(AttType) & d(AttOption) &

d(AttSuite) & ”>”

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2024.21.51

Souheil Tawk, Maroun Abi Assaf,

Joseph Constantin, Kablan Barbar,

Jacques Bou Abdo

E-ISSN: 2224-3402

552

Volume 21, 2024

• Perform syntactic transformation of the source

DTD such as eliminating the unnecessary paren-

theses in the declaration in the DTD.

• Rewrite the source DTD into an XML docu-

ment or an intermediate model while retaining

the DTD declarations. Then each DTD transfor-

mation can be expressed by an XSLT stylesheet

applicable to the intermediate model.

Our approach differs from existing approaches in that

it provides a formal framework for analyzing DTDs

based on compilation techniques. We define a con-

text free grammar that recognizes the language of the

DTD. Then any DTD conversion can be realized by

enriching this grammar with an attributed system.

Our contribution is that in the case of multiple trans-

formations, we can convert the DTD to another inter-

mediate model such as XML and apply a stylesheet to

the intermediate model for each transformation with-

out having to analyze the source DTD.

In the future, the entire framework, grammar with at-

tributes, can be used to split the model conversion

into two steps: The first step is to convert the DTD

into an intermediate model in terms of XML syntax,

while the second step of effective conversion can be

expressed by an XSLT stylesheet. We will also ex-

plore a way to map DTDs to relational schemas and

use traditional relational database engines to process

XML documents, since XML is emerging as the data

format of the Internet and there is a need for storing

and querying XML documents.

Acknowledgment:

Special thanks to Miss Ferial Srour Nemr for her

excellent proofreading.

References:

[1] A. M. Saba, E. Shahab, H. Abdolrahimpour, M.

Hakimi, and A. Moazzam, A comparative

analysis of xml documents, xml enabled

databases and native xml databases, arXiv, vol.

abs/1707.08259, 2017. [Online]. Available:

https://arxiv.org/abs/1707.08259.

[2] M. Reed, Python Programming and SQL,

Independently published, January 2023.

[3] S. Lu, Y. Sun, M. Atay, and F. Fotouhi, A new

inlining algorithm for mapping XML DTDs to

relational schemas, in: Jeusfeld, M.A., Pastor,

O. (eds) Conceptual Modeling for Novel

Application Domains. Lecture Notes in

Computer Science, Springer, Berlin,

Heidelberg., vol. 2814, 2003, pp. 366–377.

[4] P. Bohannon, J. Freire, P. Roy, and J. Simeon,

Form XML Schemas to relations: a cost based

approach to XML Storage, in Proceedings of the

18th International Conference on Data

Engineering (ICDE2002) IEEE Computer

Society, 2002, p. 564–580.

[5] P. Bohannon, J. Freire, J. Haritsa, M. Ramanath,

P. Roy, and J. Simeon, Bridging the XML

relational divide with LegoDB: a demonstration,

in Proceedings 19th International Conference

on Data Engineering (Cat. No.03CH37405),

Bangalore, India, 2003, p. 759–761, doi:

10.1109/ICDE.2003.1260859.

[6] R. Bourret, C. Bornhovd, and A. Buchmann, A

generic load/extract utility for data transfer

between XML documents and relational

databases, in Proceedings Second International

Workshop on Advanced Issues of Ecommerce

and Web Based Information Systems, WECWIS

2000, Milpitas, CA, USA, 2000, pp. 134–143,

doi: 10.1109/WECWIS.2000.853868.

[7] Y. Chen, S. Davidson, and Y. Zheng,

Constraints preserving XML storage in relations,

International Workshop on the Web and

Databases, 2002. [Online]. Available:

https://api.semanticscholar.org/Cor-

pusID:59848379

[8] D. Lee, M. Mani, and W. Chu, Schema

conversion methods between XML and

relational models, B. Omelayenko, M. Klein

(Eds.), Knowledge Transformation for the

Semantic Web, IOS Press, 2003, pp. 1–17,.

[9] T. Lv and P. Yan, Mapping DTDs to relational

schemas with semantic constraints, Information

and Software Technology, vol. 48, no. 4, 2006,

pp. 245–252.

[10] Z. Tan, J. Xu, W. Wang, , and B. Shi, Storing

normalized xml documents in normalized

relations, in The Fifth International Conference

on Computer and Information Technology

(CIT’05), Shanghai, 2005, pp. 123–129, doi:

10.1109/CIT.2005.175.

[11] A. J. Rafsanjani and S. H. Mirian

Hosseinabadi, RIAL: Redundancy Reducing

Inlining Algorithm to Map XML DTD to

Relations, In Proceedings of International

Conferences on Computational Intelligence for

Modelling Control and Automation, 2008, pp.

25–30, doi: 10.1109/CIMCA.2008.19.

[12] Q. Zhu and W. Yang, Mapping from the XML

Schema to the Relational Database with

Functional Dependency Preserved, in 2010

International Conference on Machine Vision

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2024.21.51

Souheil Tawk, Maroun Abi Assaf,

Joseph Constantin, Kablan Barbar,

Jacques Bou Abdo

E-ISSN: 2224-3402

553

Volume 21, 2024

and Human machine Interface, Kaifeng, China,

2010, pp. 412–415.

[13] A. A. A. El Aziz and A. Kannan, Survey of

mapping xml dtds(documents) to relational

schemas, International Journal of Computer

Science and Management Research, vol. 2,

2013, pp. 1175–1193.

[14] D. Knuth, Semantics of context free languages,

Math. Systems Theory, vol. 2, 1968, pp.

127–145, https://doi.org/10.1007/BF01692511.

[15] K. Barbar and K. Musumbu, Implementation of

abstract interpretation algorithms by means of

attribute grammar, in Proceedings of 26th

Southeastern Symposium on System Theory,

Athens, OH, USA, 1994, pp. 87–92, doi:

10.1109/SSST.1994.287905.

[16] K. Barbar, Automatic generator of xml

documents editors based on attributed

grammars, in Proceedings of the 5th

international conference on Soft computing as

transdisciplinary science and technology, 2008,

pp. 166–172,

https://doi.org/10.1145/1456223.1456260.

[17] D. Li, X. Li, V. Stolz, QVT based model

transformation using XSLT, ACM SIGSOFT

Software Engineering Notes, Vol.36, No.1,

2011,pp. 1-8,

https://doi.org/10.1145/1921532.1921563.

[18] H. Mizumoto, N. SUZUKI, An XSLT

transformation method for distributed XML,

Journal of information processing, Vol.23, No.3,

2015,pp. 353-365, doi:10.2197/ipsjjip.23.353.

[19] A. V. Aho, Compilers: Principles, Techniques,

and Tools,ser. Always Learning. Pearson, 2014.

[20] T. Mogensen, Introduction to Compiler Design,

ser. Undergraduate Topics in Computer Science.

Springer Cham, 2024. [Online]. Available:

https://doi.org/10.1007/978-3-031-46460-7

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

SOUHEIL TAWK is a PHD student of the the-

sis ”Systems based on attribute grammars for

model conversion: Application to the conver-

sion of DTD into relational schema”. He carried

out the experimental and the optimization parts.

MAROUN ABI ASSAF was responsible of the

Logic, algorithms, and the bibliography parts.

JOSEPH CONSTANTIN is the cosupervisor

of the thesis. He was responsible of find-

ing the Grammar productions, the attributed

Grammar equations, and writing of the paper.

KABLAN BBRBAR is the supervisor of the thesis.

He was responsible of the Logic part and the imple-

mentation the attributed Grammar equations.

JACQUES BOU ABDO was an examiner

of the thesis. He was responsible for the

correction of the paper from logic errors.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

No funding was received for conducting this study.

Conflicts of Interest

The authors have no conflicts of interest to

declare that are relevant to the content of this

article.

Creative Commons Attribution License 4.0

(Attribution 4.0 International , CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2024.21.51

Souheil Tawk, Maroun Abi Assaf,

Joseph Constantin, Kablan Barbar,

Jacques Bou Abdo

E-ISSN: 2224-3402

554

Volume 21, 2024

APPENDIX

Table 2: Semantic rules for the productions P5-P10

of the grammar GDT D

P5:

x(AttList) = ” ”

d(Attsuite0) = identifier & d(AttType) & d(At-

tOption) & d(AttSuite4)

P7:

d(AttSuite) = ” ”

P8:

d(ElementType) = ”Empty”

P9:

d(ElementType) = ”ANY”

P10:

d(ElementType) = ”#PCDATA”

Table 3: Semantic rules for the productions P11-P13

of the grammar GDT D

P11:

o1(List2) = ”

d(ElementType) = ”(” & d(List) & ”)” & d(Oc-

curs)

P12:

o1(List4) = 





0,0if(o1(List0) =00 or

o1(List0) =0,0)

0m0,otherwise

d(List0) = id & d(Occurs)& ”,” & d(List4)

o(List0) = o(List4)

P13:

o1(List4) = 





0|0, if(o1(List0) =00 or

o1(List0) =0|0)

0m0,otherwise

d(List0)= id & d(Occurs) & ”|” & d(List4)

o(List0) = o(List4)

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2024.21.51

Souheil Tawk, Maroun Abi Assaf,

Joseph Constantin, Kablan Barbar,

Jacques Bou Abdo

E-ISSN: 2224-3402

555

Volume 21, 2024

Table 4: Semantic rules for the productions P14-P15

of the grammar GDT D

P14:

o1(List2) =’ ’

o1(List6) = 





0,0, if(o1(List0) =00 or

o1(List0) =0,0)

0m0,otherwise

if (o1(List0) = o(List2) & o(List2) =’,’) then

d(List0) = d(List2) & d(Occurs) & ”,” & d(List6)

else

d(List0) = ”(” & d(List2) & ”)” & d(Occurs) &

”,” & d(List6)

endif

o(List0) =o(List6)

P15:

o1(List2) =’ ’

o1(List6) = 





0|0, if(o1(List0) =00 or

o1(List0) =0|0)

0m0,otherwise

if (o1(List0) = o(List2) & o1(List2) = ‘|’) then

d(List0) = d(List2) & d(Occurs) & ”|” & d(List6)

else

d(List0) = ”(” & d(List2) & ”)” & d(Occurs) &

”|” & d(List6)

endif

o(List0) = o(List6)

Table 5: Semantic rules for the productions P16-P20

of the grammar GDT D

P16:

o1(List2) =’ ’

if (o1(List0) = o(List2) ) then

d(List0) = d(List2) & d(Occurs)

else

d(List0) = “(” & d(List2) & ”)” & d(Occurs)

endif

o(List0) = o1(List0)

P17:

d(List) = identifier & d(Occurs)

o(List) = o1(List)

P18:

d(Occurs) = ”*”

P19:

d(Occurs) = ”?”

P20:

d(Occurs) = ” ”

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2024.21.51

Souheil Tawk, Maroun Abi Assaf,

Joseph Constantin, Kablan Barbar,

Jacques Bou Abdo

E-ISSN: 2224-3402

556

Volume 21, 2024

Table 6: Semantic rules for the productions P21-P27

of the grammar GDT D

P21:

d(Occurs) = ”+”

P22:

d(AttType) = ”CDATA”

P23:

d(AttType) = ”ID”

P24:

d(AttType) = ”IDREF”

P25:

d(AttType) = ”(” & d(Enumerate) & ”)” & d(Ini-

tialVal)

P26:

d(Enumerate) = string & d(EnumerateSuite)

P27:

d(EnumerateSuite) = d(Enumerate)

Table 7: Semantic rules for the productions P28-P33

of the grammar GDT D

P28:

d(Enumerate) = ””

P29:

d(AttOption) = ”#” & ”FIXED” & d(InitialVal)

P30:

d(AttOption) = ”#REQUIRED”

P31:

d(AttOption) = ”#IMPLIED”

P32:

d(InitialVal) = string

P33:

d(InitialVal) = ” ”

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2024.21.51

Souheil Tawk, Maroun Abi Assaf,

Joseph Constantin, Kablan Barbar,

Jacques Bou Abdo

E-ISSN: 2224-3402

557

Volume 21, 2024