of character subdivision into factors, resulting in a
relatively faster search process.
In scenarios where search performance is
evaluated with longer characters and larger text
volumes, the new algorithm's search outcomes do not
consistently match those of other algorithms. This
discrepancy arises from the need for continuous
analysis of longer characters in each search
operation, resulting in a higher frequency of analyses.
Furthermore, the actual search process can be
influenced by computer programming, as character
segmentation based on factor values directly affects
search time. Additionally, it's worth noting that the
new algorithm lacks a shift table, unlike algorithms
such as BM or KMP, which allows for shifting the
search window by more than one character.
7 Conclusion
This research introduces a novel data structure
known as multi-character inverted lists (m-cIVL)
designed for searching string data. The development
of this structure enhances the ability to compare
character strings, going beyond the limitations of
one-to-one character matching. Theoretical findings
suggest that this newly created data structure exhibits
a time complexity of O(m / k) for constructing string
matches, a worst-case space complexity of O(|Σ| +
m), and an average-case space complexity of O(2(m /
k)). The search algorithm operates with a time
complexity of O(n + nocc). In experimental
evaluations, this new solution proves to be a good fit
and performs efficiently, especially when utilizing a
binary character set with a limited number of
characters and shorter lengths. It is noteworthy that
this algorithm adapts well to longer patterns and
outperforms other approaches in terms of text search
performance.
8 Suggestions for future work
Although the new algorithm is effective, there is
room for improvement, especially in algorithms for
finding or comparing pairs, which only require a
single shift of the search window. Developing
strategies or algorithms that allow multiple characters
to be scrolled simultaneously would significantly
improve the efficiency of the search process.
Additionally, the following suggestions could be
considered to further improve the algorithm's
efficiency and open up new avenues for future
research:
1. Develop processes for generating shift tables
that accommodate multiple characters in the search
window.
2. Optimize the search direction, including
exploring reverse searches within the search window.
3. Utilize simultaneous access factors across
various segments of a character pattern during
searches.
4. Investigate techniques to expedite the
verification of the inverted lists continuity.
Acknowledgement:
Special thanks to Ramkhamhaeng University for
providing funding for this research. Extensive
gratitude goes to the Department of Computer
Science for their support, including the provision of
necessary time and research space. Finally,
appreciation is extended to Assoc. Prof. Dr. Veera
Boonjing for collaborating on the creation and
expansion of the inverted lists in the initial version of
this research structure.
References:
[1] Boyer R.S. and Moore J. S., A fast string
searching algorithm, Communications of the
ACM. 20, pp. 762-772, 1997.
[2] Chrochemore M. and Handcart C., Automata
for Matching Patterns, Handbook of Formal
Languages, Volume 2, Linear Modeling:
Background and Application, G. Rozenberg and
A.Salomaa ed.,Springer-Verlag,Berlin., Ch. 9,
pp. 399-462, 1997.
[3] Chrochemore M., Off-line serial exact string
searching, Pattern Matching Algorithms, A.
Apostolico and Z. Galil ed., Oxford University
Press. Chapter 1, pp. 1-53, 1997.
[4] Crochemore M., Gasieniec L., and Rytter W.,
Constant-space string-matching in sublinear
average time, Compression and Complexity of
Sequences 1997, pp. 230 – 239, 1997.
[5] Monz C. and Rijke M. de. (2006, August, 12)
Inverted Index Construction. Available:
http://staff.science.uva.nl/~christof/courses/ir/tr
ansparencies/clean-w-05.pdf.
[6] Escardo M., (2008, October 15), Complexity
considerations for hash tables Available:
http://www.cs.bham.ac.uk/~mhe/foundations2/
node92.html.
[7] Charras C. and Lecroq T.. (2008, October 10).
Handbook of Exact String Matching. Available:
www-igm.univ-lv.fr/~lecroq/string/string.pdf.
[8] Navarro G. and Raffinot M., Flexible Pattern
Matching in Strings, The press Syndicate of
The University of Cambridge., pp. 15-40, 2002.
[9] Galil Z., Giancarlo R., On the exact complexity
of string matching upper bounds, SIAM Journal
on Computing, 21(3)., pp. 407-437, 1992.
WSEAS TRANSACTIONS on COMPUTERS
DOI: 10.37394/23205.2023.22.18