International Journal of Applied Mathematics, Computational Science and Systems Engineering
E-ISSN: 2766-9823
Volume 2, 2020
Multidimensional Data Indexing in BIG DATA
Author:
Abstract: Multidimensional trie hashing (MTH) access method is an extension of the trie hashing for dynamic multi-key files (or databases). Its formulation consists in maintaining in main memory (d) separate tries, every one indexes an attribute. The data file represents an array of dimension (d), in an orderly, linear way on the disk. The correspondence between the physical addresses and indexes resulting of the application of the tries is achieved through the mapping function. In average, a record may be found in one disk access, which places the method among the most efficient known. Yet MTH has the double disadvantage of a low occupancy of file buckets (40-50%) and a greater memory space in relation to the file size (tries in memory). We propose a refinement of MTH on two levels. First, by using the compact representations of tries suggested in [23], then by applying the phenomenon of delayed splitting (partial expansion) as introduced in the first methods of dynamic hashing and as used in [25]. The analysis of performances of this new scheme, mainly by simulation, shows on the one hand a high load factor (70-80%) with an access time practically equal to one disk access and on the other hand an increase in the file size with a factor of two with the same space used by MTH.
Search Articles
Pages: 12-18
International Journal of Applied Mathematics, Computational Science and Systems Engineering, E-ISSN: 2766-9823, Volume 2, 2020, Art. #3