WSEAS Transactions on Information Science and Applications
Print ISSN: 1790-0832, E-ISSN: 2224-3402
Volume 20, 2023
Automatic SQL to HQL-NoSQL Querying using PostgreSQL and Integrated Hive-Hbase
Authors: ,
Abstract: The amount of digital data is constantly growing in almost all fields. This data is divided into two categories, structured and unstructured data. Non-structural databases known as NoSQL became one of the main fields of big data. Many companies are still using relational databases like PostgreSQL and MySQL. But with the rapid evolution and diversity of stored data, companies find themselves obliged to use big data tools like HBase or Hive. Big data is characterized by its capacity, speed, and ability to store diverse types of data. Data analysis and high storage capacity are the main reasons for companies to search for new database systems. Data migration to new systems is associated with the modification of the existing data and applications. This process costs a lot to adopt new specialists to handle this transition. Furthermore, due to different sources of data in old systems, e.g., real-time applications that are continuously collecting new data, companies will not be able to leave relational databases. For this reason, we present a system, termed Automatic Query Language, or AQL in short form, for migrating data from PostgreSQL to integrated HBase/Hive databases. In addition, we provide a platform that allows any user to query automatically PostgreSQL, Hive, and HBase databases using SQL query only. Querying the system is related to where each big data tool’s performance is better. After the platform was completed, we were able to insert and select data from both relational databases and big data components. Join operation was not a problem because complex queries for analysis were executed using Hive which was integrated with HBase. The tested AQL system proved that HBase can insert data with more efficiency than PostgreSQL and Hive, and that select query in Hive has a better performance than PostgreSQL for big data size, whereas, for small data size, the performance of PostgreSQL is better.
Search Articles
Keywords: Automatic Query Language, Big data, HBase, HDFS, Hive, PostgreSQL, Relational Database, Sqoop
Pages: 16-27
DOI: 10.37394/23209.2023.20.3