Monotonic and Linear Relations between Growth of Quality vs Growth in

Quantity in Open-Source Software Projects

EKBAL RASHID , NIKOS MASTORAKIS

Abstract: - Mathematical models have been developed to study the relation between the growth of quantity and the

growth of quality of open-source collaborative software repositories of GitHub. GitHub events triggering the

growth of quantity and quality have been identified. Linear regression analysis, Pearson’s, Spearman’s, and

Kendall’s correlation coefficients have been used. Hypothesis testing has led to the conclusion that there may be a

linear relation between quality and quantity within a certain range of values. Positive monotonic relations and

dependency between quantity and quality have been strongly established. Scripts for automated testing have been

developed.

Key-Words : - Software quality, linear regression models, Pearson’s Spearman’s Kendall’s correlation coefficients,

GitHub events, Software Growth, Software repositories.

Received: August 17, 2023. Revised: December 15, 2023. Accepted: February 19, 2024. Published: April 8, 2024.

1 Introduction

The noted statistician George Box had said, “All

models are wrong, but some are useful.” And so,

when we go on to try to model the quality vs quantity

factor for open-source software repositories, we are

facing a myriad of issues to be tackled. The foremost

being how to look at quality and quantity, how to

represent them, and how to cope with the ever-

changing nature of these software projects. There have

been so many attempts, but hardly anyone seems to be

happy. The development in the open-source world has

been largely stochastic in nature, with so much

randomness, that it would be very difficult to make

them mathematically tractable. Over and above, the

open-source world with its collaborative nature is also

full of people who are learning and at the same time

contributing. There are communities of developers,

writers, artists, and designers, working in tandem,

with many of them also involved in other day jobs,

contributing to making big projects work. The projects

evolve beautifully unfolding amidst their seemingly

chaotic but homogeneous path of development, giving

rise to extremely useful engineering products. In this

paper, the authors have attempted to try and identify

the events which bring about changes in these

projects. GitHub is one place hosting millions of

repositories. Hence, GitHub has been the point of

attraction for engineers and data scientists. For each

event in GitHub, a record is kept and the data of such

records are available. If it is possible to identify those

events that lead to quantitative changes and those that

lead to qualitative changes, then data related to those

events can be mined and analyzed. This is exactly

what the authors have attempted to do in the present

study. Parameters have been identified, normalized

and bundled into two variables – quality and quantity.

Two models have been suggested here. Many more

can be designed likewise. One of the models suggests

a strong positive linear relationship between quality

and quantity. Both suggest a strong positive

monotonic relationship between quality and quantity.

2 Survey of Literature

The field of mining software repositories is still an

evolving area of research and hence needs to be

further probed and studied. The present research is

motivated by the following works. [1], has discussed

about the differences between two versions of

software and has shown the effect of changes in

source code. This will assist the developers in

understanding as to what changes in source code

would cause the software to function differently. A

tool called IMPEX has been discussed here. So, this is

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.4

Ekbal Rashid, Nikos Mastorakis

E-ISSN: 2224-2872

Volume 23, 2024

1Technical University of Sofia, Sofia, BULGARIA

2Sector of Electrical Engineering and Computer Science,

Hellenic Naval Academy, Piraeus, GREECE

and English Language Faculty of Engineering,

Technical University of Sofia, Sofia, BULGARIA

1 2

an attempt to understand the quality of the software in

terms of some quantitative change, namely the

changes in source code. Again, [2], have discussed the

effect of names of identifiers on the quality of the

software. According to them, if the quality of names

of the identifiers is of low quality, it leads to a

lowering of the quality of the software as a whole. A

tool has been used here to get identifier names from

the source code of Java projects. They have also

discussed which type of names of identifiers lead to

which type of problems. Again, here the changes in

the code are affecting the software quality. [3], as

explained the effect of code refactoring on the quality

of the software. They have developed techniques to

improve the quality of the software by refactoring

code. The authors have tried to formalize the method

of refactoring code in this paper. [4], have in a similar

manner analyzedanalyzed structural changes in

software versions and the chief motive was to measure

changes related to the structure of source code in the

software. Although there is no attempt to relate the

changes in code structure with the quality of the

software, it is quite evident that there may be a

relation between the two and this may constitute an

interesting study. However, these works are looking

into the software projects only from a static point of

view. It is necessary to look at these projects from a

dynamic perspective, from the viewpoint of their ever-

changing nature, from the viewpoint of their coming

into being and going out of being. This has been the

primary focus of the present paper.

Several studies are there that have tried to

understand features of open-source projects and

parameters such as the number of active contributors,

the use of different kinds of programming languages,

the particular structure of the project, and on many

other important parameters that the author considers

to be quantitative parameters. Many such works are

found in [5] and [6]. Again, these are looking largely

at popularity. There has to be a judgement of quality

too using suitable metrics based on events of open-

source projects.

There are other works such as [7] and [8] that

analyze the data in datasets related to the GitHub

repositories. In such studies, mainly the stars, forks,

and issues are considered. Many have also included

code and outline of how the data has been retrieved,

that is they have elaborated in detail the mining

methods involved. The study also did a random word

selection from a certain word list which was given to a

GitHub API and then the API gave back a list of repos.

Out of the list, some are selected at random and

mining is done to extract the data of those reports.

It has used the metric of popularity as popularity

= stars + forks + pulls2. The authors in this study

have tried to correlate the documentation of the

project to this defined value of popularity. The

method though is not discussed in great detail], but it

does motivate the present study to think in similar

terms.

In [9], authors have adequately described that a

large number of GitHub repositories are personal and

not active. This may have a large effect on the

conclusions that one may draw from a dataset of

GitHub repositories. For this, the authors analysed

parts of GHTorrent datasets and sent surveys to users

of GitHub. They also highlighted the fact that there

was a substantial number of projects that had very few

commits so it might not be proper to jump to

conclusions from the commit data of GitHub.

In [10], authors have shown that frequency of

commits and the evolution of versions of files in eight

large projects of GitHub have a certain degree of

correlation. The projects discussed here are very

successful. It presented a picture of the number of

commits and the number of lines of code being

changed in each file and a comparison between the

number of commits and file changes in different

versions etc. All these works are pointing towards the

attempt to design a software quality vs quantity model

for understanding the relation between them.

3 Methodology

Detailed mining has been done from a public dataset

available on Google Big Query. The total data

processed in Big Query went to about 43.2 TB. More

than 170 queries were performed on the dataset to

extract the data. Since the process is cost-consuming,

it could be performed for only a single time. This

extracted data was cross-checked from Click House

[11] and all the data was tabulated. It needs to be

mentioned here that GH Archive has made available

the data from GitHub for the last eleven years, that is,

from 2011 to 2021. This has a detailed collection of several

events. The events and their identifiers are as below:

1. CommitCommentEvent: triggered when there is

a comment in a commit

2. CreateEvent: when there is the creation of a

branch or a tag

3. DeleteEvent: When a branch or a tag is deleted

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.4

Ekbal Rashid, Nikos Mastorakis

E-ISSN: 2224-2872

Volume 23, 2024

4. ForkEvent: This is triggered when a user forks

any repository

5. GollumEvent: When a wiki page is created or

updated

6. IssueCommentEvent: When there is a comment

on any issue opened

7. IssuesEvent: This event is related to an issue.

There are many actions involved with it such as

opened, closed, reopened, assigned, etc.

8. MemberEvent: This is an event related to the

activity of any member of the repository

9. PublicEvent: When a repository is made a public

repository

10. PullRequestEvent: This event is triggered when a

pull request is made and this also includes many

actions like opened, closed, reopened, assigned,

etc.

11. PullRequestReviewEvent: Whenever there is a

review about a pull request.

12. PullRequestReviewCommentEvent: If anyone

comments on a pull request review

13. PushEvent: Whenever one or more commits are

pushed to a repository branch

14. ReleaseEvent: Whenever a new version is

release, this event is triggered

15. SponsorshipEvent: Event associated with the

listing of sponsorship

16. WatchEvent: When anyone stars a repository

For this study, some GitHub projects have been

randomly selected based on popularity. Three

types of popularity measures have been

considered as follows (Table 1):

i) Highly rated projects: Projects with stars greater

or equal to 50,000

ii) Moderately rated projects: Projects with stars

between 5,000 and 10,000

iii) Slightly rated projects: Projects with stars less or

equal to 1,000

The above repositories have been arbitrarily

selected by running queries as per GitHub docs, [12],

and then listing them out. Only one criterion has been

considered while selecting the repositories, that is, the

repositories should hold coded projects. There are

collections of images, or books, or other resources as

GitHub repositories, but they have not been

considered for this study.

The raw data was then processed to properly

ascertain the quantitative, qualitative, and popularity

parameters. Although there may be many ways in

which growth in quantity, growth of quality, and

popularity can be defined for open-source

collaborative projects like GitHub, this research has

identified the following at present:

Parameters measuring growth in quantity:

a) Pull Request Activity: It is the ratio between the

number of pull requests opened and the number of

pull requests closed. However, here the ratio itself

is not taken as the quantitative measure. Rather, it

is normalized. The reason for this is that different

types of repositories are taken up for data

collection. A less active repository may have the

same value of closed pull requests to opened pull

requests ratio with a far lesser number of pull

requests. Hence, this ratio has been normalized by

multiplying it by a factor proportional to the closed

number of pull requests. The final normalized pull

request activity is as follows:

pull request activity

= 𝑛𝑜.𝑜𝑓 𝑐𝑙𝑜𝑠𝑒𝑑 𝑝𝑢𝑙𝑙 𝑟𝑒𝑞𝑢𝑒𝑠𝑡𝑠

𝑛𝑜.𝑜𝑓 𝑜𝑝𝑒𝑛 𝑝𝑢𝑙𝑙 𝑟𝑒𝑞𝑢𝑒𝑠𝑡𝑠 x 𝑛𝑜.𝑜𝑓 𝑐𝑙𝑜𝑠𝑒𝑑 𝑝𝑢𝑙𝑙 𝑟𝑒𝑞𝑢𝑒𝑠𝑡𝑠

100

where (closed pull requests/100) is the normalizing

factor. There would be not much difference in the

final results had we simply multiplied the original

ratio with the number of closed requests without

dividing it by 100, but it has been done to keep the

values small. This normalized pull request activity has

been calculated for all the listed projects for a period

of seven years (2015-2021).

b) Issue Activity: It is the ratio between the number

of closed issues to the number of issues opened.

Similar to the above parameter, there is again a

need for normalizing the data and for that the

following method has been adopted:

issue activity = 𝑛𝑜.𝑜𝑓 𝑐𝑙𝑜𝑠𝑒𝑑 𝑖𝑠𝑠𝑢𝑒𝑠

𝑛𝑜.𝑜𝑓 𝑜𝑝𝑒𝑛 𝑖𝑠𝑠𝑢𝑒𝑠 x 𝑛𝑜.𝑜𝑓 𝑐𝑙𝑜𝑠𝑒𝑑 𝑖𝑠𝑠𝑢𝑒𝑠

100

As stated above, the value 100 in the denominator

of the normalizing factor does not matter much so far

as the final results are concerned. The normalized

issue activity has been calculated for all the listed

projects for a period of seven years (2015-2021).

After calculating the pull request activity and the

issue activity, the growth of quantity has been

determined as:

growth in quantity = normalized pull request

activity + normalized issue activity

1. Parameters measuring growth of quality:

Open-source projects follow a different quality

assurance model and quality control processes. The

pivotal point in these methods is peer-review of code

and local testing before updating the repositories.

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.4

Ekbal Rashid, Nikos Mastorakis

E-ISSN: 2224-2872

Volume 23, 2024

Hence, any update in the form of pushes or the

release of new versions signals extensive

implementation of quality standards. For this reason,

the author has taken these two events as quality

parameters.

a) Number of pushes: A push will update the remote

repository branch with a commit. When a code has

been written to serve as a patch or for some other

feature, it is reviewed and tested locally. Only after

adequate alpha testing activities, it is pushed to the

remote branch. Hence, a push will improve the

quality of the project as it is reviewed and tested

code. However, a single push may not affect the

quality to a great extent. Hence to use it as a

parameter, each push has been treated to raise the

quality of the project by a factor of 0.1. The

number of pushes for the projects listed above has

been extracted for seven years (2015 to 2021) and

tabulated.

b) Number of releases: The release of a version

cannot be a qualitative leap in the development of

collaborative open-source projects. The pushes to

the rawhide branch are effectively quality

changers, but they are in the larger perspective

small quantitative changes in quality that are more

or less imperceptible to the common user.

However, the release of a new version brings about

a sudden leap, a visible qualitative change in the

software which is revolutionary in character and

which has a break from the earlier version of the

project. This change in quality is much more

pronounced than the qualitative change brought

about by the push event. Hence the normalized

value of this parameter is obtained by multiplying

the number of releases by unity, signifying that the

quality has grown ten times compared to a push

event. Of course, this may be a matter of

discussion whether it would be proper to consider

the growth of quality in this fashion, or what

should be the value multiplied to get a suitable

normalized quality growth parameter.

The data for seven years (2015-2021) have been

calculated for all the projects listed above. After this

has been done the value showing the growth of quality

is calculated for each year and each project using the

following formula:

growth in quality = (no. of pushes)/10 + (no. of

releases)

Where the term (no. of pushes)/10 is the

normalized number of pushes.

‘Growth in quantity’ has been taken as the

independent variable (x) while the ‘growth of quality’

has been taken as the dependent variable (y).

Pearson’s correlation coefficients, [13],

Spearman’s correlation coefficients, and Kendall’s

correlation coefficients have been evaluated for

understanding the relation between them. The detailed

methods to find out these coefficients have been

explained in several noted works, [13], [14], [15],

[16], which have been consulted meticulously. The

details have been skipped with the assumption that the

reader can easily access these works or any other

related literature.

Several models have been framed with different

upper and lower limits of parameters. The most

suitable model has been suggested for drawing

inferences. 80% of the data has been used for training

the models and 20% data has been used for testing the

models. Hypothesis testing has been done by

calculating probability values with a significance level

of 95%. Finally, a regression analysis has been done

to estimate the mathematical relation between the

growth in quantity and the growth of quality. For this

entire activity, an entire Python script has been

developed for automated testing. The script has been

executed using Jupyter. It takes values from the Excel

sheets of different models and generates graphs and

necessary analytical material for drawing inferences.

We know from the principles of Hypothesis

testing as elucidated in many noted works such as

[17], [18], [19], [20] that the null hypothesis is

rejected if the P-value is less than 0.05 and this is the

method used here to test the hypothesis. The

hypothesis is as follows:

H0: The correlation coefficient is not significantly

different from zero. There does not exist a linear

relation between growth in quantity and growth of

quality in the population.

Ha: The population correlation coefficient is

significantly different from zero. There is a

significant linear relationship between growth in

quantity and growth of quality in the population.

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.4

Ekbal Rashid, Nikos Mastorakis

E-ISSN: 2224-2872

Volume 23, 2024

Table 1. Three types of popularity measures

4 Mathematical Models

Model 0: In this model, the entire data set has been

considered without any edits. This contains data in

two columns - namely quality and quantity for the

selected thirty repositories over a period of seven

years. For some of the repositories, the data for some

particular years was not available. In those cases, the

value zero has been used. So, a zero in this model

stands for the non-availability of data. The Python

scripts generate the following figures and regression

data for this model and it can be seen in Figure 1,

Figure 2, Figure 3, Figure 4, Figure 5, Figure 6,

Figure 7, Figure 8, Figure 9, Figure 10, Figure 11,

Figure 12, Figure 13, Figure 14, Figure 15 and in

Table 2 and Table 3.

Table 2. Regression summary (model0) without

constant - OLS Regression Results

Table 3. Regression summary (model0) with constant

- OLS Regression Results

Fig. 1: Quantity vs Quality

Fig. 2: Predicted mean and predicted points interval

highly rated projects moderately rated projects slightly rated projects

greater or equal to 50,000 stars between 5,000 and 10,000 starts less or equal to 1,000 stars

1 vuejs/vue 11 knockout/knockout 21 marcelstoer/nodemcu-pyflasher

2 facebook/react 12 cyclejs/cyclejs 22 sebleier/django-redis-cache

3 twbs/bootstrap 13 jquery/jquery-mobile 23 microsoft/coyote

4 flutter/flutter 14 code4craft/webmagic 24 liberodark/Odrive

5 microsoft/vscode 15 nasa/openmct 25 data-forge/data-forge-ts

6 tensorflow/tensorflow 16 ansible/awx 26 Olivine-Labs/busted

7 facebook/react-native 17 brianc/node-postgres 27 MetalPetal/MetalPetal

8 electron/electron 18 openresty/openresty 28 cbeuw/Cloak

9 nodejs/node 19 appwrite/appwrite 29 arnesson/cordova-plugin-firebase

10 angular/angular 20 teamcapybara/capybara 30 hahnlee/hwp.js

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.4

Ekbal Rashid, Nikos Mastorakis

E-ISSN: 2224-2872

Volume 23, 2024

Fig. 3: Quantity vs residuals

Fig. 4: studentized residuals

Fig. 5: Spearman correlation

Fig. 6: Pearson correlation

Fig. 7: Kendall correlation

Fig. 8: qqplot residuals

The correlation coefficient without constant is

fairly high. The correlation coefficient with the

coefficient is also sufficiently high. In both cases, the

P-values are way below the 0.05 value. That may lure

the statistician to reject the null hypothesis. However,

the residuals of the OLS fit do not satisfy the

condition of being normally distributed. Besides the

studentized residuals also do not fall within the range

from +2 to -2. We therefore cannot accept the

regression model. We may not conclude anything

about the linear relationship between quality and

quantity based on this model. This is probably

because of the outliers existing in the model. We can

see from the graph of predicted mean and predicted

point intervals that many points are lying outside the

interval. The mean squared error is also quite high.

Hence, so far as the question of linear regression is

considered, we have to reject this model. However,

Spearman’s coefficient is quite valid and it suggests a

strong positive monotonic relationship between

quality and quantity. Kendall’s coefficient also does

not rely on any assumption and it has a high value

suggesting that there is a strong dependence of quality

on quantity. Therefore, although this model does not

succeed in terms of establishing a linear relationship,

it does suggest strong dependence and strong positive

monotonic relationship between quantity and quality.

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.4

Ekbal Rashid, Nikos Mastorakis

E-ISSN: 2224-2872

Volume 23, 2024

Model 1: In this model, an attempt has been made to

remove outliers. For this quality values which are less

than 13 have been removed. The selection of this

value 13 is arbitrary and it has been done only to

supervise the reduction of outliers. It has been

assumed that values below 13 may not fit the model.

Similarly, quality values above 600 have been deleted,

assuming that pure collaborative work may result in

some upper limit of quality. Projects achieving higher

quality values may be artificially achieved by using

the non-random involvement of developers. Hence

there are no zero values for this model. The Python

script generates the following regression analysis See

Table 4 and Table 5:

Table 4. Regression summary (model1) without

constant - OLS Regression Results

Table 5. Regression summary (model1) with constant

- OLS Regression Results

Fig. 9: Quantity vs Quality for model 1

Fig. 10: predicted mean and predicted points interval

for model 1

Fig. 11: Quality vs residuals for model 1

Fig. 12: studentized residuals for model 1

Fig. 13: qqplot residualsof OLS fit for model 1

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.4

Ekbal Rashid, Nikos Mastorakis

E-ISSN: 2224-2872

Volume 23, 2024

Fig. 14: Spearman correlation for model 1

Fig. 15: Pearson correlation for model 1

Fig. 16: Kendall correlation for model 1

The correlation coefficient without constant is

fairly high. The correlation coefficient with the

coefficient is also sufficiently high. In both cases, the

P-values are way below the 0.05 value. Almost all the

residuals fall within the range from +2 to -2. Almost

all the residuals are along the 45-degree line and thus,

normally distributed. Hence, the assumptions for the

linear regression model are satisfied. We can accept

this model. Since the P-values are far below 0.05 and

the correlation coefficient is sufficiently high, we can

reject the null hypothesis based on this model. We can

say that a linear correlation exists between quantity

and quality for this model. The relation between

quality and quantity can be expressed using the

equation:

y = 5.3543 x + 60.5005

Suggesting that for every unit quantitative

increase, the quality increases by a factor of 5.3543.

Moreover, Spearman’s coefficient is quite valid

and it suggests a strong positive monotonic

relationship between quality and quantity. Kendall’s

coefficient also does not rely on any assumption and it

has a high value suggesting that there is a strong

dependence between quality and quantity. Based on

this model, we may say that within the given range of

quality values selected, there exists a strong positive

monotonic and linear relationship between growth in

quantity and growth of quality of collaborative

software projects.

The predicted mean interval shown in the graph of

this model is also very encouraging. It suggests that

the mean quality may be maintained in this interval if

the quantitative changes are monitored accordingly.

5 Conclusions

The relation between the growth of quality vs growth

in quantity can be modeled in the manner

demonstrated above. For both the models shown

above, we may safely conclude based on Kendall’s

coefficient that there exists a dependence between

quantity and quality where the terms quantity and

quality are as defined in this paper. Similarly, we may

also conclude that there exists a strong positive

monotonic relationship between growth of quantity

and growth of quality on the basis of Spearman’s

coefficient values as seen in both models. Both

Spearman’s and Kendall’s coefficients do not rely on

any assumptions and can be used for drawing these

conclusions. However, for the first model, the

regression analysis does not meet the test assumptions

and hence cannot be used for determining linear

relationships. The second model has passed the test

assumptions of a linear regression model. Hence, we

can use it. The correlation coefficient in this model is

significantly high. The P-values are much lower than

0.05. This is true for both cases – the OLS regression

without a constant and with a constant. Due to this, we

can reject the null hypothesis and accept the alternate

hypothesis. We therefore conclude based on the

second model that there exists a strong positive linear

relationship between growth in quantity and growth of

quality of collaborative software projects within a

suitable range of values as presented by the model.

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.4

Ekbal Rashid, Nikos Mastorakis

E-ISSN: 2224-2872

Volume 23, 2024

6 Significance

The mathematical models suggested in this paper may

serve as the basis for future research to study the

relationship between quality and quantity in further

detail. A relationship between quality and quantity

may suggest that the quality of the software project

may be estimated or even predicted by measuring the

growth in quantity. This may lead to the development

of newer software development models so far as the

open-source collaborative software world is

concerned. There may be situations where the

direction of development may be controlled based on

such quality vs quantity mathematical models.

Supervised learning of such types also may give us an

insight into the nature of the changing world of

software projects. A study about how stochastic

development takes place in the realm of software

projects of such types may also be initiated.

7 Future Scope

Quantity and quality have been defined based on four

GitHub events, namely – pullRequest, issue, push, and

release. However, there are many other events in the

lifecycle of open-source collaborative projects.

Interpretation of such other events may lead to more

interesting results. This paper has discussed two

important models. The objective has been to

demonstrate the technique of mathematical modeling

in this field. Further situations may be modeled and

used to arrive at conclusions. A particular range of

data has been used to model a situation. Other ranges

of data may be considered and more models can be

developed. The y-intercept and slopes may be

interpreted in other ways too.

References:

[1] D. Nemer. IMPEX: An Approach to Analyse

Source Code Changes on Software Run

Behaviour. Journal of Software Engineering

and Applications, 2013, vol. 6, no 4.

[2] S. Butler, M. Wermelinger, Y. Yu and H. Sharp.

Exploring the Influence of Identifier Names on

Code Quality: An Empirical Study. 14th

European Conference on Software maintenance

and Reengineering, Madrid, Spain, 2010.

[3] B. D. Bois, T. Mens. Describing the impact of

refractoring on internal program quality. ELISA

workshop, Amstredam, Netherlands, 2003.

[4] C. Gerlec, M. Hericko. Analysing Structural

Software Changes: A Case Study. BCI-LOCAL

2012, Novi Sad, Serbia, 2012.

[5] Aggarwal K., Hindle A., Stroulia E. Co-

evolution of project documentation and

popularity within GitHub in Proceedings of

MSR 2014, ACM, pp.361-362. ISBN: 978-1-

4503-2863-0.

[6] T. F. Bisisyande. Got Issues? Who cares about

it? A large-scale investigation of issue trackers

from GitHub. IEEE 21st international

Symposium on Software Reliability Engineering

(ISSRE). Nov. 2013 pp. 189-196.

[7] Jarczyk O. and others in GitHub Projects.

Quality Analysis of Open-Source Software.

Social Informatics Springer, Cham, Nov. 2014

pp. 81-89.

[8] Peterson Kevin on Mining GitHub: Why

Commit Stops Exploring the Relationship

between Developer’s Commit Pattern and File

Version Evolution. 20th Asia Pacific Software

Engineering Conference, Vol. 2, Dec 2013, pp.

164-170.

[9] Kalliamvakou E. and others. The Promises and

Perils of Mining GitHub. Proceedings of the

11th Working Conference on Mining Software

Repositories MSR 2014 ACM, pp. 92-101.

[10] Weicheng Y., Beijun S., Ben X. Mining

GitHub: Why Commit Stops Exploring the

Relationship between Developer’s Commit

Pattern and File Version Evolution. 20th Asia

Pacific Software Engineering Conference, Vol.

2 Dec. 2013.

[11] GitHub: Everything You Always Wanted to

Know about GitHub (But Were Afraid to Ask),

[Online]. https://gh.clickhouse.tech/explorer/

(Accessed Date: April 30, 2022).

[12] GitHub Docs, [Online].

https://docs.github.com/en/github (Accessed

Date: April 30, 2022).

[13] Pearson Karl Mathematical Contributions to the

Theory of Evolution. V. On the Reconstruction

of the Stature of Prehistoric Races. London:

Philosophical Transactions of the Royal Society

of London, 1898. First Edition.

[14] Chen P. Y., Popovich P.M., Correlation,

Parametric and Non-Parametric Measures. A

Sage University Publication, 2002. --14

[15] Bobko P., Correlation and Regression

Applications for Industrial Organizational

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.4

Ekbal Rashid, Nikos Mastorakis

E-ISSN: 2224-2872

Volume 23, 2024

Psychology and Management, Sage Publication,

New Delhi, Second Edition, pp. 67-84.--15

[16] Gupta S.P., Gupta M.P. Business Statistics

Sultan Chand & Sons, 2001, Twelfth Edition,

pp. 237-247

[17] Hartshorn S., Hypothesis Testing: A Visual

Introduction to Statistical Significance, Kindle

Edition, Amazon, [Online].

https://www.amazon.in/Hypothesis-Testing-

Introduction-Statistical-Significance-

ebook/dp/B019N212NE (Accessed Date: April

30, 2022).

[18] Erich L. Lehmann, Joseph P. Romano, Testing

Statistical Hypotheses, Springer-Verlag New

York, ISBN: 978-0-387-27605-2, pp. 56-78.

[19] Arthur Taff. 2018. Hypothesis Testing: The

Ultimate Beginner's Guide to Statistical

Significance. CreateSpace Independent

Publishing Platform, North Charleston, SC,

USA. pp. 69-75.

[20] Frost Jim, Hypothesis Testing: An Intuitive

Guide for Making Data Driven Decisions,

Kindle Edition, March 2021, ISBN: 978-

1735431154.

Contribution of Individual Authors to the Creation

of a Scientific Article (Ghostwriting Policy)

The authors equally contributed in the present

research, at all stages from the formulation of the

problem to the final findings and solution.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

No funding was received for conducting this study.

Conflict of Interest

The authors have no conflicts of interest to declare

that are relevant to the content of this article.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en_

WSEAS TRANSACTIONS on COMPUTERS

DOI: 10.37394/23205.2024.23.4

Ekbal Rashid, Nikos Mastorakis

E-ISSN: 2224-2872

Volume 23, 2024