Abstract:
Bad designs in software code have a significant impact on the total cost
incurred in the development of software. This is because software code with
bad designs has poor structure, which decreases its readability, understandability
and maintainability. Software restructuring is thus a crucial activity
in software development. Cohesion is an important measure in assessing the
quality of software. The cohesion of a software module is the degree to which
module components belong together. An ill-structured software code is characterized
by low cohesion. Software restructuring techniques based on hierarchical
agglomerative clustering (HAC) algorithms have been widely used to restructure
large modules with low cohesion into smaller modules with high cohesion.
These techniques generate clustering trees (or dendrograms) of the modules.
The clustering trees are then sliced at different cut-points to obtain the desired
restructurings. Choosing the appropriate cut-points is a difficult problem in
clustering. This problem is exacerbated in previous HAC techniques as those
techniques generate clustering trees which have a large number of cut-points.
Moreover, many of those cut-points return clusters of which only a few lead to
a meaningful restructuring.
In this thesis, we propose a new hierarchical clustering technique for restructuring
software at the function-level that generates clustering trees where
the number of cut-points is reduced, and the quality of the cut-points is improved.
To establish this we compare the results of our technique with those of
four previous hierarchical clustering algorithms. We also develop an easy-to-use
software tool that allows the user to generate clustering trees of functions using
five different clustering algorithms, including the algorithm proposed in this
thesis. Finally, we give a characterization of clusters returned by cut-points, in
the context of software restructuring.