Abstract:
The Health Level Seven (IlL 7) organization has developed a powerful abstract
model of patient care called the Reference Inforn1ation Model (RlM), which is intended
to serve as a unified framework for the integration and sharing of information and the
usage of data across different healthcare domains. There are a number of exciting
research challenges posed by health care data that make them different from data in other
industries: data sparseness, high dimensionality, schema change, continuously valued
data, complex data modeling features and performance. Entity Attribute Value (EAV) is
a widely used solution to handle these above challenges of medical data but EAV is not a
search efficient data model for knowledge discovery. The thesis presents two search
efficient open schema data models: Optimized Entity Attribute Value (OEAV) and
Positional Bitmap Approach (PBA) to handle data sparseness, schema change and high
dimensionality of medical data as alternatives of widely used EAV data model. It has
been shown in both analytically and experimentally that the proposed open schema data
models are dramatically efficient in knowledge discovery operations and occupy less
storage space compared to EAV.
We have transformed HL7 RIM healthcare data into EAV, OEAV and PBA data models
and applied the proposed data mining algorithms. New data mining algorithms have been
proposed to discover knowledge from healthcare data stored in the above models. We
have evaluated the performance of the proposed algorithms experimentally by using
synthetic datasets. The experimental results show-in all the new developed data mining
algorithms, OEAV data model outperforms all the others. Next comes PBA which
performs better than EAV and in EAV these algorithms are quite slow.