My primary research interests are in data mining, machine learning and other statistical pattern classification methods for real world, high-dimensional, large scale data. My research career began through the development of sequential methods used in bioinformatics. There are a vast amount of biological data available in the form of DNA and protein sequences that have little or no annotation. My research has focused on methods that can operate on large scale sequential datasets to aid in understanding these data. Specifically, I have developed methods that can be used to annotate and categorize protein sequence data, as well as discover discriminatory features and anomalies within the sequences. There is a significant need for such methods in bioinformatics, due to the plethora of protein sequence data available that lacks any functional characterization. My methods can help improve the understanding of the role of proteins in biological and biomedical research, yielding additional information about the biological functions in species that genomic data alone can not determine. Improved functional understanding of proteins in humans helps improve drug targeting in biomedical research. It is my immediate aim to continue to investigate methods that can effectively operate on large scale genomic and proteomic sequence data, to help researchers understand other functional and structural characteristics of these data. My long term aim is to carry this research beyond the biological domain toward other areas that deal with challenges in understanding large amounts of sequential data, including text classification, handwriting recognition, atmospheric data, eye tracking data, climate change, and stock market analysis, to name a few.
My secondary research interests culminate from my vast experience as a senior software engineer. Before returning to academia, I was employed by an environmental instrumentation manufacturer for 11 years, where I designed and developed software for embedded and desktop platforms. During these years, I learned about the need for structured software engineering methodologies through experience. I also learned that methods at that time were those that were challenging to carry out in practice, particularly for small firms with rapid research and development cycles. While academia had given us a wide variety of disciplined approaches to software development that seem good on paper, they are often deemed to be too costly by management, resulting in “just-in-time” software that is ridden with “features” (bugs). Complicating matters is the fact that most Computer Aided Software Engineering (CASE) tools are developed with the desktop or server application developer in mind, where it is acceptable to release software that uses exorbitant amounts of computational resources. Yet, there are a large number of neglected careers for the embedded software engineer, who usually has limited resources available for their application. As time permits, I would like to use my practical experience in software engineering (SE) and evaluate the current state of SE methodologies to study areas of opportunity for the industry, particularly for those working with limited computational resources.