George Hripcsak, MD, MS

Professor of Biomedical Informatics
Chair, Department of Biomedical Informatics, Columbia University

Research

My research focuses on understanding and using the clinical information stored in the electronic medical record. This theme has several components:

 

1.   Data mining and knowledge discovery. Machine learning and visualization are examples of techniques to uncover knowledge from vast clinical databases. My work focuses on testing and extending existing discovery methods to improve their performance on clinical databases. Important issues include training set size, data accuracy, data completeness, and representation (e.g., how to accommodate diagnostic data, which is nominal with many categories).

 

2.   Natural language processing. In most institutions, the vast majority of the richly detailed clinical information is stored as narrative text, which is not generally amenable to automated analysis. Natural language processing can parse the narrative text, converting it to a structured and coded format. At present, natural language processors can do an excellent job in domains such as radiology, which have fairly focused language. In broader domains such as admission notes, natural language processing can do very well if the problem is known ahead of time and the processor can be tailored to the task.

 

3.   Knowledge and data representation. With the advent of natural language processing and the improvement in the direct collection of structured data, we are overrun with complex coded information. Methods are needed to organize the information for visualization (so human beings can understand it) and analysis (so data mining tools can derive useful knowledge). It has been shown, for example, that the representation of the training set is more important to machine learning accuracy than the particular choice of learning algorithm.

 

4.   Evaluation methodology. The complexity of clinical data, the presence of inaccurate and missing values, and the large but heterogeneous collection of patients conspire to make it difficult to draw conclusions using traditional statistical methods. Bias that would not affect a traditional randomized trial can overwhelm the true effect in a retrospective study of the electronic medical record.

 

5.   Clinical demonstration. Demonstrating the usefulness of the above methods is critical to gather support and to focus new work in important areas. The methods can be applied to clinical research (largely hypothesis refinement) and clinical care (by generating timely advice and monitoring patient safety).

 

The above work is carried out within Columbia University’s Data Mining Group, which includes faculty and students from several departments.

 

In a separate area of research, I have focused on the use of new technology such as wireless networks and handheld computers to improve communications among health care participants. Examples include community health information networks, portable computers for providers, home monitoring, and wearable computers for patients.

 

A number of research projects are available to Columbia University students, including:

·         Assessing the suitability of new data mining techniques for clinical data

·         Issues in the use of machine learning training sets containing clinical data (accuracy, completeness, size)

·         Issues in the representation of clinical data for data mining (complexity, nesting, etc.)

·         Formal models of data accuracy

·         Issues in de-identifying and scrubbing patient data for clinical research.

·         Evaluation methodology

·         application of reliability theory to the Delphi technique and to binomial models

·         use of the bootstrap to assess variability (e.g., in critical incident technique)

·         analysis of clustered data

·         characterizing performance (ROC curve; Kappa and prevalence)

·         sample size analysis

·         Use of admit diagnosis to predict the patient state

·         Formal characterization of diagnostic uncertainty

·         Mapping clinical states to practice guidelines

·         Use of data mining in patient safety research (medical errors)

·         Linking of the clinical database to genome knowledge bases and databases

·         Use of data mining to assess the breadth of residency training

·         Use of data mining to study community acquired pneumonia

·         Enhancing communication through wearable computers for patients and providers

Teaching

·         G4003 Theory and Methods in Biomedical Informatics (lecturer 2005- )

·         G4060 Evaluation Methods in Medical Informatics (1997-2004)

·         Research Elective in Medical Informatics (1995-1998)

·         G4001 Introduction to computer applications in health care and biomedicine (formerly W4501) (1993-1995)

o        Online lecture notes (no longer maintained)

Service

I designed and manage WebCIS, the Web-based clinical information system for the Columbia University Medical Center and NewYork Presbyterian Hospital’s Columbia-Presbyterian campus. It is used by over 7000 health care providers to access and enter data for 2,500,000 patients and contains data collected since 1979.

 

·         WebCIS Web-based clinical information system

Further information

My publications

Pubmed search

Curriculum vitae

Contact:

George Hripcsak, MD, MS
622 West 168th Street, VC-5
New York, NY 10032

Email:

hripcsak@columbia.edu

Department of Biomedical Informatics, Columbia University