High costs, lack of speed, non-intuitive interfaces, and inefficient, fragmented display of patient information have hindered the adoption of the Electronic Health Record (EHR, EMR). Critical factors inhibiting adoption of the EMR include the time spent by the health care providers in accessing and also documenting patient information during clinical encounters. We describe an emerging visual analytics system that unifies all EMR information fragments, such as current symptoms, history of present illness, previous treatments, available data, current medications, past history, family history, and others into a single interactive visual framework. Based on this information the physician can then follow through a medical diagnostics chain that includes requests for further data, diagnosis, treatment, follow-up, and eventually a report of treatment outcome. As patients often have rather complex medical histories, we believe that this visualization and visual analytics framework can offer large benefits for the navigation and reasoning with this information.
This talk will outline some of the key challenges in analyzing large volumes of biomedical image data, using microscopy image analysis as an example. It will present potential approaches to address these challenges and will look at emerging computational architectures as well as software middleware support.
The Theoretical and Experimental Algorithmics Lab ( TEALab ) designs and implements space- and cache-efficient high-performing parallel algorithms for bioinformatics. I will first talk about our results on dynamic programming (DP). DP is extensively used in biosequence analysis, such as in protein homology search, gene structure prediction, motif search, analysis of repetitive genomic elements, RNA secondary structure prediction, interpretation of mass spectrometry data, etc. Standard looping implementations of DP have many advantages, but suffer in performance from poor temporal cache locality. We design multithreaded recursive divide-and-conquer algorithms that can take full advantage of the temporal locality inherent in DP recurrences without sacrificing the advantages of the looping code. The resulting divide-and-conquer implementations often run orders of magnitude faster than highly optimized parallel looping implementations. Next I will give you a glimpse of our results on the computation of energetics in protein structures. Energetics computation lies at the core of molecular dynamics and docking. We show the use of space- and cache-efficient octree data structures for maintaining neighborhood information instead of traditional non-bonded lists can lead to significant performance gains in such computations.
We investigate computational and mechanism design aspects of the pricing, managing, and allocating of medical resource in healthcare. In this talk I shall demonstrate our results on allocating medical treatments at hospitals of different costs to patients who each value these hospitals differently, when the insurer’s budget constraint has to be met and waiting times are used to ration access to over-demanded hospitals. I shall also discuss potential policy implications of our results, open problems, and follow-up directions.
Tremendous advances have been made in reducing the cost of DNA synthesis. We are entering an age of synthetic biology, where we can design and synthesize new life forms for scientific and medical applications.
Our gene design algorithms optimize the DNA sequence of a gene for particular desired properties while coding for a particular protein. For vaccine design, we optimize the codon-pair bias of a sequence to modulate expression. We have also developed sequence design algorithms to optimize RNA secondary structure and the autocorrelation of tRNA usage. Experiments with our sequence designs are improving understanding of the mechanisms behind transcription and translation.
Addressing dynamic streams of events in the real-time applications and the Semantic Web realm and development of tools for implementing streaming applications has recently become an important area of research. We developed ETALIS, an open source system for complex event and stream processing with two accompanied languages: the ETALIS rule-based language for event patterns and the EP-SPARQL query language, both based on a declarative semantics grounded in logic programming. Using ETALIS, we developed applications in transportation, sensor data streams of solar power plants, social networks, medical diagnosis, stock market applications and robotics.
Knowledge representation and reasoning (KRR) is a branch of Artificial Intelligent that develops techniques for capturing complex informational relationships in a way that enables drawing non-trivial conclusions from the available information. One can think of SQL databases (or of RDF+SPARQL) as primitive KRR formalisms. More advanced knowledge representation provides the means to represent knowledge via logical formulas and logical inference. For instance, the well-known ontology language OWL is more expressive than RDF in many ways, but is still a severely limited KRR language. One problem with OWL is the often criticized trade-off between the computational complexity and (the lack of) expressivity. However, a much more serious barrier to the adoption of advanced KRR is the fact that logical formulas are notoriously hard for humans to author without mistakes, which makes reasoning systems fragile. Our own work is in the area of rule-based KRR, which is a much more expressive and practically useful KRR paradigm than OWL. In rule-based KRR, knowledge is represented using rules, i.e., "if-then" statements. We developed several award-winning languages for representing rule-based knowledge at the high level that is more suitable for use by knowledge engineers. Much of this work has been incorporated into our popular Flora-2 system, which adopted by a number of research groups worldwide.
Applications include ontology management, integration of information, security policy analysis, financial regulations, process representation, healthcare, intelligent agents, knowledge-based networking, and more.
In this brief talk I will share some of my experiences building clinical informatics systems in collaboration with clinicians and health-care administrators in Stony Brook Medicine. It has been rewarding to take an idea from the drawing board all the way to its implementation and deployment and witness first hand its impact on the practice of heath care management and delivery. But on the flip side this journey has been far from smooth. Difficulties in getting access to patient data, dealing with a myriad of “siloed” data sources, and absence of any documentation are just a few of the pitfalls one has to face. By highlighting both the exciting opportunities in clinical informatics for computer scientists and the pitfalls, this talk hopes to encourage discussions on how to create a frictionless environment for R&D in clinical informatics in Stony Brook Medicine.
Multivariate analysis of functional Magnetic Resonance Imaging (fMRI) data is being widely used to acquire insights to brain function and dysfunction. In this talk I will present classification of brain function as a first step to answer basic questions on mental state and clinical evaluation of the subject. However fMRI data allows us to explore brain
function further and discover probabilistic networks that reveal the mechanisms underlying mental states or clinical conditions. Learning such networks would require prohibitive amounts of fMRI data to train general algorithms, hence we need to impose the appropriate prior constraints both on feature detection and on network structure. Apart from accuracy an efficiency to other important constraints pertain to the learning of functional networks: The stability of the learned network across subjects as well as the interpretability of the results. I will present a number of such constraints and discuss their applicability to a number of fMRI datasets and show results for populations of intense interest such as Autism and Drug Addiction patients.
The Department of Psychiatry and the Center for Understanding Biology through Imaging Technologies (CUBIT) has several ongoing studies with multimodal image acquisition and analysis. We are the data center for an multisite, multimodal study called EMBARC. We have data on multiple magnetic resonance imaging (MRI) modalities, electroencephalograms (EEG), behavioral data and clinical data. In addition to EMBARC, our lab has both ongoing imaging studies and a repository of neuroimages from completed studies. Ongoing imaging studies include multi-sequence MRI images of patients with multiple sclerosis (new study) as well as adolescents at risk of developing depression (>150 subjects imaged to date). Our current repository includes >1000 MRI images (containing structural, diffusion, and functional sequences) as well as Positron Emission Tomography (PET) images from a number of different tracers: ~300 [11C]DASB, serotonin transporter; ~300 [11C]WAY, serotonin 1A receptor; ~300 [18F]FDG, brain metabolism; ~200 [11C]PIB, beta-amyloid plaques; ~100 [11C]ABP, metabotropic glutamate receptor subtype 5; ~50 [11C]PE2I, dopamine transporter; ~50 [11C]CUMI, serotonin 1A receptor (agonist); ~15 [11C]Clorgiline, MAO-A; and ~10 [11C]Harmine, MAO-A. In all cases, thorough clinical and demographics data were collected. Our group is currently developing supervised and unsupervised techniques to extract the most meaningful combination of data (imaging and other) relating to clinical diagnosis or trajectory from these studies. This is accomplished using through computer vision and machine learning algorithms.
The human brain and the neuronal networks comprising it are of immense interest to the scienti fic community. In this work, we focus on the structural connectivity of human brains, investigating sex di fferences across male and female connectomes (brain-graphs) for the knowledge discovery problem Which brain regions exert di fferences in connectivity across the two sexes? . One of our main ndings discloses the statistical di fference at the pars orbitalis of the connectome between sexes, which has been shown to function in language production. Moreover, we use these discriminative regions for the related learning problem Can we classify a given human connectome to belong to one of the sexes just by analyzing its connectivity structure? We show that we can learn decision tree as well as support vector machine classifi cation models for this task. We show that our models achieve up to 79% prediction accuracy with only a handful of brain regions as discriminating factors. Importantly, our results are consistent across two data sets, collected at two di fferent centers, with two di fferent scanning sequences, and two different age groups (children and elderly). This is highly suggestive that we have discovered scientifi cally meaningful sex differences.
Language is a window into the mind. It is not just what we write, but it is how we write that reveals a lot about our personal traits, socio-cognitive identities, concealed intention such as deception, nuanced connotation, and even literary success. Understanding language is already hard enough, hence understanding minds through language may seem even harder. In this talk, I argue that data-driven analysis of writing style, i.e., how we write, can be surprisingly powerful in perceiving the cognitive context of writing beyond the explicit and literal content of text. What's more, computers can at times substantially outperform humans in perceiving people's minds (e.g., deception), despite the lack of full blown understanding of underlying semantics and human-like common sense knowledge. I will briefly highlight two most unconventional tasks in this problem space: deception detection in online review communities, and predicting the success of literary works.
This research leverages sonification to identify brain activity in drug addicts in order to understand the underlying neurobiological mechanisms of addiction. Using ambisonics we can create a three dimensional audio model of the brain and present fMRI data through spatialized sound. Brain-scan data is it inherently very noisy; we are more likely to identify patterns over time through the sense of hearing than through sight alone. Ongoing issues in this technique are the volume of data combined with the complexity of spacializing multiple sound-points.
Parkinson's Disease (PD) is a progressive nervous system disorder characterized by slow movement, rigidity, tremor, and postural instability. Because of degeneration of specific areas of the brain, individuals with PD have difficulty using internally driven cues to initiate and drive movement. However, many studies have demonstrated that persons with PD can achieve almost normal movement patterns when provided with external cues. In our current work we aim to develop a paradigm where spatiotemporal aspects of gait are analyzed through gesture-capture (using the Kinect sensor) and presented sonically, in such a way that that abnormalities can be clearly realized by both clinicians and patients. Our ultimate goal is to use this information in a biofeedback system so that individuals with PD can use external sound cues to self-correct impaired gait patterns.
Mental illness affects one-quarter of the US population each year and about 20 million people have serious mental illness (SMI). In addition to the psychological toll of their illness, those with SMI tend to have multiple physical problems. In fact, the average lifespan for someone with SMI is as much as 25 years less than the typical lifespan of the general population, with most of this decrease due to non-psychiatric illness. Medications can help in treating physical and mental disorders but can have side effects that worsen physical health. With complicated medicated regimens, there are many opportunities for drug interactions. Treatment adherence is another challenge. Data from electronic record systems provide a way to analyze prescribing trends across the organization as well as examining rates of adverse effects and emergence of physical disorders with particular medications. Analysis of potential drug interactions combined with analysis of medication decision support alert overrides can give insights into methods for improving decision support. At the individual patient level, analysis of drug regimens could guide development of personalized recommendations for optimizing treatment through minimizing the number of medications, eliminating high-risk medications where possible, reducing the number of times a day that pills are taken and decreasing the potential for adverse effects of medications. Such an approach would be expected to have many benefits including reducing costs (for patients and for the health care system), reducing physical complications of psychotropic medications, and enhancing the quality of life of those with mental illness.
Two Medicare data sets based on FOIA requests have recently been made available to the public. The first is a teaming graph which reveals which health care providers share Medicare patients in a 30 day window.
This named graph has nearly 1 million nodes and 50 million directed and weighted connections. The second data set is the medications prescribed by all Medicare providers that are filled at least 11 times. Both data sets pose computational challenges in applying algorithms to due to their large size. By sub-setting the larger data sets with the aid of an outside database meaningful conclusions can be made about health care delivery in Medicare.
Due to recent advances in high-throughput experimentation exponential the growth of biological data outpaces our ability to process it. We desperately need predictive models to convert all this raw data into quantitative understanding of how biological organisms function and evolve. I will briefly describe a few quantitative models of complex biomolecular networks and dynamical processes operating on those networks that were developed in my lab. I will then introduce Systems Biology Knowledgebase (KBase) project funded by the Department of Energy and co-lead by me together with 3 other co-PIs.
KBase (www.kbase.us) is an open source software and data environment designed to enable researchers to collaboratively generate, test and share new hypotheses about gene and protein functions; perform large-scale analyses on our scalable computing infrastructure; and model interactions in microbes, plants, and their communities.
Stony Brook University and Brookhaven National Laboratory joint ventures in computational and data intensive science.
Management and coordination of clinical research studies involving large numbers of participants and multiple clinical and resource centers require different, yet integrated strategies for study coordination and monitoring , participant tracking, data capture, report generation and data analysis at the participant, site and study wide levels. Current approaches can be “home built”, open source or commercially available systems, which each have advantages and disadvantages and emphasize different functions. Developing an integrated system that addresses the various study needs requires a multidisciplinary team with clinical, methodological, biostatistical and systems design and development expertise. The unique needs of Coordinating Centers are often not met in available systems. Preventive Medicine has served as a Coordinating Center for NIH funded multicenter international clinical research studies for the past 25 years. Our management systems have been developed in house or customized using a commercially available system to accommodate Coordinating Center activities. Currently a framework for study coordination and management for new projects at the Coordinating Center level does not exist at Stony Brook. This need provides an opportunity to create modules for a commercially available or open source system or to develop a new system for Stony Brook study coordination and management involving a faculty –student model.
In today’s research environment, investigators are faced with collecting research data from different sources including paper, electronic data capture, imaging, medical databases (e.g. EMRs) and digital monitoring systems. There is a myriad of data types that need to be captured in a timely manner to an electronic format. These data need to be stored, tracked and analyzed across data types, study participants, and study visits while ensuring completeness and accuracy. Data from many patient monitoring systems, e.g., blood pressure monitors, are manually entered into an EMR with continuous data often lost before being electronically stored for research purposes. Electronic data are often downloaded and stored as blobs in the research record in an unanalyzable format. Interfaces are needed to transform these data for integration into the EMR or a clinical research management system for data analysis. The opportunity exists to redefine how researchers capture data using more interactive methods to engage both the researcher and the participant in the data collection process.