In an increasing number of scientific domains, advanced sensor technologies and complex simulations result in collections of large low-dimensional spatio-temporal datasets. Data elements in these datasets are defined at points in a 2D or 3D coordinate system and over time. To fully exploit the potential of spatio-temporal sensor datasets, researchers need the ability to synthesize information at interactive-rates as well as generate data products in batch mode, explore and classify features computed from different datasets, and assemble a rich view of the phenomenon being studied. The vast sizes of the datasets and the processing requirements of analyses necessitate new capabilities, techniques, and runtime optimizations to efficiently manage, process, and query information from both data at rest (data stored in the system) and data in transit from sensors. The primary objective of BMI research in this area is to develop and evaluate a suite of novel data and processing abstractions and optimizations within an integrated framework to enable analysis of extremely large low-dimensional spatio-temporal data for scientific, biomedical, and clinical research.
The methods and software systems developed in this research will support novel data representations and runtime optimizations to be able to 1) ingest and manage large volume of diverse data sets, 2) stage datasets using resources in the data path, such as clusters and GPU accelerators, and 3) rapildy process datasets using a repertoire of analysis operations. The research will also investigate the interplay between spatio-temporal data analysis applications, middleware software, and hardware configurations, targeting high end parallel machines/clusters containing hybrid multicore CPU-GPU nodes; extreme scale machines which consist of hundreds of thousands of CPU cores; and machines with deep memory and storage hierarchies linked to machines that acquire and aggregate data.
People
Joel Saltz
Tahsin Kurc
Wei Zhu
Allen Tannenbaum
I.V. Ramakrishnan
Erich Bremer
Janos Hajagos
Projects
Informatics for Integrative Brain Tumor Whole Slide Analysis - funded by the National Library of Medicine, this project develops, deploys, and evaluates methodologies, information models, tools, and analytic pipelines that will make it feasible to systematically carry out large-scale comparative analyses of brain tumor histological features using whole slide images and of patterns of protein and gene expression. The research and development effort involves (1) highly optimized algorithms and analytic pipelines which enable investigators to carry out large-scale comparative analyses of brain tumor histological features using whole slide images and of patterns of protein and gene expression, (2) flexible information models to manage information associated with analysis of brain tumor whole virtual slide data, (3) runtime systems that take advantage of high performance computing platforms to scale image analyses to large datasets. The methods and tools will be used to determine the relationship between image-based tumor signatures and clinical outcome, gene expression category, genetic gains/losses and methylation status and map the activity of signal transduction pathways and transcriptional networks relative to the tumor microenvironment using quantitative multiplex quantum dot immunohistochemistry and histology feature descriptions. This project is a collaborative effort between Stony Brook University (Joel Saltz, Tahsin Kurc), Emory University (Daniel Brat, Lee Cooper, David Gutman, Fusheng Wang, Jun Kong, Roberd Bostick, Carlos Moreno), and Rutgers Cancer Institute of New Jersey (David J. Foran).
Image Mining for Comparative Analysis of Expression Patterns in Tissue Microarray - funded by the National Institutes of Health, this project designs, implements, and evaluates (1) a new family of multi-stage, searching algorithms to facilitate quick, reliable interrogation of large-scale, clinical and research, microscopy applications including whole-slide imaging and tissue microarray; (2) high-throughput services capable of automatically detecting, archiving and indexing user-specified objects (e.g. tissues, cells) in large collections of images and implement extensions to the data models and support for optimized pipeline selection, (3) optimized imaging, computational and content-based image retrieval algorithms and tools using a wide range of different tissues, cancer types and biomarkers to support clinical and research experiments and studies involving patient stratification, quality-control, and outcomes assessment. These capabilities will be deployed as analytical tools, data models, user-centered interfaces and reference libraries of imaged specimens to make them available to the clinical and research communities to support future development and testing of new hypotheses, algorithms and methods. This is a collaborative project between Rutgers Cancer Institute of New Jersey (David J. Foran), Stony Brook University (Joel Saltz, Tahsin Kurc), and University of Kentucky (Lin Yang).
High Throughput Analysis of Whole Slide Tissue Images on Hybrid CPU-GPU Systems - This project investigates data structures, data and programming abstractions, and runtime middleware to enable processing of large numbers of microscopy images on emerging hybrid computational clusters equipped with multi-core CPUs and co-processors (GPUs, MICs, etc). It develops (1) runtime task scheduling techniques to map pipelines of image processing operations on collections of CPU cores and co-processors by taking into account performance variability among operations on different co-processor systems, (2) data abstractions to hide complexities of managing and staging common data types to minimize data management overheads in large scale analyses, (3) optimized implementations of common data and computation patterns on co-processor architectures, and (4) high performance I/O capabilities using scalable I/O systems such as ADIOS to reduce I/O overheads on large scale computing clusters. This is a joint effort between Stony Brook University (Joel Saltz, Tahsin Kurc), University of Brasilia (George Teodoro), Emory University (Tony Pan), and the Oak Ridge National Laboratory (Scott Klasky).
Consensus Clustering on a Shared-Memory System for Image Analysis - supported in part by grants from the National Library of Medicine and the University of Tennessee’s Center for Remote Data Analysis and Visualization, funded by the National Science Foundation, this project has investigated methods for efficient implementation of a consensus clustering method on large scale shared-memory systems. The objective is to enable robust clustering of large numbers of nuclei and cells segmented in a set of images -- the number of nuclei and cells in a dataset with hundereds of images can reach millions, requiring large memory space and computing power for computing power. The project has employed existing parallel k-means clustering algorithms and developed shared-memory implementations of the consensus matrix construction and clustering steps. This is a joint effort between Stony Brook University (Tahsin Kurc, Joel Saltz), Emory University (Lee Cooper, Michael Nalisnik), and the Oak Ridge National Laboratory (Scott Klasky).
Related Publications
- G. Teodoro, T. Pan, T. Kurc, J. Kong, L. Cooper, and J. Saltz: Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines, Parallel Computing, 39(4-5), 189-211, 2013. [paper]
- J. Saltz, G. Teodoro, T. Pan, L. Cooper, J. Kong, S. Klasky, T. Kurc: Feature-based analysis of large-scale spatio-temporal sensor data on hybrid architectures, International Journal of High Performance Computing Applications, 27(3), pp. 263-272, 2013. [paper]
- F. Wang, J. Kong, J. Gao, L. Cooper, T. Kurc, Z. Zhou, D. Adler, C. Vergara-Niedermayr, B. Katigbak, D. Brat, J. Saltz: A high-performance spatial database based approach for pathology imaging algorithm evaluation, Journal of pathology informatics, 4, 2013. [paper]
- F. Wang, J. Kong, L. Cooper, T. Pan, T. Kurc, W. Chen, A. Sharma, C. Niedermayr, T.W. Oh, D. Brat, A.B. Farris, D.J. Foran, J. Saltz: A data model and database for high-resolution pathology analytical image informatics. J Pathol Inform 2 (2011) 32. [paper]
- D Foran, L Yang, W Chen, J Hu, L Goodell, M Reiss, F Wang, T Kurc, T Pan, A Sharma, J Saltz. ImageMiner: A Software System for Comparative Analysis of Tissue Microarrays Using Content-Based Image Retrieval, High-Performance Computing, and Grid Technology. Journal of the American Medical Informatics Association. May 23, 2011. 18:352-353. PMID: 21606133. [paper]
- V.S. Kumar, T. Kurc, V. Ratnakar, J. Kim, G. Mehta, K. Vahi, Y.L. Nelson, P. Sadayappan, E. Deelman, Y. Gil, M. Hall and J. Saltz: Parameterized Specification, Configuration and Execution of Data-Intensive Scientific Workflows. Cluster Computing: the Journal of Networks, Software Tools and Applications, Special Issue on High Performance Distributed Computing, Vol. 13(3), pp. 315-333, 2010. [paper]
- T. Kurc, S. Hastings, V.S. Kumar, S. Langella, A. Sharma, T. Pan, S. Oster, D. Ervin, J. Permar, S. Narayanan, Y. Gil, E. Deelman, M. Hall and J. Saltz: HPC and Grid Computing for Integrative Biomedical Research. International Journal of High Performance Computing Applications, Special Issue, the Workshop on Clusters and Computational Grids for Scientific Computing, Vol. 23(3), pp. 252-264, 2009. [paper]
- N. Vydyanathan, S. Krishnamoorthy, G.M. Sabin, U.V. Catalyurek, T. Kurc, P. Sadayappan, J. Saltz: An Integrated Approach to Locality-Conscious Processor Allocation and Scheduling of Mixed-Parallel Applications. IEEE Trans. Parallel Distrib. Syst., Vol. 20(8), pp. 1158-1172, 2009. [paper]
- V. S. Kumar, S. Narayanan, T. Kurc, J. Kong, M. N. Gurcan, J. H. Saltz, ”Analysis and Semantic Querying in Large Biomedical Image Datasets”, IEEE Computer Magazine, special issue on Data-Intensive Computing, Vol. 41(4), pp. 52-59, 2008. [paper]
- V. S. Kumar, B. Rutt, T. Kurc, U. V. Catalyurek, T. C. Pan, S. Chow, S. Lamont, M. Martone, J. H. Saltz, ”Large-scale Biomedical Image Analysis in Grid Environments”, IEEE Transactions on Information Technology in Biomedicine, Vol. 12(2), pp. 154-161, 2008. [paper]
- G. Teodoro, T. Tavares, R. Ferreira, T. Kurc, W. Meira Jr., D. O. Guedes, T. Pan, J. H. Saltz, ”A Run-time System for Efficient Execution of Scientific Workflows on Distributed Environments”, International Journal of Parallel Programming, Vol. 36(2), pp. 250-266, 2008. [paper]
- G. Teodoro, T. Pan, T. Kurc, J. Kong, L. A. Cooper, N. Podhorszki, S. Klasky, J. Saltz, "High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms," in the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Boston, Massachusetts, USA. May 20-24, 2013. [paper]
- P. Widener, T. Kurc, W. Chen, F. Wang, L. Yang, J. Hu, V. Kumar, V. Chu, L. Cooper, J. Kong, A. Sharma, T. Pan, J. Saltz, and D. Foran: High Performance Computing Techniques for Scaling Image Analysis Workflows. Lecture Notes in Computer Science, Applied Parallel and Scientific Computing (PARA 10), Springer, p67-77, 2012. [paper]
- G. Teodoro, T. Kurc, T. Pan, L. Cooper, J. Kong, P. Widener, and J. Saltz: Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems. The 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2012. [paper]
- P. Widener, T. Kurc, W. Chen, F. Wang, L. Yang, J. Hu, V. Kumar, V. Chu, L. Cooper, J. Kong, A. Sharma, T. Pan, J. Saltz, D. Foran: Grid-Enabled, High-performance Microscopy Image Analysis. The 2nd International Workshop on High-Performance Medical Image Computing for Image-Assisted Clinical Intervention and Decision-Making (HP-MICCAI 2010) Beijing, China, September 2010.
- F. Wang, T. Kurc, P. Widener, T. Pan, J. Kong, L. Cooper, D. Gutman, A. Sharma, S. Cholleti, V. Kumar and J. Saltz: High-performance Systems for In Silico Microscopy Imaging Studies. The 7th International Conference on Data Integration in the Life Sciences, Gothenburg, Sweden, August 2010. [paper]
- V.S. Kumar, T. Kurc, G. Mehta, K. Vahi, V. Ratnakar, J. Kim, E. Deelman, Y. Gil, P. Sadayappan, M. Hall and J. Saltz, ”An Integrated Framework for Parameter-based Optimization of Scientific Workflows”, Proceedings of the ACM International Symposium on High Performance Distributed Computing (HPDC), June 2009. [paper]
- V.S. Kumar, T. Kurc, J. Saltz, G. Abdulla, S. Kohn, and C. Matarazzo, Architectural Implications for Spatial Object Association Algorithms, the 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS 09), Rome, Italy, May, 2009. [paper]
- S. Narayanan, U. Catalyurek, T. Kurc, and J. Saltz: Parallel Materialization of Large ABoxes. The 24th Annual ACM Symposium on Applied Computing (SAC 2009), Hawaii, USA, March, 2009. [paper]
- G. Khanna, U. Catalyurek, T. Kurc, R. Kettimuthu, P. Sadayappan, I. Foster, and J. Saltz, ”Using Overlays For Efficient Data Transfer Over Shared Wide-Area Networks”, Proceedings of SC2008 High Performance Computing, Networking, and Storage Conference, Nov 2008. [paper]
- G. Khanna, U. V. Catalyurek, T. Kurc, R. Kettimuthu, P. Sadayappan, J. H. Saltz, ”A Dynamic Scheduling Approach for Coordinated Wide-Area Data Transfers using GridFTP”, The 22nd IEEE International Parallel & Distributed Processing Symposium (IPDPS’08), April, 2008. [paper]
- V. Kumar, T. Kurc, J. Kong, U. Catalyurek, M. Gurcan, J. Saltz, Performance vs. Accuracy Trade-offs for Large-scale Image Analysis Applications, Cluster 2007, 2007. [paper]