Ph.D. Graduate, June 2011 (thesis)|
Stanford University, Department of Computer Science
Advisor: Vijay Pande (Chemistry)
Advisor: Daphne Koller (Computer Science)
|ihaque (at) cs [dot] stanford [dot] edu|
I graduated in June 2011 with a Ph.D in Computer Science from Stanford. My thesis was titled Accelerating Chemical Similarity Search Using GPUs and Metric Embeddings. I am currently a scientist at a biotech startup in the San Francisco Bay Area.
At Stanford I worked in the Folding@Home research group, and was primarily advised by Prof. Vijay Pande. I was co-advised in the CS department by Professor Daphne Koller. My primary research focus was in computational drug design, but I also have interests in data visualization, computational biology, and distributed systems.
Before coming to Stanford, I graduated from the University of California, Berkeley, with a degree in Electrical Engineering and Computer Science (Go Bears!). I did undergraduate research with Professors Kathy Yelick, Bora Nikolic, and John Wawrzynek. I was also a member and officer for several semesters at the Berkeley Mu Chapter of Eta Kappa Nu.
Even further back, I graduated from Bellarmine College Preparatory in San Jose (Go Bells!). I doubt any high school students will care to read this page, but if you do, I strongly encourage you to do (as I did), speech and debate. Without a doubt, the skills I gained there have been extremely useful to me.
- Beauchamp KA, Bowman GR, Lane TJ, Maibaum L, Haque IS, and Pande VS. MSMBuilder2: Modeling Conformational Dynamics at the Picosecond to Millisecond Scale. Journal of Chemical Theory and Computation 2011. Published online ahead of print. paper link paper PDF software
- Pronk S, Larsson P, Pouya I, Bowman G, Haque I, Beauchamp K, Hess B, Pande V, Kasson P, and Lindahl E. Copernicus: A New Paradigm for Parallel Adaptive Molecular Dynamics. Accepted to Supercomputing 2011.
- Haque IS, Pande VS, and Walters WP. Anatomy of High-Performance 2D Similarity Calculations. Journal of Chemical Information and Modeling 2011, Published online ahead of print. paper link paper PDF Supplemental Information
- Haque IS and Pande VS. Error Bounds on the SCISSORS Approximation Method. Journal of Chemical Information and Modeling 2011, Published online ahead of print. paper link paper PDF Supplemental Information
- Haque IS. Accelerating Chemical Similarity Search Using GPUs and Metric Embeddings. PhD thesis, Stanford University, Stanford, CA USA. 2011. PDF download
- Haque IS and Pande VS. Large-Scale Chemical Informatics on GPUs. In GPU Computing Gems, Emerald Edition, Wen-Mei Hwu, Ed. Burlington, MA: Morgan Kaufmann. 2011 DOI link PDF
- Haque IS and Pande VS. SCISSORS: A Linear-Algebraical Technique to Rapidly Approximate Chemical Similarities. Journal of Chemical Information and Modeling 2010, 50, 1075-1088. paper link paper PDF
- Haque IS and Pande VS. Hard Data on Soft Errors - A Large-Scale Assessment of Real-World Error Rates in GPGPU. In Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud, and Grid Computing (CCGrid 2010). PDF talk Supplemental Information code
- Haque IS, Pande VS, and Walters WP. SIML: A Fast SIMD Algorithm for Calculating LINGO Chemical Similarities on GPUs and CPUs. Journal of Chemical Information and Modeling 2010, 50(4), pp 560-564. paper link paper PDF code
- Ponder JW, Wu C, Ren P, Pande VS, Chodera JD, Schnieders MJ, Haque I, Mobley DL, Lambrecht DS, DiStasio RA Jr, Head-Gordon M, Clark GN, Johnson ME, Head-Gordon T. Current Status of the AMOEBA Polarizable Force Field. Journal of Physical Chemistry B 114(8), 2549-64 (2010). paper link paper PDF
- Haque IS and Pande VS. PAPER -- Accelerating Parallel Evaluations of ROCS. Journal of Computational Chemistry 31(1), 117-132 (2010). paper link paper PDF code
- Pitera J, Haque I, and Swope W. Absence of reptation in the high-temperature folding of the trpzip2 beta-hairpin peptide. Journal of Chemical Physics 124, 141102 (2006). link PDF
In silico cheminformatic prediction of toxicity
Predicting activity and toxicity of prospective drugs in silico... [show details]
Predicting activity and toxicity of prospective drugs in silico is the major goal of computational drug discovery. Current state-of-the-art experimental techniques for drug development typically include a "high-throughput screening" (HTS) step in which (hundreds of) thousands of compounds are simultaneously tested for activity against a desired target. These experimental screens are labor-intensive, expensive, and time-consuming. I am interested in accurate computational approaches to improve this procedure. In collaboration with my advisors, I am using machine learning techniques and relative descriptors of molecules in order to predict biological activity of lead compounds for drug discovery.[hide details]
GPUs originated in error-tolerant graphics applications, but are now used for error-intolerant scientific computing... [show details]
GPUs originated in error-tolerant graphics applications, but are now used for error-intolerant scientific computing. In particular, current generation GPUs do not have error protection (parity or ECC) on their memory subsystems. To investigate the impact of this design, we wrote a custom test code, MemtestG80, and ran it on over 50,000 GPUs on the Folding@home distributed computing network.
Our control experiments on consumer-grade and dedicated-GPGPU hardware in a controlled environment found no errors. However, our survey over cards on Folding@home found that, in their installed environments, two-thirds of tested GPUs exhibit a detectable, pattern-sensitive rate of memory soft errors. We demonstrate that these errors persist after controlling for overclocking and environmental proxies for temperature, but depend strongly on board architecture.
Haque IS and Pande VS. Hard Data on Soft Errors - A Large-Scale Assessment of Real-World Error Rates in GPGPU. Accepted to Resilience 2010: 3rd Workshop on Resiliency in High Performance Computing (held in conjunction with CCGrid 2010). Preprint PDF Supplemental Information
Methods for virtual high-throughput screening
Biophysical methods to computationally estimate binding affinity and compound activity... [show details]
Biophysical methods to computationally estimate binding affinity and compound activity, in theory, can make recommendations on promising compounds acting on previously-uncharacterized targets. The availability of structural information for relevant drug targets, combined with data about the interaction networks present within biological organisms, may make it possible to specifically design chemical agents with higher potency and lower toxicity. These methods are applicable not only to the design of pharmaceuticals, but also to the design of agents to interact with specific cellular systems for research work in chemical biology. I am particularly interested in combinations of docking and free energy approaches.
Docking techniques (also known as virtual high-throughput screening, or vHTS) trade accuracy for speed, with the goal of being an in silico alternative to wet-bench based HTS methods. In collaboration with Dr. Kim Branson, I am interested in improving the accuracy of vHTS techniques, primarily through improved scoring techniques. Free-energy methods, by contrast, tend to be slow. However, they are usually more accurate at predicting the Gibbs free energy of an interaction, which is a physical parameter that determines the interaction affinity between a chemical agent and its target (colloquially speaking, how strongly the two "stick to" one another), which is a critical determinant of the potency of a particular chemical. I am interested in improving the accuracy and performance of free-energy techniques to make them more applicable to drug design.
Imran Haque, John D. Chodera, Michael R. Shirts, David L. Mobley, Vijay S. Pande. Toward Quantitative Prediction of Binding Affinities to JNK3 by Alchemical Free Energy Methods. Poster presented at the CUP IX conference, Santa Fe, NM, 17 Mar 2008.[hide details]
PAPER is a GPU-accelerated implementation of Gaussian molecular shape overlay (the algorithm in OpenEye ROCS)... [show details]
PAPER is a GPU-accelerated implementation of Gaussian molecular shape overlay (the algorithm in OpenEye ROCS) running on NVIDIA graphics cards. We have demonstrated multiple-order-of-magnitude speedups relative to a CPU-based implementation of the same algorithm, and 5x speedup relative to OpenEye ROCS even on low-end graphics hardware (an NVIDIA 8600GT).
PAPER source code (GPL-licensed) is available at http://simtk.org/home/paper.[hide details]
Poor organization and expensive software should not restrict the public's access to public data.... [show details]
Poor organization and expensive software should not restrict the public's access to public data. gCensus and gCensus-GT are my effort to make geographic data freely and easily accessible to the public, without the need for expensive GIS software, by leveraging Google's excellent free mapping program Google Earth.
Online since late 2006, gCensus exposes the entire 2000 US Census Summary Files 1 and 3. It is widely used, with over 7,500 unique maps generated in 2008. It has received extensive press coverage on(among others) ExtremeTech, Digg, Slashdot, and the San Jose Mercury News. gCensus can be found at http://gecensus.stanford.edu
gCensus-GT solves a parallel problem. While Google Earth Pro lets you load geotagged GeoTIFF images into Google Earth, for a fee, the conversion to KML is in fact a very simple process. gCensus-GT converts from GeoTIFF (and a variety of other geotagged formats) to KML/KMZ for free.[hide details]
Protein folding mechanics
The mechanism by which proteins fold into their native shapes is an open problem in biophysics... [show details]
The mechanism by which proteins fold into their native shapes is an open problem in biophysics. In work I performed with Drs. Jed Pitera and Bill Swope (of IBM Almaden Research Center), I investigated the mechanisms of beta-hairpin rearrangement using molecular dynamics simulations of a model peptide, trpzip2.[hide details]
Computational analysis of genome regulation
Genome sequences alone do not tell us how genes are expressed in vivo... [show details]
Genome sequences alone do not tell us how genes are expressed in vivo, but computational analysis of gene expression levels can offer insight into the higher-level organization controlling cell biology. This research area (also known as computational systems biology) seeks to determine the structure of the systems which regulate the activity levels of genes and their products in order to produce biological function. Further understanding of this field would have effects not only on our understanding of biology, but also on medicine and pharmacology (by granting better understanding of the mechanisms of disease) and on synthetic biology (through a better understanding of the "architecture" behind existing biological systems).
In collaboration with fellow students and Professor Daphne Koller, I have worked on a machine learning-based model seeking to explain the relevance of DNA copy-number variation to the regulatory network and phenotype of cancer cells.
Brad Gulko, Imran Haque, Sharareh Noorbaloochi, and Keyan Salari. Role of DNA Copy Number Alterations in the trans-Regulatory Network of Cancer Cells. Poster presented at National Cancer Institute Integrative Cancer Biology Program meeting at Stanford, 13 Feb 2007.[hide details]
Architecture and Implementation of LDPC Codecs
Low-density parity check codes closely approach the Shannon limit, but their maximum-likelihood decoding is NP-hard... [show details]
Low-density parity check codes closely approach the Shannon limit, but their maximum-likelihood decoding is NP-hard. With Professor Bora Nikolic and Zhengya Zhang at Berkeley, I worked on hardware architectures for efficient iterative decoding of LDPC codes, as well as algorithms for hardware real-time analysis of our noise simulation.
Acknowledged in Z. Zhang, L. Dolecek, B. Nikolic, V. Anantharam, M. J. Wainwright, Investigation of error floors of structured low-density parity-check codes by hardware emulation. Proceedings of IEEE Global Communications Conference (GLOBECOM), San Francisco CA, November 2006. (Best Paper Award Finalist). link[hide details]
- Thesis Defense: Accelerating Chemical Similarity Search with GPUs and Metric Embeddings. Ph.D. thesis defense, Department of Computer Science, Stanford University. Stanford, CA, 11 Apr 2011. PDF
- Keynote Talk: Folding@Everywhere: Computational Biochemistry in the New Era of HPC. Presented at Hyperience 2010: 5th National Informatics Congress. Urk, Netherlands. 24 Nov 2010. PDF
- Hard Data on Soft Errors: A Global-Scale Assessment of GPGPU Memory Soft Error Rates (Updated). Presented at Resilience Summit at Los Alamos Computer Science Symposium. Santa Fe, NM. 13 Oct 2010. PDF
- Hybrid Vigor: Using Heterogeneous HPC to Accelerate Chemical Biology. Presented at Bio-Molecular Simulations on Future Computing Architectures (Oak Ridge National Laboratory). Oak Ridge, TN. 17 Sep 2010. PDF
- Hard Data on Soft Errors: A Global-Scale Assessment of GPGPU Memory Soft Error Rates. Presented at 3rd Workshop on Resiliency in High Performance Computing in Clusters, Clouds, and Grids (Resilience 2010) @ CCGrid 2010. Melbourne, Australia, 17 May 2010. PDF
- Bigger, Longer, and Uncut: Chemical Informatics at Scale. Presented at CSIRO Molecular Health and Technologies. Melbourne, Australia. 13 May 2010. PDF
- LINGOs, GPUs, and Monitoring Vertex. Presented at OpenEye CUP XI conference, Santa Fe, NM, 10 Mar 2010. PDF
- Do GPUs really need ECC? A global-scale assessment of GPU Memory Soft Error Rates. Presented at NVIDIA Corporation, Santa Clara, CA, 2 Dec 2009. PDF
- Of Jacquard Looms and Jaccard Coefficients: multithreading biomolecular simulations in a GPU world. Presented at NSF-NAIS Workshop on Intelligent Software, Edinburgh, UK, 19-21 Oct 2009. PDF
- Cheminformatics at Scale - Bigger, Longer, and Uncut Presented at OpenEye EuroCUP IV conference, Bergisch Gladbach, Germany, 28 Apr 2010. PDF
- Lies, Damned Lies, and AUC Confidence Intervals Presented at OpenEye EuroCUP IV conference, Bergisch Gladbach, Germany, 28 Apr 2010. PDF
- GPUs: TeraFLOPs or TeraFLAWED? Presented at SC09: 2009 ACM/IEEE Conference on Supercomputing, Portland, OR, 17 Nov 2009. PDF
- Rochambeau: Playing Games with ROCS Presented at OpenEye CUP X Conference, Santa Fe, NM, 9 Mar 2009. PDF
- Biomedical Informatics 214/CS 274 - Representations and Algorithms for Computational Molecular Biology. Course Assistant (instructor: Russ Altman), Spring 2010.
- CS 148 - Introduction to Computer Graphics and Imaging. Course Assistant (instructor: Pat Hanrahan), Fall 2010.
- CS 109 - Introduction to Probability for Computer Scientists. Course Assistant (instructor: Mehran Sahami), Winter 2011.
- Biochemistry 224 - Cell Biology of Physiological Processes (audited)
- Biochemistry 230 - Molecular Interventions in Human Disease
- Bioengineering 331 - Protein Engineering
- CS 148 - Introduction to Computer Graphics
- CS 229 - Machine Learning
- CS 279 - Computational Analysis and Reconstruction of Biological Networks
- ME 334 - Statistical Mechanics
- Structural Biology 241 - Biological Macromolecules