• News

  • New Tradeoffs for Machine Learning Systems. The next generation of data systems need to make fundamentally new tradeoffs. For example, we proved that many statistical algorithms can be run in parallel without locks (Hogwild! or SCD) or with lower precision. This leads to a fascinating systems tradeoff between statistical and hardware efficiency. These ideas have been picked up by web and enterprise companies for everything from recommendation to deep learning. There are limits to the robustness of these algorithms, see our ICML 2016 best paper.

  • New Database Engines. We're thinking about how these new workloads change how one would build a database. We're building a new database, EmptyHeaded, that extends our theoretical work on optimal join processing. Multiway join algorithms are asymptotically and empirically faster than traditional database engines—by orders of magnitude. We're using it to unify database querying, graph patterns, linear algebra and inference, RDF processing, and more soon.
  • Our course material from CS145 intro databases is available (send a note), and we'll continue to update it. We're aware of a handful of courses that are using these materials. Drop us a note, if you do!
  • Recent/Upcoming keynotes and talks: EDBT17, UAI17, ABS East 2017, Cornell, Alibaba, CMU (SiValley), SystemX, KBCOM (WSDM), ITBB18.
  • Manuscripts.
  • Upcoming talks, keynotes, and meetings: March: MEMEX, SIMPLEX, EDBT, AI and the future of business, AAAI Spring, April: Computer Forum, May: SIGMOD, SystemX, SIOPT 2017 June: MongoDB World, STOC Theory Fest, NAS Kavli, August: UAI, DIMACS large-scale learning.
  • Nature Communications. Kun's paper about automated cancer prognosis is out! He shows that automated approaches can out perform human pathologists at lung cancer prognosis. Update: Kun wins the data parasite award for this work!
  • Database Theory
  • Machine Learning and Optimization
  • Elated that our group's work was honored by a MacArthur Foundation Fellowship. So excited for what's next!
  • Talks Feb: Mobilize Center, Distributed Inference at AAAI,
  • SIGMOD and PODS
  • VLDB15. Honored to receive the VLDB Early Career Award for scalable analytics. talk video.
  • ICML16
  • Upcoming Meetings and Talks
  • ICDT16. It’s all a matter of degree: Using degree information to optimize multiway joins by Manas Joglekar discusses one technique to use degree information to perform joins faster (asymptotically!).
  • SODA16. Weighted SGD for lp Regression with Randomized Preconditioning by Jiyan, Yin Lam Chow, Michael Mahoney, and me looks at some preconditioning methods to speed up SGD in theory and practice.
  • NIPS15. Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width by Chris De Sa explains a notion of width that allows one to bound mixing times for factor graphs.
  • NIPS15. Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms by Chris De Sa et al. derives results for low precision and non-convex Hogwild! (asynchronous) style algorithms.
  • VLDB15. Incremental Knowledge Base Construction Using DeepDive is our latest description of DeepDive.
  • New. Increasing the parallelism in multi-round MapReduce join plans. Semih Salihoglu, Manas Joglekar, and crew show that you can recover classical results about parallelizing acyclic queries using only Yannakakis's algorithm and our recent algorithms for generalized fractional hypertree decompositions for joins.
  • ICDE. Finland!
  • USC ML. Jan 26.
  • Berkeley. Feb 3.
  • Michigan. Mar 11.
  • ICDT. Mar 15-21.
  • Strata. Mar 28-31.
  • SIMPLEX. April 4-7.
  • Dagstuhl. Foundations of Databases. April 10-15. Slides for EDBT/ICDT keynote on Joins and Convex Geometry Frank explains our joins work very nicely
  • Code
  • Application Overview Videos (See our YouTube channel, HazyResearch)