My colleague, Michael Kuzyk, and I will be giving the following presentation at the Bio-IT World Conference and Expo in Boston, MA on April 13, 2011.
Selecting a LIMS for the Next-Generation Genomics Lab
In May 1961, President John F. Kennedy declared the goal of landing a man on the moon before the end of the decade. The Space Race fueled advances in microelectronics that, as predicted by Moore’s Law, have driven an exponential rate of improvement in data processing speed and cost for half a century.[i]
In 1990, The Human Genome Project ignited a similar technological race in gene sequencing. Today, high-throughput sequencing technologies are driving data generation and cost improvements at a rate even greater than Moore’s Law.[ii] It took more than a decade and an estimated cost of $3 billion to publish the first draft of the human genome in 2000. Just over 10 years later, businesses are competing to sequence an individual’s entire genome in a matter of weeks for about $10,000, and experts claim the thousand-dollar genome is within sight.[iii]
The incredible improvements in speed and cost are driving large-scale adoption of next-generation sequencing (NGS) instruments by genomics labs—from 200 in 2007 to an estimated 1,900 in 2010, with a projected growth rate of over 1,000 annually reaching an installed base of over 5,000 by 2013.[iv] This has resulted in unprecedented genomic data production. The 1000 Genomes Project, the first large project to capitalize on next-generation sequencing technologies, deposited twice as much raw sequencing data into the GenBank archives in its first six months of operation as had been deposited into GenBank in the 30 years since its inception.[v]
Organizations are confronting the reality that new sequencing instruments running at capacity can generate more information in a single year than the total deposited in GenBank at the beginning of 2008.[vi] In turn, industry analysis that once focused on the costs of sequencing genome data now focus on the challenges of managing it. In a J.P. Morgan report conducted in 2010, lab directors cited data storage, data management, and informatics as the biggest collective hurdle to expanding next-generation sequencing operations.[vii]
To meet the data management challenge, genomics labs are revamping the workflows that support sequencing. Many of the traditional workflows are based on manual, one-at-a-time processes and information stored in disconnected silos such as spreadsheets, emails, document-based communications, and paper lab notebooks. Others are supported by custom, home-brew software or modified open-source software that was developed to handle the low throughput of first-generation sequencing, microarray and qPCR instruments. Both approaches are inadequate for the high-throughput, next-generation genomics lab.
Laboratory Information Management Systems (LIMS), commercially introduced in the early 1980s, are a mature class of software for managing data in the analytical laboratory and organizing it into meaningful information. LIMS have proven their effectiveness across multiple industries including pharmaceuticals, utilities, chemicals, food and beverage, oil and gas, and agriculture. The total LIMS market is presently about $400 million,[viii] and about 17% of LIMS applications are in life sciences.[ix]
A full-featured LIMS will manage laboratory data from sample log-in to reporting the results. However, the unprecedented throughput, experimental complexity, and rapid change associated with next-generation sequencing create unique challenges for a LIMS. The rapid timescales and expanded workflows associated with next-generation sequencing require LIMS that can be configured quickly and easily to accommodate the specific NGS instrumentation chosen by a lab. Scientific programmers and bioinformaticians in genomics labs must be able to easily adapt the system themselves to support changing technologies and protocols. And next-generation sequencing research requires iterative, collaborative work that is performed by different types of scientists.
[i] Moore's law describes a long-term trend in the history of computing hardware. The quantity of transistors that can be placed inexpensively on an integrated circuit has doubled approximately every two years. The trend has continued for more than half a century and is not expected to stop until 2015 or 2020 or later. The capabilities of many digital electronic devices are strongly linked to Moore's law, including processing speed and memory capacity, and are improving at (roughly) exponential rates. http://en.wikipedia.org/wiki/Moore's_law
[ii] Stein, L. D. The case for cloud computing in genome informatics. Genome Biology 2010, 11, 207.
[iii] Batley, J.; Edwards, D. Genome sequences data: Management, storage, and visualization. BioTechniques 2009, 46, 333-336.
[iv] Third Quarter 2010 Earnings Conference Call, Caliper Life Sciences, October 2010, 16.
[v] Stein, L. D. The case for cloud computing in genome informatics. Genome Biology 2010, 11, 207.
[vi] Holt, R. and Jones, S. The new paradigm of flow cell sequencing. Genome Research 2010, 18 (6).
[vii] J.P Morgan Equity Research. Next-generation Sequencing Survey 2010.
[viii] ARC Advisory Group. Laboratory Information Management Systems Outlook: Five Year Market Analysis and Technology Forecast through 2013, (2009).
[ix] 2009 Worldwide Survey of LIMS Users: Understanding Market Trends and End-User Attitudes. Published by Strategic Directions International, Los Angeles (2009).