-
Molecular Systems Biology Jul 2016Technological advances in genomics and imaging have led to an explosion of molecular and cellular profiling data from large numbers of samples. This rapid increase in... (Review)
Review
Technological advances in genomics and imaging have led to an explosion of molecular and cellular profiling data from large numbers of samples. This rapid increase in biological data dimension and acquisition rate is challenging conventional analysis strategies. Modern machine learning methods, such as deep learning, promise to leverage very large data sets for finding hidden structure within them, and for making accurate predictions. In this review, we discuss applications of this new breed of analysis approaches in regulatory genomics and cellular imaging. We provide background of what deep learning is, and the settings in which it can be successfully applied to derive biological insights. In addition to presenting specific applications and providing tips for practical use, we also highlight possible pitfalls and limitations to guide computational biologists when and how to make the most use of this new technology.
Topics: Computational Biology; Genomics; Humans; Machine Learning; Models, Genetic
PubMed: 27474269
DOI: 10.15252/msb.20156651 -
Genome Biology Feb 2021
Topics: Cloud Computing; Computational Biology; Genomics; Humans; Reproducibility of Results; Software
PubMed: 33608018
DOI: 10.1186/s13059-021-02299-x -
GigaScience Aug 2018With the rapid development of next-generation sequencing technology, ever-increasing quantities of genomic data pose a tremendous challenge to data processing.... (Review)
Review
With the rapid development of next-generation sequencing technology, ever-increasing quantities of genomic data pose a tremendous challenge to data processing. Therefore, there is an urgent need for highly scalable and powerful computational systems. Among the state-of-the-art parallel computing platforms, Apache Spark is a fast, general-purpose, in-memory, iterative computing framework for large-scale data processing that ensures high fault tolerance and high scalability by introducing the resilient distributed dataset abstraction. In terms of performance, Spark can be up to 100 times faster in terms of memory access and 10 times faster in terms of disk access than Hadoop. Moreover, it provides advanced application programming interfaces in Java, Scala, Python, and R. It also supports some advanced components, including Spark SQL for structured data processing, MLlib for machine learning, GraphX for computing graphs, and Spark Streaming for stream computing. We surveyed Spark-based applications used in next-generation sequencing and other biological domains, such as epigenetics, phylogeny, and drug discovery. The results of this survey are used to provide a comprehensive guideline allowing bioinformatics researchers to apply Spark in their own fields.
Topics: Animals; Computational Biology; Genomics; High-Throughput Nucleotide Sequencing; Humans; Mice; Software
PubMed: 30101283
DOI: 10.1093/gigascience/giy098 -
Microbial Biotechnology Sep 2016Microbial bioinformatics in 2020 will remain a vibrant, creative discipline, adding value to the ever-growing flood of new sequence data, while embracing novel... (Review)
Review
Microbial bioinformatics in 2020 will remain a vibrant, creative discipline, adding value to the ever-growing flood of new sequence data, while embracing novel technologies and fresh approaches. Databases and search strategies will struggle to cope and manual curation will not be sustainable during the scale-up to the million-microbial-genome era. Microbial taxonomy will have to adapt to a situation in which most microorganisms are discovered and characterised through the analysis of sequences. Genome sequencing will become a routine approach in clinical and research laboratories, with fresh demands for interpretable user-friendly outputs. The "internet of things" will penetrate healthcare systems, so that even a piece of hospital plumbing might have its own IP address that can be integrated with pathogen genome sequences. Microbiome mania will continue, but the tide will turn from molecular barcoding towards metagenomics. Crowd-sourced analyses will collide with cloud computing, but eternal vigilance will be the price of preventing the misinterpretation and overselling of microbial sequence data. Output from hand-held sequencers will be analysed on mobile devices. Open-source training materials will address the need for the development of a skilled labour force. As we boldly go into the third decade of the twenty-first century, microbial sequence space will remain the final frontier!
Topics: Computational Biology; Databases, Nucleic Acid; Genomics; Internet
PubMed: 27471065
DOI: 10.1111/1751-7915.12389 -
Interdisciplinary Sciences,... Jun 2017Hong Kong's bioinformatics sector is attaining new heights in combination with its economic boom and the predominance of the working-age group in its population. Factors... (Review)
Review
Hong Kong's bioinformatics sector is attaining new heights in combination with its economic boom and the predominance of the working-age group in its population. Factors such as a knowledge-based and free-market economy have contributed towards a prominent position on the world map of bioinformatics. In this review, we have considered the educational measures, landmark research activities and the achievements of bioinformatics companies and the role of the Hong Kong government in the establishment of bioinformatics as strength. However, several hurdles remain. New government policies will assist computational biologists to overcome these hurdles and further raise the profile of the field. There is a high expectation that bioinformatics in Hong Kong will be a promising area for the next generation.
Topics: Computational Biology; Government Regulation; Hong Kong; Public Policy
PubMed: 26961385
DOI: 10.1007/s12539-016-0147-x -
Current Opinion in Biotechnology Jun 2016Systems genetics is defined as the simultaneous assessment and analysis of multi-omics datasets. In the past few years, metabolomics has been established as a robust... (Review)
Review
Systems genetics is defined as the simultaneous assessment and analysis of multi-omics datasets. In the past few years, metabolomics has been established as a robust tool describing an important functional layer in this approach. The metabolome of a biological system represents an integrated state of genetic and environmental factors and has been referred to as a 'link between genotype and phenotype'. In this review, we summarize recent progresses in statistical analysis methods for metabolomics data in combination with other omics layers. We put a special focus on complex, multivariate statistical approaches as well as pathway-based and network-based analysis methods. Moreover, we outline current challenges and pitfalls of metabolomics-focused multi-omics analyses and discuss future steps for the field.
Topics: Computational Biology; Humans; Metabolome; Metabolomics; Models, Biological; Systems Biology
PubMed: 27135552
DOI: 10.1016/j.copbio.2016.04.009 -
BMC Bioinformatics Apr 2011
Topics: Computational Biology; Humans; Neoplasms; Proteins; Systems Biology
PubMed: 21521513
DOI: 10.1186/1471-2105-12-120 -
PLoS Computational Biology Mar 2021Overfitting is one of the critical problems in developing models by machine learning. With machine learning becoming an essential technology in computational biology, we...
Overfitting is one of the critical problems in developing models by machine learning. With machine learning becoming an essential technology in computational biology, we must include training about overfitting in all courses that introduce this technology to students and practitioners. We here propose a hands-on training for overfitting that is suitable for introductory level courses and can be carried out on its own or embedded within any data science course. We use workflow-based design of machine learning pipelines, experimentation-based teaching, and hands-on approach that focuses on concepts rather than underlying mathematics. We here detail the data analysis workflows we use in training and motivate them from the viewpoint of teaching goals. Our proposed approach relies on Orange, an open-source data science toolbox that combines data visualization and machine learning, and that is tailored for education in machine learning and explorative data analysis.
Topics: Computational Biology; Data Science; Humans; Machine Learning; Models, Biological; Models, Statistical; Software
PubMed: 33661899
DOI: 10.1371/journal.pcbi.1008671 -
PLoS Computational Biology Dec 2018
Topics: Computational Biology; Publishing; Research Report; Surveys and Questionnaires; Writing
PubMed: 30571692
DOI: 10.1371/journal.pcbi.1006670 -
Current Opinion in Biotechnology Apr 2000The field of proteomics is becoming increasingly important as genome sequences are being completed and annotated. Recent advances in proteomics include experimental and... (Review)
Review
The field of proteomics is becoming increasingly important as genome sequences are being completed and annotated. Recent advances in proteomics include experimental and mathematical proofs of the need to complement microarray analysis with protein analysis, improved sensitivity for mass spectrometric analysis of separated proteins, better informatic tools for gel analysis and protein spot annotation, first steps towards automated experimental procedures, and new technology for quantitation of protein changes.
Topics: Animals; Biotechnology; Computational Biology; Humans; Proteome
PubMed: 10753759
DOI: 10.1016/s0958-1669(00)00078-1