bio-informatics - OpenMD.com Journal Search

Deep learning for computational biology.

Molecular Systems Biology Jul 2016

Technological advances in genomics and imaging have led to an explosion of molecular and cellular profiling data from large numbers of samples. This rapid increase in... (Review)

Summary PubMed Full Text PDF

Review

Authors: Christof Angermueller, Tanel Pärnamaa, Leopold Parts...

Technological advances in genomics and imaging have led to an explosion of molecular and cellular profiling data from large numbers of samples. This rapid increase in biological data dimension and acquisition rate is challenging conventional analysis strategies. Modern machine learning methods, such as deep learning, promise to leverage very large data sets for finding hidden structure within them, and for making accurate predictions. In this review, we discuss applications of this new breed of analysis approaches in regulatory genomics and cellular imaging. We provide background of what deep learning is, and the settings in which it can be successfully applied to derive biological insights. In addition to presenting specific applications and providing tips for practical use, we also highlight possible pitfalls and limitations to guide computational biologists when and how to make the most use of this new technology.

Topics: Computational Biology; Genomics; Humans; Machine Learning; Models, Genetic

PubMed: 27474269
DOI: 10.15252/msb.20156651

Promoting reproducibility with Code Ocean.

Genome Biology Feb 2021

Summary PubMed Full Text PDF

Authors: Barbara Cheifet

Topics: Cloud Computing; Computational Biology; Genomics; Humans; Reproducibility of Results; Software

PubMed: 33608018
DOI: 10.1186/s13059-021-02299-x

Bioinformatics applications on Apache Spark.

GigaScience Aug 2018

With the rapid development of next-generation sequencing technology, ever-increasing quantities of genomic data pose a tremendous challenge to data processing.... (Review)

Summary PubMed Full Text PDF

Review

Authors: Runxin Guo, Yi Zhao, Quan Zou...

With the rapid development of next-generation sequencing technology, ever-increasing quantities of genomic data pose a tremendous challenge to data processing. Therefore, there is an urgent need for highly scalable and powerful computational systems. Among the state-of-the-art parallel computing platforms, Apache Spark is a fast, general-purpose, in-memory, iterative computing framework for large-scale data processing that ensures high fault tolerance and high scalability by introducing the resilient distributed dataset abstraction. In terms of performance, Spark can be up to 100 times faster in terms of memory access and 10 times faster in terms of disk access than Hadoop. Moreover, it provides advanced application programming interfaces in Java, Scala, Python, and R. It also supports some advanced components, including Spark SQL for structured data processing, MLlib for machine learning, GraphX for computing graphs, and Spark Streaming for stream computing. We surveyed Spark-based applications used in next-generation sequencing and other biological domains, such as epigenetics, phylogeny, and drug discovery. The results of this survey are used to provide a comprehensive guideline allowing bioinformatics researchers to apply Spark in their own fields.

Topics: Animals; Computational Biology; Genomics; High-Throughput Nucleotide Sequencing; Humans; Mice; Software

PubMed: 30101283
DOI: 10.1093/gigascience/giy098

Microbial bioinformatics 2020.

Microbial Biotechnology Sep 2016

Microbial bioinformatics in 2020 will remain a vibrant, creative discipline, adding value to the ever-growing flood of new sequence data, while embracing novel... (Review)

Summary PubMed Full Text PDF

Review

Authors: Mark J Pallen

Microbial bioinformatics in 2020 will remain a vibrant, creative discipline, adding value to the ever-growing flood of new sequence data, while embracing novel technologies and fresh approaches. Databases and search strategies will struggle to cope and manual curation will not be sustainable during the scale-up to the million-microbial-genome era. Microbial taxonomy will have to adapt to a situation in which most microorganisms are discovered and characterised through the analysis of sequences. Genome sequencing will become a routine approach in clinical and research laboratories, with fresh demands for interpretable user-friendly outputs. The "internet of things" will penetrate healthcare systems, so that even a piece of hospital plumbing might have its own IP address that can be integrated with pathogen genome sequences. Microbiome mania will continue, but the tide will turn from molecular barcoding towards metagenomics. Crowd-sourced analyses will collide with cloud computing, but eternal vigilance will be the price of preventing the misinterpretation and overselling of microbial sequence data. Output from hand-held sequencers will be analysed on mobile devices. Open-source training materials will address the need for the development of a skilled labour force. As we boldly go into the third decade of the twenty-first century, microbial sequence space will remain the final frontier!

Topics: Computational Biology; Databases, Nucleic Acid; Genomics; Internet

PubMed: 27471065
DOI: 10.1111/1751-7915.12389

Rising Strengths Hong Kong SAR in Bioinformatics.

Interdisciplinary Sciences,... Jun 2017

Hong Kong's bioinformatics sector is attaining new heights in combination with its economic boom and the predominance of the working-age group in its population. Factors... (Review)

Summary PubMed Full Text PDF

Review

Authors: Chiranjib Chakraborty, C George Priya Doss, Hailong Zhu...

Hong Kong's bioinformatics sector is attaining new heights in combination with its economic boom and the predominance of the working-age group in its population. Factors such as a knowledge-based and free-market economy have contributed towards a prominent position on the world map of bioinformatics. In this review, we have considered the educational measures, landmark research activities and the achievements of bioinformatics companies and the role of the Hong Kong government in the establishment of bioinformatics as strength. However, several hurdles remain. New government policies will assist computational biologists to overcome these hurdles and further raise the profile of the field. There is a high expectation that bioinformatics in Hong Kong will be a promising area for the next generation.

Topics: Computational Biology; Government Regulation; Hong Kong; Public Policy

PubMed: 26961385
DOI: 10.1007/s12539-016-0147-x

Computational approaches for systems metabolomics.

Current Opinion in Biotechnology Jun 2016

Systems genetics is defined as the simultaneous assessment and analysis of multi-omics datasets. In the past few years, metabolomics has been established as a robust... (Review)

Summary PubMed Full Text

Review

Authors: Jan Krumsiek, Jörg Bartel, Fabian J Theis...

Systems genetics is defined as the simultaneous assessment and analysis of multi-omics datasets. In the past few years, metabolomics has been established as a robust tool describing an important functional layer in this approach. The metabolome of a biological system represents an integrated state of genetic and environmental factors and has been referred to as a 'link between genotype and phenotype'. In this review, we summarize recent progresses in statistical analysis methods for metabolomics data in combination with other omics layers. We put a special focus on complex, multivariate statistical approaches as well as pathway-based and network-based analysis methods. Moreover, we outline current challenges and pitfalls of metabolomics-focused multi-omics analyses and discuss future steps for the field.

Topics: Computational Biology; Humans; Metabolome; Metabolomics; Models, Biological; Systems Biology

PubMed: 27135552
DOI: 10.1016/j.copbio.2016.04.009

Cancer computational biology.

BMC Bioinformatics Apr 2011

Summary PubMed Full Text PDF

Authors: Zohar Yakhini, Igor Jurisica

Topics: Computational Biology; Humans; Neoplasms; Proteins; Systems Biology

PubMed: 21521513
DOI: 10.1186/1471-2105-12-120

Hands-on training about overfitting.

PLoS Computational Biology Mar 2021

Overfitting is one of the critical problems in developing models by machine learning. With machine learning becoming an essential technology in computational biology, we...

Summary PubMed Full Text PDF

Authors: Janez Demšar, Blaž Zupan

Overfitting is one of the critical problems in developing models by machine learning. With machine learning becoming an essential technology in computational biology, we must include training about overfitting in all courses that introduce this technology to students and practitioners. We here propose a hands-on training for overfitting that is suitable for introductory level courses and can be carried out on its own or embedded within any data science course. We use workflow-based design of machine learning pipelines, experimentation-based teaching, and hands-on approach that focuses on concepts rather than underlying mathematics. We here detail the data analysis workflows we use in training and motivate them from the viewpoint of teaching goals. Our proposed approach relies on Orange, an open-source data science toolbox that combines data visualization and machine learning, and that is tailored for education in machine learning and explorative data analysis.

Topics: Computational Biology; Data Science; Humans; Machine Learning; Models, Biological; Models, Statistical; Software

PubMed: 33661899
DOI: 10.1371/journal.pcbi.1008671

One thousand simple rules.

PLoS Computational Biology Dec 2018

Summary PubMed Full Text PDF

Authors: Philip E Bourne, Fran Lewitter, Scott Markel...

Topics: Computational Biology; Publishing; Research Report; Surveys and Questionnaires; Writing

PubMed: 30571692
DOI: 10.1371/journal.pcbi.1006670

Proteomic analysis.

Current Opinion in Biotechnology Apr 2000

The field of proteomics is becoming increasingly important as genome sequences are being completed and annotated. Recent advances in proteomics include experimental and... (Review)

Summary PubMed Full Text

Review

Authors: M J Dutt, K H Lee

The field of proteomics is becoming increasingly important as genome sequences are being completed and annotated. Recent advances in proteomics include experimental and mathematical proofs of the need to complement microarray analysis with protein analysis, improved sensitivity for mass spectrometric analysis of separated proteins, better informatic tools for gel analysis and protein spot annotation, first steps towards automated experimental procedures, and new technology for quantitation of protein changes.

Topics: Animals; Biotechnology; Computational Biology; Humans; Proteome

PubMed: 10753759
DOI: 10.1016/s0958-1669(00)00078-1