-
The Journal of Investigative Dermatology Sep 2017High-throughput biology presents unique opportunities and challenges for dermatological research. Drawing on a small handful of exemplary studies, we review some of the... (Review)
Review
High-throughput biology presents unique opportunities and challenges for dermatological research. Drawing on a small handful of exemplary studies, we review some of the major lessons of these new technologies. We caution against several common errors and introduce helpful statistical concepts that may be unfamiliar to researchers without experience in bioinformatics. We recommend specific software tools that can aid dermatologists at varying levels of computational literacy, including platforms with command line and graphical user interfaces. The future of dermatology lies in integrative research, in which clinicians, laboratory scientists, and data analysts come together to plan, execute, and publish their work in open forums that promote critical discussion and reproducibility. In this article, we offer guidelines that we hope will steer researchers toward best practices for this new and dynamic era of data intensive dermatology.
Topics: Animals; Computational Biology; Dermatology; Forecasting; Genome, Human; Genomics; Guidelines as Topic; Humans; Research Design; Software
PubMed: 28843296
DOI: 10.1016/j.jid.2017.07.095 -
Briefings in Bioinformatics Jan 2019Big data management for information centralization (i.e. making data of interest findable) and integration (i.e. making related data connectable) in health research is a... (Review)
Review
Big data management for information centralization (i.e. making data of interest findable) and integration (i.e. making related data connectable) in health research is a defining challenge in biomedical informatics. While essential to create a foundation for knowledge discovery, optimized solutions to deliver high-quality and easy-to-use information resources are not thoroughly explored. In this review, we identify the gaps between current data management approaches and the need for new capacity to manage big data generated in advanced health research. Focusing on these unmet needs and well-recognized problems, we introduce state-of-the-art concepts, approaches and technologies for data management from computing academia and industry to explore improvement solutions. We explain the potential and significance of these advances for biomedical informatics. In addition, we discuss specific issues that have a great impact on technical solutions for developing the next generation of digital products (tools and data) to facilitate the raw-data-to-knowledge process in health research.
Topics: Big Data; Computational Biology; Database Management Systems; Humans; Knowledge Bases; Machine Learning; Research
PubMed: 28968677
DOI: 10.1093/bib/bbx086 -
BMC Bioinformatics 2014Many efforts exist to design and implement approaches and tools for data capture, integration and analysis in the life sciences. Challenges are not only the... (Review)
Review
Many efforts exist to design and implement approaches and tools for data capture, integration and analysis in the life sciences. Challenges are not only the heterogeneity, size and distribution of information sources, but also the danger of producing too many solutions for the same problem. Methodological, technological, infrastructural and social aspects appear to be essential for the development of a new generation of best practices and tools. In this paper, we analyse and discuss these aspects from different perspectives, by extending some of the ideas that arose during the NETTAB 2012 Workshop, making reference especially to the European context. First, relevance of using data and software models for the management and analysis of biological data is stressed. Second, some of the most relevant community achievements of the recent years, which should be taken as a starting point for future efforts in this research domain, are presented. Third, some of the main outstanding issues, challenges and trends are analysed. The challenges related to the tendency to fund and create large scale international research infrastructures and public-private partnerships in order to address the complex challenges of data intensive science are especially discussed. The needs and opportunities of Genomic Computing (the integration, search and display of genomic information at a very specific level, e.g. at the level of a single DNA region) are then considered. In the current data and network-driven era, social aspects can become crucial bottlenecks. How these may best be tackled to unleash the technical abilities for effective data integration and validation efforts is then discussed. Especially the apparent lack of incentives for already overwhelmed researchers appears to be a limitation for sharing information and knowledge with other scientists. We point out as well how the bioinformatics market is growing at an unprecedented speed due to the impact that new powerful in silico analysis promises to have on better diagnosis, prognosis, drug discovery and treatment, towards personalized medicine. An open business model for bioinformatics, which appears to be able to reduce undue duplication of efforts and support the increased reuse of valuable data sets, tools and platforms, is finally discussed.
Topics: Algorithms; Animals; Computational Biology; Cooperative Behavior; Genome; Genomics; Humans; Software
PubMed: 24564249
DOI: 10.1186/1471-2105-15-S1-S2 -
Briefings in Bioinformatics Mar 2024Across many scientific disciplines, the development of computational models and algorithms for generating artificial or synthetic data is gaining momentum. In biology,... (Review)
Review
Across many scientific disciplines, the development of computational models and algorithms for generating artificial or synthetic data is gaining momentum. In biology, there is a great opportunity to explore this further as more and more big data at multi-omics level are generated recently. In this opinion, we discuss the latest trends in biological applications based on process-driven and data-driven aspects. Moving ahead, we believe these methodologies can help shape novel multi-omics-scale cellular inferences.
Topics: Computational Biology; Algorithms; Genomics; Humans; Big Data; Proteomics; Multiomics
PubMed: 38711370
DOI: 10.1093/bib/bbae213 -
Cells Jan 2019Lysine succinylation is a form of posttranslational modification of the proteins that play an essential functional role in every aspect of cell metabolism in both... (Review)
Review
Lysine succinylation is a form of posttranslational modification of the proteins that play an essential functional role in every aspect of cell metabolism in both prokaryotes and eukaryotes. Aside from experimental identification of succinylation sites, there has been an intense effort geared towards the development of sequence-based prediction through machine learning, due to its promising and essential properties of being highly accurate, robust and cost-effective. In spite of these advantages, there are several problems that are in need of attention in the design and development of succinylation site predictors. Notwithstanding of many studies on the employment of machine learning approaches, few articles have examined this bioinformatics field in a systematic manner. Thus, we review the advancements regarding the current state-of-the-art prediction models, datasets, and online resources and illustrate the challenges and limitations to present a useful guideline for developing powerful succinylation site prediction tools.
Topics: Algorithms; Amino Acid Sequence; Animals; Computational Biology; Databases, Protein; Humans; Lysine; Species Specificity; Succinic Acid
PubMed: 30696115
DOI: 10.3390/cells8020095 -
EMBO Reports Aug 2008
Topics: Biomedical Research; Computational Biology; Databases, Nucleic Acid; Databases, Protein
PubMed: 18670437
DOI: 10.1038/embor.2008.141 -
Molecular Biology of the Cell Nov 2016
Topics: Animals; Cell Biology; Computational Biology; Humans; Systems Biology
PubMed: 27811327
DOI: 10.1091/mbc.E16-09-0673 -
Genomics, Proteomics & Bioinformatics Oct 2022Recently developed technologies to generate single-cell genomic data have made a revolutionary impact in the field of biology. Multi-omics assays offer even greater... (Review)
Review
Recently developed technologies to generate single-cell genomic data have made a revolutionary impact in the field of biology. Multi-omics assays offer even greater opportunities to understand cellular states and biological processes. The problem of integrating different omics data with very different dimensionality and statistical properties remains, however, quite challenging. A growing body of computational tools is being developed for this task, leveraging ideas ranging from machine translation to the theory of networks, and represents another frontier on the interface of biology and data science. Our goal in this review is to provide a comprehensive, up-to-date survey of computational techniques for the integration of single-cell multi-omics data, while making the concepts behind each algorithm approachable to a non-expert audience.
Topics: Computational Biology; Multiomics; Genomics; Algorithms
PubMed: 36581065
DOI: 10.1016/j.gpb.2022.11.013 -
Methods in Molecular Biology (Clifton,... 2019Open-source software encourages computer programmers to reuse software components written by others. In evolutionary bioinformatics, open-source software comes in a...
Open-source software encourages computer programmers to reuse software components written by others. In evolutionary bioinformatics, open-source software comes in a broad range of programming languages, including C/C++, Perl, Python, Ruby, Java, and R. To avoid writing the same functionality multiple times for different languages, it is possible to share components by bridging computer languages and Bio* projects, such as BioPerl, Biopython, BioRuby, BioJava, and R/Bioconductor.In this chapter, we compare the three principal approaches for sharing software between different programming languages: by remote procedure call (RPC), by sharing a local "call stack," and by calling program to programs. RPC provides a language-independent protocol over a network interface; examples are SOAP and Rserve. The local call stack provides a between-language mapping, not over the network interface but directly in computer memory; examples are R bindings, RPy, and languages sharing the Java virtual machine stack. This functionality provides strategies for sharing of software between Bio* projects, which can be exploited more often.Here, we present cross-language examples for sequence translation and measure throughput of the different options. We compare calling into R through native R, RSOAP, Rserve, and RPy interfaces, with the performance of native BioPerl, Biopython, BioJava, and BioRuby implementations and with call stack bindings to BioJava and the European Molecular Biology Open Software Suite (EMBOSS).In general, call stack approaches outperform native Bio* implementations, and these, in turn, outperform "RPC"-based approaches. To test and compare strategies, we provide a downloadable Docker container with all examples, tools, and libraries included.
Topics: Computational Biology; Programming Languages; Software; User-Computer Interface; Web Browser
PubMed: 31278684
DOI: 10.1007/978-1-4939-9074-0_25 -
Briefings in Bioinformatics May 2014With the development of next-generation sequencing (NGS) technologies, a large amount of short read data has been generated. Assembly of these short reads can be... (Review)
Review
With the development of next-generation sequencing (NGS) technologies, a large amount of short read data has been generated. Assembly of these short reads can be challenging for genomes and metagenomes without template sequences, making alignment-based genome sequence comparison difficult. In addition, sequence reads from NGS can come from different regions of various genomes and they may not be alignable. Sequence signature-based methods for genome comparison based on the frequencies of word patterns in genomes and metagenomes can potentially be useful for the analysis of short reads data from NGS. Here we review the recent development of alignment-free genome and metagenome comparison based on the frequencies of word patterns with emphasis on the dissimilarity measures between sequences, the statistical power of these measures when two sequences are related and the applications of these measures to NGS data.
Topics: Algorithms; Computational Biology; Genomics; High-Throughput Nucleotide Sequencing; Markov Chains; Models, Statistical; Sequence Alignment; Sequence Analysis
PubMed: 24064230
DOI: 10.1093/bib/bbt067