-
PLoS Computational Biology Jan 2020Functional annotation of genes remains a challenge in fundamental biology and is a limiting factor for translational medicine. Computational approaches have been...
Functional annotation of genes remains a challenge in fundamental biology and is a limiting factor for translational medicine. Computational approaches have been developed to process heterogeneous data into meaningful metrics, but often do not address how findings might be updated when new evidence comes to light. To address this challenge, we describe requirements for a framework for incremental data integration and propose an implementation based on phenotype ontologies and Bayesian probability updates. We apply the framework to quantify similarities between gene annotations and disease profiles. Within this scope, we categorize human diseases according to how well they can be recapitulated by animal models and quantify similarities between human diseases and mouse models produced by the International Mouse Phenotyping Consortium. The flexibility of the approach allows us to incorporate negative phenotypic data to better prioritize candidate genes, and to stratify disease mapping using sex-dependent phenotypes. All our association scores can be updated and we exploit this feature to showcase integration with curated annotations from high-precision assays. Incremental integration is thus a suitable framework for tracking functional annotations and linking to complex human pathology.
Topics: Animals; Computational Biology; Disease Models, Animal; Genetic Predisposition to Disease; Genotype; Humans; Mice; Molecular Sequence Annotation; Phenotype
PubMed: 31986132
DOI: 10.1371/journal.pcbi.1007586 -
Scientific Reports Mar 2020To date, reliable relationships between mammalian phenotypes, based on diagnostic test measurements, have not been reported on a large scale. The purpose of this study...
To date, reliable relationships between mammalian phenotypes, based on diagnostic test measurements, have not been reported on a large scale. The purpose of this study was to present a large mouse phenotype-phenotype relationships dataset as a reference resource, alongside detailed evaluation of the resource. We used bias-minimized comprehensive mouse phenotype data and applied association rule mining to a dataset consisting of only binary (normal and abnormal phenotypes) data to determine relationships among phenotypes. We present 3,686 evidence-based significant associations, comprising 345 phenotypes covering 60 biological systems (functions), and evaluate their characteristics in detail. To evaluate the relationships, we defined a set of phenotype-phenotype association pairs (PPAPs) as a module of phenotypic expression for each of the 345 phenotypes. By analyzing each PPAP, we identified phenotype sub-networks consisting of the largest numbers of phenotypes and distinct biological systems. Furthermore, using hierarchical clustering based on phenotype similarities among the 345 PPAPs, we identified seven community types within a putative phenome-wide association network. Moreover, to promote leverage of these data, we developed and published web-application tools. These mouse phenome-wide phenotype-phenotype association data reveal general principles of relationships among mammalian phenotypes and provide a reference resource for biomedical analyses.
Topics: Animals; Genome-Wide Association Study; Mice; Phenotype
PubMed: 32127602
DOI: 10.1038/s41598-020-60891-w -
The European Journal of Neuroscience Jun 2021Microglia are the resident immune cells of the central nervous system (CNS) and are increasingly recognized as critical players in development, brain homeostasis, and... (Review)
Review
Microglia are the resident immune cells of the central nervous system (CNS) and are increasingly recognized as critical players in development, brain homeostasis, and disease pathogenesis. The lifespan, maintenance, proliferation, and turnover of microglia are important factors that regulate microglial behavior and affect their roles in the CNS. However, emerging evidence suggests that microglia are morphologically and phenotypically distinct in different brain areas, at different ages, and during disease. Ongoing research focuses on understanding how microglia acquire specific phenotypes in response to extrinsic cues in the environment and how phenotypes are specified by intrinsic properties of different populations of microglia. With the development of pharmacological and genetic tools that allow the investigation of microglia in vivo, there have been considerable advances in understanding molecular signatures of both homeostatic microglia and those reacting to injury and disease. Here, we review the master gene regulators that define microglia as well as discuss the evidence that microglia are heterogeneous and fall into distinct clusters that display specific intrinsic properties and perform unique tasks in different settings. Taken together, the information presented supports the idea that microglia morphology and transcriptional heterogeneity should be considered when studying the complex nature of microglia and their roles in brain health and disease.
Topics: Brain; Central Nervous System; Homeostasis; Microglia; Phenotype
PubMed: 33835613
DOI: 10.1111/ejn.15225 -
Scientific Reports Jan 2023Thoracic insufficiency syndromes are a genetically and phenotypically heterogeneous group of disorders characterized by congenital abnormalities or progressive...
Thoracic insufficiency syndromes are a genetically and phenotypically heterogeneous group of disorders characterized by congenital abnormalities or progressive deformation of the chest wall and/or vertebrae that result in restrictive lung disease and compromised respiratory capacity. We performed whole exome sequencing on a cohort of 42 children with thoracic insufficiency to elucidate the underlying molecular etiologies of syndromic and non-syndromic thoracic insufficiency and predict extra-skeletal manifestations and disease progression. Molecular diagnosis was established in 24/42 probands (57%), with 18/24 (75%) probands having definitive diagnoses as defined by laboratory and clinical criteria and 6/24 (25%) probands having strong candidate genes. Gene identified in cohort patients most commonly encoded components of the primary cilium, connective tissue, and extracellular matrix. A novel association between KIF7 and USP9X variants and thoracic insufficiency was identified. We report and expand the genetic and phenotypic spectrum of a cohort of children with thoracic insufficiency, reinforce the prevalence of extra-skeletal manifestations in thoracic insufficiency syndromes, and expand the phenotype of KIF7 and USP9X-related disease to include thoracic insufficiency.
Topics: Phenotype; Spine
PubMed: 36653407
DOI: 10.1038/s41598-023-27641-0 -
Journal of the American Dental... Nov 2019A significant amount of clinical information captured as free-text narratives could be better used for several applications, such as clinical decision support, ontology...
BACKGROUND
A significant amount of clinical information captured as free-text narratives could be better used for several applications, such as clinical decision support, ontology development, evidence-based practice, and research. The Human Phenotype Ontology (HPO) is specifically used for semantic comparisons for diagnostic purposes. All these functions require quality coverage of the domain of interest. The authors used natural language processing to capture craniofacial and oral phenotype signatures from electronic health records and then used these signatures for evaluation of existing oral phenotype ontology coverage.
METHODS
The authors applied a text-processing pipeline based on the clinical Text Analysis and Knowledge Extraction System to annotate the clinical notes with Unified Medical Language System codes. The authors extracted the disease or disorder phenotype terms, which were then compared with HPO terms and their synonyms.
RESULTS
The authors retrieved 2,153 deidentified clinical notes from 558 patients. Finally, 2,416 unique diseases or disorders phenotype terms were extracted, which included 210 craniofacial or oral phenotype terms. Twenty-six of these phenotypes were not found in the HPO.
CONCLUSIONS
The authors demonstrated that natural language processing tools could extract relevant phenotype terms from clinical narratives, which could help identify gaps in existing ontologies and enhance craniofacial and dental phenotyping vocabularies.
PRACTICAL IMPLICATIONS
The expansion of terms in the dental, oral, and craniofacial domains in the HPO is particularly important as the dental community moves toward electronic health records.
Topics: Electronic Health Records; Humans; Narration; Natural Language Processing; Phenotype; Vocabulary
PubMed: 31668172
DOI: 10.1016/j.adaj.2019.05.029 -
Journal of Biomedical Informatics Apr 2023Identifying patient cohorts meeting the criteria of specific phenotypes is essential in biomedicine and particularly timely in precision medicine. Many research groups... (Review)
Review
Identifying patient cohorts meeting the criteria of specific phenotypes is essential in biomedicine and particularly timely in precision medicine. Many research groups deliver pipelines that automatically retrieve and analyze data elements from one or more sources to automate this task and deliver high-performing computable phenotypes. We applied a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines to conduct a thorough scoping review on computable clinical phenotyping. Five databases were searched using a query that combined the concepts of automation, clinical context, and phenotyping. Subsequently, four reviewers screened 7960 records (after removing over 4000 duplicates) and selected 139 that satisfied the inclusion criteria. This dataset was analyzed to extract information on target use cases, data-related topics, phenotyping methodologies, evaluation strategies, and portability of developed solutions. Most studies supported patient cohort selection without discussing the application to specific use cases, such as precision medicine. Electronic Health Records were the primary source in 87.1 % (N = 121) of all studies, and International Classification of Diseases codes were heavily used in 55.4 % (N = 77) of all studies, however, only 25.9 % (N = 36) of the records described compliance with a common data model. In terms of the presented methods, traditional Machine Learning (ML) was the dominant method, often combined with natural language processing and other approaches, while external validation and portability of computable phenotypes were pursued in many cases. These findings revealed that defining target use cases precisely, moving away from sole ML strategies, and evaluating the proposed solutions in the real setting are essential opportunities for future work. There is also momentum and an emerging need for computable phenotyping to support clinical and epidemiological research and precision medicine.
Topics: Algorithms; Electronic Health Records; Machine Learning; Natural Language Processing; Phenotype
PubMed: 36933631
DOI: 10.1016/j.jbi.2023.104335 -
Nature Communications Oct 2023Biological sciences, drug discovery and medicine rely heavily on cell phenotype perturbation and microscope observation. However, most cellular phenotypic changes are...
Biological sciences, drug discovery and medicine rely heavily on cell phenotype perturbation and microscope observation. However, most cellular phenotypic changes are subtle and thus hidden from us by natural cell variability: two cells in the same condition already look different. In this study, we show that conditional generative models can be used to transform an image of cells from any one condition to another, thus canceling cell variability. We visually and quantitatively validate that the principle of synthetic cell perturbation works on discernible cases. We then illustrate its effectiveness in displaying otherwise invisible cell phenotypes triggered by blood cells under parasite infection, or by the presence of a disease-causing pathological mutation in differentiated neurons derived from iPSCs, or by low concentration drug treatments. The proposed approach, easy to use and robust, opens the door to more accessible discovery of biological and disease biomarkers.
Topics: Cell Differentiation; Induced Pluripotent Stem Cells; Drug Discovery; Phenotype
PubMed: 37821450
DOI: 10.1038/s41467-023-42124-6 -
Annual Review of Biomedical Data Science Jul 2021Electronic health records (EHRs) are a rich source of data for researchers, but extracting meaningful information out of this highly complex data source is challenging.... (Review)
Review
Electronic health records (EHRs) are a rich source of data for researchers, but extracting meaningful information out of this highly complex data source is challenging. Phecodes represent one strategy for defining phenotypes for research using EHR data. They are a high-throughput phenotyping tool based on ICD (International Classification of Diseases) codes that can be used to rapidly define the case/control status of thousands of clinically meaningful diseases and conditions. Phecodes were originally developed to conduct phenome-wide association studies to scan for phenotypic associations with common genetic variants. Since then, phecodes have been used to support a wide range of EHR-based phenotyping methods, including the phenotype risk score. This review aims to comprehensively describe the development, validation, and applications of phecodes and suggest some future directions for phecodes and high-throughput phenotyping.
Topics: Electronic Health Records; Genome-Wide Association Study; International Classification of Diseases; Phenomics; Phenotype
PubMed: 34465180
DOI: 10.1146/annurev-biodatasci-122320-112352 -
Orphanet Journal of Rare Diseases Aug 2021With the advent of whole exome (ES) and genome sequencing (GS) as tools for disease gene discovery, rare variant filtering, prioritization and data sharing have become...
BACKGROUND
With the advent of whole exome (ES) and genome sequencing (GS) as tools for disease gene discovery, rare variant filtering, prioritization and data sharing have become essential components of the search for disease genes and variants potentially contributing to disease phenotypes. The computational storage, data manipulation, and bioinformatic interpretation of thousands to millions of variants identified in ES and GS, respectively, is a challenging task. To aid in that endeavor, we constructed PhenoDB, GeneMatcher and VariantMatcher.
RESULTS
PhenoDB is an accessible, freely available, web-based platform that allows users to store, share, analyze and interpret their patients' phenotypes and variants from ES/GS data. GeneMatcher is accessible to all stakeholders as a web-based tool developed to connect individuals (researchers, clinicians, health care providers and patients) around the globe with interest in the same gene(s), variant(s) or phenotype(s). Finally, VariantMatcher was developed to enable public sharing of variant-level data and phenotypic information from individuals sequenced as part of multiple disease gene discovery projects. Here we provide updates on PhenoDB and GeneMatcher applications and implementation and introduce VariantMatcher.
CONCLUSION
Each of these tools has facilitated worldwide data sharing and data analysis and improved our ability to connect genes to phenotypic traits. Further development of these platforms will expand variant analysis, interpretation, novel disease-gene discovery and facilitate functional annotation of the human genome for clinical genomics implementation and the precision medicine initiative.
Topics: Computational Biology; Databases, Genetic; Genomics; Humans; Phenotype; Software
PubMed: 34407837
DOI: 10.1186/s13023-021-01916-z -
Mammalian Genome : Official Journal of... Jun 2020Thought to be directly and uniquely dependent from genotypes, the ontogeny of individual phenotypes is much more complicated. Individual genetics, environmental... (Review)
Review
Thought to be directly and uniquely dependent from genotypes, the ontogeny of individual phenotypes is much more complicated. Individual genetics, environmental exposures, and their interaction are the three main determinants of individual's phenotype. This picture has been further complicated a decade ago when the Lamarckian theory of acquired inheritance has been rekindled with the discovery of epigenetic inheritance, according to which acquired phenotypes can be transmitted through fertilization and affect phenotypes across generations. The results of Genome-Wide Association Studies have also highlighted a big degree of missing heritability in genetics and have provided hints that not only acquired phenotypes, but also individual's genotypes affect phenotypes intergenerationally through indirect genetic effects. Here, we review available examples of indirect genetic effects in mammals, what is known of the underlying molecular mechanisms and their potential impact for our understanding of missing heritability, phenotypic variation. and individual disease risk.
Topics: Animals; DNA Methylation; Epigenesis, Genetic; Gene-Environment Interaction; Genetic Variation; Genome-Wide Association Study; Genotype; Histone Code; Humans; Mammals; Multifactorial Inheritance; Phenotype
PubMed: 32529318
DOI: 10.1007/s00335-020-09841-5