-
Cell Reports Methods Dec 2023Glycomics, the comprehensive profiling of all glycan structures in samples, is rapidly expanding to enable insights into physiology and disease mechanisms. However,...
Glycomics, the comprehensive profiling of all glycan structures in samples, is rapidly expanding to enable insights into physiology and disease mechanisms. However, glycan structure complexity and glycomics data interpretation present challenges, especially for differential expression analysis. Here, we present a framework for differential glycomics expression analysis. Our methodology encompasses specialized and domain-informed methods for data normalization and imputation, glycan motif extraction and quantification, differential expression analysis, motif enrichment analysis, time series analysis, and meta-analytic capabilities, synthesizing results across multiple studies. All methods are integrated into our open-source glycowork package, facilitating performant workflows and user-friendly access. We demonstrate these methods using dedicated simulations and glycomics datasets of N-, O-, lipid-linked, and free glycans. Differential expression tests here focus on human datasets and cancer vs. healthy tissue comparisons. Our rigorous approach allows for robust, reliable, and comprehensive differential expression analyses in glycomics, contributing to advancing glycomics research and its translation to clinical and diagnostic applications.
Topics: Humans; Glycomics; Polysaccharides
PubMed: 37992708
DOI: 10.1016/j.crmeth.2023.100652 -
Frontiers in Genetics 2020Time-series can provide critical insights into the structure and function of microbial communities. The analysis of temporal data warrants statistical considerations,...
Time-series can provide critical insights into the structure and function of microbial communities. The analysis of temporal data warrants statistical considerations, distinct from comparative microbiome studies, to address ecological questions. This primer identifies unique challenges and approaches for analyzing microbiome time-series. In doing so, we focus on (1) identifying compositionally similar samples, (2) inferring putative interactions among populations, and (3) detecting periodic signals. We connect theory, code and data via a series of hands-on modules with a motivating biological question centered on marine microbial ecology. The topics of the modules include characterizing shifts in community structure and activity, identifying expression levels with a diel periodic signal, and identifying putative interactions within a complex community. Modules are presented as self-contained, open-access, interactive tutorials in R and Matlab. Throughout, we highlight statistical considerations for dealing with autocorrelated and compositional data, with an eye to improving the robustness of inferences from microbiome time-series. In doing so, we hope that this primer helps to broaden the use of time-series analytic methods within the microbial ecology research community.
PubMed: 32373155
DOI: 10.3389/fgene.2020.00310 -
PLoS Computational Biology Dec 2021There is a growing realization that multi-way chromatin contacts formed in chromosome structures are fundamental units of gene regulation. However, due to the paucity...
There is a growing realization that multi-way chromatin contacts formed in chromosome structures are fundamental units of gene regulation. However, due to the paucity and complexity of such contacts, it is challenging to detect and identify them using experiments. Based on an assumption that chromosome structures can be mapped onto a network of Gaussian polymer, here we derive analytic expressions for n-body contact probabilities (n > 2) among chromatin loci based on pairwise genomic contact frequencies available in Hi-C, and show that multi-way contact probability maps can in principle be extracted from Hi-C. The three-body (triplet) contact probabilities, calculated from our theory, are in good correlation with those from measurements including Tri-C, MC-4C and SPRITE. Maps of multi-way chromatin contacts calculated from our analytic expressions can not only complement experimental measurements, but also can offer better understanding of the related issues, such as cell-line dependent assemblies of multiple genes and enhancers to chromatin hubs, competition between long-range and short-range multi-way contacts, and condensates of multiple CTCF anchors.
Topics: Chromatin; Chromosome Mapping; DNA; Enhancer Elements, Genetic; Gene Expression Regulation; Genes; Genomics; High-Throughput Nucleotide Sequencing; Humans
PubMed: 34871311
DOI: 10.1371/journal.pcbi.1009669 -
Frontiers in Psychiatry 2022In non-randomized studies (NRSs) where a continuous outcome variable (e.g., depressive symptoms) is assessed at baseline and follow-up, it is common to observe imbalance...
BACKGROUND
In non-randomized studies (NRSs) where a continuous outcome variable (e.g., depressive symptoms) is assessed at baseline and follow-up, it is common to observe imbalance of the baseline values between the treatment/exposure group and control group. This may bias the study and consequently a meta-analysis (MA) estimate. These estimates may differ across statistical methods used to deal with this issue. Analysis of individual participant data (IPD) allows standardization of methods across studies. We aimed to identify methods used in published IPD-MAs of NRSs for continuous outcomes, and to compare different methods to account for baseline values of outcome variables in IPD-MA of NRSs using two empirical examples from the Thyroid Studies Collaboration (TSC).
METHODS
For the first aim we systematically searched in MEDLINE, EMBASE, and Cochrane from inception to February 2021 to identify published IPD-MAs of NRSs that adjusted for baseline outcome measures in the analysis of continuous outcomes. For the second aim, we applied analysis of covariance (ANCOVA), change score, propensity score and the naïve approach (ignores the baseline outcome data) in IPD-MA from NRSs on the association between subclinical hyperthyroidism and depressive symptoms and renal function. We estimated the study and meta-analytic mean difference (MD) and relative standard error (SE). We used both fixed- and random-effects MA.
RESULTS
Ten of 18 (56%) of the included studies used the change score method, seven (39%) studies used ANCOVA and one the propensity score (5%). The study estimates were similar across the methods in studies in which groups were balanced at baseline with regard to outcome variables but differed in studies with baseline imbalance. In our empirical examples, ANCOVA and change score showed study results on the same direction, not the propensity score. In our applications, ANCOVA provided more precise estimates, both at study and meta-analytical level, in comparison to other methods. Heterogeneity was higher when change score was used as outcome, moderate for ANCOVA and null with the propensity score.
CONCLUSION
ANCOVA provided the most precise estimates at both study and meta-analytic level and thus seems preferable in the meta-analysis of IPD from non-randomized studies. For the studies that were well-balanced between groups, change score, and ANCOVA performed similarly.
PubMed: 35273528
DOI: 10.3389/fpsyt.2022.774251 -
Briefings in Bioinformatics May 2022The human major histocompatibility complex (MHC), also known as human leukocyte antigen (HLA), plays an important role in the adaptive immune system by presenting...
MOTIVATION
The human major histocompatibility complex (MHC), also known as human leukocyte antigen (HLA), plays an important role in the adaptive immune system by presenting non-self-peptides to T cell receptors. The MHC region has been shown to be associated with a variety of diseases, including autoimmune diseases, organ transplantation and tumours. However, structural analytic tools of HLA are still sparse compared to the number of identified HLA alleles, which hinders the disclosure of its pathogenic mechanism.
RESULT
To provide an integrative analysis of HLA, we first collected 1296 amino acid sequences, 256 protein data bank structures, 120 000 frequency data of HLA alleles in different populations, 73 000 publications and 39 000 disease-associated single nucleotide polymorphism sites, as well as 212 modelled HLA heterodimer structures. Then, we put forward two new strategies for building up a toolkit for transplantation and tumour immunotherapy, designing risk alignment pipeline and antigenic peptide prediction pipeline by integrating different resources and bioinformatic tools. By integrating 100 000 calculated HLA conformation difference and online tools, risk alignment pipeline provides users with the functions of structural alignment, sequence alignment, residue visualization and risk report generation of mismatched HLA molecules. For tumour antigen prediction, we first predicted 370 000 immunogenic peptides based on the affinity between peptides and MHC to generate the neoantigen catalogue for 11 common tumours. We then designed an antigenic peptide prediction pipeline to provide the functions of mutation prediction, peptide prediction, immunogenicity assessment and docking simulation. We also present a case study of hepatitis B virus mutations associated with liver cancer that demonstrates the high legitimacy of our antigenic peptide prediction process. HLA3D, including different HLA analytic tools and the prediction pipelines, is available at http://www.hla3d.cn/.
Topics: Computational Biology; HLA Antigens; Histocompatibility Antigens Class I; Humans; Immunotherapy; Neoplasms; Peptides; Protein Binding
PubMed: 35289353
DOI: 10.1093/bib/bbac076 -
Biological Psychiatry Global Open... Oct 2022Genetics and biology may influence the age of onset of anorexia nervosa (AN). The aims of this study were to determine whether common genetic variation contributes to...
BACKGROUND
Genetics and biology may influence the age of onset of anorexia nervosa (AN). The aims of this study were to determine whether common genetic variation contributes to age of onset of AN and to investigate the genetic associations between age of onset of AN and age at menarche.
METHODS
A secondary analysis of the Psychiatric Genomics Consortium genome-wide association study (GWAS) of AN was performed, which included 9335 cases and 31,981 screened controls, all from European ancestries. We conducted GWASs of age of onset, early-onset AN (<13 years), and typical-onset AN, and genetic correlation, genetic risk score, and Mendelian randomization analyses.
RESULTS
Two loci were genome-wide significant in the typical-onset AN GWAS. Heritability estimates (single nucleotide polymorphism- ) were 0.01-0.04 for age of onset, 0.16-0.25 for early-onset AN, and 0.17-0.25 for typical-onset AN. Early- and typical-onset AN showed distinct genetic correlation patterns with putative risk factors for AN. Specifically, early-onset AN was significantly genetically correlated with younger age at menarche, and typical-onset AN was significantly negatively genetically correlated with anthropometric traits. Genetic risk scores for age of onset and early-onset AN estimated from independent GWASs significantly predicted age of onset. Mendelian randomization analysis suggested a causal link between younger age at menarche and early-onset AN.
CONCLUSIONS
Our results provide evidence consistent with a common variant genetic basis for age of onset and implicate biological pathways regulating menarche and reproduction.
PubMed: 36324647
DOI: 10.1016/j.bpsgos.2021.09.001 -
Archives of Toxicology Jul 2021Since the addition of fluoride to drinking water in the 1940s, there have been frequent and sometimes heated discussions regarding its benefits and risks. In a recently... (Review)
Review
Since the addition of fluoride to drinking water in the 1940s, there have been frequent and sometimes heated discussions regarding its benefits and risks. In a recently published review, we addressed the question if current exposure levels in Europe represent a risk to human health. This review was discussed in an editorial asking why we did not calculate benchmark doses (BMD) of fluoride neurotoxicity for humans. Here, we address the question, why it is problematic to calculate BMDs based on the currently available data. Briefly, the conclusions of the available studies are not homogeneous, reporting negative as well as positive results; moreover, the positive studies lack control of confounding factors such as the influence of well-known neurotoxicants. We also discuss the limitations of several further epidemiological studies that did not meet the inclusion criteria of our review. Finally, it is important to not only focus on epidemiological studies. Rather, risk analysis should consider all available data, including epidemiological, animal, as well as in vitro studies. Despite remaining uncertainties, the totality of evidence does not support the notion that fluoride should be considered a human developmental neurotoxicant at current exposure levels in European countries.
Topics: Animals; Drinking Water; Epidemiologic Studies; Europe; Fluorides; Longitudinal Studies
PubMed: 34095968
DOI: 10.1007/s00204-021-03072-6 -
Sensors (Basel, Switzerland) Aug 2023Green Chemistry is a vital and crucial instrument in achieving pollution control, and it plays an important role in helping society reach the Sustainable Development...
Green Chemistry is a vital and crucial instrument in achieving pollution control, and it plays an important role in helping society reach the Sustainable Development Goals (SDGs). NIR (near-infrared spectroscopy) has been utilized as an alternate technique for molecular identification, making the process faster and less expensive. Near-infrared diffuse reflectance spectroscopy and Machine Learning (ML) algorithms were utilized in this study to construct identification and classification models of bacteria such as , , and . Furthermore, divide these bacteria into Gram-negative and Gram-positive groups. The green and quick approach was created by combining NIR spectroscopy with a diffuse reflectance accessory. Using infrared spectral data and ML techniques such as principal component analysis (PCA), hierarchical cluster analysis (HCA) and K-Nearest Neighbor (KNN), It was feasible to accomplish the identification and classification of four bacteria and classify these bacteria into two groups: Gram-positive and Gram-negative, with 100% accuracy. We may conclude that our study has a high potential for bacterial identification and classification, as well as being consistent with global policies of sustainable development and green analytical chemistry.
Topics: Spectroscopy, Near-Infrared; Algorithms; Bacteria; Chemistry, Analytic; Escherichia coli; Machine Learning
PubMed: 37687792
DOI: 10.3390/s23177336 -
Nature Human Behaviour May 2023Identifying genetic determinants of reproductive success may highlight mechanisms underlying fertility and identify alleles under present-day selection. Using data in...
Identifying genetic determinants of reproductive success may highlight mechanisms underlying fertility and identify alleles under present-day selection. Using data in 785,604 individuals of European ancestry, we identified 43 genomic loci associated with either number of children ever born (NEB) or childlessness. These loci span diverse aspects of reproductive biology, including puberty timing, age at first birth, sex hormone regulation, endometriosis and age at menopause. Missense variants in ARHGAP27 were associated with higher NEB but shorter reproductive lifespan, suggesting a trade-off at this locus between reproductive ageing and intensity. Other genes implicated by coding variants include PIK3IP1, ZFP82 and LRP4, and our results suggest a new role for the melanocortin 1 receptor (MC1R) in reproductive biology. As NEB is one component of evolutionary fitness, our identified associations indicate loci under present-day natural selection. Integration with data from historical selection scans highlighted an allele in the FADS1/2 gene locus that has been under selection for thousands of years and remains so today. Collectively, our findings demonstrate that a broad range of biological mechanisms contribute to reproductive success.
Topics: Child; Female; Humans; Aging; Fertility; Menopause; Reproduction; Selection, Genetic
PubMed: 36864135
DOI: 10.1038/s41562-023-01528-6 -
The Journal of Chemical Physics Oct 2022This work is devoted to deriving and implementing analytic second- and third-order energy derivatives with respect to the nuclear coordinates and external electric field...
Analytic high-order energy derivatives for metal nanoparticle-mediated infrared and Raman scattering spectra within the framework of quantum mechanics/molecular mechanics model with induced charges and dipoles.
This work is devoted to deriving and implementing analytic second- and third-order energy derivatives with respect to the nuclear coordinates and external electric field within the framework of the hybrid quantum mechanics/molecular mechanics method with induced charges and dipoles (QM/DIM). Using these analytic energy derivatives, one can efficiently compute the harmonic vibrational frequencies, infrared (IR) and Raman scattering (RS) spectra of the molecule in the proximity of noble metal clusters/nanoparticles. The validity and accuracy of these analytic implementations are demonstrated by the comparison of results obtained by the finite-difference method and the analytic approaches and by the full QM and QM/DIM calculations. The complexes formed by pyridine and two sizes of gold clusters (Au and Au) at varying intersystem distances of 3, 4, and 5 Å are used as the test systems, and Raman spectra of 4,4'-bipyridine in the proximity of Au and Ag metal nanoparticles (MNP) are calculated by the QM/DIM method and compared with experimental results as well. We find that the QM/DIM model can well reproduce the IR spectra obtained from full QM calculations for all the configurations, while although it properly enhances some of the vibrational modes, it artificially overestimates RS spectral intensities of several modes for the systems with very short intersystem distance. We show that this could be improved, however, by incorporating the hyperpolarizability of the gold metal cluster in the evaluation of RS intensities. Additionally, we address the potential impact of charge migration between the adsorbate and MNPs.
PubMed: 36319412
DOI: 10.1063/5.0118205