-
BMC Genomics Jun 2024Deep Mutational Scanning (DMS) assays are powerful tools to study sequence-function relationships by measuring the effects of thousands of sequence variants on protein...
Deep Mutational Scanning (DMS) assays are powerful tools to study sequence-function relationships by measuring the effects of thousands of sequence variants on protein function. During a DMS experiment, several technical artefacts might distort non-linearly the functional score obtained, potentially biasing the interpretation of the results. We therefore tested several technical parameters in the deepPCA workflow, a DMS assay for protein-protein interactions, in order to identify technical sources of non-linearities. We found that parameters common to many DMS assays such as amount of transformed DNA, timepoint of harvest and library composition can cause non-linearities in the data. Designing experiments in a way to minimize these non-linear effects will improve the quantification and interpretation of mutation effects.
Topics: Mutation; Workflow; Proteins; High-Throughput Nucleotide Sequencing; Protein Interaction Mapping; DNA Mutational Analysis; Protein Binding
PubMed: 38914936
DOI: 10.1186/s12864-024-10524-7 -
BMC Bioinformatics Jun 2024Pan-virus detection, and virome investigation in general, can be challenging, mainly due to the lack of universally conserved genetic elements in viruses. Metagenomic...
BACKGROUND
Pan-virus detection, and virome investigation in general, can be challenging, mainly due to the lack of universally conserved genetic elements in viruses. Metagenomic next-generation sequencing can offer a promising solution to this problem by providing an unbiased overview of the microbial community, enabling detection of any viruses without prior target selection. However, a major challenge in utilising metagenomic next-generation sequencing for virome investigation is that data analysis can be highly complex, involving numerous data processing steps.
RESULTS
Here, we present Entourage to address this challenge. Entourage enables short-read sequence assembly, viral sequence search with or without reference virus targets using contig-based approaches, and intrasample sequence variation quantification. Several workflows are implemented in Entourage to facilitate end-to-end virus sequence detection analysis through a single command line, from read cleaning, sequence assembly, to virus sequence searching. The results generated are comprehensive, allowing for thorough quality control, reliability assessment, and interpretation. We illustrate Entourage's utility as a streamlined workflow for virus detection by employing it to comprehensively search for target virus sequences and beyond in raw sequence read data generated from HeLa cell culture samples spiked with viruses. Furthermore, we showcase its flexibility and performance on a real-world dataset by analysing a preassembled Tara Oceans dataset. Overall, our results show that Entourage performs well even with low virus sequencing depth in single digits, and it can be used to discover novel viruses effectively. Additionally, by using sequence data generated from a patient with chronic SARS-CoV-2 infection, we demonstrate Entourage's capability to quantify virus intrasample genetic variations, and generate publication-quality figures illustrating the results.
CONCLUSIONS
Entourage is an all-in-one, versatile, and streamlined bioinformatics software for virome investigation, developed with a focus on ease of use. Entourage is available at https://codeberg.org/CENMIG/Entourage under the MIT license.
Topics: Software; Genome, Viral; Humans; High-Throughput Nucleotide Sequencing; SARS-CoV-2; Metagenomics; Viruses; COVID-19; Virome; HeLa Cells
PubMed: 38914932
DOI: 10.1186/s12859-024-05846-y -
BMC Genomics Jun 2024Current RNA-seq analysis software for RNA-seq data tends to use similar parameters across different species without considering species-specific differences. However,...
BACKGROUND
Current RNA-seq analysis software for RNA-seq data tends to use similar parameters across different species without considering species-specific differences. However, the suitability and accuracy of these tools may vary when analyzing data from different species, such as humans, animals, plants, fungi, and bacteria. For most laboratory researchers lacking a background in information science, determining how to construct an analysis workflow that meets their specific needs from the array of complex analytical tools available poses a significant challenge.
RESULTS
By utilizing RNA-seq data from plants, animals, and fungi, it was observed that different analytical tools demonstrate some variations in performance when applied to different species. A comprehensive experiment was conducted specifically for analyzing plant pathogenic fungal data, focusing on differential gene analysis as the ultimate goal. In this study, 288 pipelines using different tools were applied to analyze five fungal RNA-seq datasets, and the performance of their results was evaluated based on simulation. This led to the establishment of a relatively universal and superior fungal RNA-seq analysis pipeline that can serve as a reference, and certain standards for selecting analysis tools were derived for reference. Additionally, we compared various tools for alternative splicing analysis. The results based on simulated data indicated that rMATS remained the optimal choice, although consideration could be given to supplementing with tools such as SpliceWiz.
CONCLUSION
The experimental results demonstrate that, in comparison to the default software parameter configurations, the analysis combination results after tuning can provide more accurate biological insights. It is beneficial to carefully select suitable analysis software based on the data, rather than indiscriminately choosing tools, in order to achieve high-quality analysis results more efficiently.
Topics: Workflow; RNA-Seq; Software; Fungi; Computational Biology; Sequence Analysis, RNA; Alternative Splicing
PubMed: 38914930
DOI: 10.1186/s12864-024-10414-y -
PloS One 2024Retinitis pigmentosa (RP) is the most common inherited retinal dystrophy and a major cause of blindness. RP is caused by several variants of multiple genes, and genetic...
Retinitis pigmentosa (RP) is the most common inherited retinal dystrophy and a major cause of blindness. RP is caused by several variants of multiple genes, and genetic diagnosis by identifying these variants is important for optimizing treatment and estimating patient prognosis. Next-generation sequencing (NGS), which is currently widely used for diagnosis, is considered useful but is known to have limitations in detecting copy number variations (CNVs). In this study, we re-evaluated CNVs in EYS, the main causative gene of RP, identified via NGS using multiplex ligation-dependent probe amplification (MLPA). CNVs were identified in NGS samples of eight patients. To identify potential CNVs, MLPA was also performed on samples from 42 patients who were undiagnosed by NGS but carried one of the five major pathogenic variants reported in Japanese EYS-RP cases. All suspected CNVs based on NGS data in the eight patients were confirmed via MLPA. CNVs were found in 2 of the 42 NGS-undiagnosed RP cases. Furthermore, results showed that 121 of the 661 patients with RP had EYS as the causative gene, and 8.3% (10/121 patients with EYS-RP) had CNVs. Although NGS using the CNV calling criteria utilized in this study failed to identify CNVs in two cases, no false-positive results were detected. Collectively, these findings suggest that NGS is useful for CNV detection during clinical diagnosis of RP.
Topics: Humans; Retinitis Pigmentosa; DNA Copy Number Variations; High-Throughput Nucleotide Sequencing; Female; Male; Eye Proteins; Middle Aged; Adult; Multiplex Polymerase Chain Reaction
PubMed: 38913662
DOI: 10.1371/journal.pone.0305812 -
Frontiers in Cellular and Infection... 2024Invasive mold diseases of the central nervous (CNS IMD) system are exceedingly rare disorders, characterized by nonspecific clinical symptoms. This results in...
BACKGROUND
Invasive mold diseases of the central nervous (CNS IMD) system are exceedingly rare disorders, characterized by nonspecific clinical symptoms. This results in significant diagnostic challenges, often leading to delayed diagnosis and the risk of misdiagnosis for patients. Metagenomic Next-Generation Sequencing (mNGS) holds significant importance for the diagnosis of infectious diseases, especially in the rapid and accurate identification of rare and difficult-to-culture pathogens. Therefore, this study aims to explore the clinical characteristics of invasive mold disease of CNS IMD in children and assess the effectiveness of mNGS technology in diagnosing CNS IMD.
METHODS
Three pediatric patients diagnosed with Invasive mold disease brain abscess and treated in the Pediatric Intensive Care Unit (PICU) of the First Affiliated Hospital of Zhengzhou University from January 2020 to December 2023 were selected for this study.
RESULTS
Case 1, a 6-year-old girl, was admitted to the hospital with "acute liver failure." During her hospital stay, she developed fever, irritability, and seizures. CSF mNGS testing resulted in a negative outcome. Multiple brain abscesses were drained, and was detected in pus culture and mNGS. The condition gradually improved after treatment with voriconazole combined with caspofungin. Case 2, a 3-year-old girl, was admitted with "acute B-lymphoblastic leukemia." During induction chemotherapy, she developed fever and seizures. was detected in the intracranial abscess fluid by mNGS, and the condition gradually improved after treatment with voriconazole combined with caspofungin, followed by "right-sided brain abscess drainage surgery." Case 3, a 7-year-old girl, showed lethargy, fever, and right-sided limb weakness during the pending chemotherapy period for acute B-lymphoblastic leukemia. and was detected in the cerebrospinal fluid by mNGS. The condition gradually improved after treatment with amphotericin B combined with posaconazole. After a six-month follow-up post-discharge, the three patients improved without residual neurological sequelae, and the primary diseases were in complete remission.
CONCLUSION
The clinical manifestations of CNS IMD lack specificity. Early mNGS can assist in identifying the pathogen, providing a basis for definitive diagnosis. Combined surgical treatment when necessary can help improve prognosis.
Topics: Humans; Female; High-Throughput Nucleotide Sequencing; Child; Metagenomics; Brain Abscess; Antifungal Agents; Invasive Fungal Infections; Male; Central Nervous System Fungal Infections; Child, Preschool; Aspergillus fumigatus; Caspofungin
PubMed: 38912204
DOI: 10.3389/fcimb.2024.1393242 -
Scientific Data Jun 2024The greater amberjack is a very important fishery species with high commercial value, and it is distributed worldwide. Transcriptome-based studies on S. dumerili have...
The greater amberjack is a very important fishery species with high commercial value, and it is distributed worldwide. Transcriptome-based studies on S. dumerili have been limited by an inadequate reference genome and a lack of well-annotated full-length transcripts. In this study, a total of 12 tissues from juvenile and adult fish both sexes were collected for next-generation RNA sequencing (RNA-seq) and full-length isoform sequencing (Iso-seq). For Iso-seq, a total of 163,218, 149,716, and 189,169 high-quality unique transcript sequences were obtained, with an N50 of 5,441, 5,255, and 5,939, from juvenile, adult male and adult female S. dumerili, respectively. We integrated the Iso-seq and RNA-seq data to construct a comprehensive gene annotation and systematically profiled the dynamics of gene expression across the 12 tissues. Our gene models had greater detail and accuracy than those from NCBI and Ensembl, with more precise polyA locations. These resources serve as a foundation for functional genomic studies and provide valuable insights into the molecular mechanisms underlying the development, reproduction and commercial traits of amberjack.
Topics: Animals; Genome; Male; RNA-Seq; Female; Transcriptome; Molecular Sequence Annotation; Sequence Analysis, RNA; High-Throughput Nucleotide Sequencing; Fishes
PubMed: 38909036
DOI: 10.1038/s41597-024-03495-7 -
Plant Disease Jun 2024Grapevine enamovirus 1 (GEV1) belongs to the genus Enamovirus, in the family Solemoviridae. It has been reported from several countries infecting grapevines including...
Grapevine enamovirus 1 (GEV1) belongs to the genus Enamovirus, in the family Solemoviridae. It has been reported from several countries infecting grapevines including Brazil (Silva et al. 2017), China (Ren et al. 2021) and France (Hily et al. 2022). To assess the prevalence and diversity of economically important grapevine viruses in nine Canadian vineyards, total RNA and double-stranded RNA (dsRNA) (Fall et al. 2020) were extracted from 30 and 100 composite samples respectively, with each consisting of five vines of the same cultivars. The cultivars included in this study are Frontenac noir (n=34), Vidal (n=32), Marquette (n=33), Riesling (n=31), and Pinot noir (n=31). The total RNA and dsRNA samples were subsequently multiplexed and diagnosed by high-throughput sequencing (HTS) on NovaSeq (600 S4 PE100) and MiSeq (2 × 250 cycle PE) respectively. From NovaSeq and MiSeq sequencing, an average of 410,000 to 1.3 million reads/sample were obtained, respectively, with mapped viral reads representing 10.92% to 12.48% of the total reads. After sequence quality was verified using Trimmomatic v.0.40 (Bolger et al. 2014), the clean sequences were screened against all possible viruses in the databases using the Virtool (Rott et al. 2017) and VirFind virus detection pipelines (Ho and Tzanetakis 2014). GEV1 was detected in clean sequences from two, three, and two leaf samples of cultivars 'Marquette' 'Riesling' and 'Frontenac noir' respectively. Six of the seven HTS-assembled GEV1 genomes were partial, ranging from 4,523 to 6,000 nucleotide (nt) with genome coverage varying from 71% to 89%. Only one 6,314 nt long assembled contig (Accession No. OR021829), represented a nearly complete genome, being only 53 and 3 nt shorter than Sd-CG (MT536978) at 5' and 3' untranslated regions (UTR), respectively. Isolate 3- Riesling-CAN (OR021829) shares 90.56 to 94.19% nt identities with several GEV1isolates at 96-99% of query coverage. Phylogenetically, OR021829 is closer to GEV1 isolates from France and China (Figure S1). To validate the HTS results, the developed primer pair SetF and Set1R (Silva et al., 2017) was used for RT-PCR detection. The amplicons from all seven HTS-positive samples were sequenced using Sanger sequencing, confirming the presence of GEV-1 in three studied grape cultivars in Canadian vineyards. Symptoms associated with the specific GEV1-infected vines could not be explained as composite samples were used. Each of the combined samples HTS library also tested positive for at least one of the known grape virus/viroids, namely grapevine leafroll associated-virus -3, grapevine pinot gris virus, grapevine rupestris stem pitting-associated virus, Marafivirus syrahense grapevine Syrah virus-1 and hop stunt viroid. To our knowledge, this is the first report of GEV1 being detected in grapevines in Canada, or in any North American vineyard. GEV1 is a relatively new virus, and its biology remains largely unknown. Based on this sequence new GEV1 primers can be developed to know the genetic variability among GEV-1 and improve the detection of this virus in vineyards.
PubMed: 38907522
DOI: 10.1094/PDIS-11-23-2452-PDN -
Acta Neuropathologica Communications Jun 2024Neurofibromatosis Type 1 (NF1) is caused by loss of function variants in the NF1 gene. Most patients with NF1 develop skin lesions called cutaneous neurofibromas (cNFs)....
snRNA-seq of human cutaneous neurofibromas before and after selumetinib treatment implicates role of altered Schwann cell states, inter-cellular signaling, and extracellular matrix in treatment response.
Neurofibromatosis Type 1 (NF1) is caused by loss of function variants in the NF1 gene. Most patients with NF1 develop skin lesions called cutaneous neurofibromas (cNFs). Currently the only approved therapeutic for NF1 is selumetinib, a mitogen -activated protein kinase (MEK) inhibitor. The purpose of this study was to analyze the transcriptome of cNF tumors before and on selumetinib treatment to understand both tumor composition and response. We obtained biopsy sets of tumors both pre- and on- selumetinib treatment from the same individuals and were able to collect sets from four separate individuals. We sequenced mRNA from 5844 nuclei and identified 30,442 genes in the untreated group and sequenced 5701 nuclei and identified 30,127 genes in the selumetinib treated group. We identified and quantified distinct populations of cells (Schwann cells, fibroblasts, pericytes, myeloid cells, melanocytes, keratinocytes, and two populations of endothelial cells). While we anticipated that cell proportions might change with treatment, we did not identify any one cell population that changed significantly, likely due to an inherent level of variability between tumors. We also evaluated differential gene expression based on drug treatment in each cell type. Ingenuity pathway analysis (IPA) was also used to identify pathways that differ on treatment. As anticipated, we identified a significant decrease in ERK/MAPK signaling in cells including Schwann cells but most specifically in myeloid cells. Interestingly, there is a significant decrease in opioid signaling in myeloid and endothelial cells; this downward trend is also observed in Schwann cells and fibroblasts. Cell communication was assessed by RNA velocity, Scriabin, and CellChat analyses which indicated that Schwann cells and fibroblasts have dramatically altered cell states defined by specific gene expression signatures following treatment (RNA velocity). There are dramatic changes in receptor-ligand pairs following treatment (Scriabin), and robust intercellular signaling between virtually all cell types associated with extracellular matrix (ECM) pathways (Collagen, Laminin, Fibronectin, and Nectin) is downregulated after treatment. These response specific gene signatures and interaction pathways could provide clues for understanding treatment outcomes or inform future therapies.
Topics: Humans; Schwann Cells; Skin Neoplasms; Benzimidazoles; Extracellular Matrix; Signal Transduction; Neurofibroma; Female; Male; RNA-Seq; Middle Aged; Adult; Neurofibromatosis 1; Protein Kinase Inhibitors; Transcriptome
PubMed: 38907342
DOI: 10.1186/s40478-024-01821-z -
Frontiers in Endocrinology 2024Thyroid cancer rarely occurs in children and adolescents. Molecular markers such as , , and have been widely used in adult PTC. It is currently unclear whether these...
OBJECTIVES
Thyroid cancer rarely occurs in children and adolescents. Molecular markers such as , , and have been widely used in adult PTC. It is currently unclear whether these molecular markers have equivalent potential for application in pediatric patients. This study aims to explore the potential utility of a multi-gene conjoint analysis based on next-generation targeted sequencing for pediatric papillary thyroid carcinoma (PTC).
MATERIALS AND METHODS
The patients diagnosed with PTC (aged 18 years or younger) in the pediatrics department of Lishui District Hospital of Traditional Chinese Medicine were retrospectively screened. A targeted enrichment and sequencing analysis of 116 genes associated with thyroid cancer was performed on paraffin-embedded tumor tissues and paired paracancerous tissue of fifteen children (average age 14.60) and nine adults (average age 49.33) PTC patients. Demographic information, clinical indicators, ultrasonic imaging information and pathological data were collected. The Kendall correlation test was used to establish a correlation between molecular variations and clinical characteristics in pediatric patients.
RESULTS
A sample of 15 pediatric PTCs revealed a detection rate of 73.33% (11/15) for driver gene mutations and fusion. Compared to adult PTCs, the genetic mutation landscape of pediatric PTCs was more complex. Six mutant genes overlap between the two groups, and an additional seventeen unique mutant genes were identified only in pediatric PTCs. There was only one unique mutant gene in adult PTCs. The tumor diameter of pediatric PTCs tended to be less than 4cm (p<0.001), and the number of lymph node metastases was more than five (p<0.001). Mutations in specific genes unique to pediatric PTCs may contribute to the onset and progression of the disease by adversely affecting hormone synthesis, secretion, and action mechanisms, as well as the functioning of thyroid hormone signaling pathways. But, additional experiments are required to validate this hypothesis.
CONCLUSION
mutation and fusion are involved in the occurrence and development of adolescent PTC. For pediatric thyroid nodules that cannot be determined as benign or malignant by fine needle aspiration biopsy, multiple gene combination testing can provide a reference for personalized diagnosis and treatment by clinical physicians.
Topics: Humans; Female; Adolescent; Thyroid Cancer, Papillary; Male; Child; Thyroid Neoplasms; Mutation; Retrospective Studies; Proto-Oncogene Proteins B-raf; Adult; Middle Aged; Biomarkers, Tumor; Proto-Oncogene Proteins c-ret; High-Throughput Nucleotide Sequencing; DNA Mutational Analysis
PubMed: 38904052
DOI: 10.3389/fendo.2024.1405142 -
Human Genomics Jun 2024Single cell RNA sequencing technology (scRNA-seq) has been proven useful in understanding cell-specific disease mechanisms. However, identifying genes of interest...
BACKGROUND
Single cell RNA sequencing technology (scRNA-seq) has been proven useful in understanding cell-specific disease mechanisms. However, identifying genes of interest remains a key challenge. Pseudo-bulk methods that pool scRNA-seq counts in the same biological replicates have been commonly used to identify differentially expressed genes. However, such methods may lack power due to the limited sample size of scRNA-seq datasets, which can be prohibitively expensive.
RESULTS
Motivated by this, we proposed to use the Bayesian-frequentist hybrid (BFH) framework to increase the power and we showed in simulated scenario, the proposed BFH would be an optimal method when compared with other popular single cell differential expression methods if both FDR and power were considered. As an example, the method was applied to an idiopathic pulmonary fibrosis (IPF) case study.
CONCLUSION
In our IPF example, we demonstrated that with a proper informative prior, the BFH approach identified more genes of interest. Furthermore, these genes were reasonable based on the current knowledge of IPF. Thus, the BFH offers a unique and flexible framework for future scRNA-seq analyses.
Topics: Single-Cell Analysis; Bayes Theorem; Humans; RNA-Seq; Sequence Analysis, RNA; Idiopathic Pulmonary Fibrosis; Gene Expression Profiling; Algorithms
PubMed: 38902839
DOI: 10.1186/s40246-024-00638-0