-
Bioinformatics (Oxford, England) Jun 2024Shotgun metagenomics allows for direct analysis of microbial community genetics, but scalable computational methods for the recovery of bacterial strain genomes from...
SUMMARY
Shotgun metagenomics allows for direct analysis of microbial community genetics, but scalable computational methods for the recovery of bacterial strain genomes from microbiomes remains a key challenge. We introduce Floria, a novel method designed for rapid and accurate recovery of strain haplotypes from short and long-read metagenome sequencing data, based on minimum error correction (MEC) read clustering and a strain-preserving network flow model. Floria can function as a standalone haplotyping method, outputting alleles and reads that co-occur on the same strain, as well as an end-to-end read-to-assembly pipeline (Floria-PL) for strain-level assembly. Benchmarking evaluations on synthetic metagenomes show that Floria is > 3× faster and recovers 21% more strain content than base-level assembly methods (Strainberry) while being over an order of magnitude faster when only phasing is required. Applying Floria to a set of 109 deeply sequenced nanopore metagenomes took <20 min on average per sample and identified several species that have consistent strain heterogeneity. Applying Floria's short-read haplotyping to a longitudinal gut metagenomics dataset revealed a dynamic multi-strain Anaerostipes hadrus community with frequent strain loss and emergence events over 636 days. With Floria, accurate haplotyping of metagenomic datasets takes mere minutes on standard workstations, paving the way for extensive strain-level metagenomic analyses.
AVAILABILITY AND IMPLEMENTATION
Floria is available at https://github.com/bluenote-1577/floria, and the Floria-PL pipeline is available at https://github.com/jsgounot/Floria_analysis_workflow along with code for reproducing the benchmarks.
Topics: Metagenome; Metagenomics; Haplotypes; Software; Humans; Genome, Bacterial; Microbiota; Bacteria; High-Throughput Nucleotide Sequencing; Sequence Analysis, DNA
PubMed: 38940183
DOI: 10.1093/bioinformatics/btae252 -
Bioinformatics (Oxford, England) Jun 2024Charting cellular trajectories over gene expression is key to understanding dynamic cellular processes and their underlying mechanisms. While advances in single-cell...
BACKGROUND
Charting cellular trajectories over gene expression is key to understanding dynamic cellular processes and their underlying mechanisms. While advances in single-cell RNA-sequencing technologies and computational methods have pushed forward the recovery of such trajectories, trajectory inference remains a challenge due to the noisy, sparse, and high-dimensional nature of single-cell data. This challenge can be alleviated by increasing either the number of cells sampled along the trajectory (breadth) or the sequencing depth, i.e. the number of reads captured per cell (depth). Generally, these two factors are coupled due to an inherent breadth-depth tradeoff that arises when the sequencing budget is constrained due to financial or technical limitations.
RESULTS
Here we study the optimal allocation of a fixed sequencing budget to optimize the recovery of trajectory attributes. Empirical results reveal that reconstruction accuracy of internal cell structure in expression space scales with the logarithm of either the breadth or depth of sequencing. We additionally observe a power law relationship between the optimal number of sampled cells and the corresponding sequencing budget. For linear trajectories, non-monotonicity in trajectory reconstruction across the breadth-depth tradeoff can impact downstream inference, such as expression pattern analysis along the trajectory. We demonstrate these results for five single-cell RNA-sequencing datasets encompassing differentiation of embryonic stem cells, pancreatic beta cells, hepatoblast and multipotent hematopoietic cells, as well as induced reprogramming of embryonic fibroblasts into neurons. By addressing the challenges of single-cell data, our study offers insights into maximizing the efficiency of cellular trajectory analysis through strategic allocation of sequencing resources.
Topics: Single-Cell Analysis; Sequence Analysis, RNA; Humans; Animals; High-Throughput Nucleotide Sequencing
PubMed: 38940162
DOI: 10.1093/bioinformatics/btae258 -
Bioinformatics (Oxford, England) Jun 2024High-throughput RNA sequencing has become indispensable for decoding gene activities, yet the challenge of reconstructing full-length transcripts persists. Traditional...
MOTIVATION
High-throughput RNA sequencing has become indispensable for decoding gene activities, yet the challenge of reconstructing full-length transcripts persists. Traditional single-sample assemblers frequently produce fragmented transcripts, especially in single-cell RNA-seq data. While algorithms designed for assembling multiple samples exist, they encounter various limitations.
RESULTS
We present Aletsch, a new assembler for multiple bulk or single-cell RNA-seq samples. Aletsch incorporates several algorithmic innovations, including a "bridging" system that can effectively integrate multiple samples to restore missed junctions in individual samples, and a new graph-decomposition algorithm that leverages "supporting" information across multiple samples to guide the decomposition of complex vertices. A standout feature of Aletsch is its application of a random forest model with 50 well-designed features for scoring transcripts. We demonstrate its robust adaptability across different chromosomes, datasets, and species. Our experiments, conducted on RNA-seq data from several protocols, firmly demonstrate Aletsch's significant outperformance over existing meta-assemblers. As an example, when measured with the partial area under the precision-recall curve (pAUC, constrained by precision), Aletsch surpasses the leading assemblers TransMeta by 22.9%-62.1% and PsiCLASS by 23.0%-175.5% on human datasets.
AVAILABILITY AND IMPLEMENTATION
Aletsch is freely available at https://github.com/Shao-Group/aletsch. Scripts that reproduce the experimental results of this manuscript is available at https://github.com/Shao-Group/aletsch-test.
Topics: Algorithms; RNA-Seq; Software; Humans; High-Throughput Nucleotide Sequencing; Sequence Analysis, RNA
PubMed: 38940157
DOI: 10.1093/bioinformatics/btae215 -
Bioinformatics (Oxford, England) Jun 2024Short-read single-cell RNA-sequencing (scRNA-seq) has been used to study cellular heterogeneity, cellular fate, and transcriptional dynamics. Modeling splicing dynamics...
MOTIVATION
Short-read single-cell RNA-sequencing (scRNA-seq) has been used to study cellular heterogeneity, cellular fate, and transcriptional dynamics. Modeling splicing dynamics in scRNA-seq data is challenging, with inherent difficulty in even the seemingly straightforward task of elucidating the splicing status of the molecules from which sequenced fragments are drawn. This difficulty arises, in part, from the limited read length and positional biases, which substantially reduce the specificity of the sequenced fragments. As a result, the splicing status of many reads in scRNA-seq is ambiguous because of a lack of definitive evidence. We are therefore in need of methods that can recover the splicing status of ambiguous reads which, in turn, can lead to more accuracy and confidence in downstream analyses.
RESULTS
We develop Forseti, a predictive model to probabilistically assign a splicing status to scRNA-seq reads. Our model has two key components. First, we train a binding affinity model to assign a probability that a given transcriptomic site is used in fragment generation. Second, we fit a robust fragment length distribution model that generalizes well across datasets deriving from different species and tissue types. Forseti combines these two trained models to predict the splicing status of the molecule of origin of reads by scoring putative fragments that associate each alignment of sequenced reads with proximate potential priming sites. Using both simulated and experimental data, we show that our model can precisely predict the splicing status of many reads and identify the true gene origin of multi-gene mapped reads.
AVAILABILITY AND IMPLEMENTATION
Forseti and the code used for producing the results are available at https://github.com/COMBINE-lab/forseti under a BSD 3-clause license.
Topics: RNA Splicing; Single-Cell Analysis; Sequence Analysis, RNA; Humans; Software; RNA-Seq; Algorithms; Single-Cell Gene Expression Analysis
PubMed: 38940130
DOI: 10.1093/bioinformatics/btae207 -
Frontiers in Immunology 2024This study discusses the importance of minimal residual disease (MRD) detection in acute myeloid leukemia (AML) patients using liquid biopsy and next-generation...
This study discusses the importance of minimal residual disease (MRD) detection in acute myeloid leukemia (AML) patients using liquid biopsy and next-generation sequencing (NGS). AML prognosis is based on various factors, including genetic alterations. NGS has revealed the molecular complexity of AML and helped refine risk stratification and personalized therapies. The long-term survival rates for AML patients are low, and MRD assessment is crucial in predicting prognosis. Currently, the most common methods for MRD detection are flow cytometry and quantitative PCR, but NGS is being incorporated into clinical practice due to its ability to detect genomic aberrations in the majority of AML patients. Typically, bone marrow samples are used for MRD assessment, but using peripheral blood samples or liquid biopsies would be less invasive. Leukemia originates in the bone marrow, along with the cfDNA obtained from peripheral blood. This study aimed to assess the utility of cell-free DNA (cfDNA) from peripheral blood samples for MRD detection in AML patients. A cohort of 20 AML patients was analyzed using NGS, and a correlation between MRD assessment by cfDNA and circulating tumor cells (CTCs) in paired samples was observed. Furthermore, a higher tumor signal was detected in cfDNA compared to CTCs, indicating greater sensitivity. Challenges for the application of liquid biopsy in MRD assessment were discussed, including the selection of appropriate markers and the sensitivity of certain markers. This study emphasizes the potential of liquid biopsy using cfDNA for MRD detection in AML patients and highlights the need for further research in this area.
Topics: Neoplasm, Residual; Humans; Leukemia, Myeloid, Acute; High-Throughput Nucleotide Sequencing; Neoplastic Cells, Circulating; Male; Female; Middle Aged; Liquid Biopsy; Adult; Biomarkers, Tumor; Aged; Prognosis; Cell-Free Nucleic Acids
PubMed: 38938565
DOI: 10.3389/fimmu.2024.1252258 -
Frontiers in Pediatrics 2024Type 1 Diabetes Mellitus (T1DM) is one of the most common endocrine disorders of childhood and adolescence, showing a rapidly increasing prevalence worldwide. A study...
BACKGROUND
Type 1 Diabetes Mellitus (T1DM) is one of the most common endocrine disorders of childhood and adolescence, showing a rapidly increasing prevalence worldwide. A study indicated that the composition of the oropharyngeal and gut microbiota changed in T1DM. However, no studies have yet associated the changes between the microbiomes of the oropharyngeal and intestinal sites, nor between the flora and clinical indicators. In this study, we examined the composition and characteristics of oropharyngeal and intestinal flora in patients with T1DM in compared to healthy children. We identified correlations between oropharyngeal and intestinal flora and evaluated their association with clinical laboratory tests in patients with T1DM.
METHODS
The oropharyngeal and fecal samples from 13 T1DM and 20 healthy children were analyzed by high-throughput sequencing of the V3-V4 region of 16S rRNA. The associations between microbes and microorganisms in oropharyngeal and fecal ecological niches, as well as the correlation between these and clinical indicators were further analyzed.
RESULTS
It was revealed that T1DM children had distinct microbiological characteristics, and the dominant oropharyngeal microbiota genus included Streptococcus, Prevotella, Leptotrichia, and Neisseria; that of intestinal microbiota included Blautia, Fusicatenibacter, Bacteroides, and Eubacterium_hallii_group. Furthermore, oropharyngeal Staphylococcus was significantly positively correlated with intestinal norank_f__Ruminococcaceae and Ruminococcus_torques_group in TIDM children. Moreover, in these children, differential genes in oropharyngeal and intestinal samples were enriched in metabolic pathways such as amino acid generation, fatty acid metabolism, and nucleotide sugar biosynthesis. Additionally, correlation analysis between the oropharyngeal/intestinal microbiome with laboratory tests showed significant correlations between several bacterial taxa in the oropharynx and intestines and glycated hemoglobin and C-peptide.
CONCLUSION
Unique microbial characteristics were found in the oropharynx and intestine in children with T1DM compared to healthy children. Positive correlations were found between changes in the relative abundance of oropharyngeal and gut microbiota in children with T1DM. Associations between the oropharyngeal/intestinal microbiota and laboratory investigations in children with T1DM suggest that the composition of the oropharyngeal and intestinal flora in children with T1DM may have some impact on glycemic control.
PubMed: 38938502
DOI: 10.3389/fped.2024.1382466 -
Plant Disease Jun 2024Blackleg and soft rot diseases represent a major threat to the health of potato () and other vegetable, ornamental and fruit crops worldwide; their main causal agents...
Blackleg and soft rot diseases represent a major threat to the health of potato () and other vegetable, ornamental and fruit crops worldwide; their main causal agents are species of and . In May 2022, 60% of potato plants (cv. Spunta) in a production field in Córdoba, Argentina (31°32'36''S 64°09'46''W) showed soft rot, blackleg and wilt. To isolate the causal agent, decayed plant tissues were disinfected in 2% NaClO, macerated in sterile water and streaked on crystal violet pectate (CVP) medium. Plates were incubated at 28°C for 48 h. Colonies that produced a pit on CVP medium were purified on nutrient agar. Two of the isolates, called 1Aia and 1B, were characterized by tests commonly employed for the identification of pectinolytic bacteria (Schaad et al. 2001). Both produced Gram-negative rods that were facultatively anaerobic, oxidase negative, nonfluorescent on King´s B, resistant to erythromycin and caused soft rot of potato slices. In addition, these isolates did not produce the blue pigment indigoidine and grew on nutrient glucose agar containing 5% NaCl. Phenotypic characteristics of the isolates 1Aia and 1B were compatible with spp. Genomic DNA was extracted using the commercially available Wizard® Genomic DNA Purification Kit (Promega) according to the manufacturer's instructions for the purification of DNA from Gram-negative bacteria. The isolates were positive in a PCR assay for (Duarte et al. 2004). The purified DNA of isolate 1Aia was used to construct a pooled Illumina library, which was sequenced at the Genomics Unit from the National Institute of Agricultural Technology (INTA, Argentina), by using high-throughput Illumina sequencing technology. Average nucleotide identity (ANI) calculation performed by FastANI v0.1.3 (Jain et al. 2018) showed 96.11% identity between the genome of the type strain LMG 21371 of (Acc. no. JQOE00000000) and our strain 1Aia (Acc. no. JAYGXQ000000000). For pathogenicity test, 3-weeks-old potato plants (cv. Spunta) planted in pots were infiltrated with 10 µl of a bacterial suspension (1x107 CFU/ml) 5 cm above the base of the stem using a sterile syringe. Negative controls were infiltrated with sterile water. Plants were kept under greenhouse conditions and regularly watered. The experiment was performed twice with six plants per treatment. Two days after inoculation, plants treated with strain 1Aia or 1B showed necrotic lesions on the stems and tubers soft rot symptoms while control plants remained asymptomatic. To fulfill Koch´s postulates, bacteria were re-isolated from symptomatic plants. Re-isolated bacteria, called 1Aia d and 1B d, were confirmed as according to biochemical and PCR results, as outlined above. Also, the % ANI value between isolates 1Aia and 1Aia d was 99.99% (Acc. no. JAYGXR000000000). To our knowledge, this is the first report of the occurrence of in Argentina. This pathogen has been observed causing blackleg and tuber soft rot on potato in Brazil (Duarte et al. 2004), Netherlands (Nunes Leite et al. 2014), Switzerland (de Werra et al. 2015), Russia (Voronina et al. 2019), Serbia (Loc et al. 2022) and USA (Zhang et al. 2023), among other countries worldwide. Due to the important economic and nutritional value of the crop, the distribution of needs to be investigated and monitored in order to develop effective control strategies.
PubMed: 38937930
DOI: 10.1094/PDIS-03-24-0558-PDN -
BMC Plant Biology Jun 2024With global warming, high temperature (HT) has become one of the most common abiotic stresses resulting in significant crop yield losses, especially for jujube (Ziziphus...
Integration analysis of miRNA-mRNA pairs between two contrasting genotypes reveals the molecular mechanism of jujube (Ziziphus jujuba Mill.) response to high-temperature stress.
With global warming, high temperature (HT) has become one of the most common abiotic stresses resulting in significant crop yield losses, especially for jujube (Ziziphus jujuba Mill.), an important temperate economic crop cultivated worldwide. This study aims to explore the coping mechanism of jujube to HT stress at the transcriptional and post-transcriptional levels, including identifying differentially expressed miRNAs and mRNAs as well as elucidating the critical pathways involved. High-throughput sequencing analyses of miRNA and mRNA were performed on jujube leaves, which were collected from "Fucumi" (heat-tolerant) and "Junzao" (heat-sensitive) cultivars subjected to HT stress (42 °C) for 0, 1, 3, 5, and 7 days, respectively. The results showed that 45 known miRNAs, 482 novel miRNAs, and 13,884 differentially expressed mRNAs (DEMs) were identified. Among them, integrated analysis of miRNA target genes prediction and mRNA-seq obtained 1306 differentially expressed miRNAs-mRNAs pairs, including 484, 769, and 865 DEMIs-DEMs pairs discovered in "Fucuimi", "Junzao" and two genotypes comparative groups, respectively. Furthermore, functional enrichment analysis of 1306 DEMs revealed that plant-pathogen interaction, starch and sucrose metabolism, spliceosome, and plant hormone signal transduction were crucial pathways in jujube leaves response to HT stress. The constructed miRNA-mRNA network, composed of 20 DEMIs and 33 DEMs, displayed significant differently expressions between these two genotypes. This study further proved the regulatory role of miRNAs in the response to HT stress in plants and will provide a theoretical foundation for the innovation and cultivation of heat-tolerant varieties.
Topics: Ziziphus; MicroRNAs; RNA, Messenger; Genotype; RNA, Plant; Gene Expression Regulation, Plant; Hot Temperature; Plant Leaves; Stress, Physiological; High-Throughput Nucleotide Sequencing; Heat-Shock Response
PubMed: 38937704
DOI: 10.1186/s12870-024-05304-0 -
Genetics, Selection, Evolution : GSE Jun 2024Genome sequence variants affecting complex traits (quantitative trait loci, QTL) are enriched in functional regions of the genome, such as those marked by certain...
BACKGROUND
Genome sequence variants affecting complex traits (quantitative trait loci, QTL) are enriched in functional regions of the genome, such as those marked by certain histone modifications. These variants are believed to influence gene expression. However, due to the linkage disequilibrium among nearby variants, pinpointing the precise location of QTL is challenging. We aimed to identify allele-specific binding (ASB) QTL (asbQTL) that cause variation in the level of histone modification, as measured by the height of peaks assayed by ChIP-seq (chromatin immunoprecipitation sequencing). We identified DNA sequences that predict the difference between alleles in ChIP-seq peak height in H3K4me3 and H3K27ac histone modifications in the mammary glands of cows.
RESULTS
We used a gapped k-mer support vector machine, a novel best linear unbiased prediction model, and a multiple linear regression model that combines the other two approaches to predict variant impacts on peak height. For each method, a subset of 1000 sites with the highest magnitude of predicted ASB was considered as candidate asbQTL. The accuracy of this prediction was measured by the proportion where the predicted direction matched the observed direction. Prediction accuracy ranged between 0.59 and 0.74, suggesting that these 1000 sites are enriched for asbQTL. Using independent data, we investigated functional enrichment in the candidate asbQTL set and three control groups, including non-causal ASB sites, non-ASB variants under a peak, and SNPs (single nucleotide polymorphisms) not under a peak. For H3K4me3, a higher proportion of the candidate asbQTL were confirmed as ASB when compared to the non-causal ASB sites (P < 0.01). However, these candidate asbQTL did not enrich for the other annotations, including expression QTL (eQTL), allele-specific expression QTL (aseQTL) and sites conserved across mammals (P > 0.05).
CONCLUSIONS
We identified putatively causal sites for asbQTL using the DNA sequence surrounding these sites. Our results suggest that many sites influencing histone modifications may not directly affect gene expression. However, it is important to acknowledge that distinguishing between putative causal ASB sites and other non-causal ASB sites in high linkage disequilibrium with the causal sites regarding their impact on gene expression may be challenging due to limitations in statistical power.
Topics: Quantitative Trait Loci; Animals; Cattle; Histones; Alleles; Chromatin Immunoprecipitation Sequencing; Polymorphism, Single Nucleotide; Histone Code; Linkage Disequilibrium; Molecular Sequence Annotation; Female
PubMed: 38937662
DOI: 10.1186/s12711-024-00916-4 -
Nature Communications Jun 2024Long-read RNA sequencing is essential to produce accurate and exhaustive annotation of eukaryotic genomes. Despite advancements in throughput and accuracy, achieving...
Long-read RNA sequencing is essential to produce accurate and exhaustive annotation of eukaryotic genomes. Despite advancements in throughput and accuracy, achieving reliable end-to-end identification of RNA transcripts remains a challenge for long-read sequencing methods. To address this limitation, we develop CapTrap-seq, a cDNA library preparation method, which combines the Cap-trapping strategy with oligo(dT) priming to detect 5' capped, full-length transcripts. In our study, we evaluate the performance of CapTrap-seq alongside other widely used RNA-seq library preparation protocols in human and mouse tissues, employing both ONT and PacBio sequencing technologies. To explore the quantitative capabilities of CapTrap-seq and its accuracy in reconstructing full-length RNA molecules, we implement a capping strategy for synthetic RNA spike-in sequences that mimics the natural 5'cap formation. Our benchmarks, incorporating the Long-read RNA-seq Genome Annotation Assessment Project (LRGASP) data, demonstrate that CapTrap-seq is a competitive, platform-agnostic RNA library preparation method for generating full-length transcript sequences.
Topics: Animals; Humans; Mice; Sequence Analysis, RNA; Gene Library; High-Throughput Nucleotide Sequencing; RNA; RNA Caps
PubMed: 38937428
DOI: 10.1038/s41467-024-49523-3