-
International Journal of Medical... Jan 2019Reproducibility of research studies is key to advancing biomedical science by building on sound results and reducing inconsistencies between published results and study...
OBJECTIVE
Reproducibility of research studies is key to advancing biomedical science by building on sound results and reducing inconsistencies between published results and study data. We propose that the available data from research studies combined with provenance metadata provide a framework for evaluating scientific reproducibility. We developed the ProvCaRe platform to model, extract, and query semantic provenance information from 435, 248 published articles.
METHODS
The ProvCaRe platform consists of: (1) the S3 model and a formal ontology; (2) a provenance-focused text processing workflow to generate provenance triples consisting of subject, predicate, and object using metadata extracted from articles; and (3) the ProvCaRe knowledge repository that supports "provenance-aware" hypothesis-driven search queries. A new provenance-based ranking algorithm is used to rank the articles in the search query results.
RESULTS
The ProvCaRe knowledge repository contains 48.9 million provenance triples. Seven research hypotheses were used as search queries for evaluation and the resulting provenance triples were analyzed using five categories of provenance terms. The highest number of terms (34%) described provenance related to population cohort followed by 29% of terms describing statistical data analysis methods, and only 5% of the terms described the measurement instruments used in a study. In addition, the analysis showed that some articles included a higher number of provenance terms across multiple provenance categories suggesting a higher potential for reproducibility of these research studies.
CONCLUSION
The ProvCaRe knowledge repository (https://provcare.
CASE
edu/) is one of the largest provenance resources for biomedical research studies that combines intuitive search functionality with a new provenance-based ranking feature to list articles related to a search query.
Topics: Algorithms; Biological Ontologies; Biomedical Research; Humans; Metadata; Reproducibility of Results; Semantics
PubMed: 30545485
DOI: 10.1016/j.ijmedinf.2018.10.009 -
Journal of the American Medical... Mar 2018To contribute a conceptual framework for evaluating data suitability to satisfy the research needs of observational studies.
OBJECTIVE
To contribute a conceptual framework for evaluating data suitability to satisfy the research needs of observational studies.
MATERIALS AND METHODS
Suitability considerations were derived from a systematic literature review on researchers' common data needs in observational studies and a scoping review on frequent clinical database design considerations, and were harmonized to construct a suitability conceptual framework using a bottom-up approach. The relationships among the suitability categories are explored from the perspective of 4 facets of data: intrinsic, contextual, representational, and accessible. A web-based national survey of domain experts was conducted to validate the framework.
RESULTS
Data suitability for observational studies hinges on the following key categories: Explicitness of Policy and Data Governance, Relevance, Availability of Descriptive Metadata and Provenance Documentation, Usability, and Quality. We describe 16 measures and 33 sub-measures. The survey uncovered the relevance of all categories, with a 5-point Likert importance score of 3.9 ± 1.0 for Explicitness of Policy and Data Governance, 4.1 ± 1.0 for Relevance, 3.9 ± 0.9 for Availability of Descriptive Metadata and Provenance Documentation, 4.2 ± 1.0 for Usability, and 4.0 ± 0.9 for Quality.
CONCLUSIONS
The suitability framework evaluates a clinical data source's fitness for research use. Its construction reflects both researchers' points of view and data custodians' design features. The feedback from domain experts rated Usability, Relevance, and Quality categories as the most important considerations.
PubMed: 29024976
DOI: 10.1093/jamia/ocx095 -
International Journal of Data Science... 2022Social media has been playing a vital importance in information sharing at massive scale due to its easy access, low cost, and faster dissemination of information. Its...
Social media has been playing a vital importance in information sharing at massive scale due to its easy access, low cost, and faster dissemination of information. Its competence to disseminate the information across a wide audience has raised a critical challenge to determine the social data provenance of digital content. describes the origin, derivation process, and transformations of social content throughout its lifecycle. In this paper, we present a Framework for key-value pair (KVP) database using the novel concept of . In our proposed framework, a huge volume of social data is first fetched from the social media (Twitter's Network) through live streaming and simultaneously modelled in a KVP database by using a query-driven approach. The proposed framework is capable in capturing, storing, and querying provenance information for different query sets including select, aggregate, standing/historical, and data update (i.e., insert, delete, update) queries on . We evaluate the performance of proposed framework in terms of provenance capturing overhead for different query sets including select, aggregate, and data update queries, and average execution time for various provenance queries.
PubMed: 34778513
DOI: 10.1007/s41060-021-00287-9 -
Frontiers in Plant Science 2022Vessels are responsible for an efficient and safe water transport in angiosperm xylem. Whereas large vessels efficiently conduct the bulk of water, small vessels might...
Vessels are responsible for an efficient and safe water transport in angiosperm xylem. Whereas large vessels efficiently conduct the bulk of water, small vessels might be important under drought stress or after winter when large vessels are embolized. Wood anatomy can adjust to the environment by plastic adaptation, but is also modified by genetic selection, which can be driven by climate or other factors. To distinguish between plastic and genetic components on wood anatomy, we used a trial where trees from ten Central European provenances were planted in three locations in Austria along a rainfall gradient. Because wood anatomy also adjusts to tree size and in ring-porous species, the vessel size depends on the amount of latewood and thereby ring width, we included tree size and ring width in the analysis. We found that the trees' provenance had a significant effect on average vessel area (VA), theoretical specific hydraulic conductivity (Ks), and the vessel fraction (VF), but correlations with annual rainfall of provenances were at best weak. The trial site had a strong effect on growth (ring width, RW), which increased from the driest to the wettest site and wood density (WD), which increased from wet to dry sites. Significant site x provenance interactions were seen only for WD. Surprisingly, the drier site had higher VA, higher VF, and higher Ks. This, however, is mainly a result of greater RW and thus a greater proportion of latewood in the wetter forest. The average size of vessels > 70 μm diameter increased with rainfall. We argue that Ks, which is measured per cross-sectional area, is not an ideal parameter to compare the capacity of ring-porous trees to supply leaves with water. Small vessels (<70 μm) on average contributed only 1.4% to Ks, and we found no evidence that their number or size was adaptive to aridity. RW and tree size had strong effect on all vessel parameters, likely the greater proportion of latewood in wide rings. This should be accounted for when searching for wood anatomical adaptations to the environment.
PubMed: 35574121
DOI: 10.3389/fpls.2022.795941 -
Biology Oct 2023Salinity is a pressing and widespread abiotic stress, adversely affecting agriculture productivity and plant growth worldwide. Seed germination is the most critical...
Salinity is a pressing and widespread abiotic stress, adversely affecting agriculture productivity and plant growth worldwide. Seed germination is the most critical stage to seedling growth and establishing plant species in harsh environments, including saline stress. However, seed germination characteristics and stress tolerance may vary among geographical locations, such as various provenances. (Linn.) Pall. () is a halophytic plant that exhibits high salt tolerance and is often considered a pioneer species for the restoration of grasslands. Understanding the germination characteristics and stress tolerance of the species could be helpful in the vegetation restoration of saline-alkali land. In this study, we collected seeds from seven different saline-alkali habitats (S1-S7) in the Songnen Plain region to assess the germination and seedling growth responses to NaCl, NaCO and NaHCO, and to observe the recovery of seed germination after relieving the salt stress. We observed significant differences in germination and seedling growth under three salt stresses and among seven provenances. Resistance to NaCO and NaHCO stress was considerably higher during seedling growth than seed germination, while the opposite responses were observed for NaCl resistance. Seeds from S1 and S7 showed the highest tolerance to all three salt stress treatments, while S6 exhibited the lowest tolerance. Seeds from S2 exhibited low germination under control conditions, while low NaCl concentration and pretreatment improved germination. Ungerminated seeds under high salt concentrations germinated after relieving the salt stress. Germination of ungerminated seeds after the abatement of salt stress is an important adaptation strategy for black seeds. While seeds from most provenances regerminated under NaCl, under NaCO and NaHCO, only seeds from S4 and S7 regerminated. These findings highlight the importance of soil salinity in the maternal environment for successful seed germination and seedling growth under various salinity-alkali stresses. Therefore, seed sources and provenance should be considered for vegetation restoration.
PubMed: 37887053
DOI: 10.3390/biology12101343 -
G3 (Bethesda, Md.) Aug 2022Genetic groups have been widely adopted in tree breeding to account for provenance effects within pedigree-derived relationship matrices. However, provenances or genetic...
Genetic groups have been widely adopted in tree breeding to account for provenance effects within pedigree-derived relationship matrices. However, provenances or genetic groups have not yet been incorporated into single-step genomic BLUP ("HBLUP") analyses of tree populations. To quantify the impact of accounting for population structure in Eucalyptus globulus, we used HBLUP to compare breeding value predictions from models excluding base population effects and models including either fixed genetic groups or the marker-derived proxies, also known as metafounders. Full-sib families from 2 separate breeding populations were evaluated across 13 sites in the "Green Triangle" region of Australia. Gamma matrices (Γ) describing similarities among metafounders reflected the geographic distribution of populations and the origins of 2 land races were identified. Diagonal elements of Γ provided population diversity or allelic covariation estimates between 0.24 and 0.56. Genetic group solutions were strongly correlated with metafounder solutions across models and metafounder effects influenced the genetic solutions of base population parents. The accuracy, stability, dispersion, and bias of model solutions were compared using the linear regression method. Addition of genomic information increased accuracy from 0.41 to 0.47 and stability from 0.68 to 0.71, while increasing bias slightly. Dispersion was within 0.10 of the ideal value (1.0) for all models. Although inclusion of metafounders did not strongly affect accuracy or stability and had mixed effects on bias, we nevertheless recommend the incorporation of metafounders in prediction models to represent the hierarchical genetic population structure of recently domesticated populations.
Topics: Eucalyptus; Genome; Genomics; Genotype; Humans; Models, Genetic; Phenotype; Plant Breeding
PubMed: 35920792
DOI: 10.1093/g3journal/jkac180 -
Applied Clinical Informatics May 2021Data readiness is a concept often used when referring to health information technology applications in the informatics disciplines, but it is not clearly defined in the...
BACKGROUND
Data readiness is a concept often used when referring to health information technology applications in the informatics disciplines, but it is not clearly defined in the literature. To avoid misinterpretations in research and implementation, a formal definition should be developed.
OBJECTIVES
The objective of this research is to provide a conceptual definition and framework for the term data readiness that can be used to guide research and development related to data-based applications in health care.
METHODS
PubMed, the National Institutes of Health RePORTER, Scopus, the Cochrane Library, and Duke University Library databases for business and information sciences were queried for formal mentions of the term "data readiness." Manuscripts found in the search were reviewed, and relevant information was extracted, evaluated, and assimilated into a framework for data readiness.
RESULTS
Of the 264 manuscripts found in the database searches, 20 were included in the final synthesis to define data readiness. In these 20 manuscripts, the term data readiness was revealed to encompass the constructs of data quality, data availability, interoperability, and data provenance.
DISCUSSION
Based upon our review of the literature, we define data readiness as the application-specific intersection of data quality, data availability, interoperability, and data provenance. While these concepts are not new, the combination of these factors in a novel data readiness model may help guide future informatics research and implementation science.
CONCLUSION
This analysis provides a definition to guide research and development related to data-based applications in health care. Future work should be done to validate this definition, and to apply the components of data readiness to real-world applications so that specific metrics may be developed and disseminated.
Topics: Databases, Factual; Delivery of Health Care; Humans; Medical Informatics
PubMed: 34289504
DOI: 10.1055/s-0041-1732423 -
Virology Journal May 2023Apple stem grooving virus (ASGV) has a wide host range, notably including apples, pears, prunes and citrus. It is found worldwide.
BACKGROUND
Apple stem grooving virus (ASGV) has a wide host range, notably including apples, pears, prunes and citrus. It is found worldwide.
METHOD
In this study, two near complete genomes, and seven coat protein (CP) sequences of Iranian isolates from apple were determined. Sequences added from GenBank provided alignments of 120 genomic sequences (54 of which were recombinant), and 276 coat protein genes (none of them recombinant).
RESULT
The non-recombinant genomes gave a well supported phylogeny with isolates from diverse hosts in China forming the base of the phylogeny, and a monophyletic clade of at least seven clusters of isolates from around the world with no host or provenace groupings among them, and all but one including isolates from China. The six regions of the ASGV genome (five in one frame, one - 2 overlapping) gave significantly correlated phylogenies, but individually had less statistical support. The largest cluster of isolates contained those from Iran and had isolates with worldwide provenances, and came from a wide range of mono- and dicotyledonous hosts. Population genetic comparisons of the six regions of the ASGV genome showed that four were under strong negative selection, but two of unknown function were under positive selection.
CONCLUSION
ASGV most likely originated and spread in East Asia in one or more of various plant species, but not in Eurasia; the ASGV population of China had the greatest overall nucleotide diversity and largest number of segregating sites.
Topics: Malus; Iran; Flexiviridae; Fruit; Phylogeny; Plant Diseases
PubMed: 37237285
DOI: 10.1186/s12985-023-02075-2 -
The Science of the Total Environment May 2022Strontium (Sr) isotope based provenance and mobility studies of ancient humans and animals necessitate representative isoscapes/baselines. However, regions/terranes that...
Strontium (Sr) isotope based provenance and mobility studies of ancient humans and animals necessitate representative isoscapes/baselines. However, regions/terranes that were shaped and affected by glaciers during the last Ice Ages and are covered by glaciogenic sediments present a challenge with regards to the choice of suitable surface proxy archives. Recent studies proposed that only Sr/Sr signatures from pristine areas are relevant for this purpose. To test this theory, 160 new Sr concentrations [Sr] and Sr/Sr signatures composed from ~960 subsamples of soil leachates and plants, complemented with 55 surface waters from agriculturally unaffected pristine forest sites from all over Denmark (island of Bornholm excluded) were analyzed. The results reveal that average Sr/Sr signatures of all three proxies (plants: 0.7115 ± 0.0025; 2σ, n = 162; soil leachates: 0.7118 ± 0.0037; 2σ; n = 161, surface waters: 0.7104 ± 0.0030; 2σ, n = 55) are elevated compared to larger water bodies (creeks, rivers, lakes). In mixing diagrams, the data converge in a shared high [Sr] low Sr/Sr endmember, which points to either remnant natural carbonates and/or organic components retaining carbonate Sr in the studied Podzols/Luvisols. The indications for more abundant carbonates in the past, compared to today's acid leached soils, implies that Sr/Sr values measured from pristine forest locations and heathlands do not adequately reflect the biosphere compositions that prevailed ~12,000-2000 thousand years ago. Consequently, pristine forests in Denmark seem to be unsuitable proxy archive environments for constructing Sr isotope baselines for determining the provenance and mobility of ancient humans and animals. Hence, Sr/Sr values measured in these pristine areas are non-representative and inadequate, and their use will lead to wrong interpretations. Finally, our study sheds light on the complexity of defining relevant and representative isoscapes/baselines in significantly changing environments and areas where the surface biosphere conditions do not necessary reflect the underlying geology.
Topics: Animals; Denmark; Forests; Geology; Soil; Strontium; Strontium Isotopes
PubMed: 35093367
DOI: 10.1016/j.scitotenv.2022.153394 -
Learning Health Systems Oct 2020Human phonemics responds to an urgent need in the medical research community; namely, reproducibility.
Human phonemics responds to an urgent need in the medical research community; namely, reproducibility.
PubMed: 33083545
DOI: 10.1002/lrh2.10249