-
The VLDB Journal : Very Large Data... Oct 2018Debugging data processing logic in data-intensive scalable computing (DISC) systems is a difficult and time-consuming effort. Today's DISC systems offer very little...
Debugging data processing logic in data-intensive scalable computing (DISC) systems is a difficult and time-consuming effort. Today's DISC systems offer very little tooling for debugging programs, and as a result, programmers spend countless hours collecting evidence (e.g., from log files) and performing trial-and-error debugging. To aid this effort, we built , a library that enables -tracking data through transformations-in Apache Spark. Data scientists using the Titian Spark extension will be able to quickly identify the input data at the root cause of a potential bug or outlier result. Titian is built directly into the Spark platform and offers data provenance support at interactive speeds-orders of magnitude faster than alternative solutions-while minimally impacting Spark job performance; observed overheads for capturing data lineage rarely exceed 30% above the baseline job execution time.
PubMed: 31007500
DOI: 10.1007/s00778-017-0474-5 -
PLoS Computational Biology Aug 2021For many biological systems, a variety of simulation models exist. A new simulation model is rarely developed from scratch, but rather revises and extends an existing...
For many biological systems, a variety of simulation models exist. A new simulation model is rarely developed from scratch, but rather revises and extends an existing one. A key challenge, however, is to decide which model might be an appropriate starting point for a particular problem and why. To answer this question, we need to identify entities and activities that contributed to the development of a simulation model. Therefore, we exploit the provenance data model, PROV-DM, of the World Wide Web Consortium and, building on previous work, continue developing a PROV ontology for simulation studies. Based on a case study of 19 Wnt/β-catenin signaling models, we identify crucial entities and activities as well as useful metadata to both capture the provenance information from individual simulation studies and relate these forming a family of models. The approach is implemented in WebProv, a web application for inserting and querying provenance information. Our specialization of PROV-DM contains the entities Research Question, Assumption, Requirement, Qualitative Model, Simulation Model, Simulation Experiment, Simulation Data, and Wet-lab Data as well as activities referring to building, calibrating, validating, and analyzing a simulation model. We show that most Wnt simulation models are connected to other Wnt models by using (parts of) these models. However, the overlap, especially regarding the Wet-lab Data used for calibration or validation of the models is small. Making these aspects of developing a model explicit and queryable is an important step for assessing and reusing simulation models more effectively. Exposing this information helps to integrate a new simulation model within a family of existing ones and may lead to the development of more robust and valid simulation models. We hope that our approach becomes part of a standardization effort and that modelers adopt the benefits of provenance when considering or creating simulation models.
Topics: Animals; Biochemical Phenomena; Computational Biology; Computer Graphics; Computer Simulation; Humans; Models, Biological; Software; Systems Biology; Wnt Signaling Pathway
PubMed: 34351901
DOI: 10.1371/journal.pcbi.1009227 -
International Journal of Cardiology Jun 2016Heparin, the widely used anticoagulant drug, is unusual among major pharmaceutical agents being neither single chemical entity nor a defined mixture of compounds. Its... (Review)
Review
Heparin, the widely used anticoagulant drug, is unusual among major pharmaceutical agents being neither single chemical entity nor a defined mixture of compounds. Its composition, while conforming to approximate average disaccharide composition or sulfation levels, exhibits heterogeneity and variability depending on the source, as well as its geographical origin. Furthermore, individual polysaccharide chains, whose physico-chemical properties are extremely similar, cannot be separated with current state-of-the-art techniques, presenting a challenge to those interested in the quality control of heparin, in ensuring its provenance and safety, and those with an interest in investigating the relationships between its structure and biological activity. The review consists of two main sections: The first is the Introduction, comprising (i) The History, Occurrence and Use of Heparin and (ii) Approaches to Structure-Activity Relationships. The second section is Improved Techniques for Structural Analysis, comprising; (i) Separation and Identification, (ii) Spectroscopic Methods, (iii) Enzymatic Approaches and (iv) Other Physico-Chemical Approaches. The ~60 references cover recent technological advances in the study of heparin structural analysis, largely since 2010.
Topics: Anticoagulants; Drug Design; Heparin; Humans; Molecular Weight; Structure-Activity Relationship
PubMed: 27264867
DOI: 10.1016/S0167-5273(16)12002-9 -
PloS One 2023The southern third of Africa is unusually rich in copper ore deposits. These were exploited by precolonial populations to manufacture wound-wire bangles, other forms of...
The southern third of Africa is unusually rich in copper ore deposits. These were exploited by precolonial populations to manufacture wound-wire bangles, other forms of jewelry, and large copper ingots that were used as stores of copper or as forms of prestige. Rectangular, fishtail, and croisette ingots dating between the 5th and 20th centuries CE have been found in many locations in the Democratic Republic of the Congo (DRC), Zambia, and Zimbabwe, with isolated finds in Malawi and Mozambique. Molds for casting these ingots have been found mostly in the Central African Copperbelt, but also around the Magondi Belt copper deposits in northern Zimbabwe. For years, scholars have debated whether these ingots were exclusively made in the Copperbelt or if the molds found in Zimbabwe indicate that local copies were produced from Magondi Belt copper ore (Garlake 1970; Bisson 1976). Before the recent application of lead isotopic and chemical methods to provenance copper in central and southern Africa, there was no way to discern between these hypotheses. Rademakers et al. (2019) and Stephens et al. (2020) showed that copper artifacts from southern DRC (mostly from Upemba) and from northwestern Botswana (Tsodilo Hills) match the lead isotope ratios of ores from the Copperbelt. Building upon these previous studies, we present here the first results from a copper provenance project across the southern third of Africa, from the Copperbelt to northern South Africa. We apply lead isotopic analysis (LIA) and chemical analyses to establish the provenance of 29 croisette ingots recovered in Zimbabwe, 2 fishtail and 1 rectangular ingot recovered from sites in Zambia, and an "X" shaped ingot smelted in an experiment in Zambia in the 1970's. Our chemistry and lead isotopic results indicate that 16 of these objects were smelted with copper from the Copperbelt, 16 objects source more specifically to the Kipushi deposit within this geological district, and only one HXR ingot sources to the Magondi Belt in Zimbabwe. Taken together, we clearly illustrate that croisette ingots were traveling significant distances to reach their eventual sites of deposition, and that there was also local production of these objects in Zimbabwe.
Topics: Zambia; Zimbabwe; Copper; Africa, Southern; Botswana
PubMed: 36947492
DOI: 10.1371/journal.pone.0282660 -
Sensors (Basel, Switzerland) Jul 2022Issues related to food authenticity, traceability, and fraud have increased in recent decades as a consequence of the deliberate and intentional substitution, addition,...
Issues related to food authenticity, traceability, and fraud have increased in recent decades as a consequence of the deliberate and intentional substitution, addition, tampering, or misrepresentation of food ingredients, where false or misleading statements are made about a product for economic gains. This study aimed to evaluate the ability of a portable NIR instrument to classify egg samples sourced from different provenances or production systems (e.g., cage and free-range) in Australia. Whole egg samples (n: 100) were purchased from local supermarkets where the label in each of the packages was used as identification of the layers' feeding system as per the Australian legislation and standards. The spectra of the albumin and yolk were collected using a portable NIR spectrophotometer (950-1600 nm). Principal component analysis (PCA) and linear discriminant analysis (LDA) were used to analyze the NIR data. The results obtained in this study showed how the combination of chemometrics and NIR spectroscopy allowed for the classification of egg albumin and yolk samples according to the system of production (cage and free range). The proposed method is simple, fast, environmentally friendly and avoids laborious sample pre-treatment, and is expected to become an alternative to commonly used techniques for egg quality assessment.
Topics: Albumins; Australia; Chemometrics; Discriminant Analysis; Eggs; Principal Component Analysis; Spectroscopy, Near-Infrared
PubMed: 35808484
DOI: 10.3390/s22134988 -
American Journal of Botany Feb 2023Riparian plants can exhibit intraspecific phenotypic variability across the landscape related to temperature and flooding gradients. Phenotypes that vary across a...
PREMISE
Riparian plants can exhibit intraspecific phenotypic variability across the landscape related to temperature and flooding gradients. Phenotypes that vary across a climate gradient are often partly genetically determined and may differ in their response to inundation. Changes to inundation patterns across a climate gradient could thus result in site-specific inundation responses. Phenotypic variability is more often studied in riparian trees, yet riparian shrubs are key elements of riparian systems and may differ from trees in phenotypic variability and environmental responses.
METHODS
We tested whether individuals of a clonal, riparian shrub, Pluchea sericea, collected from provenances spanning a temperature gradient differed in their phenotypes and responses to inundation and to what degree any differences were related to genotype. Plants were subjected to different inundation depths and a subset genotyped. Variables related to growth and resource acquisition were measured and analyzed using hierarchical, multivariate Bayesian linear regressions.
RESULTS
Individuals from different provenances differed in their phenotypes, but not in their response to inundation. Phenotypes were not related to provenance temperature but were partially governed by genotype. Growth was more strongly influenced by inundation, while resource acquisition was more strongly controlled by genotype.
CONCLUSIONS
Growth and resource acquisition responses in a clonal, riparian shrub are affected by changes to inundation and plant demographics in unique ways. Shrubs appear to differ from trees in their responses to environmental change. Understanding environmental effects on shrubs separately from those of trees will be a key part of evaluating impacts of environmental change on riparian ecosystems.
Topics: Ecosystem; Bayes Theorem; Floods; Climate; Genotype; Rivers
PubMed: 36462152
DOI: 10.1002/ajb2.16115 -
National Science Review Feb 2023This paper reviews published and presents new data on U-Pb detrital zircon ages, and petrographic, geochemical and isotope (Sm-Nd, Lu-Hf) compositions obtained from... (Review)
Review
This paper reviews published and presents new data on U-Pb detrital zircon ages, and petrographic, geochemical and isotope (Sm-Nd, Lu-Hf) compositions obtained from greywacke sandstones of Kazakhstan in order to reconstruct fossil intra-oceanic arcs that once existed at Pacific-type convergent margins of the Paleo-Asian Ocean (PAO) in Paleozoic time. We focus on orogenic belts of central Kazakhstan (Itmurundy and Tekturmas) and eastern Kazakhstan (Zharma and Char) in the western Central Asian Orogenic belt. These orogenic belts host accretionary complexes with greywacke sandstones of early Paleozoic (central Kazakhstan) and middle-late Paleozoic (eastern Kazakhstan) ages. First, we evaluate general perspectives for studying sandstones to reconstruct survived and disappeared magmatic arcs, taking into account episodes of subduction erosion. Then we discuss the analytical data from sandstones to make conclusions about the ages and formation settings of their igneous protoliths and define maximum deposition ages. Finally, we discuss the role of serpentinite mélanges in tectonic reconstructions. We argue that sandstones hosted by accretionary complexes are typically greywackes deposited close to their igneous sources and buried rapidly. The provenances of the studied greywacke sandstones of central and eastern Kazakhstan were dominated by mafic to andesitic igneous protoliths derived from juvenile mantle sources. The igneous rocks in the provenances were emplaced in an intra-oceanic arc setting. The sandstones were deposited in fore-arc/trench basins or, to a lesser degree, in back-arc basins. The data from both sandstones and serpentinite mélanges reconstruct middle-late-Cambrian, Ordovician, late-Devonian and Carboniferous arcs of the western PAO. The middle-late Cambrian arcs were fully destroyed by subduction erosion, whereas the Ordovician and Carboniferous arcs survived. The late-Devonian arcs were also eroded, but partly. Both the early and late Paleozoic active margins of the PAO were characterized by alternating periods of accretionary growth and subduction erosion.
PubMed: 36817843
DOI: 10.1093/nsr/nwac215 -
Biopreservation and Biobanking Apr 2018The known challenge of underutilization of data and biological material from biorepositories as potential resources for medical research has been the focus of discussion... (Review)
Review
The known challenge of underutilization of data and biological material from biorepositories as potential resources for medical research has been the focus of discussion for over a decade. Recently developed guidelines for improved data availability and reusability-entitled FAIR Principles (Findability, Accessibility, Interoperability, and Reusability)-are likely to address only parts of the problem. In this article, we argue that biological material and data should be viewed as a unified resource. This approach would facilitate access to complete provenance information, which is a prerequisite for reproducibility and meaningful integration of the data. A unified view also allows for optimization of long-term storage strategies, as demonstrated in the case of biobanks. We propose an extension of the FAIR Principles to include the following additional components: (1) quality aspects related to research reproducibility and meaningful reuse of the data, (2) incentives to stimulate effective enrichment of data sets and biological material collections and its reuse on all levels, and (3) privacy-respecting approaches for working with the human material and data. These FAIR-Health principles should then be applied to both the biological material and data. We also propose the development of common guidelines for cloud architectures, due to the unprecedented growth of volume and breadth of medical data generation, as well as the associated need to process the data efficiently.
Topics: Biological Specimen Banks; Confidentiality; Databases, Factual; Guidelines as Topic; Humans; Information Dissemination
PubMed: 29359962
DOI: 10.1089/bio.2017.0110 -
PLoS Computational Biology Sep 2015The reproducibility of experiments is key to the scientific process, and particularly necessary for accurate reporting of analyses in data-rich fields such as...
The reproducibility of experiments is key to the scientific process, and particularly necessary for accurate reporting of analyses in data-rich fields such as phylogenomics. We present ReproPhylo, a phylogenomic analysis environment developed to ensure experimental reproducibility, to facilitate the handling of large-scale data, and to assist methodological experimentation. Reproducibility, and instantaneous repeatability, is built in to the ReproPhylo system and does not require user intervention or configuration because it stores the experimental workflow as a single, serialized Python object containing explicit provenance and environment information. This 'single file' approach ensures the persistence of provenance across iterations of the analysis, with changes automatically managed by the version control program Git. This file, along with a Git repository, are the primary reproducibility outputs of the program. In addition, ReproPhylo produces an extensive human-readable report and generates a comprehensive experimental archive file, both of which are suitable for submission with publications. The system facilitates thorough experimental exploration of both parameters and data. ReproPhylo is a platform independent CC0 Python module and is easily installed as a Docker image or a WinPython self-sufficient package, with a Jupyter Notebook GUI, or as a slimmer version in a Galaxy distribution.
Topics: Genomics; Models, Genetic; Phylogeny; Reproducibility of Results; Sequence Alignment; Software
PubMed: 26335558
DOI: 10.1371/journal.pcbi.1004447 -
Scientific Reports Jun 2021Chronic wasting disease (CWD) is a fatal, contagious, neurodegenerative prion disease affecting both free-ranging and captive cervid species. CWD is spread via direct or...
Chronic wasting disease (CWD) is a fatal, contagious, neurodegenerative prion disease affecting both free-ranging and captive cervid species. CWD is spread via direct or indirect contact or oral ingestion of prions. In the gastrointestinal tract, prions enter the body through microfold cells (M-cells), and the abundance of these cells can be influenced by the gut microbiota. To explore potential links between the gut microbiota and CWD, we collected fecal samples from farmed and free-ranging white-tailed deer (Odocoileus virginianus) around the Midwest, USA. Farmed deer originated from farms that were depopulated due to CWD. Free-ranging deer were sampled during annual deer harvests. All farmed deer were tested for CWD via ELISA and IHC, and we used 16S rRNA gene sequencing to characterize the gut microbiota. We report significant differences in gut microbiota by provenance (Farm 1, Farm 2, Free-ranging), sex, and CWD status. CWD-positive deer from Farm 1 and 2 had increased abundances of Akkermansia, Lachnospireacea UCG-010, and RF39 taxa. Overall, differences by provenance and sex appear to be driven by diet, while differences by CWD status may be linked to CWD pathogenesis.
Topics: Animals; Deer; Enzyme-Linked Immunosorbent Assay; Female; Gastrointestinal Microbiome; Male; Prions; RNA, Ribosomal, 16S; Wasting Disease, Chronic
PubMed: 34168170
DOI: 10.1038/s41598-021-89896-9