-
Journal of Chemical Information and... Jan 2022Projects in chemo- and bioinformatics often consist of scattered data in various types and are difficult to access in a meaningful way for efficient data analysis. Data...
Projects in chemo- and bioinformatics often consist of scattered data in various types and are difficult to access in a meaningful way for efficient data analysis. Data is usually too diverse to be even manipulated effectively. Sdfconf is data manipulation and analysis software to address this problem in a logical and robust manner. Other software commonly used for such tasks are either not designed with molecular and/or conformational data in mind or provide only a narrow set of tasks to be accomplished. Furthermore, many tools are only available within commercial software packages. Sdfconf is a flexible, robust, and free-of-charge tool for linking data from various sources for meaningful and efficient manipulation and analysis of molecule data sets. Sdfconf packages molecular structures and metadata into a complete ensemble, from which one can access both the whole data set and individual molecules and/or conformations. In this software note, we offer some practical examples of the utilization of sdfconf.
Topics: Computational Biology; Data Analysis; Data Management; Software
PubMed: 34932340
DOI: 10.1021/acs.jcim.1c01051 -
Journal of Integrative Bioinformatics Dec 2022Core facilities have to offer technologies that best serve the needs of their users and provide them a competitive advantage in research. They have to set up and...
Core facilities have to offer technologies that best serve the needs of their users and provide them a competitive advantage in research. They have to set up and maintain instruments in the range of ten to a hundred, which produce large amounts of data and serve thousands of active projects and customers. Particular emphasis has to be given to the reproducibility of the results. More and more, the entire process from building the research hypothesis, conducting the experiments, doing the measurements, through the data explorations and analysis is solely driven by very few experts in various scientific fields. Still, the ability to perform the entire data exploration in real-time on a personal computer is often hampered by the heterogeneity of software, the data structure formats of the output, and the enormous data sizes. These impact the design and architecture of the implemented software stack. At the Functional Genomics Center Zurich (FGCZ), a joint state-of-the-art research and training facility of ETH Zurich and the University of Zurich, we have developed the B-Fabric system, which has served for more than a decade, an entire life sciences community with fundamental data science support. In this paper, we sketch how such a system can be used to glue together data (including metadata), computing infrastructures (clusters and clouds), and visualization software to support instant data exploration and visual analysis. We illustrate our in-daily life implemented approach using visualization applications of mass spectrometry data.
Topics: Data Management; Reproducibility of Results; Software; Genomics
PubMed: 36073980
DOI: 10.1515/jib-2022-0031 -
Molecular & Cellular Proteomics : MCP 2021Today it is the norm that all relevant proteomics data that support the conclusions in scientific publications are made available in public proteomics data repositories....
Today it is the norm that all relevant proteomics data that support the conclusions in scientific publications are made available in public proteomics data repositories. However, given the increase in the number of clinical proteomics studies, an important emerging topic is the management and dissemination of clinical, and thus potentially sensitive, human proteomics data. Both in the United States and in the European Union, there are legal frameworks protecting the privacy of individuals. Implementing privacy standards for publicly released research data in genomics and transcriptomics has led to processes to control who may access the data, so-called "controlled access" data. In parallel with the technological developments in the field, it is clear that the privacy risks of sharing proteomics data need to be properly assessed and managed. In our view, the proteomics community must be proactive in addressing these issues. Yet a careful balance must be kept. On the one hand, neglecting to address the potential of identifiability in human proteomics data could lead to reputational damage of the field, while on the other hand, erecting barriers to open access to clinical proteomics data will inevitably reduce reuse of proteomics data and could substantially delay critical discoveries in biomedical research. In order to balance these apparently conflicting requirements for data privacy and efficient use and reuse of research efforts through the sharing of clinical proteomics data, development efforts will be needed at different levels including bioinformatics infrastructure, policymaking, and mechanisms of oversight.
Topics: Confidentiality; Data Management; Humans; Information Dissemination; Proteomics
PubMed: 33711481
DOI: 10.1016/j.mcpro.2021.100071 -
Journal of Assisted Reproduction and... Jul 2021
Topics: Artificial Intelligence; Data Management; Fertilization in Vitro; Humans; Reproductive Medicine
PubMed: 33715133
DOI: 10.1007/s10815-021-02122-3 -
PloS One 2022Just like the scientific data they generate, simulation workflows for research should be findable, accessible, interoperable, and reusable (FAIR). However, while...
Just like the scientific data they generate, simulation workflows for research should be findable, accessible, interoperable, and reusable (FAIR). However, while significant progress has been made towards FAIR data, the majority of science and engineering workflows used in research remain poorly documented and often unavailable, involving ad hoc scripts and manual steps, hindering reproducibility and stifling progress. We introduce Sim2Ls (pronounced simtools) and the Sim2L Python library that allow developers to create and share end-to-end computational workflows with well-defined and verified inputs and outputs. The Sim2L library makes Sim2Ls, their requirements, and their services discoverable, verifies inputs and outputs, and automatically stores results in a globally-accessible simulation cache and results database. This simulation ecosystem is available in nanoHUB, an open platform that also provides publication services for Sim2Ls, a computational environment for developers and users, and the hardware to execute runs and store results at no cost. We exemplify the use of Sim2Ls using two applications and discuss best practices towards FAIR simulation workflows and associated data.
Topics: Computer Simulation; Data Management; Ecosystem; Reproducibility of Results; Software; Workflow
PubMed: 35271613
DOI: 10.1371/journal.pone.0264492 -
BMJ Open Aug 2022This article aims to measure the willingness of the Swiss public to participate in personalised health research, and their preferences regarding data management and...
OBJECTIVES
This article aims to measure the willingness of the Swiss public to participate in personalised health research, and their preferences regarding data management and governance.
SETTING
Results are presented from a nationwide survey of members of the Swiss public.
PARTICIPANTS
15 106 randomly selected Swiss residents received the survey in September 2019. The response rate was 34.1% (n=5156). Respondent age ranged from 18 to 79 years, with fairly uniform spread across sex and age categories between 25 and 64 years.
PRIMARY AND SECONDARY OUTCOME MEASURES
Willingness to participate in personalised health research and opinions regarding data management and governance.
RESULTS
Most respondents preferred to be contacted and reconsented for each new project using their data (39%, 95% CI: 37.4% to 40.7%), or stated that their preference depends on the project type (29.4%, 95% CI: 27.9% to 31%). Additionally, a majority (52%, 95% CI: 50.3% to 53.8%) preferred their data or samples be stored anonymously or in coded form (43.4%, 95% CI: 41.7% to 45.1%). Of those who preferred that their data be anonymised, most also indicated a wish to be recontacted for each new project (36.8%, 95% CI: 34.5% to 39.2%); however, these preferences are in conflict. Most respondents desired to personally own their data. Finally, most Swiss respondents trust their doctors, along with researchers at universities, to protect their data.
CONCLUSION
Insight into public preference can enable Swiss biobanks and research institutions to create management and governance strategies that match the expectations and preferences of potential participants. Models allowing participants to choose how to interact with the process, while more complex, may increase individual willingness to provide data to biobanks.
Topics: Adolescent; Adult; Aged; Biological Specimen Banks; Data Management; Humans; Middle Aged; Surveys and Questionnaires; Switzerland; Trust; Young Adult
PubMed: 36028266
DOI: 10.1136/bmjopen-2022-060844 -
GigaScience Dec 2022The importance of effective research data management (RDM) strategies to support the generation of Findable, Accessible, Interoperable, and Reusable (FAIR) neuroscience...
The importance of effective research data management (RDM) strategies to support the generation of Findable, Accessible, Interoperable, and Reusable (FAIR) neuroscience data grows with each advance in data acquisition techniques and research methods. To maximize the impact of diverse research strategies, multidisciplinary, large-scale neuroscience research consortia face a number of unsolved challenges in RDM. While open science principles are largely accepted, it is practically difficult for researchers to prioritize RDM over other pressing demands. The implementation of a coherent, executable RDM plan for consortia spanning animal, human, and clinical studies is becoming increasingly challenging. Here, we present an RDM strategy implemented for the Heidelberg Collaborative Research Consortium. Our consortium combines basic and clinical research in diverse populations (animals and humans) and produces highly heterogeneous and multimodal research data (e.g., neurophysiology, neuroimaging, genetics, behavior). We present a concrete strategy for initiating early-stage RDM and FAIR data generation for large-scale collaborative research consortia, with a focus on sustainable solutions that incentivize incremental RDM while respecting research-specific requirements.
Topics: Animals; Humans; Data Management; Neuroimaging; Research Personnel
PubMed: 37401720
DOI: 10.1093/gigascience/giad049 -
Seminars in Oncology Nursing Apr 2023To provide an overview of three consecutive stages involved in the processing of quantitative research data (ie, data management, analysis, and interpretation) with the... (Review)
Review
OBJECTIVES
To provide an overview of three consecutive stages involved in the processing of quantitative research data (ie, data management, analysis, and interpretation) with the aid of practical examples to foster enhanced understanding.
DATA SOURCES
Published scientific articles, research textbooks, and expert advice were used.
CONCLUSION
Typically, a considerable amount of numerical research data is collected that require analysis. On entry into a data set, data must be carefully checked for errors and missing values, and then variables must be defined and coded as part of data management. Quantitative data analysis involves the use of statistics. Descriptive statistics help summarize the variables in a data set to show what is typical for a sample. Measures of central tendency (ie, mean, median, mode), measures of spread (standard deviation), and parameter estimation measures (confidence intervals) may be calculated. Inferential statistics aid in testing hypotheses about whether or not a hypothesized effect, relationship, or difference is likely true. Inferential statistical tests produce a value for probability, the P value. The P value informs about whether an effect, relationship, or difference might exist in reality. Crucially, it must be accompanied by a measure of magnitude (effect size) to help interpret how small or large this effect, relationship, or difference is. Effect sizes provide key information for clinical decision-making in health care.
IMPLICATIONS FOR NURSING PRACTICE
Developing capacity in the management, analysis, and interpretation of quantitative research data can have a multifaceted impact in enhancing nurses' confidence in understanding, evaluating, and applying quantitative evidence in cancer nursing practice.
Topics: Humans; Data Management; Research Design; Data Collection
PubMed: 36868925
DOI: 10.1016/j.soncn.2023.151398 -
International Journal of Population... 2021Data pooling from pre-existing datasets can be useful to increase study sample size and statistical power in order to answer a research question. However, individual...
Data pooling from pre-existing datasets can be useful to increase study sample size and statistical power in order to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization, an approach that can generate comparable datasets from heterogeneous sources, can address this issue in some circumstances. As an illustrative example, this paper describes the data harmonization strategies that helped generate comparable datasets across two Canadian pregnancy cohort studies: All Our Families; and the Alberta Pregnancy Outcomes and Nutrition. Variables were harmonized considering multiple features across the datasets: the construct measured; question asked/response options; the measurement scale used; the frequency of measurement; timing of measurement, and the data structure. Completely matching, partially matching, and completely un-matching variables across the datasets were determined based on these features. Variables that were an exact match were pooled as is. Partially matching variables were harmonized or processed under a common format across the datasets considering the frequency of measurement, the timing of measurement, the measurement scale used, and response options. Variables that were completely unmatching could not be harmonized into a single variable. The variable harmonization strategies that were used to generate comparable cohort datasets for data pooling are applicable to other data sources. Future studies may employ or evaluate these strategies, which permit researchers to answer novel research questions in a statistically efficient, timely, and cost-efficient manner that could not be achieved using a single data source.
Topics: Alberta; Cohort Studies; Data Collection; Data Management; Female; Humans; Pregnancy; Sample Size
PubMed: 34888420
DOI: 10.23889/ijpds.v6i1.1680 -
F1000Research 2022: Knowing the needs of the bioimaging community with respect to research data management (RDM) is essential for identifying measures that enable adoption of the FAIR...
: Knowing the needs of the bioimaging community with respect to research data management (RDM) is essential for identifying measures that enable adoption of the FAIR (findable, accessible, interoperable, reusable) principles for microscopy and bioimage analysis data across disciplines. As an initiative within Germany's National Research Data Infrastructure, we conducted this community survey in summer 2021 to assess the state of the art of bioimaging RDM and the community needs. : An online survey was conducted with a mixed question-type design. We created a questionnaire tailored to relevant topics of the bioimaging community, including specific questions on bioimaging methods and bioimage analysis, as well as more general questions on RDM principles and tools. 203 survey entries were included in the analysis covering the perspectives from various life and biomedical science disciplines and from participants at different career levels. : The results highlight the importance and value of bioimaging RDM and data sharing. However, the practical implementation of FAIR practices is impeded by technical hurdles, lack of knowledge, and insecurity about the legal aspects of data sharing. The survey participants request metadata guidelines and annotation tools and endorse the usage of image data management platforms. At present, OMERO (Open Microscopy Environment Remote Objects) is the best known and most widely used platform. Most respondents rely on image processing and analysis, which they regard as the most time-consuming step of the bioimage data workflow. While knowledge about and implementation of electronic lab notebooks and data management plans is limited, respondents acknowledge their potential value for data handling and publication. : The bioimaging community acknowledges and endorses the value of RDM and data sharing. Still, there is a need for information, guidance, and standardization to foster the adoption of FAIR data handling. This survey may help inspiring targeted measures to close this gap.
Topics: Humans; Data Management; Metadata; Information Dissemination; Surveys and Questionnaires; Workflow
PubMed: 36405555
DOI: 10.12688/f1000research.121714.2