data management - OpenMD.com Journal Search

A high-throughput big-data orchestration and processing system for the High Energy Photon Source.

Journal of Synchrotron Radiation Nov 2023

High-data-throughput and multimodal-acquisition experiments will prevail in next-generation synchrotron beamlines. Orchestrating dataflow pipelines connecting the data...

Summary PubMed Full Text PDF

Authors: Xiang Li, Yi Zhang, Yu Liu...

High-data-throughput and multimodal-acquisition experiments will prevail in next-generation synchrotron beamlines. Orchestrating dataflow pipelines connecting the data acquisition, processing, visualization and storage ends are becoming increasingly complex and essential for enhancing beamline performance. Mamba Data Worker (MDW) has been developed to address the data challenges for the forthcoming High Energy Photon Source (HEPS). It is an important component of the Mamba experimental control and data acquisition software ecosystem, which enables fast data acquisition and transmission, dynamic configuration of data processing pipelines, data multiplex in streaming, and customized data and metadata assembly. This paper presents the architecture and development plan of MDW, outlines the essential technologies involved, and illustrates its current application at the Beijing Synchrotron Radiation Facility (BSRF).

PubMed: 37729071
DOI: 10.1107/S1600577523006951

VoDEx: a Python library for time annotation and management of volumetric functional imaging data.

Bioinformatics (Oxford, England) Sep 2023

In functional imaging studies, accurately synchronizing the time course of experimental manipulations and stimulus presentations with resulting imaging data is crucial...

Summary PubMed Full Text PDF

Authors: Anna Nadtochiy, Peter Luu, Scott E Fraser...

SUMMARY

In functional imaging studies, accurately synchronizing the time course of experimental manipulations and stimulus presentations with resulting imaging data is crucial for analysis. Current software tools lack such functionality, requiring manual processing of the experimental and imaging data, which is error-prone and potentially non-reproducible. We present VoDEx, an open-source Python library that streamlines the data management and analysis of functional imaging data. VoDEx synchronizes the experimental timeline and events (e.g. presented stimuli, recorded behavior) with imaging data. VoDEx provides tools for logging and storing the timeline annotation, and enables retrieval of imaging data based on specific time-based and manipulation-based experimental conditions.

AVAILABILITY AND IMPLEMENTATION

VoDEx is an open-source Python library and can be installed via the "pip install" command. It is released under a BSD license, and its source code is publicly accessible on GitHub (https://github.com/LemonJust/vodex). A graphical interface is available as a napari-vodex plugin, which can be installed through the napari plugins menu or using "pip install." The source code for the napari plugin is available on GitHub (https://github.com/LemonJust/napari-vodex). The software version at the time of submission is archived at Zenodo (version v1.0.18, https://zenodo.org/record/8061531).

Topics: Software; Image Processing, Computer-Assisted; Animals; Programming Languages

PubMed: 37699009
DOI: 10.1093/bioinformatics/btad568

Cancer incidence in Khyber Pakhtunkhwa, Pakistan, 2020.

BMC Public Health Sep 2023

To present the population-based cancer statistics for Khyber Pakhtunkhwa (KP), Pakistan, an incidence study was conducted at the Shaukat Khanum Memorial Cancer Hospital...

Summary PubMed Full Text PDF

Authors: Farhana Badar, Muhammad Sohaib, Shahid Mahmood...

BACKGROUND

To present the population-based cancer statistics for Khyber Pakhtunkhwa (KP), Pakistan, an incidence study was conducted at the Shaukat Khanum Memorial Cancer Hospital and Research Centre (SKMCH&RC) in Lahore, Pakistan, in 2023.

METHODS

Records from various centres on new cancers diagnosed among residents of KP between January and December 2020 were gathered. Both active and passive methods of data collection were applied, and the information was saved in a central repository at SKMCH&RC. The incidence rates were computed by age group and sex and presented per 100,000 population.

RESULTS

Among children (0-14 years), the Age-Standardised Incidence Rate (ASIR) was 4.0 in girls and 6.1 in boys, and haematologic malignancies were more prevalent; in adolescents (15-19 years), the ASIR was 7.7 in females, 9.4 in males, and bone tumours, haematologic malignancies, and neurological cancers were prominent; in adult females (> / = 20 years), the ASIR was 84.9, and cancers of the breast, digestive system, and reproductive organs were predominant; and adult males, the ASIR was 73.0, and cancers of the gastrointestinal tract, lip/oral cavity/pharynx, prostate, and Non-Hodgkin Lymphoma (NHL) were common.

CONCLUSIONS

It is crucial to investigate the aetiology of these diseases at the community level because dietary elements, infectious diseases, and tobacco use all appear to be significant contributors. Prospective studies could play a key role in highlighting the factors linked to these diseases. Therefore, cancer registration must continue in conjunction with the exploration of risk factors.

Topics: Adolescent; Adult; Male; Child; Female; Humans; Incidence; Pakistan; Prospective Studies; Neoplasms; Hematologic Neoplasms

PubMed: 37710250
DOI: 10.1186/s12889-023-16686-5

Initiatives, Concepts, and Implementation Practices of the Findable, Accessible, Interoperable, and Reusable Data Principles in Health Data Stewardship: Scoping Review.

Journal of Medical Internet Research Aug 2023

Thorough data stewardship is a key enabler of comprehensive health research. Processes such as data collection, storage, access, sharing, and analytics require... (Review)

Summary PubMed Full Text PDF

Review

Authors: Esther Thea Inau, Jean Sack, Dagmar Waltemath...

BACKGROUND

Thorough data stewardship is a key enabler of comprehensive health research. Processes such as data collection, storage, access, sharing, and analytics require researchers to follow elaborate data management strategies properly and consistently. Studies have shown that findable, accessible, interoperable, and reusable (FAIR) data leads to improved data sharing in different scientific domains.

OBJECTIVE

This scoping review identifies and discusses concepts, approaches, implementation experiences, and lessons learned in FAIR initiatives in health research data.

METHODS

The Arksey and O'Malley stage-based methodological framework for scoping reviews was applied. PubMed, Web of Science, and Google Scholar were searched to access relevant publications. Articles written in English, published between 2014 and 2020, and addressing FAIR concepts or practices in the health domain were included. The 3 data sources were deduplicated using a reference management software. In total, 2 independent authors reviewed the eligibility of each article based on defined inclusion and exclusion criteria. A charting tool was used to extract information from the full-text papers. The results were reported using the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines.

RESULTS

A total of 2.18% (34/1561) of the screened articles were included in the final review. The authors reported FAIRification approaches, which include interpolation, inclusion of comprehensive data dictionaries, repository design, semantic interoperability, ontologies, data quality, linked data, and requirement gathering for FAIRification tools. Challenges and mitigation strategies associated with FAIRification, such as high setup costs, data politics, technical and administrative issues, privacy concerns, and difficulties encountered in sharing health data despite its sensitive nature were also reported. We found various workflows, tools, and infrastructures designed by different groups worldwide to facilitate the FAIRification of health research data. We also uncovered a wide range of problems and questions that researchers are trying to address by using the different workflows, tools, and infrastructures. Although the concept of FAIR data stewardship in the health research domain is relatively new, almost all continents have been reached by at least one network trying to achieve health data FAIRness. Documented outcomes of FAIRification efforts include peer-reviewed publications, improved data sharing, facilitated data reuse, return on investment, and new treatments. Successful FAIRification of data has informed the management and prognosis of various diseases such as cancer, cardiovascular diseases, and neurological diseases. Efforts to FAIRify data on a wider variety of diseases have been ongoing since the COVID-19 pandemic.

CONCLUSIONS

This work summarises projects, tools, and workflows for the FAIRification of health research data. The comprehensive review shows that implementing the FAIR concept in health data stewardship carries the promise of improved research data management and transparency in the era of big data and open research publishing.

INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID)

RR2-10.2196/22505.

Topics: Humans; COVID-19; Pandemics; Big Data; Cardiovascular Diseases; Data Accuracy

PubMed: 37639292
DOI: 10.2196/45013

Health Status Classification for Cows Using Machine Learning and Data Management on AWS Cloud.

Animals : An Open Access Journal From... Oct 2023

The health and welfare of livestock are significant for ensuring the sustainability and profitability of the agricultural industry. Addressing efficient ways to monitor...

Summary PubMed Full Text PDF

Authors: Kristina Dineva, Tatiana Atanasova

The health and welfare of livestock are significant for ensuring the sustainability and profitability of the agricultural industry. Addressing efficient ways to monitor and report the health status of individual cows is critical to prevent outbreaks and maintain herd productivity. The purpose of the study is to develop a machine learning (ML) model to classify the health status of milk cows into three categories. In this research, data are collected from existing non-invasive IoT devices and tools in a dairy farm, monitoring the micro- and macroenvironment of the cow in combination with particular information on age, days in milk, lactation, and more. A workflow of various data-processing methods is systematized and presented to create a complete, efficient, and reusable roadmap for data processing, modeling, and real-world integration. Following the proposed workflow, the data were treated, and five different ML algorithms were trained and tested to select the most descriptive one to monitor the health status of individual cows. The highest result for health status assessment is obtained by random forest classifier (RFC) with an accuracy of 0.959, recall of 0.954, and precision of 0.97. To increase the security, speed, and reliability of the work process, a cloud architecture of services is presented to integrate the trained model as an additional functionality in the Amazon Web Services (AWS) environment. The classification results of the ML model are visualized in a newly created interface in the client application.

PubMed: 37893978
DOI: 10.3390/ani13203254

Redefining governance: a critical analysis of sustainability transformation in e-governance.

Frontiers in Big Data 2024

With the rapid growth of information and communication technologies, governments worldwide are embracing digital transformation to enhance service delivery and... (Review)

Summary PubMed Full Text PDF

Review

Authors: Qaiser Abbas, Tahir Alyas, Turki Alghamdi...

With the rapid growth of information and communication technologies, governments worldwide are embracing digital transformation to enhance service delivery and governance practices. In the rapidly evolving landscape of information technology (IT), secure data management stands as a cornerstone for organizations aiming to safeguard sensitive information. Robust data modeling techniques are pivotal in structuring and organizing data, ensuring its integrity, and facilitating efficient retrieval and analysis. As the world increasingly emphasizes sustainability, integrating eco-friendly practices into data management processes becomes imperative. This study focuses on the specific context of Pakistan and investigates the potential of cloud computing in advancing e-governance capabilities. Cloud computing offers scalability, cost efficiency, and enhanced data security, making it an ideal technology for digital transformation. Through an extensive literature review, analysis of case studies, and interviews with stakeholders, this research explores the current state of e-governance in Pakistan, identifies the challenges faced, and proposes a framework for leveraging cloud computing to overcome these challenges. The findings reveal that cloud computing can significantly enhance the accessibility, scalability, and cost-effectiveness of e-governance services, thereby improving citizen engagement and satisfaction. This study provides valuable insights for policymakers, government agencies, and researchers interested in the digital transformation of e-governance in Pakistan and offers a roadmap for leveraging cloud computing technologies in similar contexts. The findings contribute to the growing body of knowledge on e-governance and cloud computing, supporting the advancement of digital governance practices globally. This research identifies monitoring parameters necessary to establish a sustainable e-governance system incorporating big data and cloud computing. The proposed framework, Monitoring and Assessment System using Cloud (MASC), is validated through secondary data analysis and successfully fulfills the research objectives. By leveraging big data and cloud computing, governments can revolutionize their digital governance practices, driving transformative changes and enhancing efficiency and effectiveness in public administration.

PubMed: 38638340
DOI: 10.3389/fdata.2024.1349116

Conjugated Polymer Process Ontology and Experimental Data Repository for Organic Field-Effect Transistors.

Chemistry of Materials : a Publication... Nov 2023

Polymer-based semiconductors and organic electronics encapsulate a significant research thrust for informatics-driven materials development. However, device measurements...

Summary PubMed Full Text PDF

Authors: Aaron L Liu, Myeongyeon Lee, Rahul Venkatesh...

Polymer-based semiconductors and organic electronics encapsulate a significant research thrust for informatics-driven materials development. However, device measurements are described by a complex array of design and parameter choices, many of which are sparsely reported. For example, the mobility of a polymer-based organic field-effect transistor (OFET) may vary by several orders of magnitude for a given polymer as a plethora of parameters related to solution processing, interface design/surface treatment, thin-film deposition, postprocessing, and measurement settings have a profound effect on the value of the final measurement. Incomplete contextual, experimental details hamper the availability of reusable data applicable for data-driven optimization, modeling (e.g., machine learning), and analysis of new organic devices. To curate organic device databases that contain reproducible and findable, accessible, interoperable, and reusable (FAIR) experimental data records, data ontologies that fully describe sample provenance and process history are required. However, standards for generating such process ontologies are not widely adopted for experimental materials domains. In this work, we design and implement an object-relational database for storing experimental records of OFETs. A data structure is generated by drawing on an international standard for batch process control (ISA-88) to facilitate the design. We then mobilize these representative data records, curated from the literature and laboratory experiments, to enable data-driven learning of process-structure-property relationships. The work presented herein opens the door for the broader adoption of data management practices and design standards for both the organic electronics and the wider materials community.

PubMed: 38027538
DOI: 10.1021/acs.chemmater.3c01842

Managing and monitoring a pandemic: showcasing a practical approach for the genomic surveillance of SARS-CoV-2.

Database : the Journal of Biological... Oct 2023

With the rapidly growing amount of biological data, powerful but also flexible data management and visualization systems are of increasingly crucial importance. The...

Summary PubMed Full Text PDF

Authors: Mateusz Jundzill, Riccardo Spott, Mara Lohde...

With the rapidly growing amount of biological data, powerful but also flexible data management and visualization systems are of increasingly crucial importance. The COVID-19 pandemic has more than highlighted this need and the challenges scientists are facing. Here, we provide an example and a step-by-step template for non-IT personnel to easily implement an intuitive, interactive data management solution to manage and visualize the high influx of biological samples and associated metadata in a laboratory setting. Our approach is illustrated with the genomic surveillance for SARS-CoV-2 in Germany, covering over 11 600 internal and 130 000 external samples from multiple datasets. We compare three data management options used in laboratories: (i) simple, yet error-prone and inefficient spreadsheets, (ii) complex and long-to-implement laboratory information management systems and (iii) high-performance database management systems. We highlight the advantages and pitfalls of each option and outline why a document-oriented NoSQL option via MongoDB Atlas can be a suitable solution for many labs. Our example can be treated as a template and easily adapted to allow scientists to focus on their core work and not on complex data administration.

Topics: Humans; SARS-CoV-2; COVID-19; Pandemics; Genomics; Database Management Systems

PubMed: 37847816
DOI: 10.1093/database/baad071

Inaccessibility and low maintenance of medical data archive in low-middle income countries: Mystery behind public health statistics and measures.

Journal of Infection and Public Health Oct 2023

Africa bears the largest burden of communicable and non-communicable diseases globally, yet it contributes only about 1 % of global research output, partly because of... (Review)

Summary PubMed Full Text

Review

Authors: Toufik Abdul-Rahman, Shankhaneel Ghosh, Lawal Lukman...

INTRODUCTION

Africa bears the largest burden of communicable and non-communicable diseases globally, yet it contributes only about 1 % of global research output, partly because of inaccessibility and low maintenance of medical data. Data is widely recognized as a crucial tool for improvement of population health. Despite the introduction of electronic health data systems in low-and middle-income countries (LMICs) to improve data quality, some LMICs still lack an efficient system to collect and archive data. This study aims to examine the underlying causes of data archive inaccessibility and poor maintenance in LMICS, and to highlight sustainable mitigation measures.

METHOD

Authors conducted a comprehensive search on PubMed, Google scholar, organization websites using the search string "data archive" or "medical data" or "public health statistics" AND "challenges" AND "maintenance" AND "Low Middle Income Countries" or "LMIC". to Identify relevant studies and reports to be included in our review. All articles related data archive in low and middle income countries were considered without restrictions due to scarcity of data.

RESULT

Medical data archives in LMICs face challenges impacting data quality. Insufficient training, organizational constraints, and limited infrastructure hinder archive maintenance. To improve, support for public datasets, digital literacy, and technology infrastructure is needed. Standardization, cloud solutions, and advanced technologies can enhance data management, while capacity building and training programs are crucial.

CONCLUSION

The creation and maintenance of data archives to facilitate the storage of retrospective datasets is critical to create reliable and consistent data to better equip the development of resilient health systems and surveillance of diseases in LMICs.

Topics: Humans; Developing Countries; Public Health; Retrospective Studies; Africa

PubMed: 37566992
DOI: 10.1016/j.jiph.2023.07.001

Essential requirements for the governance and management of data trusts, data repositories, and other data collaborations.

International Journal of Population... 2023

Around the world, many organisations are working on ways to increase the use, sharing, and reuse of person-level data for research, evaluation, planning, and innovation...

Summary PubMed Full Text PDF

Authors: P Alison Paprica, Monique Crichlow, Donna Curtis Maillet...

INTRODUCTION

Around the world, many organisations are working on ways to increase the use, sharing, and reuse of person-level data for research, evaluation, planning, and innovation while ensuring that data are secure and privacy is protected. As a contribution to broader efforts to improve data governance and management, in 2020 members of our team published 12 minimum specification essential requirements (min specs) to provide practical guidance for organisations establishing or operating data trusts and other forms of data infrastructure.

APPROACH AND AIMS

We convened an international team, consisting mostly of participants from Canada and the United States of America, to test and refine the original 12 min specs. Twenty-three (23) data-focused organisations and initiatives recorded the various ways they address the min specs. Sub-teams analysed the results, used the findings to make improvements to the min specs, and identified materials to support organisations/initiatives in addressing the min specs.

RESULTS

Analyses and discussion led to an updated set of 15 min specs covering five categories: one min spec for Legal, five for Governance, four for Management, two for Data Users, and three for Stakeholder & Public Engagement. Multiple changes were made to make the min specs language more technically complete and precise. The updated set of 15 min specs has been integrated into a Canadian national standard that, to our knowledge, is the first to include requirements for public engagement and Indigenous Data Sovereignty.

CONCLUSIONS

The testing and refinement of the min specs led to significant additions and improvements. The min specs helped the 23 organisations/initiatives involved in this project communicate and compare how they achieve responsible and trustworthy data governance and management. By extension, the min specs, and the Canadian national standard based on them, are likely to be useful for other data-focused organisations and initiatives.

Topics: Humans; United States; Canada; Privacy

PubMed: 38419825
DOI: 10.23889/ijpds.v8i4.2142