-
Scientific Reports Nov 2023Adverse pregnancy outcomes, such as low birth weight (LBW) and preterm birth (PTB), can have serious consequences for both the mother and infant. Early prediction of...
Adverse pregnancy outcomes, such as low birth weight (LBW) and preterm birth (PTB), can have serious consequences for both the mother and infant. Early prediction of such outcomes is important for their prevention. Previous studies using traditional machine learning (ML) models for predicting PTB and LBW have encountered two important limitations: extreme class imbalance in medical datasets and the inability to account for complex relational structures between entities. To address these limitations, we propose a node embedding-based graph outlier detection algorithm to predict adverse pregnancy outcomes. We developed a knowledge graph using a well-curated representative dataset of the Emirati population and two node embedding algorithms. The graph autoencoder (GAE) was trained by applying a combination of original risk factors and node embedding features. Samples that were difficult to reconstruct at the output of GAE were identified as outliers considered representing PTB and LBW samples. Our experiments using LBW, PTB, and very PTB datasets demonstrated that incorporating node embedding considerably improved performance, achieving a 12% higher AUC-ROC compared to traditional GAE. Our study demonstrates the effectiveness of node embedding and graph outlier detection in improving the prediction performance of adverse pregnancy outcomes in well-curated population datasets.
Topics: Pregnancy; Female; Infant, Newborn; Humans; Pregnancy Outcome; Premature Birth; Infant, Low Birth Weight; Mothers; Risk Factors
PubMed: 37963898
DOI: 10.1038/s41598-023-46726-4 -
Data Mining and Knowledge Discovery 2023It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem (Janssens and Postma, in: Proceedings of the 18th...
UNLABELLED
It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem (Janssens and Postma, in: Proceedings of the 18th annual Belgian-Dutch on machine learning, pp 56-64, 2009; Janssens et al. in: Proceedings of the 2009 ICMLA international conference on machine learning and applications, IEEE Computer Society, pp 147-153, 2009. 10.1109/ICMLA.2009.16). In this paper, we focus on the comparison of one-class classification algorithms with such adapted unsupervised outlier detection methods, improving on previous comparison studies in several important aspects. We study a number of one-class classification and unsupervised outlier detection methods in a rigorous experimental setup, comparing them on a large number of datasets with different characteristics, using different performance measures. In contrast to previous comparison studies, where the models (algorithms, parameters) are selected by using examples from both classes (outlier and inlier), here we also study and compare different approaches for model selection in the absence of examples from the outlier class, which is more realistic for practical applications since labeled outliers are rarely available. Our results showed that, overall, SVDD and GMM are top-performers, regardless of whether the ground truth is used for parameter selection or not. However, in specific application scenarios, other methods exhibited better performance. Combining one-class classifiers into ensembles showed better performance than individual methods in terms of accuracy, as long as the ensemble members are properly selected.
SUPPLEMENTARY INFORMATION
The online version contains supplementary material available at 10.1007/s10618-023-00931-x.
PubMed: 37424877
DOI: 10.1007/s10618-023-00931-x -
Journal of Medical Internet Research May 2021Perioperative quantitative monitoring of neuromuscular function in patients receiving neuromuscular blockers has become internationally recognized as an absolute and...
BACKGROUND
Perioperative quantitative monitoring of neuromuscular function in patients receiving neuromuscular blockers has become internationally recognized as an absolute and core necessity in modern anesthesia care. Because of their kinetic nature, artifactual recordings of acceleromyography-based neuromuscular monitoring devices are not unusual. These generate a great deal of cynicism among anesthesiologists, constituting an obstacle toward their widespread adoption. Through outlier analysis techniques, monitoring devices can learn to detect and flag signal abnormalities. Outlier analysis (or anomaly detection) refers to the problem of finding patterns in data that do not conform to expected behavior.
OBJECTIVE
This study was motivated by the development of a smartphone app intended for neuromuscular monitoring based on combined accelerometric and angular hand movement data. During the paired comparison stage of this app against existing acceleromyography monitoring devices, it was noted that the results from both devices did not always concur. This study aims to engineer a set of features that enable the detection of outliers in the form of erroneous train-of-four (TOF) measurements from an acceleromyographic-based device. These features are tested for their potential in the detection of erroneous TOF measurements by developing an outlier detection algorithm.
METHODS
A data set encompassing 533 high-sensitivity TOF measurements from 35 patients was created based on a multicentric open label trial of a purpose-built accelero- and gyroscopic-based neuromuscular monitoring app. A basic set of features was extracted based on raw data while a second set of features was purpose engineered based on TOF pattern characteristics. Two cost-sensitive logistic regression (CSLR) models were deployed to evaluate the performance of these features. The final output of the developed models was a binary classification, indicating if a TOF measurement was an outlier or not.
RESULTS
A total of 7 basic features were extracted based on raw data, while another 8 features were engineered based on TOF pattern characteristics. The model training and testing were based on separate data sets: one with 319 measurements (18 outliers) and a second with 214 measurements (12 outliers). The F1 score (95% CI) was 0.86 (0.48-0.97) for the CSLR model with engineered features, significantly larger than the CSLR model with the basic features (0.29 [0.17-0.53]; P<.001).
CONCLUSIONS
The set of engineered features and their corresponding incorporation in an outlier detection algorithm have the potential to increase overall neuromuscular monitoring data consistency. Integrating outlier flagging algorithms within neuromuscular monitors could potentially reduce overall acceleromyography-based reliability issues.
TRIAL REGISTRATION
ClinicalTrials.gov NCT03605225; https://clinicaltrials.gov/ct2/show/NCT03605225.
Topics: Accelerometry; Humans; Machine Learning; Neuromuscular Blockade; Neuromuscular Monitoring; Reproducibility of Results
PubMed: 34152273
DOI: 10.2196/25913 -
Entropy (Basel, Switzerland) Dec 2021People nowadays use the internet to project their assessments, impressions, ideas, and observations about various subjects or products on numerous social networking...
People nowadays use the internet to project their assessments, impressions, ideas, and observations about various subjects or products on numerous social networking sites. These sites serve as a great source to gather data for data analytics, sentiment analysis, natural language processing, etc. Conventionally, the true sentiment of a customer review matches its corresponding star rating. There are exceptions when the star rating of a review is opposite to its true nature. These are labeled as the outliers in a dataset in this work. The state-of-the-art methods for anomaly detection involve manual searching, predefined rules, or traditional machine learning techniques to detect such instances. This paper conducts a sentiment analysis and outlier detection case study for Amazon customer reviews, and it proposes a statistics-based outlier detection and correction method (SODCM), which helps identify such reviews and rectify their star ratings to enhance the performance of a sentiment analysis algorithm without any data loss. This paper focuses on performing SODCM in datasets containing customer reviews of various products, which are (a) scraped from Amazon.com and (b) publicly available. The paper also studies the dataset and concludes the effect of SODCM on the performance of a sentiment analysis algorithm. The results exhibit that SODCM achieves higher accuracy and recall percentage than other state-of-the-art anomaly detection algorithms.
PubMed: 34945950
DOI: 10.3390/e23121645 -
Addiction (Abingdon, England) Dec 2021Over-the-counter codeine products were up-scheduled to prescription only in Australia from February 2018. This trend study aimed to identify changes in codeine supply...
BACKGROUND AND AIMS
Over-the-counter codeine products were up-scheduled to prescription only in Australia from February 2018. This trend study aimed to identify changes in codeine supply before and after the February 2018 implementation.
DESIGN, SETTING AND CASES
Time-series regression analysis of monthly medicine supplies in Australia from 2014 to 2018. The February 2018 up-scheduling was pre-specified as the intervention; outlier analysis was used to detect automatically sudden unexpected changes before February 2018.
MEASUREMENTS
Per-capita supplies based on national data for pharmaceutical wholesales and population exposure. Weight of supplies in milligrams for low-dose codeine (≤ 15 mg per tablet or ≤ 1.92 mg per ml, originally sold over the counter but up-scheduled after February 2018), high-dose combination codeine (30 mg per tablet, prescription only throughout the study period) and all codeine.
FINDINGS
Several level shifts in supply occurred during the 5 years, led by one of -4.4% [95% confidence interval (CI) = -6.6 to -2.1%] in high-dose codeine in 2015, followed by shifts in low-dose codeine of -40.0% (CI = -46.9 to -32.3%) and -82.2% (CI = -84.3 to -79.9%), respectively, before and after February 2018. High-dose codeine supply increased by 4.4% (CI = 1.8-7.1%) immediately after up-scheduling. Also detected were transient increases and decreases in 2016 and 2017. Compared with pre-2015 levels, the February 2018 up-scheduling was associated with reductions of 45.7% (CI = 43.2-48.0%) and 89.3% (CI = 87.9-90.6%), respectively, in all and low-dose codeine supply but no change in high-dose codeine supply. The level shifts and transient changes were located around various regulatory activities, including public announcements and expert advisory meetings on up-scheduling.
CONCLUSION
Up-scheduling of over-the-counter codeine products in Australia in 2018 appears to have been associated with a near halving of Australia's national codeine supply. The transition occurred in multiple forms and phases.
Topics: Analgesics, Opioid; Australia; Codeine; Humans; Nonprescription Drugs
PubMed: 33999465
DOI: 10.1111/add.15566 -
Evolution; International Journal of... Apr 2023An evolutionary debate contrasts the importance of genetic convergence versus genetic redundancy. In genetic convergence, the same adaptive trait evolves because of...
An evolutionary debate contrasts the importance of genetic convergence versus genetic redundancy. In genetic convergence, the same adaptive trait evolves because of similar genetic changes. In genetic redundancy, the adaptive trait evolves using different genetic combinations, and populations might not share the same genetic changes. Here we address this debate by examining single nucleotide polymorphisms (SNPs) associated with the rapid evolution of character displacement in Anolis carolinensis populations inhabiting replicate islands with and without a competitor species (1Spp and 2Spp islands, respectively). We identify 215-outliers SNPs that have improbably large FST values, low nucleotide variation, greater linkage than expected and that are enriched for genes underlying animal movement. The pattern of SNP divergence between 1Spp and 2Spp populations supports both genetic convergence and genetic redundancy for character displacement. In support of genetic convergence: all 215-outliers SNPs are shared among at least three of the five 2Spp island populations, and 23% of outlier SNPS are shared among all five 2Spp island populations. In contrast, in support of genetic redundancy: many outlier SNPs only have meaningful allele frequency differences between 1Spp and 2Spp islands on a few 2Spp islands. That is, on at least one of the 2Spp islands, 77% of outlier SNPs have allele frequencies more similar to those on 1Spp islands than to those on 2Spp islands. Focusing on genetic convergence is scientifically rigorous because it relies on replication. Yet, this focus distracts from the possibility that there are multiple, redundant genetic solutions that enhance the rate and stability of adaptive change.
Topics: Animals; Gene Frequency; Genomics; Phenotype; Polymorphism, Single Nucleotide; Selection, Genetic
PubMed: 36857409
DOI: 10.1093/evolut/qpad031 -
International Journal of Biometeorology Nov 2020Citizen science involves public participation in research, usually through volunteer observation and reporting. Data collected by citizen scientists are a valuable...
Citizen science involves public participation in research, usually through volunteer observation and reporting. Data collected by citizen scientists are a valuable resource in many fields of research that require long-term observations at large geographic scales. However, such data may be perceived as less accurate than those collected by trained professionals. Here, we analyze the quality of data from a plant phenology network, which tracks biological response to climate change. We apply five algorithms designed to detect outlier observations or inconsistent observers. These methods rely on different quantitative approaches, including residuals of linear models, correlations among observers, deviations from multivariate clusters, and percentile-based outlier removal. We evaluated these methods by comparing the resulting cleaned datasets in terms of time series means, spatial data coverage, and spatial autocorrelations after outlier removal. Spatial autocorrelations were used to determine the efficacy of outlier removal, as they are expected to increase if outliers and inconsistent observations are successfully removed. All data cleaning methods resulted in better Moran's I autocorrelation statistics, with percentile-based outlier removal and the clustering method showing the greatest improvement. Methods based on residual analysis of linear models had the strongest impact on the final bloom time mean estimates, but were among the weakest based on autocorrelation analysis. Removing entire sets of observations from potentially unreliable observers proved least effective. In conclusion, percentile-based outlier removal emerges as a simple and effective method to improve reliability of citizen science phenology observations.
Topics: Citizen Science; Climate Change; Community Participation; Humans; Reproducibility of Results; Volunteers
PubMed: 32671668
DOI: 10.1007/s00484-020-01968-z -
Journal of Applied Statistics 2020Outlier detection can be seen as a pre-processing step for locating data points in a data sample, which do not conform to the majority of observations. Various...
Outlier detection can be seen as a pre-processing step for locating data points in a data sample, which do not conform to the majority of observations. Various techniques and methods for outlier detection can be found in the literature dealing with different types of data. However, many data sets are inflated by true zeros and, in addition, some components/variables might be of compositional nature. Important examples of such data sets are the Structural Earnings Survey, the Structural Business Statistics, the European Statistics on Income and Living Conditions, tax data or - as in this contribution - household expenditure data which are used, for example, to estimate the Purchase Power Parity of a country. In this work, robust univariate and multivariate outlier detection methods are compared by a complex simulation study that considers various challenges included in data sets, namely structural (true) zeros, missing values, and compositional variables. These circumstances make it difficult or impossible to flag true outliers and influential observations by well-known outlier detection methods. Our aim is to assess the performance of outlier detection methods in terms of their effectiveness to identify outliers when applied to challenging data sets such as the household expenditures data surveyed all over the world. Moreover, different methods are evaluated through a close-to-reality simulation study. Differences in performance of univariate and multivariate robust techniques for outlier detection and their shortcomings are reported. We found that robust multivariate methods outperform robust univariate methods. The best performing methods in finding the outliers and in providing a low false discovery rate were found to be the generalized S estimators (GSE), the BACON-EEM algorithm and a compositional method (CoDa-Cov). In addition, these methods performed also best when the outliers are imputed based on the corresponding outlier detection method and indicators are estimated from the data sets.
PubMed: 35707025
DOI: 10.1080/02664763.2019.1671961 -
IEEE ... International Conference on... Jul 2022When it comes to observing and measuring human gait data for further analysis, determining whether the observed behavior is within the normal range of variability, or...
When it comes to observing and measuring human gait data for further analysis, determining whether the observed behavior is within the normal range of variability, or should be considered abnormal, is very challenging. Moreover, usually gait data are multivariate including motion capture, electromyography, force measurements, etc., each source having its own unique causes of irregularities and anomalies. This paper introduces a unique algorithm for outlier detection in periodic gait data using multiple sources and multiple procedures to improve the overall accuracy. The proposed algorithm's performance is evaluated using realistic synthetic gait data to gauge its accuracy to a truly objective known solution. It is shown that the proposed method is able to detect 91.2% of the true outliers in an extensive synthetic dataset, while only producing false positives at a rate of 0.1%, outperforming other procedures usually utilized in gait data outlier detection. The proposed method is a systematic way of removing outliers from gait data, with direct applications to human biomechanics, rehabilitation and robotics, and can be applied to other scientific fields dealing with periodic data.
Topics: Algorithms; Biomechanical Phenomena; Electromyography; Gait; Humans
PubMed: 36176090
DOI: 10.1109/ICORR55369.2022.9896411 -
Genes Oct 2021Domestication of teleost fish is a recent development, and in most cases started less than 50 years ago. Shedding light on the genomic changes in key economic traits...
Domestication of teleost fish is a recent development, and in most cases started less than 50 years ago. Shedding light on the genomic changes in key economic traits during the domestication process can provide crucial insights into the evolutionary processes involved and help inform selective breeding programmes. Here we report on the recent domestication of a native marine teleost species in New Zealand, the Australasian snapper (). Specifically, we use genome-wide data from a three-generation pedigree of this species to uncover genetic signatures of domestication selection for growth. Genotyping-By-Sequencing (GBS) was used to generate genome-wide SNP data from a three-generation pedigree to calculate generation-wide averages of F between every generation pair. The level of differentiation between generations was further investigated using ADMIXTURE analysis and Principal Component Analysis (PCA). After that, genome scans using Bayescan, LFMM and XP-EHH were applied to identify SNP variants under putative selection following selection for growth. Finally, genes near candidate SNP variants were annotated to gain functional insights. Analysis showed that between generations F values slightly increased as generational time increased. The extent of these changes was small, and both ADMIXTURE analysis and PCA were unable to form clear clusters. Genome scans revealed a number of SNP outliers, indicative of selection, of which a small number overlapped across analyses methods and populations. Genes of interest within proximity of putative selective SNPs were related to biological functions, and revealed an association with growth, immunity, neural development and behaviour, and tumour repression. Even though few genes overlapped between outlier SNP methods, gene functionalities showed greater overlap between methods. While the genetic changes observed were small in most cases, a number of outlier SNPs could be identified, of which some were found by more than one method. Multiple outlier SNPs appeared to be predominately linked to gene functionalities that modulate growth and survival. Ultimately, the results help to shed light on the genomic changes occurring during the early stages of domestication selection in teleost fish species such as snapper, and will provide useful candidates for the ongoing selective breeding in the future of this and related species.
Topics: Animals; Biological Evolution; Domestication; Genome; Genome-Wide Association Study; Genotyping Techniques; New Zealand; Pedigree; Perciformes; Phenotype; Polymorphism, Single Nucleotide; Selective Breeding
PubMed: 34828341
DOI: 10.3390/genes12111737