-
PloS One 2023This study proposes a robust outlier detection method based on the circular median for non-parametric linear-circular regression in case the response variable includes...
This study proposes a robust outlier detection method based on the circular median for non-parametric linear-circular regression in case the response variable includes outlier(s) and the residuals are Wrapped-Cauchy distributed. Nadaraya-Watson and local linear regression methods were employed to obtain non-parametric regression fits. The proposed method's performance was investigated by using a real dataset and a comprehensive simulation study with different sample sizes, contamination, and heterogeneity degrees. The method performs quite well in medium and higher contamination degrees, and its performance increases as the sample size and the homogeneity of data increase. In addition, when the response variable of linear-circular regression contains outliers, the Local Linear Estimation method fits the data set better than the Nadaraya Watson method.
Topics: Humans; Linear Models; Computer Simulation; Drug Contamination; Sample Size; Seizures
PubMed: 37307265
DOI: 10.1371/journal.pone.0286448 -
Frontiers in Psychiatry 2023Cortisol, a hormone regulated by the hypothalamic-pituitary-adrenal (HPA) axis, has been linked to attention deficit hyperactivity disorder (ADHD). The nature of the...
CONTEXT
Cortisol, a hormone regulated by the hypothalamic-pituitary-adrenal (HPA) axis, has been linked to attention deficit hyperactivity disorder (ADHD). The nature of the relationship between cortisol and ADHD, and whether it is causal or explained by reverse causality, remains a matter of debate.
OBJECTIVE
This study aims to evaluate the bidirectional causal relationship between morning plasma cortisol levels and ADHD.
METHODS
This study used a bidirectional 2-sample Mendelian randomization (MR) design to analyze the association between morning plasma cortisol levels and ADHD using genetic information from the authoritative Psychiatric Genomics Collaboration (PGC) database ( = 55,347) and the ADHD Working Group of the CORtisol NETwork (CORNET) Consortium ( = 12,597). MR analyses were employed: inverse variance weighting (IVW), MR-Egger regression, and weighted medians. OR values and 95% CI were used to evaluate whether there was a causal association between morning plasma cortisol levels on ADHD and ADHD on morning plasma cortisol levels. The Egger-intercept method was employed to test for level pleiotropy. Sensitivity analysis was performed using the "leave-one-out" method, MR pleiotropy residual sum, and MR pleiotropy residual sum and outlier (MR-PRESSO).
RESULTS
Findings from bidirectional MR demonstrated that lower morning plasma cortisol levels were associated with ADHD (ADHD-cortisol OR = 0.857; 95% CI, 0.755-0.974; = 0.018), suggesting there is a reverse causal relationship between cortisol and ADHD. However, morning plasma cortisol levels were not found to have a causal effect on the risk of ADHD (OR = 1.006; 95% CI, 0.909-1.113; = 0.907), despite the lack of genetic evidence. The MR-Egger method revealed intercepts close to zero, indicating that the selected instrumental variables had no horizontal multiplicity. The "leave-one-out" sensitivity analysis revealed stable results, with no instrumental variables significantly affecting the results. Heterogeneity tests were insignificant, and MR-PRESSO did not detect any significant outliers. The selected single-nucleotide polymorphisms (SNPs) were all >10, indicating no weak instrumental variables. Thus, the overall MR analysis results were reliable.
CONCLUSION
The study findings suggest a reverse causal relationship between morning plasma cortisol levels and ADHD, with low cortisol levels associated with ADHD. No genetic evidence was found to support a causal relationship between morning plasma cortisol levels and the risk of ADHD. These results suggest that ADHD may lead to a significant reduction in morning plasma cortisol secretion.
PubMed: 37389173
DOI: 10.3389/fpsyt.2023.1148759 -
BioData Mining Sep 2023There are not currently any univariate outlier detection algorithms that transform and model arbitrarily shaped distributions to remove univariate outliers. Some...
There are not currently any univariate outlier detection algorithms that transform and model arbitrarily shaped distributions to remove univariate outliers. Some algorithms model skew, even fewer model kurtosis, and none of them model bimodality and monotonicity. To overcome these challenges, we have implemented an algorithm for Skew and Tail-heaviness Adjusted Removal of Outliers (STAR_outliers) that robustly removes univariate outliers from distributions with many different shape profiles, including extreme skew, extreme kurtosis, bimodality, and monotonicity. We show that STAR_outliers removes simulated outliers with greater recall and precision than several general algorithms, and it also models the outlier bounds of real data distributions with greater accuracy.Background Reliably removing univariate outliers from arbitrarily shaped distributions is a difficult task. Incorrectly assuming unimodality or overestimating tail heaviness fails to remove outliers, while underestimating tail heaviness incorrectly removes regular data from the tails. Skew often produces one heavy tail and one light tail, and we show that several sophisticated outlier removal algorithms often fail to remove outliers from the light tail. Multivariate outlier detection algorithms have recently become popular, but having tested PyOD's multivariate outlier removal algorithms, we found them to be inadequate for univariate outlier removal. They usually do not allow for univariate input, and they do not fit their distributions of outliership scores with a model on which an outlier threshold can be accurately established. Thus, there is a need for a flexible outlier removal algorithm that can model arbitrarily shaped univariate distributions.Results In order to effectively model arbitrarily shaped univariate distributions, we have combined several well-established algorithms into a new algorithm called STAR_outliers. STAR_outliers removes more simulated true outliers and fewer non-outliers than several other univariate algorithms. These include several normality-assuming outlier removal methods, PyOD's isolation forest (IF) outlier removal algorithm (ACM Transactions on Knowledge Discovery from Data (TKDD) 6:3, 2012) with default settings, and an IQR based algorithm by Verardi and Vermandele that removes outliers while accounting for skew and kurtosis (Verardi and Vermandele, Journal de la Société Française de Statistique 157:90-114, 2016). Since the IF algorithm's default model poorly fit the outliership scores, we also compared the isolation forest algorithm with a model that entails removing as many datapoints as STAR_outliers does in order of decreasing outliership scores. We also compared these algorithms on the publicly available 2018 National Health and Nutrition Examination Survey (NHANES) data by setting the outlier threshold to keep values falling within the main 99.3 percent of the fitted model's domain. We show that our STAR_outliers algorithm removes significantly closer to 0.7 percent of values from these features than other outlier removal methods on average.Conclusions STAR_outliers is an easily implemented python package for removing outliers that outperforms multiple commonly used methods of univariate outlier removal.
PubMed: 37667378
DOI: 10.1186/s13040-023-00342-0 -
Journal of Nutrition Education and... Jan 2021The goal of this study was to explore the impact of 5 decision rules for removing outliers from adolescent food frequency questionnaire (FFQ) data.
OBJECTIVE
The goal of this study was to explore the impact of 5 decision rules for removing outliers from adolescent food frequency questionnaire (FFQ) data.
DESIGN
This secondary analysis used baseline and 3-month data from a weight loss intervention clinical trial.
PARTICIPANTS
African American adolescents (n = 181) were recruited from outpatient clinics and community health fairs.
VARIABLES MEASURED
Data collected included self-reported FFQ and mediators of weight (food addiction, depressive symptoms, and relative reinforcing value of food), caregiver-reported executive functioning, and objectively measured weight status (percentage overweight).
ANALYSIS
Descriptive statistics examined patterns in study variables at baseline and follow-up. Correlational analyses explored the relationships between FFQ data and key study variables at baseline and follow-up.
RESULTS
Compared with not removing outliers, using decision rules reduced the number of cases and restricted the range of data. The magnitude of baseline FFQ-mediator relationships was attenuated under all decision rules but varied (increasing, decreasing, and reversing direction) at follow-up. Decision rule use increased the magnitude of change in FFQ estimated energy intake and significantly strengthened its relationship with weight change under 2 fixed range decision rules.
CONCLUSIONS AND IMPLICATIONS
Results suggest careful evaluation of outliers and testing and reporting the effects of different outlier decision rules through sensitivity analyses.
Topics: Adolescent; Diet; Diet Records; Diet Surveys; Energy Intake; Female; Humans; Male; Motivation; Reproducibility of Results; Surveys and Questionnaires
PubMed: 33012663
DOI: 10.1016/j.jneb.2020.08.002 -
BMC Health Services Research Sep 2021As healthcare systems strive for efficiency, hospital "length of stay outliers" have the potential to significantly impact a hospital's overall utilization. There is a...
BACKGROUND
As healthcare systems strive for efficiency, hospital "length of stay outliers" have the potential to significantly impact a hospital's overall utilization. There is a tendency to exclude such "outlier" stays in local quality improvement and data reporting due to their assumed rare occurrence and disproportionate ability to skew mean and other summary data. This study sought to assess the influence of length of stay (LOS) outliers on inpatient length of stay and hospital capacity over a 5-year period at a large urban academic medical center.
METHODS
From January 2014 through December 2019, 169,645 consecutive inpatient cases were analyzed and assigned an expected LOS based on national academic center benchmarks. Cases in the top 1% of national sample LOS by diagnosis were flagged as length of stay outliers.
RESULTS
From 2014 to 2019, mean outlier LOS increased (40.98 to 45.11 days), as did inpatient LOS with outliers excluded (5.63 to 6.19 days). Outlier cases increased both in number (from 297 to 412) and as a percent of total discharges (0.98 to 1.56%), and outlier patient days increased from 6.7 to 9.8% of total inpatient plus observation days over the study period.
CONCLUSIONS
Outlier cases utilize a disproportionate and increasing share of hospital resources and available beds. The current tendency to exclude such outlier stays in data reporting due to assumed rare occurrence may need to be revisited. Outlier stays require distinct and targeted interventions to appropriately reduce length of stay to both improve patient care and maintain hospital capacity.
Topics: Hospitals, Urban; Humans; Length of Stay; Quality Improvement; Retrospective Studies
PubMed: 34503494
DOI: 10.1186/s12913-021-06972-6 -
BMJ Open Jul 2023To measure differences at various deciles in days alive and out of hospital to 90 days (DAOH) and explore its utility for identifying outliers of performance among...
OBJECTIVES
To measure differences at various deciles in days alive and out of hospital to 90 days (DAOH) and explore its utility for identifying outliers of performance among district health boards (DHBs).
METHODS
Days in hospital and mortality within 90 days of surgery were extracted by linking data from the New Zealand National Minimum Data Set and the births and deaths registry between 1 January 2011 and 31 December 2021 for all adults in New Zealand undergoing acute laparotomy (AL-a relatively high-risk group), elective total hip replacement (THR-a medium risk group) or lower segment caesarean section (LSCS-a low-risk group). DAOH was calculated without censoring to zero in cases of mortality. For each DHB, direct risk standardisation was used to adjust for potential confounders and presented in deciles according to baseline patient risk. The Mann-Whitney U test assessed overall DAOH differences between DHBs, and comparisons are presented between selected deciles of DAOH for each operation.
RESULTS
We obtained national data for 35 175, 52 032 and 117 695 patients undergoing AL, THR and LSCS procedures, respectively. We have demonstrated that calculating DAOH without censoring zero allows for differences between procedures and DHBs to be identified. Risk-adjusted national mean DAOH Scores were 64.0 days, 79.0 days and 82.0 days at the 0.1 decile and 75.0 days, 82.0 days and 84.0 days at the 0.2 decile for AL, THR and LSCS, respectively, matching to their expected risk profiles. Differences between procedures and DHBs were most marked at lower deciles of the DAOH distribution, and outlier DHBs were detectable. Corresponding 90-day mortality rates were 5.45%, 0.78% and 0.01%.
CONCLUSION
In New Zealand after direct risk adjustment, differences in DAOH between three types of surgical procedure reflected their respective risk levels and associated mortality rates. Outlier DHBs were identified for each procedure. Thus, our approach to analysing DAOH appears to have considerable face validity and potential utility for contributing to the measurement of perioperative outcomes in an audit or quality improvement setting.
Topics: Pregnancy; Adult; Humans; Female; Cross-Sectional Studies; New Zealand; Cesarean Section; Hospitals; Treatment Outcome
PubMed: 37491100
DOI: 10.1136/bmjopen-2022-063787 -
Sensors (Basel, Switzerland) Jul 2023This paper presents a comprehensive study on the development of models and soft sensors required for the implementation of the automated bioreactor feeding of Chinese...
This paper presents a comprehensive study on the development of models and soft sensors required for the implementation of the automated bioreactor feeding of Chinese hamster ovary (CHO) cells using Raman spectroscopy and chemometric methods. This study integrates various methods, such as partial least squares regression and variable importance in projection and competitive adaptive reweighted sampling, and highlights their effectiveness in overcoming challenges such as high dimensionality, multicollinearity and outlier detection in Raman spectra. This paper emphasizes the importance of data preprocessing and the relationship between independent and dependent variables in model construction. It also describes the development of a simulation environment whose core is a model of CHO cell kinetics. The latter allows the development of advanced control algorithms for nutrient dosing and the observation of the effects of different parameters on the growth and productivity of CHO cells. All developed models were validated and demonstrated to have a high robustness and predictive accuracy, which were reflected in a 40% reduction in the root mean square error compared to established methods. The results of this study provide valuable insights into the practical application of these methods in the field of monitoring and automated cell feeding and make an important contribution to the further development of process analytical technology in the bioprocess industry.
Topics: Cricetinae; Animals; Cricetulus; CHO Cells; Bioreactors; Spectrum Analysis, Raman; Least-Squares Analysis
PubMed: 37514911
DOI: 10.3390/s23146618 -
Sensors (Basel, Switzerland) Apr 2020With the advent of unmanned aerial vehicles (UAVs), a major area of interest in the research field of UAVs has been vision-aided inertial navigation systems (V-INS). In...
With the advent of unmanned aerial vehicles (UAVs), a major area of interest in the research field of UAVs has been vision-aided inertial navigation systems (V-INS). In the front-end of V-INS, image processing extracts information about the surrounding environment and determines features or points of interest. With the extracted vision data and inertial measurement unit (IMU) dead reckoning, the most widely used algorithm for estimating vehicle and feature states in the back-end of V-INS is an extended Kalman filter (EKF). An important assumption of the EKF is Gaussian white noise. In fact, measurement outliers that arise in various realistic conditions are often non-Gaussian. A lack of compensation for unknown noise parameters often leads to a serious impact on the reliability and robustness of these navigation systems. To compensate for uncertainties of the outliers, we require modified versions of the estimator or the incorporation of other techniques into the filter. The main purpose of this paper is to develop accurate and robust V-INS for UAVs, in particular, those for situations pertaining to such unknown outliers. Feature correspondence in image processing front-end rejects vision outliers, and then a statistic test in filtering back-end detects the remaining outliers of the vision data. For frequent outliers occurrence, variational approximation for Bayesian inference derives a way to compute the optimal noise precision matrices of the measurement outliers. The overall process of outlier removal and adaptation is referred to here as "outlier-adaptive filtering". Even though almost all approaches of V-INS remove outliers by some method, few researchers have treated outlier adaptation in V-INS in much detail. Here, results from flight datasets validate the improved accuracy of V-INS employing the proposed outlier-adaptive filtering framework.
PubMed: 32260451
DOI: 10.3390/s20072036 -
PeerJ 2021The Mahalanobis distance is a statistical technique that has been used in statistics and data science for data classification and outlier detection, and in ecology to...
Mahalanobis distances for ecological niche modelling and outlier detection: implications of sample size, error, and bias for selecting and parameterising a multivariate location and scatter method.
The Mahalanobis distance is a statistical technique that has been used in statistics and data science for data classification and outlier detection, and in ecology to quantify species-environment relationships in habitat and ecological niche models. Mahalanobis distances are based on the location and scatter of a multivariate normal distribution, and can measure how distant any point in space is from the centre of this kind of distribution. Three different methods for calculating the multivariate location and scatter are commonly used: the sample mean and variance-covariance, the minimum covariance determinant, and the minimum volume ellipsoid. The minimum covariance determinant and minimum volume ellipsoid were developed to be robust to outliers by minimising the multivariate location and scatter for a subset of the full sample, with the proportion of the full sample forming the subset being controlled by a user-defined parameter. This outlier robustness means the minimum covariance determinant and the minimum volume ellipsoid are highly relevant for ecological niche analyses, which are usually based on natural history observations that are likely to contain errors. However, natural history observations will also contain extreme bias, to which the minimum covariance determinant and the minimum volume ellipsoid will also be sensitive. To provide guidance for selecting and parameterising a multivariate location and scatter method, a series of virtual ecological niche modelling experiments were conducted to demonstrate the performance of each multivariate location and scatter method under different levels of sample size, errors, and bias. The results show that there is no optimal modelling approach, and that choices need to be made based on the individual data and question. The sample mean and variance-covariance method will perform best on very small sample sizes if the data are free of error and bias. At larger sample sizes the minimum covariance determinant and minimum volume ellipsoid methods perform as well or better, but only if they are appropriately parameterised. Modellers who are more concerned about the prevalence of errors should retain a smaller proportion of the full data set, while modellers more concerned about the prevalence of bias should retain a larger proportion of the full data set. I conclude that Mahalanobis distances are a useful niche modelling technique, but only for questions relating to the fundamental niche of a species where the assumption of multivariate normality is reasonable. Users of the minimum covariance determinant and minimum volume ellipsoid methods must also clearly report their parameterisations so that the results can be interpreted correctly.
PubMed: 34026369
DOI: 10.7717/peerj.11436 -
Plants (Basel, Switzerland) Apr 2023We epigenotyped 211 individuals from 17 populations using methylation-sensitive amplification polymorphism (MSAP) and investigated the associations of methylated...
We epigenotyped 211 individuals from 17 populations using methylation-sensitive amplification polymorphism (MSAP) and investigated the associations of methylated (mMSAP) and unmethylated (uMSAP) loci with 16 environmental variables. Data regarding genetic variation based on amplified fragment length polymorphism (AFLP) were obtained from an earlier study. We found a significant positive correlation between genetic and epigenetic variation. Significantly higher mean mMSAP and uMSAP (unbiased expected heterozygosity: 0.223 and 0.131, respectively, < 0.001) per locus than that estimated based on AFLP ( = 0.104) were found. Genome scans detected 10 mMSAP and 9 uMSAP outliers associated with various environmental variables. A significant linear fit for 11 and 12 environmental variables with outlier mMSAP and uMSAP ordination, respectively, generated using full model redundancy analysis (RDA) was found. When conditioned on geography, partial RDA revealed that five and six environmental variables, respectively, were the most important variables influencing outlier mMSAP and uMSAP variation. We found higher genetic (average = 0.298) than epigenetic (mMSAP and uMSAP average = 0.044 and 0.106, respectively) differentiation and higher genetic isolation-by-distance (IBD) than epigenetic IBD. Strong epigenetic isolation-by-environment (IBE) was found, particularly based on the outlier data, controlling either for geography (mMSAP and uMSAP = 0.128 and 0.132, respectively, = 0.001) or for genetic structure (mMSAP and uMSAP = 0.105 and 0.136, respectively, = 0.001). Our results suggest that epigenetic variants can be substrates for natural selection linked to environmental variables and complement genetic changes in the adaptive evolution of populations.
PubMed: 37050184
DOI: 10.3390/plants12071558