-
ESMO Open Oct 2023
PubMed: 37769399
DOI: 10.1016/j.esmoop.2023.101833 -
Mathematical Biosciences and... Jul 2023Trajectory outlier detection can identify abnormal phenomena from a large number of trajectory data, which is helpful to discover or predict potential traffic risks. In...
Trajectory outlier detection can identify abnormal phenomena from a large number of trajectory data, which is helpful to discover or predict potential traffic risks. In this work, we proposed a trajectory outlier detection model based on variational auto-encoder. First, the model encodes the trajectory data as parameters of distribution functions based on the statistical characteristics of urban traffic. Then, an auto-encoder network is built and trained. The training goal of the auto-encoder network is to maximize the generation probability of original trajectories when decoding. Once the model training is completed, we can detect the trajectory outlier by the difference between a trajectory and the trajectory generated by the model. The advantage of the proposed model is that it only needs to compute the difference between the original trajectory and the trajectory generated by the model when detecting the trajectory outlier, which greatly reduces the amount of calculation and makes the model very suitable for real-time detection scenarios. In addition, the distance threshold between the abnormal trajectory and the normal trajectory can be set by referring to the proportion of the abnormal trajectory in the training data set, which eliminates the difficulty of setting the threshold manually and makes the model more convenient to be applied in different actual scenes. In terms of effect, the proposed model has achieved more than 95% in accuracy, which is better than the two typical density-based and classification-based detection methods, and also better than the methods based on machine learning in recent years. In terms of efficiency, the model has good convergence in the training phase and the training time increases slowly with the data scale, which is better than or as the same as the comparison methods.
PubMed: 37679172
DOI: 10.3934/mbe.2023675 -
Scientific Reports Oct 2023This study conducted a comprehensive analysis of multiple supervised machine learning models, regressors and classifiers, to accurately predict diamond prices. Diamond...
This study conducted a comprehensive analysis of multiple supervised machine learning models, regressors and classifiers, to accurately predict diamond prices. Diamond pricing is a complex task due to the non-linear relationships between key features such as carat, cut, clarity, table, and depth. The analysis aimed to develop an accurate predictive model by utilizing both regression and classification approaches. To preprocess the data, the study employed various techniques. The work addressed outliers, standardized the predictors, performed median imputation of missing values, and resolved multicollinearity issues. Equal-width binning on the cut variable was performed to handle class imbalance. Correlation-based feature selection was utilized to eliminate highly correlated variables, ensuring that only relevant features were included in the models. Outliers were handled using the inter-quartile range method, and numerical features were normalized through standardization. Missing values in numerical features were imputed using the median, preserving the integrity of the dataset. Among the models evaluated, the RF regressor exhibited exceptional performance. It achieved the lowest root mean squared error (RMSE) of 523.50, indicating superior accuracy compared to the other models. The RF regressor also obtained a high R-squared ([Formula: see text]) score of 0.985, suggesting it explained a significant portion of the variance in diamond prices. Furthermore, the area under the curve with RF classifier for the test set was 1.00 [Formula: see text], indicating perfect classification performance. These results solidify the RF's position as the best-performing model in terms of accuracy and predictive power, both in regression and classification. The MLP regressor showed promising results with an RMSE of 563.74 and an [Formula: see text] score of 0.980, demonstrating its ability to capture the complex relationships in the data. Although it achieved slightly higher errors than the RF regressor, further analysis is needed to determine its suitability and potential advantages compared to the RF regressor. The XGBoost Regressor achieved an RMSE of 612.88 and an [Formula: see text] score of 0.972, indicating its effectiveness in predicting diamond prices but with slightly higher errors compared to the RF regressor. The Boosted Decision Tree Regressor had an RMSE of 711.31 and an [Formula: see text] score of 0.968, demonstrating its ability to capture some of the underlying patterns but with higher errors than the RF and XGBoost models. In contrast, the KNN regressor yielded a higher RMSE of 1346.65 and a lower [Formula: see text] score of 0.887, indicating its inferior performance in accurately predicting diamond prices compared to the other models. Similarly, the Linear Regression model performed similarly to the KNN regressor, with an RMSE of 1395.41 and an [Formula: see text] score of 0.876. The Support Vector Regression model showed the highest RMSE of 3044.49 and the lowest [Formula: see text] score of 0.421, indicating its limited effectiveness in capturing the complex relationships in the data. Overall, the study demonstrates that the RF outperforms the other models in terms of accuracy and predictive power, as evidenced by its lowest RMSE, highest [Formula: see text] score, and perfect classification performance. This highlights its suitability for accurately predicting diamond prices. The study not only provides an effective tool for the diamond industry but also emphasizes the importance of considering both regression and classification approaches in developing accurate predictive models. The findings contribute valuable insights for pricing strategies, market trends, and decision-making processes in the diamond industry and related fields.
PubMed: 37828360
DOI: 10.1038/s41598-023-44326-w -
Mikrochimica Acta Jan 2024In this tutorial review, we provide a guiding reference on the good practice in building calibration and correlation experiments, and we explain how the results should... (Review)
Review
In this tutorial review, we provide a guiding reference on the good practice in building calibration and correlation experiments, and we explain how the results should be evaluated and interpreted. The review centers on calibration experiments where the relationship between response and concentration is expected to be linear, although certain of the described principles of good practice can be applied to non-linear systems, as well. Furthermore, it gives prominence to the meaning and correct interpretation of some of the statistical terms commonly associated with calibration and regression. To reach a mutual understanding in this significant field, we present, through a practical example, a step-by-step procedure, which deals with typical challenges related to linearity and outlier assessment, calculation of the associated error of the predicted concentration, and limits of detection. The utilization of regression lines to compare analytical methods is also elaborated. The results of regression and correlation data are acquired by implementing the Excel spreadsheet of Microsoft, being perhaps one of the most widely used user-friendly software in education and research.
PubMed: 38191690
DOI: 10.1007/s00604-023-06157-4 -
Frontiers in Bioinformatics 2023Conventional dimensionality reduction methods like Multidimensional Scaling (MDS) are sensitive to the presence of orthogonal outliers, leading to significant defects in...
Conventional dimensionality reduction methods like Multidimensional Scaling (MDS) are sensitive to the presence of orthogonal outliers, leading to significant defects in the embedding. We introduce a robust MDS method, called (Detection and Correction of Orthogonal outliers using MDS), based on the geometry and statistics of simplices formed by data points, that allows to detect orthogonal outliers and subsequently reduce dimensionality. We validate our methods using synthetic datasets, and further show how it can be applied to a variety of large real biological datasets, including cancer image cell data, human microbiome project data and single cell RNA sequencing data, to address the task of data cleaning and visualization.
PubMed: 37637212
DOI: 10.3389/fbinf.2023.1211819 -
Frontiers in Immunology 2024Previous studies have reported associations of Crohn's disease (CD) and ulcerative colitis (UC) with the risks of extraintestinal cancers, but the causality remains...
BACKGROUND
Previous studies have reported associations of Crohn's disease (CD) and ulcerative colitis (UC) with the risks of extraintestinal cancers, but the causality remains unclear.
METHODS
Using genetic variations robustly associated with CD and UC extracted from genome-wide association studies (GWAS) as instrumental variables. Nine types of extraintestinal cancers of European and Asian populations were selected as outcomes. We used the inverse variance weighted method as the primary approach for two-sample Mendelian randomization analysis. Sensitivity analyses were carried out to evaluate the reliability of our findings.
RESULTS
In the European population, we found that CD showed a potential causal relationship with pancreatic cancer (OR: 1.1042; 95% CI: 1.0087-1.2088; P=0.0318). Meanwhile, both CD (outliers excluded: OR: 1.0208; 95% CI: 1.0079-1.0339; P=0.0015) and UC (outliers excluded: OR: 1.0220; 95% CI: 1.0051-1.0393; P=0.0108) were associated with a slight increase in breast cancer risk. Additionally, UC exhibited a potential causal effect on cervical cancer (outliers excluded: OR: 1.1091; 95% CI: 1.0286-1.1960; P=0.0071). In the East Asian population, CD had significant causal effects on pancreatic cancer (OR: 1.1876; 95% CI: 1.0741-1.3132; P=0.0008) and breast cancer (outliers excluded: OR: 0.9452; 95% CI: 0.9096-0.9822; P=0.0040). For UC, it exhibited significant causal associations with gastric cancer (OR: 1.1240; 95% CI: 1.0624-1.1891; P=4.7359×10), bile duct cancer (OR: 1.3107; 95% CI: 1.0983-1.5641; P=0.0027), hepatocellular carcinoma (OR: 1.2365; 95% CI: 1.1235-1.3608; P=1.4007×10) and cervical cancer (OR: 1.3941; 95% CI: 1.1708-1.6599; P=0.0002), as well as a potential causal effect on lung cancer (outliers excluded: OR: 1.1313; 95% CI: 1.0280-1.2449; P=0.0116).
CONCLUSIONS
Our study provided evidence that genetically predicted CD may be a risk factor for pancreatic and breast cancers in the European population, and for pancreatic cancer in the East Asian population. Regarding UC, it may be a risk factor for cervical and breast cancers in Europeans, and for gastric, bile duct, hepatocellular, lung, and cervical cancers in East Asians. Therefore, patients with CD and UC need to emphasize screening and prevention of site-specific extraintestinal cancers.
Topics: Humans; Breast Neoplasms; Colitis, Ulcerative; Crohn Disease; East Asian People; Genetic Predisposition to Disease; Genome-Wide Association Study; Pancreatic Neoplasms; Reproducibility of Results; Risk Factors; Uterine Cervical Neoplasms; European People; Neoplasms
PubMed: 38404590
DOI: 10.3389/fimmu.2024.1339207 -
Regulatory Toxicology and Pharmacology... Aug 2023While there are some regulatory assessment criteria available on how to generally evaluate dermal absorption (DA) studies for risk assessment purposes, practical... (Review)
Review
While there are some regulatory assessment criteria available on how to generally evaluate dermal absorption (DA) studies for risk assessment purposes, practical guidance and examples are lacking. The current manuscript highlights the challenges in interpretating data from in vitro assays and proposes holistic data-based assessment strategies from an industry perspective. Inflexible decision criteria may be inadequate for real data and may lead to irrelevant DA estimates. We recommend the use of mean values for reasonably conservative DA estimates from in vitro studies. In cases where additional conservatism is needed, e.g., due to non-robust data and acute exposure scenarios, the upper 95% confidence interval of the mean may be appropriate. It is critical to review the data for potential outliers and we provide some example cases and strategies to identify aberrant responses. Some regional regulatory authorities require the evaluation of stratum corneum (SC) residue, but here, as a very simple pro-rata approach, we propose to review whether the predicted post 24-h absorption flux exceeds the predicted elimination flux by desquamation because otherwise it is not possible for the SC residue to contribute to systemic dose. Overall, the adjustment of DA estimates due to mass balance (normalization) is not recommended.
Topics: Skin; Skin Absorption; Pesticides; Epidermis; Industry; Risk Assessment
PubMed: 37302560
DOI: 10.1016/j.yrtph.2023.105432 -
Bioengineering (Basel, Switzerland) Apr 2024Monitoring fetal heart rate (FHR) through cardiotocography is crucial for the early diagnosis of fetal distress situations, necessitating prompt obstetrical... (Review)
Review
Monitoring fetal heart rate (FHR) through cardiotocography is crucial for the early diagnosis of fetal distress situations, necessitating prompt obstetrical intervention. However, FHR signals are often marred by various contaminants, making preprocessing techniques essential for accurate analysis. This scoping review, following PRISMA-ScR guidelines, describes the preprocessing methods in original research articles on human FHR (or beat-to-beat intervals) signal preprocessing from PubMed and Web of Science, published from their inception up to May 2021. From the 322 unique articles identified, 54 were included, from which prevalent preprocessing approaches were identified, primarily focusing on the detection and correction of poor signal quality events. Detection usually entailed analyzing deviations from neighboring samples, whereas correction often relied on interpolation techniques. It was also noted that there is a lack of consensus regarding the definition of missing samples, outliers, and artifacts. Trends indicate a surge in research interest in the decade 2011-2021. This review underscores the need for standardizing FHR signal preprocessing techniques to enhance diagnostic accuracy. Future work should focus on applying and evaluating these methods across FHR databases aiming to assess their effectiveness and propose improvements.
PubMed: 38671789
DOI: 10.3390/bioengineering11040368 -
PloS One 2024The detection of water quality indicators such as Temperature, pH, Turbidity, Conductivity, and TDS involves five national standard methods. Chemically based measurement...
The detection of water quality indicators such as Temperature, pH, Turbidity, Conductivity, and TDS involves five national standard methods. Chemically based measurement techniques may generate liquid residue, causing secondary pollution. The water quality monitoring and data analysis system can effectively address the issues that conventional methods require multiple pieces of equipment and repeated measurements. This paper analyzes the distribution characteristics of the historical data from five sensors at a specific time, displays them graphically in real time, and provides an early warning of exceeding the standard; It selects four water samples from different sections of the Li River, based on the national standard method, the average measurement errors of Temperature, PH, TDS, Conductivity and Turbidity are 0.98%, 2.23%, 2.92%, 3.05% and 3.98%.;It further uses the quartile method to analyze the outlier data over 100,000 records and five historical periods are selected. Experiment results show the system is relatively stable in measuring Temperature, PH and TDS, and the proportion of outlier is 0.42%, 0.84% and 1.24%. When Turbidity and Conductivity are measured, the proportion is 3.11% and 2.92%. In the experiment of using 7 methods to fill outlier, K nearest neighbor algorithm is better than others. The analysis of data trends, outliers, means, and extreme values assists in making decisions, such as updating and maintaining equipment, addressing extreme water quality situations, and enhancing regional water quality oversight.
Topics: Water Quality; Rivers; Environmental Monitoring; Fresh Water; Cluster Analysis
PubMed: 38498583
DOI: 10.1371/journal.pone.0299435 -
Alzheimer's & Dementia (Amsterdam,... 2024Overlooking the heterogeneity in Alzheimer's disease (AD) may lead to diagnostic delays and failures. Neuroanatomical normative modeling captures individual brain...
INTRODUCTION
Overlooking the heterogeneity in Alzheimer's disease (AD) may lead to diagnostic delays and failures. Neuroanatomical normative modeling captures individual brain variation and may inform our understanding of individual differences in AD-related atrophy.
METHODS
We applied neuroanatomical normative modeling to magnetic resonance imaging from a real-world clinical cohort with confirmed AD ( = 86). Regional cortical thickness was compared to a healthy reference cohort ( = 33,072) and the number of outlying regions was summed (total outlier count) and mapped at individual- and group-levels.
RESULTS
The superior temporal sulcus contained the highest proportion of outliers (60%). Elsewhere, overlap between patient atrophy patterns was low. Mean total outlier count was higher in patients who were non-amnestic, at more advanced disease stages, and without depressive symptoms. Amyloid burden was negatively associated with outlier count.
DISCUSSION
Brain atrophy in AD is highly heterogeneous and neuroanatomical normative modeling can be used to explore anatomo-clinical correlations in individual patients.
PubMed: 38487076
DOI: 10.1002/dad2.12559