-
Methods in Molecular Biology (Clifton,... 2014The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive... (Review)
Review
The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning approaches for this application area. In this chapter, we first review the fundamental concepts of machine learning such as feature assessment, unsupervised versus supervised learning and types of classification. Then, we point out the main issues of designing machine learning experiments and their performance evaluation. Finally, we introduce some supervised learning methods.
Topics: Artificial Intelligence; Bayes Theorem; Discriminant Analysis; Probability; Support Vector Machine
PubMed: 24272434
DOI: 10.1007/978-1-62703-748-8_7 -
Sensors (Basel, Switzerland) Mar 2022The transport network in eastern Japan was severely damaged by the 2011 Tohoku earthquake. To understand the road recovery conditions after a large earthquake, a large...
The transport network in eastern Japan was severely damaged by the 2011 Tohoku earthquake. To understand the road recovery conditions after a large earthquake, a large amount of time is needed to collect information on the extent of the damage and road usage. In our previous study, we applied cluster analysis to analyze the data on driving vehicles in Fukushima prefecture to classify the road recovery conditions among municipalities within the first six months after the earthquake. However, the results of the cluster analysis and relevant factors affecting road recovery from that study were not validated. In this study, we proposed a framework for determining post-earthquake road recovery patterns and validated the cluster analysis results by using discriminant analysis and observing them on a map to identify their common characteristics. In addition, our analysis of objective data reflecting regional characteristics showed that the road recovery conditions were similar according to the topography and the importance of roads.
Topics: Automobile Driving; Cluster Analysis; Discriminant Analysis; Earthquakes; Japan
PubMed: 35336384
DOI: 10.3390/s22062213 -
International Journal of Environmental... Apr 2021The purpose of this study was to investigate the multivariate profile of different types of Brazilian runners and to identify the discriminant pattern of the distinct...
The purpose of this study was to investigate the multivariate profile of different types of Brazilian runners and to identify the discriminant pattern of the distinct types of runners, as a runners' ability to self-classify well. The sample comprised 1235 Brazilian runners of both sexes (492 women; 743 men), with a mean age of 37.94 ± 9.46 years. Individual characteristics were obtained through an online questionnaire: Sex, age, body height (m) and body mass (kg), socioeconomic status, and training information (i.e., self-classification, practice time, practice motivation, running pace, frequency and training volume/week). Multivariate analysis of variance was conducted by sex and the discriminant analysis was used to identify which among running pace, practice time, body mass index and volume/training could differentiate groups such as "professional athletes", "amateur athletes" and "recreational athletes". For both sexes, running pace was the variable that better discriminated the groups, followed by BMI and volume/week. The practice time is not a good indicator to differentiate runner's types. In both sexes, semi-professional runners were those that better self-classify themselves, with amateur runners presenting the highest classification error. This information can be used to guide the long-term training, athlete's selection programs, and to identify the strengths and weaknesses of athletes.
Topics: Adult; Anthropometry; Brazil; Discriminant Analysis; Female; Humans; Male; Middle Aged; Motivation; Running
PubMed: 33923769
DOI: 10.3390/ijerph18084248 -
Science Progress Oct 2021In the wider spectrum of Taiwanese public service spheres, the herculean services and dedication of its committed Police personnel have long been recognized, respected,...
In the wider spectrum of Taiwanese public service spheres, the herculean services and dedication of its committed Police personnel have long been recognized, respected, and admired. However, regrettably, question marks concerning their conduct, discipline, and abuse of power have surfaced on intermittent occasions. A classic example that lingers in the public memory is the bribing of Taiwanese video game companies to some unscrupulous elements of the police department, in the closing decades of the 20th century that triggered public outrage and called for scrutiny concerning serious lapses in the discipline and conduct of Police personnel. This research paper endeavors to understand, analyze and address some of those issues based on empirical data on the police personnel of certain specific work zones/areas taking into account holistically both the sentenced police officers vis-à-vis the law-abiding police officers. This module looks into and sieves through available data for seven critical variables, including their degree of variation through the Identification and Analysis Method to develop a Predictive Model on Police Ethics and the important factors that affect Police Ethics. Concretely based on the integrated research, it is proposed that this Predictive Model has good applicability as well as accurate predictive ability in addressing the core issues that affect Police Ethics. It is hoped that through this Early Warning Predictive Model-all the stakeholders that are Policy and Decision-makers, Regulatory Police Agencies but more importantly the Police personnel themselves would effectively address the criticality of the issues that affect the Police Ethics so as to undertake competent and effective measures to erase/lessen the menace and provide an early rehabilitative care/assistance to build a strong, constructive and visionary Taiwanese Police Force to meet the challenges of 21st century and beyond.
Topics: Discriminant Analysis; Government Agencies; Humans; Police
PubMed: 34783615
DOI: 10.1177/00368504211055638 -
Biometrics Jun 2022Classification methods that leverage the strengths of data from multiple sources (multiview data) simultaneously have enormous potential to yield more powerful findings...
Classification methods that leverage the strengths of data from multiple sources (multiview data) simultaneously have enormous potential to yield more powerful findings than two-step methods: association followed by classification. We propose two methods, sparse integrative discriminant analysis (SIDA), and SIDA with incorporation of network information (SIDANet), for joint association and classification studies. The methods consider the overall association between multiview data, and the separation within each view in choosing discriminant vectors that are associated and optimally separate subjects into different classes. SIDANet is among the first methods to incorporate prior structural information in joint association and classification studies. It uses the normalized Laplacian of a graph to smooth coefficients of predictor variables, thus encouraging selection of predictors that are connected. We demonstrate the effectiveness of our methods on a set of synthetic datasets and explore their use in identifying potential nontraditional risk factors that discriminate healthy patients at low versus high risk for developing atherosclerosis cardiovascular disease in 10 years. Our findings underscore the benefit of joint association and classification methods if the goal is to correlate multiview data and to perform classification.
Topics: Discriminant Analysis; Humans
PubMed: 33739448
DOI: 10.1111/biom.13458 -
Scientific Reports Mar 2022Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by impaired social interaction and restricted, repetitive behavior. Multiple studies have...
Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by impaired social interaction and restricted, repetitive behavior. Multiple studies have suggested mitochondrial dysfunction, glutamate excitotoxicity, and impaired detoxification mechanism as accepted etiological mechanisms of ASD that can be targeted for therapeutic intervention. In the current study, blood samples were collected from 40 people with autism and 40 control participants after informed consent and full approval from the Institutional Review Board of King Saud University. Sodium (Na), Potassium (K), lactate dehydrogenase (LDH), glutathione-s-transferase (GST), and mitochondrial respiratory chain complex I (MRC1) were measured in plasma of both groups. Predictive models were established to discriminate individuals with ASD from controls. The predictive power of these five variables, individually and in combination, was compared using the area under a ROC curve (AUC). We compared the performance of principal component analysis (PCA), discriminant analysis (DA), and binary logistic regression (BLR) as ways to combine single variables and create the predictive models. K had the highest AUC (0.801) of any single variable, followed by GST, LDH, Na, and MRC1, respectively. Combining the five variables resulted in higher AUCs than those obtained using single variables across all models. Both DA and BLR were superior to PCA and comparable to each other. In our study, the combination of Na, K, LDH, GST, and MRC1 showed the highest promise in discriminating individuals with autism from controls. These results provide a platform that can potentially be used to verify the efficacy of our models with a larger sample size or evaluate other biomarkers.
Topics: Autism Spectrum Disorder; Discriminant Analysis; Glutathione Transferase; Humans; L-Lactate Dehydrogenase; Logistic Models; Principal Component Analysis; Sodium
PubMed: 35260688
DOI: 10.1038/s41598-022-07829-6 -
Health Informatics Journal Sep 2020Coronary artery disease is one of the most prevalent chronic pathologies in the modern world, leading to the deaths of thousands of people, both in the United States and...
Coronary artery disease is one of the most prevalent chronic pathologies in the modern world, leading to the deaths of thousands of people, both in the United States and in Europe. This article reports the use of data mining techniques to analyse a population of 10,265 people who were evaluated by the Department of Advanced Biomedical Sciences for myocardial ischaemia. Overall, 22 features are extracted, and linear discriminant analysis is implemented twice through both the Knime analytics platform and R statistical programming language to classify patients as either normal or pathological. The former of these analyses includes only classification, while the latter method includes principal component analysis before classification to create new features. The classification accuracies obtained for these methods were 84.5 and 86.0 per cent, respectively, with a specificity over 97 per cent and a sensitivity between 62 and 66 per cent. This article presents a practical implementation of traditional data mining techniques that can be used to help clinicians in decision-making; moreover, principal component analysis is used as an algorithm for feature reduction.
Topics: Algorithms; Coronary Artery Disease; Discriminant Analysis; Europe; Humans; Principal Component Analysis
PubMed: 31969043
DOI: 10.1177/1460458219899210 -
Scientific Reports Nov 2022Elatine is a genus in which, flower and seed characteristics are the most important diagnostic features; i.e. seed shape and the structure of its cover found to be the...
Elatine is a genus in which, flower and seed characteristics are the most important diagnostic features; i.e. seed shape and the structure of its cover found to be the most reliable identification character. We used a combination of classic discriminant methods by combining with deep learning techniques to analyze seed morphometric data within 28 populations of six Elatine species from 11 countries throughout the Northern Hemisphere to compare the obtained results and then check their taxonomic classification. Our findings indicate that among the discriminant methods, Quadratic Discriminant Analysis (QDA) had the highest percentage of correct matching (mean fit-91.23%); only the deep machine learning method based on Convolutional Neural Network (CNN) was characterized by a higher match (mean fit-93.40%). The QDA method recognized the seeds of E. brochonii and E. orthosperma with 99% accuracy, and the CNN method with 100%. Other taxa, such as E. alsinastrum, E. trianda, E. californica and E. hungarica were matched with an accuracy of at least 95% (CNN). Our results indicate that the CNN obtains remarkably more accurate classifications than classic discriminant methods, and better recognizes the entire taxa pool analyzed. The least recognized species are E. macropoda and E. hexandra (88% and 78% match).
Topics: Deep Learning; Malpighiales; Seeds; Discriminant Analysis; Flowers
PubMed: 36443472
DOI: 10.1038/s41598-022-24660-1 -
Journal of Integrative Bioinformatics Jun 2021Some species of cover crops produce phenolic compounds with allelopathic potential. The use of math, statistical and computational tools to analyze data obtained with...
Some species of cover crops produce phenolic compounds with allelopathic potential. The use of math, statistical and computational tools to analyze data obtained with spectrophotometry can assist in the chemical profile discrimination to choose which species and cultivation are the best for weed management purposes. The aim of this study was to perform exploratory and discriminant analysis using R package specmine on the phenolic profile of L., L. and L. shoots obtained by UV-vis scanning spectrophotometry. Plants were collected at 60, 80 and 100 days after sowing and at 15 and 30 days after rolling in experiment in Brazil. Exploratory and discriminant analysis, namely principal component analysis, hierarchical clustering analysis, -test, fold-change, analysis of variance and supervised machine learning analysis were performed. Results showed a stronger tendency to cluster phenolic profiles according to plant species rather than crop management system, period of sampling or plant phenologic stage. PCA analysis showed a strong distinction of L. and L. 30 days after rolling. Due to the fast analysis and friendly use, the R package specmine can be recommended as a supporting tool to exploratory and discriminatory analysis of multivariate data.
Topics: Cluster Analysis; Crops, Agricultural; Discriminant Analysis; Secale; Spectrophotometry, Ultraviolet
PubMed: 34085494
DOI: 10.1515/jib-2019-0056 -
Statistical Methods in Medical Research Apr 2022Discriminant analysis procedures that assume parsimonious covariance and/or means structures have been proposed for distinguishing between two or more populations in...
Discriminant analysis procedures that assume parsimonious covariance and/or means structures have been proposed for distinguishing between two or more populations in multivariate repeated measures designs. However, these procedures rely on the assumptions of multivariate normality which is not tenable in multivariate repeated measures designs which are characterized by binary, ordinal, or mixed types of response distributions. This study investigates the accuracy of repeated measures discriminant analysis (RMDA) based on the multivariate generalized estimating equations (GEE) framework for classification in multivariate repeated measures designs with the same or different types of responses repeatedly measured over time. Monte Carlo methods were used to compare the accuracy of RMDA procedures based on GEE, and RMDA based on maximum likelihood estimators (MLE) under diverse simulation conditions, which included number of repeated measure occasions, number of responses, sample size, correlation structures, and type of response distribution. RMDA based on GEE exhibited higher average classification accuracy than RMDA based on MLE especially in multivariate non-normal distributions. Three repeatedly measured responses namely severity of epilepsy, current number of anti-epileptic drugs, and parent-reported quality of life in children with epilepsy were used to demonstrate the application of these procedures.
Topics: Child; Computer Simulation; Discriminant Analysis; Humans; Models, Statistical; Monte Carlo Method; Quality of Life; Sample Size
PubMed: 34898331
DOI: 10.1177/09622802211032705