-
Source Code For Biology and Medicine May 2012The Portable Document Format (PDF) is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these...
BACKGROUND
The Portable Document Format (PDF) is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the 'Layout-Aware PDF Text Extraction' (LA-PDFText) system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications.
RESULTS
Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1) Detecting contiguous text blocks using spatial layout processing to locate and identify blocks of contiguous text, (2) Classifying text blocks into rhetorical categories using a rule-based method and (3) Stitching classified text blocks together in the correct order resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision1 = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, 2commonly used to extract text from PDF. Finally, we discuss preliminary error analysis for our system and identify further areas of improvement.
CONCLUSIONS
LA-PDFText is an open-source tool for accurately extracting text from full-text scientific articles. The release of the system is available at http://code.google.com/p/lapdftext/.
PubMed: 22640904
DOI: 10.1186/1751-0473-7-7 -
Bioinformatics (Oxford, England) Jan 2016A complete repository of gene-gene interactions is key for understanding cellular processes, human disease and drug response. These gene-gene interactions include both...
MOTIVATION
A complete repository of gene-gene interactions is key for understanding cellular processes, human disease and drug response. These gene-gene interactions include both protein-protein interactions and transcription factor interactions. The majority of known interactions are found in the biomedical literature. Interaction databases, such as BioGRID and ChEA, annotate these gene-gene interactions; however, curation becomes difficult as the literature grows exponentially. DeepDive is a trained system for extracting information from a variety of sources, including text. In this work, we used DeepDive to extract both protein-protein and transcription factor interactions from over 100,000 full-text PLOS articles.
METHODS
We built an extractor for gene-gene interactions that identified candidate gene-gene relations within an input sentence. For each candidate relation, DeepDive computed a probability that the relation was a correct interaction. We evaluated this system against the Database of Interacting Proteins and against randomly curated extractions.
RESULTS
Our system achieved 76% precision and 49% recall in extracting direct and indirect interactions involving gene symbols co-occurring in a sentence. For randomly curated extractions, the system achieved between 62% and 83% precision based on direct or indirect interactions, as well as sentence-level and document-level precision. Overall, our system extracted 3356 unique gene pairs using 724 features from over 100,000 full-text articles.
AVAILABILITY AND IMPLEMENTATION
Application source code is publicly available at https://github.com/edoughty/deepdive_genegene_app
CONTACT
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Data Curation; Data Mining; Databases, Genetic; Epistasis, Genetic; Humans; Information Storage and Retrieval; Publications; Software
PubMed: 26338771
DOI: 10.1093/bioinformatics/btv476 -
Molecules (Basel, Switzerland) Jun 2021Subcritical water refers to high-temperature and high-pressure water. A unique and useful characteristic of subcritical water is that its polarity can be dramatically... (Review)
Review
Subcritical water refers to high-temperature and high-pressure water. A unique and useful characteristic of subcritical water is that its polarity can be dramatically decreased with increasing temperature. Therefore, subcritical water can behave similar to methanol or ethanol. This makes subcritical water a green extraction fluid used for a variety of organic species. This review focuses on the subcritical water extraction (SBWE) of natural products. The extracted materials include medicinal and seasoning herbs, vegetables, fruits, food by-products, algae, shrubs, tea leaves, grains, and seeds. A wide range of natural products such as alkaloids, carbohydrates, essential oil, flavonoids, glycosides, lignans, organic acids, polyphenolics, quinones, steroids, and terpenes have been extracted using subcritical water. Various SBWE systems and their advantages and drawbacks have also been discussed in this review. In addition, we have reviewed co-solvents including ethanol, methanol, salts, and ionic liquids used to assist SBWE. Other extraction techniques such as microwave and sonication combined with SBWE are also covered in this review. It is very clear that temperature has the most significant effect on SBWE efficiency, and thus, it can be optimized. The optimal temperature ranges from 130 to 240 °C for extracting the natural products mentioned above. This review can help readers learn more about the SBWE technology, especially for readers with an interest in the field of green extraction of natural products. The major advantage of SBWE of natural products is that water is nontoxic, and therefore, it is more suitable for the extraction of herbs, vegetables, and fruits. Another advantage is that no liquid waste disposal is required after SBWE. Compared with organic solvents, subcritical water not only has advantages in ecology, economy, and safety, but also its density, ion product, and dielectric constant can be adjusted by temperature. These tunable properties allow subcritical water to carry out class selective extractions such as extracting polar compounds at lower temperatures and less polar ingredients at higher temperatures. SBWE can mimic the traditional herbal decoction for preparing herbal medication and with higher extraction efficiency. Since SBWE employs high-temperature and high-pressure, great caution is needed for safe operation. Another challenge for application of SBWE is potential organic degradation under high temperature conditions. We highly recommend conducting analyte stability checks when carrying out SBWE. For analytes with poor SBWE efficiency, a small number of organic modifiers such as ethanol, surfactants, or ionic liquids may be added.
Topics: Biological Products; Hot Temperature; Plant Extracts; Solvents; Sonication; Water
PubMed: 34209151
DOI: 10.3390/molecules26134004 -
International Journal of Molecular... Sep 2022Plants produce a variety of high-value chemicals (e.g., secondary metabolites) which have a plethora of biological activities, which may be utilised in many facets of... (Review)
Review
Plants produce a variety of high-value chemicals (e.g., secondary metabolites) which have a plethora of biological activities, which may be utilised in many facets of industry (e.g., agrisciences, cosmetics, drugs, neutraceuticals, household products, etc.). Exposure to various different environments, as well as their treatment (e.g., exposure to chemicals), can influence the chemical makeup of these plants and, in turn, which chemicals will be prevalent within them. Essential oils (EOs) usually have complex compositions (>300 organic compounds, e.g., alkaloids, flavonoids, phenolic acids, saponins and terpenes) and are obtained from botanically defined plant raw materials by dry/steam distillation or a suitable mechanical process (without heating). In certain cases, an antioxidant may be added to the EO (EOs are produced by more than 17,500 species of plants, but only ca. 250 EOs are commercially available). The interesting bioactivity of the chemicals produced by plants renders them high in value, motivating investment in their production, extraction and analysis. Traditional methods for effectively extracting plant-derived biomolecules include cold pressing and hydro/steam distillation; newer methods include solvent/Soxhlet extractions and sustainable processes that reduce waste, decrease processing times and deliver competitive yields, examples of which include microwave-assisted extraction (MAE), ultrasound-assisted extraction (UAE), subcritical water extraction (SWE) and supercritical CO2 extraction (scCO2). Once extracted, analytical techniques such as chromatography and mass spectrometry may be used to analyse the contents of the high-value extracts within a given feedstock. The bioactive components, which can be used in a variety of formulations and products (e.g., displaying anti-aging, antibacterial, anticancer, anti-depressive, antifungal, anti-inflammatory, antioxidant, antiparasitic, antiviral and anti-stress properties), are biorenewable high-value chemicals.
Topics: Anti-Bacterial Agents; Antifungal Agents; Antioxidants; Antiparasitic Agents; Antiviral Agents; Carbon Dioxide; Flavonoids; Oils, Volatile; Plant Extracts; Plants; Saponins; Solvents; Steam; Terpenes
PubMed: 36142238
DOI: 10.3390/ijms231810334 -
Molecules (Basel, Switzerland) Jun 2022The extraction of bioactive compounds from fruits, such as lemon, has gained relevance because these compounds have beneficial properties for health, such as antioxidant...
The extraction of bioactive compounds from fruits, such as lemon, has gained relevance because these compounds have beneficial properties for health, such as antioxidant and anticancer properties; however, the extraction method can significantly affect these properties. High hydrostatic pressure and ultrasound, as emerging extraction methods, constitute an alternative to conventional extraction, improving extractability and obtaining extracts rich in bioactive compounds. Therefore, lemon extracts (LEs) were obtained by conventional (orbital shaking), ultrasound-assisted, and high-hydrostatic-pressure extraction. Extracts were then microencapsulated with maltodextrin at 10% (M10), 20% (M20), and 30% (M30). The impact of microencapsulation on LEs physicochemical properties, phenolics (TPC), flavonoids (TFC) and relative bio-accessibility (RB) was evaluated. M30 promoted a higher microencapsulation efficiency for TPC and TFC, and a longer time required for microcapsules to dissolve in water, as moisture content, water activity and hygroscopicity decreased. The RBs of TPC and TFC were higher in microcapsules with M30, and lower when conventional extraction was used. The data suggest that microencapsulated LE is promising as it protects the bioactivity of phenolic compounds. In addition, this freeze-dried product can be utilized as a functional ingredient for food or supplement formulations.
Topics: Antioxidants; Capsules; Phenols; Plant Extracts; Water
PubMed: 35807411
DOI: 10.3390/molecules27134166 -
Foods (Basel, Switzerland) Jul 2018Some functional foods contain biologically active compounds (BAC) that can be derived from various biological sources (fruits, vegetables, medicinal plants, wastes, and... (Review)
Review
Some functional foods contain biologically active compounds (BAC) that can be derived from various biological sources (fruits, vegetables, medicinal plants, wastes, and by-products). Global food markets demand foods from plant materials that are “safe”, “fresh”, “natural”, and with “nutritional value” while processed in sustainable ways. Functional foods commonly incorporate some plant extract(s) rich with BACs produced by conventional extraction. This approach implies negative thermal influences on extraction yield and quality with a large expenditure of organic solvents and energy. On the other hand, sustainable extractions, such as microwave-assisted extraction (MAE), ultrasound-assisted extraction (UAE), high-pressure assisted extraction (HPAE), high voltage electric discharges assisted extraction (HVED), pulsed electric fields assisted extraction (PEF), supercritical fluids extraction (SFE), and others are aligned with the “green” concepts and able to provide raw materials on industrial scale with optimal expenditure of energy and chemicals. This review provides an overview of relevant innovative food processing and extraction technologies applied to various plant matrices as raw materials for functional foods production.
PubMed: 29976906
DOI: 10.3390/foods7070106 -
Molecules (Basel, Switzerland) Aug 2022The essential oil extracted from leaves is a mixture of volatile compounds, mainly terpenes, and is widely used in medicine, perfume and chemical industries. In this...
The essential oil extracted from leaves is a mixture of volatile compounds, mainly terpenes, and is widely used in medicine, perfume and chemical industries. In this study, the extraction processes of essential oil from leaves by steam distillation and supercritical CO extraction were summarized and compared, and the camphor tree essential oil was detected by GC/MS. The extraction rate of essential oil extracted by steam distillation is less than 0.5%, while that of supercritical CO extraction is 4.63% at 25 MPa, 45 °C and 2.5 h. GC/MS identified 21 and 42 compounds, respectively. The content of alcohols in the essential oil is more than 35%, and that of terpenoids is more than 80%. The steam extraction method can extract volatile substances with a low boiling point and more esters and epoxides; The supercritical method is suitable for extracting weak polar substances with a high alcohol content. Supercritical CO extraction can selectively extract essential oil components and effectively prevent oxidation and the escape of heat sensitive substances.
Topics: Carbon Dioxide; Chromatography, Supercritical Fluid; Cinnamomum camphora; Distillation; Oils, Volatile; Plant Extracts; Steam; Terpenes
PubMed: 36080152
DOI: 10.3390/molecules27175385 -
Molecules (Basel, Switzerland) Jun 2023Luteolin from exhibits strong antiviral activity. Here, the conditions for extracting and enriching luteolin from were optimized. Response surface methodology was used...
Luteolin from exhibits strong antiviral activity. Here, the conditions for extracting and enriching luteolin from were optimized. Response surface methodology was used to determine the optimal extraction parameters in terms of reflux time, solvent ratio, extraction temperature, material-to-liquid ratio, and number of extractions. Thereafter, a macroporous resin method was used to enrich luteolin from . Finally, the following optimal extraction and enrichment conditions were established: an extraction time of 43.00 min, a methanol/hydrochloric acid solvent ratio of 13:1, an extraction temperature of 77.60 °C, a material/liquid ratio of 1:22, and a total of two extractions. NKA-9 was determined to be the most appropriate resin for enrichment. The ideal adsorption conditions were as follows: a pH of 5.0, a temperature of 25 °C, an initial luteolin concentration of 19.58 µg/mL, a sample loading volume of 2.9 BV, and a sample loading rate of 2 BV/h. The ideal desorption conditions were as follows: distilled water, 30% ethanol and 80% ethanol elution, and 5 BV at a flow rate of 2 BV/h. After optimization, the enrichment recovery rate was 80.06% and the luteolin content increased 3.8-fold. Additionally, the enriched product exhibited a significant inhibitory effect on PRV (Porcine pseudorabies virus) in vitro and in vivo, providing data for developing and applying luteolin from .
Topics: Animals; Swine; Patrinia; Luteolin; Plant Extracts; Ethanol; Solvents
PubMed: 37446667
DOI: 10.3390/molecules28135005 -
RSC Advances Jul 2023Lignin constitutes an impressive resource of high-value low molecular weight compounds. However, robust methods for isolation of the extractable fraction from...
Lignin constitutes an impressive resource of high-value low molecular weight compounds. However, robust methods for isolation of the extractable fraction from lignocellulose are yet to be established. In this study, supercritical fluid extraction (SFE) and CO-expanded liquid extraction (CXLE) were employed to extract lignin from softwood and hardwood chips. Ethanol, acetone, and ethyl lactate were investigated as green organic co-solvents in the extractions. Additionally, the effects of temperature, CO percentage and the water content of the co-solvent were investigated using a design of experiment approach employing full factorial designs. Ethyl lactate and acetone provided the highest gravimetric yields. The water content in the extraction mixture had the main impact on the amount of extractable lignin monomers (LMs) and lignin oligomers (LOs) while the type of organic solvent was of minor importance. The most effective extraction was achieved by using a combination of liquid CO/acetone/water (10/72/18, v/v/v) at 60 °C, 350 bar, 30 min and 2 mL min flow rate. The optimized method provided detection of 13 LMs and 6 lignin dimers (LDs) from the hardwood chips. The results demonstrate the potential of supercritical fluids and green solvents in the field of mild and bening lignin extraction from wood.
PubMed: 37483673
DOI: 10.1039/d3ra01873c -
Molecules (Basel, Switzerland) Mar 2022This study aimed to compare the influence of extraction methods on the pharmaceutical and cosmetic properties of medicinal and aromatic plants (MAPs). For this purpose,...
This study aimed to compare the influence of extraction methods on the pharmaceutical and cosmetic properties of medicinal and aromatic plants (MAPs). For this purpose, the dried plant materials were extracted using advanced (microwave (MAE), ultrasonic (UAE), and homogenizer (HAE) assisted extractions) and conventional techniques (maceration, percolation, decoction, infusion, and Soxhlet). The tyrosinase, elastase, α-amylase, butyryl, and acetylcholinesterase inhibition were tested by using L-3,4 dihydroxy-phenylalanine, N-Succinyl-Ala-Ala-p-nitroanilide, butyryl, and acetylcholine as respective substrates. Antioxidant activities were studied by ABTS, DPPH, and FRAP. In terms of extraction yield, advanced extraction techniques showed the highest values (MAE > UAE > HAE). Chemical profiles were dependent on the phenolic compounds tested, whereas the antioxidant activities were always higher, mainly in infusion and decoction as a conventional technique. In relation to the pharmaceutical and cosmetic properties, the highest inhibitory activities against α-amylase and acetylcholinesterase were observed for Soxhlet and macerated extracts, whereas the highest activity against tyrosinase was obtained with MAE > maceration > Soxhlet. Elastase and butyrylcholinesterase inhibitory activities were in the order of Soxhlet > maceration > percolation, with no activities recorded for the other tested methods. In conclusion, advanced methods afford an extract with high yield, while conventional methods might be an adequate approach for minimal changes in the biological properties of the extract.
Topics: Acetylcholinesterase; Antioxidants; Butyrylcholinesterase; Monophenol Monooxygenase; Pancreatic Elastase; Plant Extracts; Plants, Medicinal; alpha-Amylases
PubMed: 35408473
DOI: 10.3390/molecules27072074