Highlights
- •Easy to use, scalable NLP framework that can leverage Spark.
- •Introduction of BERT based Relation Extraction models.
- •State-of-the-art performance on Named Entity Recognition and Relation Extraction.
- •Reported SOTA performance of multiple public benchmark datasets.
- •Application of these models on real-world use-cases.
Abstract
Keywords
Current code version | v1.0 |
Permanent link to code/repository used for this code version | https://github.com/SoftwareImpacts/SIMPAC-2022-44 |
Permanent link to Reproducible Capsule | https://codeocean.com/capsule/1636734/tree/v1 |
Legal Code License | Apache License, 2.0 |
Code versioning system used | git |
Software code languages, tools, and services used | python, scala, java |
Compilation requirements, operating environments & dependencies | Windows or Linux, JVM, Spark |
If available Link to developer documentation/manual | https://nlp.johnsnowlabs.com/docs/en/licensed_install |
Support email for questions | [email protected] |
1. Introduction
2. Approach



3. Applications
3.1 Prescription parsing
3.2 Adverse drug event detection
3.3 Putting facts on a timeline
3.4 Generating knowledge graph

3.5 Improved entity mapping


4. Impact overview
Declaration of Competing Interest
Acknowledgments
References
- Relation extraction from biomedical and clinical text: Unified multitask learning framework.CoRR. 2020; (abs/2009.09509, https://arxiv.org/abs/2009.09509)
- BERT: Pre-training of deep bidirectional transformers for language understanding.CoRR. 2018; (abs/1810.04805)
- Two are better than one: Joint entity and relation extraction with table-sequence encoders.CoRR. 2020; (abs/2010.03851, https://arxiv.org/abs/2010.03851)
- BioBERT: A pre-trained biomedical language representation model for biomedical text mining.CoRR. 2019; (abs/1901.08746)
- Deeper clinical document understanding using relation extraction.CoRR. 2021; (abs/2112.13259)
- Spark NLP: Natural language understanding at scale.Softw. Impacts. 2021; 8 (https://www.sciencedirect.com/science/article/pii/S2665963821000063)100058
- Improving clinical document understanding on COVID-19 research with spark NLP.CoRR. 2020; (abs/2012.04005, https://arxiv.org/abs/2012.04005)
- Mining adverse drug reactions from unstructured mediums at scale.CoRR. 2022; (abs/2201.01405, https://arxiv.org/abs/2201.01405)
K. Nugroho, A. Sukmadewa, N. Yudistira, Large-scale news classification using bert language model: Spark nlp approach, in: 6th International Conference on Sustainable Information Engineering and Technology 2021, 2021, pp. 240–246.
- Understanding COVID-19 news coverage using medical NLP.2022 (arXiv, https://arxiv.org/abs/2203.10338)
- Preparing for the next pandemic: Transfer learning from existing diseases via hierarchical multi-modal BERT models to predict COVID-19 outcomes.2021
- Spark NLP: A versatile solution for structuring data from endoscopy reports.Appl. Med. Inform. 2021; 43: 26
- MSCAT: A machine learning assisted catalog of metabolomics software tools.Metabolites. 2021; 11: 678
- MNLP at MEDIQA 2021: Fine-tuning PEGASUS for consumer health question summarization.in: Proceedings of the 20th Workshop on Biomedical Language Processing. 2021: 320-327 (https://aclanthology.org/2021.bionlp-1.37)
Article info
Publication history
Footnotes
The code (and data) in this article has been certified as Reproducible by Code Ocean: (https://codeocean.com/). More information on the Reproducibility Badge Initiative is available at https://www.elsevier.com/physical-sciences-and-engineering/computer-science/journals.
Identification
Copyright
User license
Creative Commons Attribution (CC BY 4.0) |
Permitted
- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes
Elsevier's open access license policy