Álvaro García-Barragán
Tagline:Computer Engineer working in the Biomedical Technology Center of the Technical University of Madrid #NLP #DeepLearning #Transformers
Madrid, España
About
💻Machine learning model developer with medical data.🏥 🕰️Now researching in the field of NLP with medical data. In particular, finetuning Named Entity Recognition Models and new LLMs. ⚙️ ⛏️ Used to working in community in the Git environment as a software developer and maintainer. 😺 Github-user : https://github.com/Alvaro8gb. ⌨️ Used to program efficiently, scalability and with software design patterns 🔷. Programing languages I know in preferred order : Python🐍, Java☕️, C 🐧.
Education
MSc
from: 2023, until: 2024Field of study:Data scienceSchool:Universidad Politécnica de Madrid
DescriptionCourse’s:
- Machine Learning
- Deep Learning
- Data Processes
- Data Visualisation
- Big Data
- Statistical Data Analysis
- Open Data and Knowledge Graphs
- Cloud Computing and Big Data
- Information, Retrieval, Extraction and Integration
- Programming for Data Processing
- Image Processing, Analysis and Clasification
- Ethical, Legal and Societal Aspects in Data Science
- Research Methodology
BSc
from: 2019, until: 2023Field of study:Computer EngineeringSchool:Universidad Complutense de Madrid
Description9.5/10 in Final Degree Project, titled “Structuring Electronic Health Records of breast cancer with Natural Language Processing”
English
from: 2019, until: 2021Field of study:English Language and Literature, GeneralSchool:Escuela Oficial de Idiomas
Work Experiences
Data Science Researcher
from: 2024, until: presentOrganization:Medical Data Analytics Laboratory (MEDAL)Location:Madrid, Community of Madrid, Spain · Hybrid
Description:Associated with the project: ELADAIS (Extracción, Almacenamiento y Análisis de Datos con Alto Impacto Social)
Professor
from: 2024, until: presentOrganization:FUNDACIÓN ORTEGA-MARAÑÓNLocation:Madrid, Community of Madrid, Spain · On-site
Professor
from: 2024, until: presentOrganization:IAU Institute for American UniversitiesLocation:Madrid, Community of Madrid, Spain · On-site
Becario de investigación
from: 2021, until: 2024Organization:Universidad Politécnica de MadridLocation:Community of Madrid, Spain
Description:- Associated to CLARIFY European Project (https://www.clarify2020.eu/)
- Developer of Natural Language Processing models with neural networks 🧠 like Google BERT. Train, validate, and develop Name Entity Recognition Models.
- Use LLM based on transformers, pretrained and finetuning.
- Use emebddings search.
- Deploy web service in a linux server ( flask, brat, prodigy).
- Work in group (English or Spanish).
- Explain code to others.
- Learn breast and lung cancer terminology.
- Write papers.
- Program bash scripts.
- Program Python (pandas, csv, json, spacy, sklearn, tensorflow, keras, stadistics).
- Program Java ( Apache Lucen).
- Make queries in SQL Database and MongoDB.
- Execute programs in Magerit-Cesvima supercomputer.
- Learning how to be a researcher.
- Program multithreaded scripts.
Research stay
from: 2023, until: 2023Organization:TIB – Leibniz-Informationszentrum Technik und NaturwissenschaftenLocation:Hannover, Lower Saxony, Germany · On-site
Description:Integrating Electronic Health Records (EHR) for spanish patients with breast cancer:
- Ontology/Vocabulary: UMLS (NCI, SNOMED CT)
- Neuro-symbolic system (LLM + KG) for Entity Alignment
Teacher
from: 2019, until: 2023Organization:ParticularLocation:Madrid, Community of Madrid, Spain
Description:Clases particulares de física/quimica de primero ESO, química/matemáticas bachillerato y clases de programación en Java y en C de grados universitarios técnicos.
Publications
Transformers for extracting breast cancer information from Spanish clinical narratives
Publisher:Artificial Intelligence in Medicine, ElsevierDate:2023Authors:Description:The wide adoption of electronic health records (EHRs) offers immense potential as a source of support for clinical research. However, previous studies focused on extracting only a limited set of medical concepts to support information extraction in the cancer domain for the Spanish language. Building on the success of deep learning for processing natural language texts, this paper proposes a transformer-based approach to extract named entities from breast cancer clinical notes written in Spanish and compares several language models. To facilitate this approach, a schema for annotating clinical notes with breast cancer concepts is presented, and a corpus for breast cancer is developed. Results indicate that both BERT-based and RoBERTa-based language models demonstrate competitive performance in clinical Named Entity Recognition (NER). Specifically, BETO and multilingual BERT achieve F-scores of 93.71% and 94.63%, respectively. Additionally, RoBERTa Biomedical attains an F-score of 95.01%, while RoBERTa BNE achieves an F-score of 94.54%. The findings suggest that transformers can feasibly extract information in the clinical domain in the Spanish language, with the use of models trained on biomedical texts contributing to enhanced results. The proposed approach takes advantage of transfer learning techniques by fine-tuning language models to automatically represent text features and avoiding the time-consuming feature engineering process.
Structuring Breast Cancer Spanish Electronic Health Records Using Deep Learning
Publisher:2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS)Date:2023Authors:Description:Using Natural Language Processing (NLP) in the clinical domain has increased the possibility of automatically extracting information from oncology clinical narratives. Specifically, deep learning methods have been used to extract information in the cancer domain. However, most of the above proposals have concentrated only on extracting named entities from clinical narratives, but those proposals do not include a methodology for structuring the information after an information extraction step. In this paper, we propose an automatic pipeline based on deep learning for structuring breast cancer information from clinical narratives written in Spanish. The pipeline inputs a set of clinical documents written in narrative form and automatically generates a structured JSON file that contains the information for each patient. This pipeline integrates both clinical entity extraction and negation and uncertainty detection. Obtained results have shown that deep learning methods are feasible for structuring information in the breast cancer domain.
Skills
- C++
- Agile Methodologies
- Software Development Methodologies
- Operating Systems
- Prompt Engineering
- R (Programming Language)
- Machine Learning
- Data processs
- Statistics
- Model Development
- Java
- UMLS
- PyTorch
- Technical Presentations
- Tutorials
- Presentations
- XGBoost
- Flask
- Docker
- Postman
- Postman API
- Elasticsearch
- Web Services API
- HTTP
- CURL
- API de Postman
- Deep Learning
- Linux
- spaCy
- Prodigy
- Git
- Bash
- Linux Server
- Python (Programming Language)
- TensorFlow
- Pandas (Software)
- SQL
- Keras
- JSON
- Natural Language Processing (NLP)
- Deep Neural Networks (DNN)
- English
- Data Analysis
- Data Mining
- Big Data
- Data Modeling
- Data Science
- Data Visualization
- Resolución de problemas
- Inglés