NLP Data Scientist/Scientific Data EngineerEuropean Molecular Biology Laboratory • Saffron Walden, England, United Kingdom
NLP Data Scientist/Scientific Data Engineer
European Molecular Biology Laboratory
- Saffron Walden, England, United Kingdom
- Saffron Walden, England, United Kingdom
About
Responsibilities
Develop machine learning pipelines for extracting drug side effects from drug labels, clinical trials, publications and other documents.
Investigate modern NLP methodologies and propose ideas for the implementation of data extraction methods and pipelines.
Apply language models to extract and map drug‑related information from unstructured text, e.g. from scientific literature, ClinicalTrials.gov.
Implement and/or fine‑tune different NLP models, e.g. NER models, transformer models, LLMs.
Integrate project workflows with existing infrastructures in the EBI Chemical Biology Services and Open Targets teams.
Prepare and evaluate benchmark datasets from the open domain as training sets for NLP models.
Work with domain experts to develop new gold standards for NLP tasks where needed.
Assist with and/or perform data curation to prepare clean and reliable training sets.
Apply and/or adapt existing methods for mapping extracted entities to biomedical ontologies, e.g. drugs, side effects/phenotypes and diseases.
Work closely with Safety 2.0 project group members bridging the ChEMBL and Open Targets teams.
Work closely with the Open Targets Core team to ensure seamless integration of data and workflows into the Open Targets Platform and long‑term sustainability.
Collaborate with the Open Targets Partners to assess, prioritise, validate and refine the developed methods.
Disseminate the outcomes of the project to the scientific community and stakeholders through presentations and publications.
Qualifications We are looking for two enthusiastic and talented NLP data scientists, cheminformaticians or bioinformaticians with experience in NLP and knowledge extraction to join the Open Targets Safety 2.0 project for a period of 3 years. You should enjoy delving into ways of addressing challenges in knowledge extraction and data standardisation, and want to contribute to open‑source code and resources.
PhD, Masters or equivalent experience in computational linguistics, computer science, bioinformatics, or cheminformatics.
Experience with language models e.g. transformer models, LLMs, AI agents for information extraction.
Experience with document and text preprocessing, cleaning and transformation techniques including mapping to ontologies.
Experience with data structures, data models and databases.
Knowledge of cheminformatics resources and/or bioinformatics databases.
Knowledge of data analysis and machine learning.
Proficiency in Python.
Knowledge of data frameworks e.g. pySpark, pandas, Polar.
Excellent attention to detail.
Strong communication skills, both presentations and verbal.
Experience working in a team‑oriented environment and working collaboratively.
Able to work independently, to manage your time and work to deadlines.
Additional Experience
Experience with the application of NLP methods to cheminformatics and/or biomedical domains.
Experience with version control.
Experience in Safety/toxicology in industry or research.
Benefits
Financial incentives: Monthly family, child and non‑resident allowances, annual salary review, pension scheme, death benefit, long‑term care, accident‑at‑work and unemployment insurances.
Flexible working arrangements – including hybrid working patterns.
Private medical insurance for you and your immediate family (including all prescriptions and generous dental & optical cover).
Generous time off: 30 days annual leave per year, plus public holidays.
Relocation package including installation grant (if required).
Campus life: Free shuttle bus to and from work, on‑site library, subsidised on‑site gym and cafeteria, casual dress code, extensive sports and social club activities (on campus and remotely).
Family benefits: On‑site nursery, 10 days of child sick leave, generous parental leave, holiday clubs on campus, and monthly family and child allowances.
Benefits for non‑UK residents: Visa exemption, education grant for private schooling, financial support to travel back to your home country every second year and a monthly non‑resident allowance.
Application Details Hybrid Working: At EMBL‑EBI, we embrace a hybrid approach to work that supports both flexibility and community. Team members are usually on site at least three days a week, and a desk will always be available. We enjoy the energy of working together and encourage regular campus presence.
Interviews: We plan to hold introductory meetings with selected candidates remotely starting in February 2026.
Contract length: 3 years (project‑based). Salary: Grade 5 to Grade 6, depending on experience, qualifications. Monthly salary starting at £3,229 to £3,612 after tax but excl. pension & insurances plus other paid benefits based on personal circumstances.
Why join us? Do something meaningful. At EMBL‑EBI you can apply your talent and passion to accelerate science and tackle some of humankind’s greatest challenges. EMBL‑EBI, part of the European Molecular Biology Laboratory, is a worldwide leader in the storage, analysis and dissemination of large biological datasets. We provide the global research community with access to publicly available databases and tools which are crucial for the advancement of healthcare, food security and biodiversity. Join a culture of innovation – we are located on the Wellcome Genome Campus, alongside other prominent research and biotech organisations, and surrounded by beautiful Cambridgeshire countryside. This is a highly collaborative and inclusive community where our employees enjoy a relaxed atmosphere. We are committed to ensuring our employees feel valued, supported and empowered to reach their professional potential.
Diversity and inclusion: At EMBL, we strongly believe that inclusive and diverse teams benefit from higher levels of innovation and creative thought. We encourage applications from women, LGBTQ+ & individuals from all nationalities.
#J-18808-Ljbffr
Languages
- English
Notice for Users
This job comes from a TieTalent partner platform. Click "Apply Now" to submit your application directly on their site.