Cette offre d'emploi n'est plus disponible
À propos
You will be working directly with Brainscape's Knowledge Manager to iterate on LLM prompts, analyze real user data, and ensure our AI output meets a high quality bar - both at launch and as models evolve. The immediate priority is migrating and testing our existing bulk flashcard creation prompts in an updated AI environment with newer GPT models. These prompts power three user-facing features: importing pasted or uploaded content into flashcards, summarizing documents into flashcards, and generating flashcards from a user-described topic. From there, the role expands into ongoing QA, regression testing, and prompt optimization across all of Brainscape's AI features.
This is a part-time contract role (~5-10 hours/week, remote) through the end of 2026, with potential to extend or convert to a permanent position. Hourly rate is $40-$100 (based on experience and location).
Responsibilities
- Migrate and test existing bulk flashcard creation prompts in an updated AI environment with newer GPT models - and plan future migrations as OpenAI retires older models
- Run test suites and manually review AI outputs for quality and correctness (fine-tune prompts)
- Analyze real user data to identify failure patterns and inform prompt improvements
- Streamline testing and evaluation workflows to make QA faster and more repeatable
- Monitor production quality post-launch and detect regressions as underlying models shift
- Build and maintain model evaluation datasets from real user inputs across all AI features
- Write new test cases for edge cases, multilingual content, and messy real-world inputs
- Document prompt changes, test results, and lessons learned
- Work with the Content Team to apply flashcard authoring quality standards
Qualifications
- 1+ years hands-on prompt engineering experience with LLMs / OpenAI API (systematic testing and iteration, not just casual ChatGPT usage)
- Familiarity with Cursor IDE or similar AI-assisted development tools (our work is primarily Python - Cursor experience is more important than raw Python skill)
- Some experience with Git version control and collaborating via shared repositories (we use GitLab)
- A habit of documenting what you tried, what worked, and why - you don't need a formal QA background, but you naturally keep track of your process
- Clear written communication skills
Proactive attitude; ability to work independently and manage your own time
- BONUS: Experience building prompt evals, AI quality assurance, or using GPT to grade GPT outputs
- BONUS: Experience with regression testing for AI systems or detecting model drift
- BONUS: Background in education technology (EdTech) or content creation - especially microlearning, flashcards, or other concise Q&A formats
- BONUS: A degree in Computer Science, Information Science, or a similar field
To Apply
Please apply here and let us know why you think you'd be a good fit. If you are a top candidate, we will follow up with a short application form to help direct you to the right person for an interview.
Contract duration of more than 6 months. with 40 hours per week.
Mandatory skills: Artificial Intelligence, Prompt Engineering, Python
Compétences linguistiques
- English
Avis aux utilisateurs
Cette offre a été publiée par l’un de nos partenaires. Vous pouvez consulter l’offre originale ici.