About
Our team is dedicated to developing comprehensive data curation and evaluation solutions to enhance our model across various quality dimensions. These include visual quality, prompt adherence, identity preservation, naturalness, and visual text generation, among others. We employ diverse approaches, such as sourcing billions of images and identifying suitable ones through a combination of manual annotations and signals from machine learning models. We also utilize both manual and automated evaluation methods to pinpoint quality gaps and data requirements.
Job Responsibilities:
Main Responsibilities Data Curation: Manage data labeling workflows, including data enqueueing for labeling, UI for labeling, and extracting labels into datasets for the modeling team. Data Engineering (Pipelines): Maintain large-scale, efficient, and reliable data processing pipelines (billions of images). This includes data sourcing, running machine learning models to understand content, and using LLMs to clean data. Data Engineering (Governance): Maintain a portfolio of datasets, ensuring governance of access, retention, and privacy compliance. Additional Responsibilities Annotations: Spend time manually annotating training data based on modeling team requirements. Use of LLMs and other models to annotate training data or to evaluate generated content. Then apply auditing to understand these model performance. Analysis: Collaborate with engineers to identify and summarize model gaps based on evaluations. Utilize these findings to identify necessary data, and then mine and prepare that data for subsequent model training iterations. Auditing: Scale validated evaluation protocols with product development teams, including coordination and auditing. Also, audit and correct human-labeled data.
Skills: Verbal and written communication skills, problem solving skills, and interpersonal skills. Attention to details and an aptitude to experimental investigations Basic ability to work independently and manage one's time. Basic knowledge of Python, and SQL. Basic knowledge of computer vision and generative models. Basic knowledge with data ETL workflows & pipelines. [New] Usage of LLM for data labeling related work.
Education/Experience: Associate's degree or equivalent training required in Computer Science, Electronic Engineering, Physics, Bioinformatics, or other STEM subjects. Prior industrial experience in software development and testing and / or research experience in human computer interaction are preferred. Previous experience at a major technology company is preferred.
Additional requirements: Be onsite, working with engineers
Pursuant to the California Fair Chance Act, Los Angeles County Fair Chance Ordinance for Employers, Los Angeles Fair Chance Initiative for Hiring Ordinance, and San Francisco Fair Chance Ordinance, qualified applicants will be considered for assignment with arrest and conviction records. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness, meet client expectations, standards, and accompanying requirements, and safeguard business operations and company reputation.
Languages
- English
Notice for Users
This job comes from a TieTalent partner platform. Click "Apply Now" to submit your application directly on their site.