About
Key Responsibilities
Build AI-ready data pipelines:
Design, construct, and optimize scalable Extract, Transform, Load (ETL) and Extract, Load, Transform (ELT) pipelines specifically for AI and ML models.
Architect data solutions:
Develop and manage data architectures, including data lakes, data warehouses, and vector databases, to support various AI workloads.
Ensure data quality and governance:
Implement data validation, security, and governance policies to ensure the integrity, accessibility, and compliance of data used in AI models.
Support AI model lifecycle:
Collaborate with data scientists and ML engineers to prepare, integrate, and manage large-scale datasets for model training and deployment.
Manage real-time data:
Develop streaming data pipelines using technologies like Apache Kafka to support real-time AI applications and analytics.
Optimize cloud infrastructure:
Utilize AWS cloud computing platforms to build, deploy, and scale AI data solutions efficiently.
Deploy AI models:
Automate the training and deployment of AI/ML models into production via APIs and microservices.
Monitor and troubleshoot:
Implement data observability tools to monitor pipeline health, identify data drift, and quickly resolve any data quality issues that may impact model performance.
AI-assisted development:
Use AI assistants like Copilot in Microsoft Fabric notebooks to generate, explain, and fix code, accelerate data analysis, and streamline data transformation tasks.
Required Qualifications
Education:
A Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related technical field is typically required.
Experience:
Proven experience in a data engineering or similar role, with specific experience supporting AI and ML projects.
Programming:
Fluency in programming languages such as Python and SQL, and familiarity with others like Java or Scala.
Frameworks:
Hands‑on experience with ML frameworks like TensorFlow, PyTorch, and Scikit-learn, as well as LLM-specific tools like LangChain or LlamaIndex.
Big data:
Experience with distributed data processing frameworks such as Apache Spark and Hadoop.
Cloud platforms:
Proficiency with at least one major cloud provider (AWS, Azure, or GCP) and its AI data‑related services.
Databases:
Expertise in both relational (SQL) and NoSQL databases, including vector databases for GenAI applications.
DevOps and MLOps:
Experience with CI/CD, Docker, and ML lifecycle management tools like MLflow is highly valued.
#J-18808-Ljbffr
Languages
- English
Notice for Users
This job comes from a TieTalent partner platform. Click "Apply Now" to submit your application directly on their site.