À propos
Manus is a company focused on accelerating the transition to BioAlternatives through innovative solutions. The Data Engineer will be responsible for building the data backbone that connects various production-related systems and transforming raw data into reliable datasets for analytics and AI/ML applications.
Responsibilities : • Map workflows, systems, data owners, and data flows across all Production-related activities, including Operations, Quality, Supply Chain, Finance, Maintenance, Labs, etc. • Document data types, formats, quality, retention, and access controls • Help classify data sources for the ingestion pipeline (real-time, batch, API, file-based) • Build connectors for on-prem systems • Develop ingestion jobs using Python or ETL tools • Implement Kafka producers to stream data to the cloud warehouse • Work with the India team to ensure schema consistency and metadata requirements • Normalize datasets from multiple systems into standard schemas • Handle missing values, outliers, timestamp alignment, and unit harmonization • Apply mapping tables, reference data, and business rules • Prepare data for Silver (Data Vault) and Gold (Star Schema) layers • Work with senior warehouse engineers in India to implement: Hubs, Links, Satellites (Data Vault 2.0), Dimension and Fact tables, Data Quality checks (freshness, completeness, uniqueness) • Maintain detailed documentation for ingestion pipelines • Work closely with Manus operations, QA, engineering, and IT • Provide weekly updates to the Program Lead
Qualifications : Required : • Master’s degree in data science, Computer Science, Information Systems, or related field • 1-2 Year of Industry experience • Strong Python skills • Working knowledge of SQL (joins, window functions, CTEs) • Experience using Pandas, PySpark, or similar tools for transformation • Understanding of Apache Kafka: Producers/consumers, Topics, partitions, offset management • Experience with batch ingestion using: REST APIs, ODBC/JDBC, CSV/JSON pipelines, Scheduled jobs • Strong communication — required for the data survey • Curiosity and willingness to work across manufacturing + biotech systems • Ability to document findings clearly and consistently • Collaborative mindset — must coordinate with geographically spread-out teams and willing to work in multiple time zones.
Preferred : • Kafka connectors (optional but preferred) • Familiarity with Azure Data Factory, Airflow, or any orchestration tool is a plus • Understanding of: Bronze/Silver/Gold patterns (Medallion Architecture), Data Lake concepts, Data cleaning techniques • Slowly Changing Dimensions (SCD) (optional) • Basic understanding of: Azure Storage, Event Hubs, Synapse or Databricks • Git, CI/CD familiarity is a plus
Company :
Manus is the world’s leading bioalternatives scale-up platform. Founded in 2011, the company is headquartered in Cambridge, USA, with a team of 51-200 employees. The company is currently Growth Stage.
Compétences linguistiques
- English
Avis aux utilisateurs
Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.