Über
– migration of an on‑premises SQL data warehouse to a target‑state
Data Lake on Google Cloud (GCP) , enabling metrics & reporting, advanced analytics, and
GenAI
use cases (natural language querying, accelerated summarization, cross‑domain trend analysis) leveraging
PySpark‑based processing, cloud‑native DevOps CI/CD pipelines, and containerized deployments on OpenShift (OCP)
to deliver scalable, secure, and high‑performance data solutions.
About Program/Project The IAM Data Modernization project involves migrating an on-premises SQL data warehouse to a target state Data Lake in GCP cloud environment. Key highlights include: ·
Integration Scope:
30+ source system data ingestions and multiple downstream integrations ·
Capabilities:
Metrics, reporting, and Gen AI use cases with natural language querying, advanced pattern/trend analysis, faster summarizations, and cross-domain metric monitoring ·
Benefits: · Scalability and access to advanced cloud functionality · Highly available and performant semantic layer with historical data support · Unified data strategy for executive reporting, analytics, and Gen AI across cyber domains This modernization establishes a single source of truth for enterprise-wide data-driven decision-making. Required Skills
DevOps / CI‑CD · Experience
implementing CI/CD pipelines
for data and analytics workloads · Familiarity with
Git‑based source control, build automation , and deployment strategies Containers & Platform · Experience with
OpenShift Container Platform (OCP)
for deploying data workloads and services · Understanding of containerized architecture, scaling, and environment management · Proven ability to build
CI/CD pipelines
for data and infrastructure workloads · Experience managing
secrets
securely using GCP Secret Manager · Ownership of
observability, SLOs, dashboards, alerts, and runbooks · Proficiency in
logging, monitoring, and alerting
for data pipelines and platform reliability Big Data & Processing · Hands‑on experience with PySpark for ETL/ELT, data transformation, and performance optimization · Solid understanding of distributed data processing concepts Data & Cloud Architecture · Strong experience designing data platforms on Google Cloud Platform (GCP) · Experience with Data Lakes, data warehousing, and large‑scale migration programs
Data Lake Architecture & Storage · Proven experience designing and implementing
data lake architectures
(e.g., Bronze/Silver/Gold or layered models). · Strong knowledge of
Cloud Storage (GCS)
design, including bucket layout, naming conventions, lifecycle policies, and access controls · Experience with
Hadoop/HDFS
architecture, distributed file systems, and data locality principles · Hands-on experience with
columnar data formats
(Parquet, Avro, ORC) and compression techniques · Expertise in
partitioning strategies , backfills, and large-scale data organization · Ability to design
data models
optimized for analytics and BI consumption Data Ingestion & Orchestration · Experience building
batch and streaming ingestion pipelines
using GCP-native services · Knowledge of
Pub/Sub-based streaming architectures , event schema design, and versioning · Strong understanding of
incremental ingestion and CDC patterns , including idempotency and deduplication · Hands-on experience with
workflow orchestration
tools (Cloud Composer / Airflow) · Ability to design robust
error handling, replay, and backfill mechanisms Data Processing & Transformation · Experience developing scalable
batch and streaming pipelines
using Dataflow (Apache Beam) and/or Spark (Dataproc) · Strong proficiency in
BigQuery SQL , including query optimization, partitioning, clustering, and cost control. · Hands-on experience with Hadoop
MapReduce
and ecosystem tools (Hive, Pig, Sqoop) · Advanced
Python programming skills
for data engineering, including testing and maintainable code design · Experience managing
schema evolution
while minimizing downstream impact Analytics & Data Serving · Expertise in
BigQuery performance optimization
and data serving patterns · Experience building
semantic layers and governed metrics
for consistent analytics · Familiarity with
BI integration , access controls, and dashboard standards · Understanding of data exposure patterns via
views, APIs, or curated datasets Data Governance, Quality & Metadata · Experience implementing
data catalogs, metadata management, and ownership models · Understanding of
data lineage
for auditability and troubleshooting · Strong focus on
data quality frameworks , including validation, freshness checks, and alerting · Experience defining and enforcing
data contracts, schemas, and SLAs Good to have Security, Privacy & Compliance · Hands-on experience implementing
fine-grained access controls
for BigQuery and GCS · Experience with
Sprint planning
and helping team technically. · Strong stakeholder communication and solution‑architecture skills Qualifications ·
Experience: [10–14]+ years in DevOps and Data Architecture, 5+ years designing on Pyspark/GCP/OCP at scale; prior on‑prem → cloud migration a must. ·
Education:
Bachelor’s/Master’s in Computer Science, Information Systems, or equivalent experience. ·
Certifications:
Google Cloud Professional Cloud Architect/DevOps/OCP
(required or within 3 months).
Plus:
Professional Data Engineer, Security Engineer.
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.