Data Engineer - Warsaw Warsaw

Vantage Point Global

United States

United States

Find similar jobs

About

About the Role
We are looking for a
Data Engineer
to design and maintain scalable data solutions that power advanced analytics and AI-driven insights. This role combines expertise in
big data engineering
and
web scraping , enabling you to work on high-impact projects involving large, complex, and unstructured datasets. You will architect data pipelines, enforce governance standards, and build tools for extracting and processing data from diverse sources, including websites and external vendors. If you thrive in solving complex data challenges and want to work with cutting-edge technologies, this is the role for you. What Youll Do Architect, develop, and maintain
high-throughput data pipelines
in
Databricks
and
AWS
(Glue, EMR, Fargate, Step Functions). Ingest, normalize, and enrich large volumes of structured and unstructured data, including internal data, market data, vendor feeds, and alternative sources. Collaborate with
AI Engineers ,
ML scientists , and software teams to translate requirements into scalable data architectures, schemas, and APIs. Optimize pipeline performance and cost using distributed processing techniques ( Spark, Delta, Arrow ) and AWS best practices (spot fleets, autoscaling). Enforce
data governance, privacy, and lineage standards , cataloguing assets in Unity Catalog and managing PII/PCI classification. Build automated
validation, testing, and monitoring frameworks
to ensure data quality and freshness for both offline and online workloads. Support onboarding and integration of new external data vendors, ensuring compliance and rapid time-to-value. Continuously evaluate emerging
GenAI tooling
(vector stores, LLMOps platforms, synthetic-data generators) and drive proof-of-concepts. Web Scraping Focus:
Own the creation of tools and workflows for web crawling and scraping using compliance-approved technologies. Test and validate scraped data for accuracy, quality, and compliance. Identify and resolve issues with scrapes and scale processes as needed.
Whats Required
Bachelors or Masters degree in Computer Science, Engineering, or related field. 5+ years of experience
in data engineering or a related role. Strong experience in
Python
and
SQL . Experience with
Spark
or
Scala
and distributed data processing. Proficiency in building scalable, distributed data pipelines in a
cloud environment . Familiarity with
Linux/UNIX , HTTP, HTML, JavaScript, and networking concepts. Knowledge of web scraping tools and libraries (e.g.,
Requests, BeautifulSoup, Scrapy, Pandas, Selenium, Spark ). Working knowledge of version control systems and open-source practices. Solid understanding of
data architecture principles , data modeling, and data warehousing. Excellent analytical and problem-solving skills. Strong communication skills in English (written and spoken). Commitment to the highest ethical standards.
Preferred Skills
Experience extracting text from PDFs, images, and applications. Familiarity with system monitoring/administration tools. Knowledge of graph databases. Prior experience analysing big data sets.
Vantage Point Global is fully committed to being an Equal Opportunities, inclusive employer. We are passionate about attracting diverse talent, and welcome applications regardless of ethnicity, culture, age, gender, nationality, religion, disability, or sexual orientation. Things you need to know: To apply, youll need to provide us with a CV and answer a few initial questions. Wed like to make you aware that if you have not heard back from us within three weeks of the date of application that we will not be progressing your application. #J-18808-Ljbffr

United States

Languages

English

Notice for Users

This job was posted by one of our partners. You can view the original job source here.

Find similar jobs