Founding Data Engineer

Elicit

United States

United States

Jetzt Bewerben

Über

divh2Elicit Data Engineer Role/h2pElicit is an AI research assistant that uses language models to help professional researchers and high-stakes decision makers break down hard questions, gather evidence from scientific/academic sources, and reason through uncertainty./ppTwo main reasons for this role:/polliCurrently, Elicit operates over academic papers and clinical trials. One of your key initial responsibilities will be to build a complete corpus of these documents, available as soon as theyre published, combining different data sources and ingestion methods. Once thats done there is a growing list of other document types and sources wed love to integrate!/liliOne of our main initiatives is to broaden the sorts of tasks you can complete in Elicit. We need a data engineer to figure out the best way to ingest massive amounts of heterogeneous data in such a way as to make it usable by LLMs. We need your help to integrate into our customers custom data providers so that they can create task-specific workflows over them./li/olpIn general, were looking for someone who can architect and implement robust, scalable solutions to handle our growing data needs while maintaining high performance and data quality./ppOur tech stack:/pulliData pipeline: Python, Flyte, Spark/liliBackend: Node and Python, event sourcing/liliFrontend: Next.js, TypeScript, and Tailwind/liliWe like static type checking in Python and TypeScript!/liliAll infrastructure runs in Kubernetes across a couple of clouds/liliWe use GitHub for code reviews and CI/liliWe deploy using the gitops pattern (i.e. deploys are defined and tracked by diffs in our k8s manifests)/li/ulpConsider the questions:/pulliHow would you optimize a Spark job thats processing a large amount of data but running slowly?/liliWhat are the differences between RDD, DataFrame, and Dataset in Spark? When would you use each?/liliHow does data partitioning work in distributed systems, and why is it important?/liliHow would you implement a data pipeline to handle regular updates from multiple academic paper sources, ensuring efficient deduplication?/li/ulpIf you have a solid answer for thesewithout reference to documentationthen we should chat!/ppLocation and travel:/ppWe have a lovely office in Oakland, CA; there are people there every day but we dont all work from there all the time. Its important to us to spend time with our teammates, however, so we ask that all Elicians spend about 1 week out of every 6 with teammates./ppWe wrote up more details on this page./ppWhat youll bring to the role:/pulli5+ years of experience as a data engineer: owning make-or-break decisions about how to ingest, manage, and use data/liliStrong proficiency in Python (5+ years experience)/liliYou have created and owned a data platform at rapidly-growing startupsgathering needs from colleagues, planning an architecture, deploying the infrastructure, and implementing the tooling/liliExperience with architecting and optimizing large data pipelines, ideally with particular experience with Spark; ideally these are pipelines which directly support user-facing features (rather than internal BI, for example)/liliStrong SQL skills, including understanding of aggregation functions, window functions, UDFs, self-joins, partitioning, and clustering approaches/liliExperience with columnar data storage formats like Parquet/liliStrong opinions, weakly-held about approaches to data quality management/liliCreative and user-centric problem-solving/liliYou should be excited to play a key role in shipping new features to usersnot just building out a data platform!/li/ulpNice to have:/pulliExperience in developing deduplication processes for large datasets/liliHands-on experience with full-text extraction and processing from various document formats (PDF, HTML, XML, etc.)/liliFamiliarity with machine learning concepts and their application in search technologies/liliExperience with distributed computing frameworks beyond Spark (e.g., Dask, Ray)/liliExperience in science and academia: familiarity with academic publications, and the ability to accurately model the needs of our users/liliHands-on experience with industry standard tools like Airflow, DBT, or Hadoop/liliHands-on experience with standard paradigms like data lake, data warehouse, or lakehouse/li/ulpWhat youll do:/pulliBuilding and optimizing our academic research paper pipeline/liliExpanding the datasets Elicit works over/liliData for our ML systems/li/ulpYour first week:/pulliStart building foundational context/liliMake your first contribution to Elicit/li/ulpYour first month:/pulliYoull complete your first multi-issue project/liliYoure actively improving the team/li/ulpYour first quarter:/pulliYoure flying solo/liliYouve developed an area of expertise/liliYou actively research and improve the product/li/ulpCompensation, benefits, and perks:/pulliFlexible work environment: work from our office in Oakland or remotely with time zone overlap (between GMT and GMT-8), as long as you can travel for in-person retreats and coworking events/liliFully covered health, dental, vision, and life insurance for you, generous coverage for the rest of your family/liliFlexible vacation policy, with a minimum recommendation of 20 days/year + company holidays/lili401K with a 6% employer match/liliA new Mac + $1,000 budget to set up your workstation or home office in your first year, then $500 every year thereafter/lili$1,000 quarterly AI Experimentation Learning budget, so you can freely experiment with new AI tools to incorporate into your workflow, take courses, purchase educational resources, or attend AI-focused conferences and events/liliA team administrative assistant who can help you with personal and work tasks/li/ulpFor all roles at Elicit, we use a data-backed compensation framework to keep salaries market-competitive, equitable, and simple to understand. For this role, we target starting ranges of:/pulliSenior (L4): $185-270k + equity/liliExpert (L5): $215-305k + equity/liliPrincipal (L6): $260 + significant equity/li/ulpWere optimizing for a hire who can contribute at a L4/senior-level or above./ppWe also offer above-market equity for all roles at Elicit, as well as employee-friendly equity terms (10-year exercise periods)./p/div

United States

Sprachkenntnisse

English

Hinweis für Nutzer

Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klicken Sie auf „Jetzt Bewerben“, um Ihre Bewerbung direkt auf deren Website einzureichen.

Jetzt Bewerben