(Fermé)Tributary

Big data infrastructure internship | Adaltas

+3
+13
France

+3
+13
France

À propos

Big Data and distributed computing are at the core of Adaltas. We accompany our partners in the deployment, maintenance, and optimization of some of the largest clusters in France. Recently, we also provide support for day-to-day operations.

As a great defender and active contributor of open source, we are at the forefront of the data platform initiative TDP (TOSIT Data Platform).

During this internship, you will contribute to the development of TDP, its industrialization, and the integration of new open source components and functionalities. You will be accompanied by the Alliage expert team in charge of TDP editor support.

You will also work with the Kubernetes ecosystem and the automation of datalab deployments Onyxia, which we want to make available to our customers as well as to students as part of our teaching modules (devops, big data, etc.).

Your qualifications will help to expand the services of Alliage’s open source support offering. Supported open source components include TDP, Onyxia, ScyllaDB, etc. For those who would like to do some web work in addition to big data, we already have a very functional intranet (ticket management, time management, advanced search, mentions, and related articles), but other nice features are anticipated.

You will practice GitOps release chains and write articles.

You will work in a team with senior advisors as mentors.

Adaltas is a consulting agency led by a team of open source experts focusing on data management. We deploy and operate the storage and computing infrastructures in collaboration with our customers.

Partner with Cloudera and Databricks, we are also open source contributors. We invite you to browse our site and our many technical publications to learn more about the company.

Skills Required and to be Acquired

Automating the deployment of the Onyxia datalab requires knowledge of Kubernetes and Cloud native. You must be comfortable with the Kubernetes ecosystem, the Hadoop ecosystem, and the distributed computing model. You will master how the basic components (HDFS, YARN, object storage, Kerberos, OAuth, etc.) work together to meet the uses of big data.

A good knowledge of using Linux and the command line is required.

During the internship, you will learn:

The Kubernetes/Hadoop ecosystem in order to contribute to the TDP project
Securing clusters with Kerberos and SSL/TLS certificates
High availability (HA) of services
The distribution of resources and workloads
Supervision of services and hosted applications
Fault tolerant Hadoop cluster with recoverability of lost data on infrastructure failure
Infrastructure as Code (IaC) via DevOps tools such as Ansible and Vagrant
Be comfortable with the architecture and operation of a data lakehouse
Code collaboration with Git, Gitlab, and Github

Responsibilities

Become familiar with the architecture and configuration methods of the TDP distribution
Deploy and test secure and highly available TDP clusters
Contribute to the TDP knowledge base with troubleshooting guides, FAQs, and articles
Actively contribute ideas and code to make iterative improvements to the TDP ecosystem
Research and analyze the differences between the main Hadoop distributions
Contribute to the development of a tool to collect customer logs and metrics on TDP and ScyllaDB
Actively contribute ideas to develop our support solution

Additional Information

Languages: French or English
Duration: 6 months

Much of the digital world runs on Open Source software and the Big Data industry is booming. This internship is an opportunity to gain valuable experience in both domains. TDP is now the only truly Open Source Hadoop distribution. This is a great momentum. As part of the TDP team, you will have the possibility to learn one of the core big data processing models and participate in the development and the future roadmap of TDP. We believe that this is an exciting opportunity and that on completion of the internship, you will be ready for a successful career in Big Data.

A laptop with the following characteristics:

32GB RAM
1TB SSD
8c/16t CPU

A cluster made up of:

3x 28c/56t Intel Xeon Scalable Gold 6132
3x 192TB RAM DDR4 ECC 2666MHz
3x 14 SSD 480GB SATA Intel S4500 6Gbps

A Kubernetes cluster and a Hadoop cluster.

Remuneration

Restaurant tickets
Participation in one international conference

In the past, the conferences which we attended include the KubeCon organized by the CNCF foundation, the Open Source Summit from the Linux Foundation, and Fosdem.

For any request for additional information and to submit your application, please contact David Worms:

#J-18808-Ljbffr

Compétences idéales

Big Data
Distributed Computing
Kubernetes
Hadoop
Linux
HDFS
Kerberos
OAuth
Ansible
Vagrant
Git
Gitlab
Github

France

Expérience professionnelle

Data Engineer
Fullstack
DevOps

Compétences linguistiques

English

Manifester de l'intérêt pour ce poste