NorthBay Solutions is a US-based premier AWS partner and the only premier-level partner in the region. We have been serving our clients globally and providing AWS Big Data, Data Lake & Data Warehousing solutions for over a decade now.
Big Data
We are currently recruiting for a Tech Lead/Associate Architect with solid experience in data engineering and a passion for Big Data. The ideal candidate will have experience of over 10 years and 7years specifically in data engineering.
You will have the chance to join a global, multi-cultural team of Data Engineers, and be part of the growth and development of it, participate in a large variety of exciting and challenging projects and work with our exceptional team of Technology professionals.
Technology Stack Used & Required Knowledge:
? Must have experience on ETL Data Engineer, Python, Java, Bigdata (Hadoop, spark, hive)
? Exceptional understanding of ETL cycle.
? Experience in creating and driving large-scale ETL pipelines in AWS-based environment Experience with integration of data from multiple data sources.
? Overall 8+ years of relevant work experience in big data engineering, ETL, Data Modeling, and Data Architecture.
? Experience with DevOps and Continuous Integration/Delivery (CI/CD) concepts and tools such as Bitbucket, Bamboo, Ansible, Sonar, and the Atlassian tool suite
? Strong software development and programming skills with a focus on data using Python/ PySpark and/or Scala for data engineering.
? Experience and understanding of various core AWS services such as IAM, Cloud Formation, EC2, S3, EMR/Spark, Glue, Lambda, Athena, and Redshift will be a plus
? Understanding of data types/ handling of different data models.
? Experience with the AWS data management tools such as Data Lake and Databricks or AWS Snowflake is a plus
? Understanding of descriptive and exploratory statistics, predictive modelling, evaluation metrics, decision trees, machine learning algorithms is a plus
? Good scripting and programming skills.
Required:
? Design and develop using Big Data technologies.
? Design and develop innovative Big Data, and Cloud technologies to build critical, overly complex distributed systems from scratch
? Design, develop and manage complex ETL jobs and pipelines.
? Function as a critical thought leader, consulting and mentoring other teams as your new systems transform the way this entire firm leverages data
? Deploying data models into production environments. This entails providing the model with data stored in a warehouse or coming directly from sources, configuring data attributes, managing computing resources, setting up monitoring tools, etc.
? Mentor and train colleagues where necessary by helping them learn and improve their skills, as well as innovate and iterate on best practices.
? Solve complex issues and provide guidance to the team members when needed.
? Make improvements and process recommendations that have an impact on the business
? Leading and organizing teams of many sizes
? Client Facing Skills- strong speaking and consultation skills
? Flexibility in using different technologies/ platforms
? Ability to shift gears quickly and cope with change
? Analytical & Meticulous
? Confident, initiative-taking, and initiative-taking
? Disciplined and “Team first approach” oriented
Checklist- Can you answer YES?
• Have you coded PySpark programs in production using AWS Glue/AWS EMR?
• Are you aware of how to handle CDC data in a Data Lake
• Do you know about Delta Lake?
• Do you know about Data Lake Architectures including Lambda and Kappa?
• Have you created data lakes in AWS having Terabytes and Petabytes of data?
• Are you familiar with Delta lakes?
• Do you have a clear understanding of CDC?
• Are you familiar with Data Governance?
• Are you familiar with streaming architectures?
• Would you consider yourself an expert at Spark technology?
|