Pyspark_Chennai, Hyderabad

FULL_TIME 3 weeks ago
Employment Information

56 years of total experience in data engineering or big data development.

  • 2?3 years hands-on experience with Apache Spark.
  • Strong programming skills in PySpark, Python, and Scala.
  • 2+ years of experience in Scala backend development.
  • Proficient in Scala, both object oriented and functional programming concepts.
  • Deep understanding and application of advanced functional programming concepts like category theory, monads, applicatives, and type classes.
  • Hands-On knowledge with Scala Typelevel libraries like Cats, Shapeless, and others used for building applications with strong typing and efficient concurrency.
  • Solid understanding of data lakes, lakehouses, and Delta Lake concepts.
  • Experience in SQL development and performance tuning.

Proficient in cloud services (e.g. S3, Glue, Lambda, EMR, Redshift, CloudWatch, IAM).

  • Familiarity with Airflow, dbt, or similar orchestration tools is a plus.
  • Experience in CI/CD tools like Jenkins, GitHub Actions, or Code Pipeline.
  • Knowledge of data security, governance, and compliance frameworks.

Responsibilities:

Develop and maintain scalable data pipelines using Apache Spark on Databricks.

  • Build end-to-end ETL/ELT pipelines on AWS/GCP/Azure using services like S3, Glue, Lambda, EMR, and Step Functions.
  • Collaborate with data scientists, analysts, and business stakeholders to deliver high-quality data solutions.
  • Design and implement data models, schemas, and Lakehouse architecture in Databrick/Snowflake.
  • Optimize and tune Spark jobs for performance and cost-efficiency.
  • Integrate data from multiple structured and unstructured data sources.
  • Monitor and manage data workflows, ensuring data quality, consistency, and security.
  • Follow best practices in CI/CD, code versioning (Git), and DevOps practices for data applications.
  • Write clean, reusable, well-documented code using Python / PySpark / Scala.