Close but not Perfect? See more jobs like this
Primary Skill:
• Minimum 5 years of Big Data experience in Big data ecosystem
• Spark (either Pyspark or Scala, customer is open to both languages)
• Hive
• Experience in data ingestions using spark, data processing, bucketing, partitioning
JD:
Ability to clean, aggregate, and organize data from disparate sources and transfer it to data Azure data lake for the Data Consumer or Data Scientist to consume.
o Must to Have:
- Data Transformation / Processing: Spark (Either PySpark or Scala)
• Transform data using techniques such as filtering, grouping, and aggregation to translate raw data into analysis-ready datasets.
- Scripting: Python for Data.
- Query: PL/SQL, HIVE
o Good to Have:
- Ingest: Informatica (Powercenter should be good enough)
- Processing: Databricks Spark Engine and Databricks DB.
- Storage: Azure Data Storage such as Blob, ADLS Gen2
- Data Scheduling: Airflow
- Visualization: Tableau
- Knowledge of Data Governance, CICD
Think you're the perfect candidate? Apply to this job
Apply on company site