For additional information, please review .**Responsibilities:*** Operate end-to-end in the design, development, and implementation of robust big data solutions, ensuring optimal performance, scalability, data quality, and security.* Collaborate closely within small, co-located squads (4-7 person teams), fostering high communication and low coordination overhead, to translate complex business requirements into technical specifications for big data processing and analytical solutions.* Act as a player/coach within the team, mentoring junior members and leading by example in the development of efficient and innovative big data architectures.* Design, develop, and optimize large-scale data pipelines using PySpark for data ingestion, transformation, and aggregation, always with an eye towards efficiency and domain relevance.* Implement and manage real-time data streaming and event-driven architectures using technologies like Apache Kafka.* Design and implement sophisticated data warehousing solutions and dimensional models for efficient data storage and retrieval, ensuring alignment with business needs.* Work with various distributed data storage technologies, including distributed file systems (e.g., HDFS, S3) and NoSQL databases (e.g., MongoDB, Cassandra), selecting the right tool for the right problem.* Implement efficient data processing and storage strategies to optimize the performance and scalability of big data applications, with a strong focus on the "why" behind the technology choices.* Champion best practices in software development, including rigorous code reviews, implementing comprehensive testing, and supporting continuous integration and continuous deployment (CI/CD) pipelines.* Demonstrate high autonomy and agency in driving projects forward, making informed decisions, and proactively identifying areas for improvement.* Proactively leverage and contribute to the development of AI-powered development tools, including internal Citi AI tools like Copilot, Claude Code, Codex, and Antigravity, to significantly enhance productivity, code quality, and accelerate development cycles.* Lead technical discussions and contribute strategically to the evolution of our big data technology stack, always seeking innovative approaches.* Troubleshoot and resolve complex technical issues within big data environments, demonstrating strong analytical and problem-solving skills.**Required Skills & Experience:*** **Experience:** 6+ years of extensive, hands-on experience as a Senior Big Data Developer, with a strong emphasis on **PySpark** and the Apache Spark ecosystem, operating as a player/coach.* **Programming Languages:** + Expert proficiency in Python, with a proven track record of developing robust, scalable, and high-performance PySpark applications for large-scale data processing.* **Big Data Frameworks/Technologies:** + Deep understanding and extensive hands-on experience with Apache Spark (Spark Core, Spark SQL, Spark Streaming) and its ecosystem. + Experience with distributed computing frameworks such as Hadoop (HDFS, YARN).* **Data Storage & Warehousing:** + Expert proficiency in SQL and extensive experience with data warehousing concepts and technologies (e.g., Hive, Snowflake, Redshift, Databricks SQL). + Proven experience with various data storage formats (e.g., Parquet, ORC, Avro) and data lake solutions (e.g., Delta Lake, Iceberg). + Experience with NoSQL databases (e.g., MongoDB, Cassandra, HBase) is a significant plus.* **Messaging & Event Streaming:** + Strong experience with Apache Kafka for building real-time data pipelines and event-driven architectures.* **Cloud Platforms:** + Demonstrated experience with big data services on major cloud platforms (e.g., AWS EMR/Glue/Redshift, Azure Databricks/Data Factory/Synapse, GCP Dataflow/Dataproc/BigQuery) is highly desirable.* **AI-Powered Development & Productivity:** + **Proven effectiveness with AI coding tools (e.g., Claude Code, Codex, Antigravity) is a mandatory requirement** + A strong "AI-first thinker" mindset, demonstrating how to leverage and integrate AI tools into the development workflow for continuous improvement. + Experience with or a strong willingness to actively explore and implement other AI-powered tools to optimize big data development processes.* **Domain Understanding:** + Strong ability to articulate the functional domain being worked in, understanding the business context, and explaining "why" the technical solutions matter.* **Other Essential Skills:** + Advanced understanding of data structures, algorithms, and performance optimization techniques for large-scale distributed data processing. + Experience with RESTful API design and development for data ingestion or exposure points. + Familiarity with containerization technologies (e.g., Docker, Kubernetes) for deploying and managing big data applications. + Expert proficiency with version control systems, especially Git, and advanced branching strategies. + Exceptional problem-solving, analytical, and debugging skills in highly complex, distributed big data environments. + Superior communication and interpersonal skills, with a proven ability to work effectively and autonomously within small, high-performing teams, and to mentor others. + Demonstrated high autonomy and agency in tackling complex challenges and delivering impactful solutions.**Education:*** Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related quantitative field is required. Equivalent practical experience with a demonstrable track record of excellence will also be considered.We are building an **A-team** of highly skilled and autonomous engineers, and we are seeking an exceptional PySpark Big Data Senior Developer to join our dynamic and focused squads. This role is for a hands-on player/coach who thrives in a high-autonomy environment, is deeply committed to leveraging AI for maximum productivity, and possesses a profound understanding of the functional domains our work impacts. The ideal candidate will be instrumental in designing, developing, and optimizing large-scale data processing solutions using PySpark and cutting-edge big data technologies. We are looking for an **AI-first thinker** who can raise the bar, coach others, and strategically contribute to our evolving technology landscape.#J-18808-Ljbffr
Pyspark Big Data Senior Developer - Vice President
CITIBANK (SWITZERLAND) AG
mississauga, mississauga
Published 27 days ago
Report job