Job Summary We are seeking a highly skilled Senior Platform Engineer – Data & AI to architect and build next-generation AI‑native and Agentic platforms that power enterprise‑scale data, automation, and intelligent systems. This role goes beyond traditional data platforms to focus on Agentic AI ecosystems, including multi‑agent orchestration, agent lifecycle management, agent communication protocols, and AI‑driven platform automation. You will design and operate a unified platform that supports: Data pipelines and real‑time streaming APIs and microservices GenAI and LLM‑powered applications Agentic workflows and multi‑agent systems Working closely with AI/ML engineers, platform teams, SRE, and product teams, you will help build a scalable, observable, and governed AI platform on Google Cloud, leveraging automation, IaC, and modern cloud‑native patterns. Responsibilities Platform & Cloud Engineering Architect and build cloud‑native platforms on Google Cloud (GCP) supporting data, AI, and agentic workloads Design event‑driven architectures using Apache Kafka, Google Pub/Sub, or equivalent systems Build scalable microservices and APIs using modern frameworks (e.g., Java, Spring Boot) Develop and manage real‑time and batch data pipelines using Airflow, Dataform, Dataflow, Spark, or similar tools Implement Infrastructure‑as‑Code (IaC) using Terraform and Kubernetes for scalable, repeatable deployments Enable platform automation using CI/CD, GitOps, and self‑service frameworks Ensure platform scalability, reliability, and cost efficiency Agentic Platform & Multi‑Agent Systems Design and build Agentic Platforms that support: Agent lifecycle management Task orchestration Context and memory handling Develop and orchestrate multi‑agent systems using frameworks such as CrewAI, LangGraph, AutoGen, or equivalent Implement agent communication and coordination patterns across distributed systems Build and integrate: Agent Gateway for managing agent interactions and routing A2A (Agent‑to‑Agent) communication protocols MCP (Model Context Protocol) or equivalent for context sharing and orchestration ADK (Agent Development Kits) or internal frameworks for rapid agent development Enable Use Cases Autonomous pipeline monitoring and remediation AI‑assisted platform operations Intelligent workflow automation Code and data pipeline generation AI & GenAI Platform Engineering Integrate LLMs and GenAI services (e.g., OpenAI, Gemini, Claude) into platform workflows Build and support: RAG pipelines and retrieval systems Vector search and embedding architectures (Weaviate, Pinecone, FAISS) Enable AI‑driven automation for: Platform operations Data quality monitoring Incident analysis and resolution Develop reusable AI platform services and APIs for enterprise consumption Agent Observability & AI Operations Design and implement Agent Observability frameworks, including: Agent execution tracing Decision tracking and explainability Latency and performance monitoring Failure and retry analysis Integrate observability using tools such as: OpenTelemetry, Prometheus, Grafana AI/LLM observability tools (e.g., prompt tracing, evaluation frameworks) Enable end‑to‑end observability across data pipelines, APIs, and agent workflows Data Architecture & Governance Lead initiatives in: Data modeling and semantic layer design Data cataloging and metadata management Data quality and lineage tracking Implement governance frameworks using tools such as DataHub, Collibra, or equivalent Support data mesh and data fabric architectures for federated data ownership Automation & Intelligent Platform Operations Build automation‑first platforms leveraging: AI‑driven workflows Self‑healing systems Event‑driven automation Use GenAI to: Automate operational tasks Generate platform configurations and code Enhance developer productivity Collaborate with SRE and Production Support teams to improve: Reliability Incident response Operational efficiency Engineering Enablement Develop platform SDKs, CLIs, and reusable blueprints Enable self‑service platform capabilities for engineering teams Standardize best practices for: APIs Data pipelines Agent development Mentor engineers and promote a culture of innovation and continuous learning Qualifications Experience 8–12 years of experience in Platform Engineering, Data Engineering, Cloud Architecture, or AI Platform Engineering Proven experience building enterprise‑scale data and AI platforms Core Technical Skills Strong programming expertise in Java, Python, Full‑Stack and SQL Experience building microservices and API‑driven architectures Deep understanding of distributed systems and cloud‑native design Cloud & Platform Engineering Strong experience with Google Cloud Platform (GCP) (mandatory) Hands‑on experience with: Kubernetes and containerized workloads Terraform and Infrastructure‑as‑Code CI/CD pipelines and GitOps Streaming & Data Systems Experience with Kafka, Pub/Sub, Spark, Flink, or similar systems Strong background in real‑time and batch data processing AI, GenAI & Agentic Systems Hands‑on experience with: LLM frameworks and APIs Multi‑agent orchestration frameworks (CrewAI, LangGraph, AutoGen, etc.) RAG pipelines and vector databases Experience building or working with: Agent Gateway architectures A2A communication models MCP or context‑sharing frameworks Agent Development Kits (ADKs) Full Stack & UI Development Experience building full stack applications with modern frontend frameworks (React, Angular, Vue.js) Strong understanding of REST/GraphQL APIs and UI integration patterns Experience with real‑time UI updates using WebSockets or streaming architectures Familiarity with design systems, UX principles, and responsive design Experience building platform dashboards, developer portals, or observability UIs Observability & Reliability Experience with observability tools: Prometheus, Grafana, OpenTelemetry Strong debugging and system analysis skills Familiarity with AI/LLM observability and evaluation frameworks Data Governance & Architecture Experience with: Data catalogs and metadata platforms Data quality and lineage frameworks Semantic modeling and data governance Preferred Qualifications Experience with Vertex AI, MLflow, Kubeflow, or ML platforms Prior implementation of data mesh or data fabric architectures Experience with Looker Modeler / LookML or semantic layers Exposure to AI safety, governance, and responsible AI practices Experience building enterprise AI/Agentic platforms at scale Why You’ll Love This Role Work on cutting‑edge Agentic AI and multi‑agent systems Build AI‑native enterprise platforms at scale Drive innovation in automation, GenAI, and intelligent systems Collaborate with high‑impact teams across data, AI, and platform engineering Shape the future of AI‑driven enterprise architecture Equinix Benefits Equinix offers a comprehensive benefits package that includes health, dental, vision, life insurance, retirement plans, paid vacation, and paid holidays, as well as employee assistance and diversity and inclusion programs. Equal Employment Opportunity Equinix is an Equal Employment Opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, creed, national or ethnic origin, ancestry, place of birth, citizenship, sex, pregnancy, sexual orientation, gender identity or expression, marital or domestic partnership status, age, veteran or military status, disability, genetic information, political affiliation, or any other status protected by law. #J-18808-Ljbffr
Senior Platform Engineer – Data & Ai
EQUINIX
toronto, toronto
Published 17 days ago
Report job