*About Us: *
We are redefining social media through cutting-edge AI and transformative user experiences. Our platform represents the next evolution in how people connect, share, and engage online. We're building technology that resonates deeply with Gen Z and future generations, combining innovative AI solutions with intuitive user experiences that challenge the status quo of traditional social media.
Position Overview: We're seeking an exceptional Senior Data Engineer to architect and scale the data infrastructure powering our AI-driven social platform. In this role, you'll build robust, scalable systems that process massive amounts of user interaction data, power real-time AI features, and enable deep social analytics. You'll work at the intersection of big data and social networking, creating the foundation for next-generation social experiences.
Key Responsibilities:
- Design and implement high-performance data pipelines that power our AI features and user recommendations using Apache Airflow and Databricks
- Build and maintain cloud infrastructure on AWS that scales with our rapidly growing user base
- Develop real-time data processing systems to support interactive social features and instant content delivery
- Create efficient data structures and pipelines that enable sophisticated user behavior analysis
- Implement robust data security measures to protect user privacy and ensure compliance
- Collaborate with AI/ML teams to optimize data infrastructure for machine learning workflows
- Design and maintain metrics collection systems for user engagement and platform performance
- Lead technical discussions and mentor junior engineers in data engineering best practices
Experience:
Minimum of 5 years of experience in data engineering, with a proven track record of building and maintaining production data platforms
Demonstrated experience with large-scale social or consumer applications processing petabyte-scale data
Technical Expertise:
Advanced proficiency in Python for data engineering, with experience in optimization and performance tuning
Expert-level SQL knowledge, including complex query optimization and data modeling
Strong understanding of distributed computing principles and big data architectures
Cloud & Infrastructure:
- Extensive experience with AWS services, specifically:
- Data Storage: S3, EFS, FSx
- Compute: EMR, ECS/EKS
- Analytics: Redshift, Athena
- Streaming: Kinesis, MSK
- Proven expertise in Terraform for infrastructure automation
- Experience managing production Kubernetes clusters with 10+ nodes
Data Processing & Orchestration:
- Deep expertise in Databricks ecosystem:
- Delta Lake optimization
- Spark performance tuning
- Unity Catalog management
- Advanced Apache Airflow knowledge:
- Custom operator development
- DAG optimization
- Multi-cluster deployment
- Experience with real-time stream processing using Kafka/Kinesis