As a Gaussian Software Engineer - Data, you will be responsible for leading the architecture, design, and development of the data systems within our AI products for the semiconductor industry. You will be working with other passionate and talented Software Engineers, AI Engineers, and Applied Scientists and have opportunities to learn about various AI technologies and how they are applied to the semiconductor industry. You will have significant influence on our overall strategy by helping define these product features, drive the data architecture, and spearhead the best practices that enable a quality product. You are the ideal candidate if you are passionate about new opportunities and have a demonstrated track record of success in delivering new features and products. A commitment to teamwork and strong communication skills with both business and technical partners are important requirements. Creating reliable, scalable, and high-performance products requires exceptional technical expertise, a sound understanding of the fundamentals of computer science, and practical experience building large-scale systems.
Responsibilities
- Design and develop the scalable data architecture for Gauss Lab’s AI products which include time series and image data for semiconductor industry.
- Design and build data infrastructure systems, services, and tools to handle Gauss Labs’s data-intensive products and business requirements that securely scale over terabytes of data.
- Define the schemas, layout, storage format, and database technologies that will be used to store, retrieve and process the data for our products. Build databases, object stores, data warehouses, and lakes for time series, images, and structured data.
- Develop robust, well-instrumented near real-time stream processing data pipelines that can scale to handle future growth and adhere to SLAs. Design and develop stream processing features to execute on various event-driven ETL and AI pipelines.
- Evolve Gauss Lab’s data infrastructure and tools and technical lead for the design, building, and launching of new data models and data pipelines within our products.
- Work closely with product/program managers to understand the product’s needs, business problems, and domain.
- Work cross-functionally with various engineering and data science teams to identify and execute data-infrastructure challenges.
Key Qualifications
- BS/MS degree in Computer Science and Engineering or strong industry experience in software development.
- 3+ years of experience as a hands-on Data Engineer/Architect including ETL jobs, data pipelines, and Big Data analytics.
- 5+ years of industry experience in building large-scale production systems
- Startup spirit with the ability to be flexible and wear multiple hats.
- Proficient in large-scale data processing including batch and streaming, query engines, tooling, and storage formats.
- Experience working with Terabytes of data.
- Preferred experience in time series data storage and processing as well as image data processing.
- Experience in distributed systems design, common data platform architecture, and open source data processing frameworks
- Experience in technologies such as Hadoop, Spark, Kafka, Redis, Cassandra, Pandas, Dask, Airflow, Apache Beam, MongoDb, Hive, Impala, Hazelcast, Athena, Presto.
- Experience in at least one modern programming language such as Python, Java, Go, Rust, and proficiency in SQL.
- Experience architecting and implementing large operational data stores.
- Excellent verbal and written communication skills, able to collaborate cross-functionally.
- Plus: Experience in Kubernetes, Kubeflow, Docker, and container technologies, as well as infrastructure as code and CI/CD technologies .