About Veryfi, Inc.
About the role
Skills: Machine Learning, Natural Language Processing, Data WarehousingVeryfi is a YC-funded Silicon Valley startup that uses AI to understand documents like receipts and invoices. As a Data Engineer at Veryfi, you'll contribute to the evolution of our training data infrastructure and the development of new features and projects. You'll gather, process, and analyze diverse datasets to generate high-quality training data for our machine-learning models. Furthermore, by delving deep into our system, you'll have the autonomy to identify challenges and opportunities, taking ownership of developing solutions to refine existing tools and algorithms.
Key Responsibilities:
- Gather, process, and analyze diverse datasets to generate training data that fuels the development of our ML projects.
- Expand and optimize the training data pipelines to improve the speed and accuracy of our processes.
- Collaborate with a cross-functional team to define requirements and prioritize development efforts.
Essential Skills:
- Proficient in Python programming for data handling and processing, with experience in utilizing data science tools such as Pandas, NumPy, SciPy, and others.
- Strong analytical thinking with a focus on delivering results.
- Meticulous attention to detail, ensuring accuracy and precision in all data handling and processing tasks.
- Enthusiastic about learning and adapting to new technologies and methodologies, particularly in the realm of Machine Learning (ML).
- Innovation mindset, adept at challenging existing processes and driving positive change.
Preferred Qualifications:
- Familiarity with regex development, software engineering principles, and Linux command line tools.
- Experience with Natural Language Processing (NLP) techniques and libraries, including the use of Large - - -- - Language Models (LLMs) and supervised learning for document data extraction.
- Effective organizational abilities, capable of managing projects independently from inception to completion.
- Exceptional verbal and written communication skills, effectively communicating problems, proposed solutions, and results to stakeholders in a multicultural environment.
A Bachelor's degree in computer science, engineering, or a related field. Postgraduate studies are a plus but not required.
Keywords: NLP, Patterns Detection, Data Labeling, Software Development, Data Engineering.
Technology
(a) Native Mobile apps: Swift, Objective-C & Kotlin
(b) Backend: Python 3, TensorFlow, APIs on Django, Hub/Web on Flask
(c) IaaS: AWS with auto deploys to 4 geographies (read the deployment posts by Andrew here https://medium.com/the-road-to-silicon-valley)
(d) Database: Amazon Aurora