Python Developer (Web Scraping Specialist)

Legistify • Full-time • Gurugram, Haryana, IN • $60k - $750k / year • 5d ago

Apply Now: https://forms.gle/dgaCTEE4xezXHeu6A

Position: Python Developer (Web Scraping)
Location: Gurugram, Haryana
Type: Full-Time

About the Role
We are looking for an experienced Python Developer with a strong focus on web scraping to join our team. In this role, you will be responsible for extracting and processing data from websites using Python-based scraping tools and libraries. This position is ideal for a detail-oriented individual with expertise in building robust web scrapers and handling large data sets.

Key Responsibilities

Web Scraping: Develop, maintain, and optimize Python-based web scrapers to extract data from various websites efficiently.
Data Extraction: Scrape and process structured and unstructured data from multiple sources, ensuring accuracy and completeness.
Automation: Design scripts to automate repetitive scraping tasks and schedule jobs using tools like cron, Airflow, or Celery.
Data Storage: Store and manage the scraped data in databases (SQL/NoSQL) or cloud storage solutions.
Error Handling: Implement error-handling strategies to deal with issues like captcha, IP blocking, and dynamic content loading.
Performance Optimization: Ensure the scrapers are optimized for performance and can handle large-scale scraping without crashing or slowing down.
Compliance: Adhere to web scraping best practices and ensure scraping activities comply with legal standards such as website terms of service.
Collaboration: Work closely with data analysts, product managers, and other developers to understand data requirements and deliver high-quality results.

Skills & Qualifications

Proficiency in Python: Strong expertise in Python, with specific experience in libraries like BeautifulSoup, Scrapy, Selenium, and Requests.
Web Scraping Tools: Familiarity with tools like Scrapy, BeautifulSoup, or Playwright to extract data from dynamic and static websites.
APIs & Data Parsing: Experience in working with RESTful APIs and parsing JSON, XML, and other data formats.
Database Management: Knowledge of handling databases such as MySQL, PostgreSQL, MongoDB, or cloud databases for storing and processing scraped data.
Problem-Solving: Ability to tackle challenges such as CAPTCHA, proxies, and dynamic content.
Version Control: Experience using Git for version control and collaboration.
Performance Optimization: Familiarity with multith

reading, asynchronous scraping (e.g., asyncio, aiohttp), and optimizing scrapers for speed and efficiency.

Nice to Have

Cloud Platforms: Experience with cloud services like AWS, GCP, or Azure for deploying and scaling scraping projects.
Proxy Management: Understanding of rotating proxies and managing anti-scraping techniques.
CI/CD Pipelines: Familiarity with CI/CD pipelines for automating testing and deployment.
Web Technologies: Basic understanding of HTML, CSS, and JavaScript to handle complex web scraping scenarios.