Apply Now: https://forms.gle/dgaCTEE4xezXHeu6A
Position: Python Developer (Web Scraping)
Location: Gurugram, Haryana
Type: Full-Time
About the Role
We are looking for an experienced Python Developer with a strong focus on web scraping to join our team. In this role, you will be responsible for extracting and processing data from websites using Python-based scraping tools and libraries. This position is ideal for a detail-oriented individual with expertise in building robust web scrapers and handling large data sets.
Key Responsibilities
-
Web Scraping: Develop, maintain, and optimize Python-based web scrapers to extract data from various websites efficiently.
-
Data Extraction: Scrape and process structured and unstructured data from multiple sources, ensuring accuracy and completeness.
-
Automation: Design scripts to automate repetitive scraping tasks and schedule jobs using tools like cron, Airflow, or Celery.
-
Data Storage: Store and manage the scraped data in databases (SQL/NoSQL) or cloud storage solutions.
-
Error Handling: Implement error-handling strategies to deal with issues like captcha, IP blocking, and dynamic content loading.
-
Performance Optimization: Ensure the scrapers are optimized for performance and can handle large-scale scraping without crashing or slowing down.
-
Compliance: Adhere to web scraping best practices and ensure scraping activities comply with legal standards such as website terms of service.
-
Collaboration: Work closely with data analysts, product managers, and other developers to understand data requirements and deliver high-quality results.
Skills & Qualifications
-
Proficiency in Python: Strong expertise in Python, with specific experience in libraries like BeautifulSoup, Scrapy, Selenium, and Requests.
-
Web Scraping Tools: Familiarity with tools like Scrapy, BeautifulSoup, or Playwright to extract data from dynamic and static websites.
-
APIs & Data Parsing: Experience in working with RESTful APIs and parsing JSON, XML, and other data formats.
-
Database Management: Knowledge of handling databases such as MySQL, PostgreSQL, MongoDB, or cloud databases for storing and processing scraped data.
-
Problem-Solving: Ability to tackle challenges such as CAPTCHA, proxies, and dynamic content.
-
Version Control: Experience using Git for version control and collaboration.
-
Performance Optimization: Familiarity with multith
reading, asynchronous scraping (e.g., asyncio, aiohttp), and optimizing scrapers for speed and efficiency.
Nice to Have
-
Cloud Platforms: Experience with cloud services like AWS, GCP, or Azure for deploying and scaling scraping projects.
-
Proxy Management: Understanding of rotating proxies and managing anti-scraping techniques.
-
CI/CD Pipelines: Familiarity with CI/CD pipelines for automating testing and deployment.
-
Web Technologies: Basic understanding of HTML, CSS, and JavaScript to handle complex web scraping scenarios.