Welcome to PySelenScrapr documentation!
PySelenScrapr is a Python package that allows you to scrape data from websites using Selenium. It is a wrapper around the Selenium package that makes it safer and easier to scrape data from websites.
- The main features of PySelenScrapr are:
Safe: Each step has a retry and validate mechanism that makes the scraping process safer.
Easy: The package is easy to use and has a simplified API.
Retry Mechanism: The package has a retry mechanism that allows you to retry a step if it fails.
Validate Mechanism: The package has a validate mechanism that allows you to validate the result of a step.
Data Binding: The package has a data binding mechanism that allows you to bind the result to a webhook or other services
Screenshot Mechanism: The package has a screenshot mechanism that allows you to take a screenshot of the page at any step.
Error Handling: The package has an error handling mechanism that allows you to handle errors in a more efficient way and send it your backend
Docker Support: In combination with this docker package you can run the script scheduled with user interface: https://github.com/donnercody/python-selenium-scraper-template
Check out the Usage section for further information, including how to Installation the project.
Note
This project is under active development.
Contents:
API Reference
- pyselenscrapr package
- Submodules
- pyselenscrapr.ScrapingBackend module
- pyselenscrapr.ScrapingBot module
TakeScreenshotModesScrapingBotScrapingBot.__init__()ScrapingBot.group_name()ScrapingBot.backend_notify()ScrapingBot.set_warning_handler()ScrapingBot.set_exception_handler()ScrapingBot.take_screenshot_on_error()ScrapingBot.add_step_group()ScrapingBot.add_step()ScrapingBot.sleep()ScrapingBot.all_groups_executed()ScrapingBot.get_next_step()ScrapingBot.set_current_group()ScrapingBot.finished()ScrapingBot.get_all_steps_by_interval()ScrapingBot.get_next_group()ScrapingBot.run()ScrapingBot.get_converted_data()ScrapingBot.save_backend_data()ScrapingBot.send_error_to_backend()ScrapingBot.send_data_to_backend()ScrapingBot.__annotations__ScrapingBot.__dict__ScrapingBot.__module__ScrapingBot.set_data()ScrapingBot.append_data()ScrapingBot.has_data()ScrapingBot.get_data()ScrapingBot.get_task_log()
- pyselenscrapr.ScrapingLogic module
tocontainer()ScrapingLogicScrapingLogic.__init__()ScrapingLogic.__getitem__()ScrapingLogic.__getattr__()ScrapingLogic.__repr__()ScrapingLogic.sleep()ScrapingLogic.replace_input_text()ScrapingLogic.clear_input_text()ScrapingLogic.send_keys_to_element()ScrapingLogic.is_visible()ScrapingLogic.set_data()ScrapingLogic.take_screenshot()ScrapingLogic.append_data()ScrapingLogic.send_data_to_backend()ScrapingLogic.has_data()ScrapingLogic.get_data()ScrapingLogic.notify()ScrapingLogic.convert_table_to_df()ScrapingLogic.get_number_of_content()ScrapingLogic.convert_tables_to_df()ScrapingLogic.get_all_elements()ScrapingLogic.wait_for_reload()ScrapingLogic.get_best_element()ScrapingLogic.element_count()ScrapingLogic.element_text()ScrapingLogic.wait_until_present()ScrapingLogic.wait_until_clickable()ScrapingLogic.element_exists()ScrapingLogic.set_attribute()ScrapingLogic.scroll_to_element()ScrapingLogic.click_on_element_by_xpath_with_jquery()ScrapingLogic.inner_text_contains()ScrapingLogic.click_on_best_element()ScrapingLogic.click_by_jquery_on_node()ScrapingLogic.scroll_to_element_by_js()ScrapingLogic.__dict__ScrapingLogic.__module__
- pyselenscrapr.ScrapingStep module
ScrapingStepErrorHandlingScrapingStepIntervalScrapingStepInterval.OrderScrapingStepInterval.BeforeAnyStepScrapingStepInterval.AfterAnyStepScrapingStepInterval.BeforeValidationScrapingStepInterval.AfterValidationScrapingStepInterval.BeforeRetryScrapingStepInterval.AfterRetryScrapingStepInterval.BeforePaginationScrapingStepInterval.AfterPaginationScrapingStepInterval.__dict__ScrapingStepInterval.__module__
ScrapingStepRepeatIScrapingStepScrapingStepScrapingStep.childGroupsScrapingStep.robotScrapingStep.__init__()ScrapingStep.__str__()ScrapingStep.name()ScrapingStep.interval()ScrapingStep.next_step()ScrapingStep.raise_exception()ScrapingStep.set_previous_step()ScrapingStep.add_child_group()ScrapingStep.previous_step()ScrapingStep.can_execute()ScrapingStep.was_executed()ScrapingStep.execute()ScrapingStep.set_executed()ScrapingStep.before_validation()ScrapingStep.reset()ScrapingStep.log()ScrapingStep.retry()ScrapingStep.__annotations__ScrapingStep.__module__ScrapingStep.exit_bot_when_errored()ScrapingStep.can_retry()ScrapingStep.is_executed()ScrapingStep.error_handling()
ScrapingStepConditional
- pyselenscrapr.ScrapingStepGroup module
- pyselenscrapr.ScrapingStepLoop module
- pyselenscrapr.ScrapingStepPagination module
- pyselenscrapr.ValidationError module
- Module contents