pyselenscrapr package
Submodules
pyselenscrapr.ScrapingBackend module
- class pyselenscrapr.ScrapingBackend.IScrapingBackend
Bases:
ABCThis is a interface that represents a backend for the scraping process.
- __abstractmethods__ = frozenset({})
- __dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingBackend', '__doc__': '\n This is a interface that represents a backend for the scraping process.\n ', 'saveData': <function IScrapingBackend.saveData>, 'errorHandling': <function IScrapingBackend.errorHandling>, 'notify': <function IScrapingBackend.notify>, '__dict__': <attribute '__dict__' of 'IScrapingBackend' objects>, '__weakref__': <attribute '__weakref__' of 'IScrapingBackend' objects>, '__abstractmethods__': frozenset(), '_abc_impl': <_abc._abc_data object>, '__annotations__': {}})
- __module__ = 'pyselenscrapr.ScrapingBackend'
- class pyselenscrapr.ScrapingBackend.ScrapingBackendWebhook(url, error_route='/error', notify_route='/notify', data_route='/data')
Bases:
IScrapingBackendThis class is used to send the data you scraped to a webhook.
- __init__(url, error_route='/error', notify_route='/notify', data_route='/data')
- __abstractmethods__ = frozenset({})
- __annotations__ = {}
- __module__ = 'pyselenscrapr.ScrapingBackend'
pyselenscrapr.ScrapingBot module
- class pyselenscrapr.ScrapingBot.TakeScreenshotModes
Bases:
objectThis enum is used to define when the bot should take a screenshot of the current page.
- OnError = 2
- Always = 1
- Never = 0
- __dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingBot', '__doc__': '\n This enum is used to define when the bot should take a screenshot of the current page.\n ', 'OnError': 2, 'Always': 1, 'Never': 0, '__dict__': <attribute '__dict__' of 'TakeScreenshotModes' objects>, '__weakref__': <attribute '__weakref__' of 'TakeScreenshotModes' objects>, '__annotations__': {}})
- __module__ = 'pyselenscrapr.ScrapingBot'
- class pyselenscrapr.ScrapingBot.ScrapingBot(driver, max_retries=3, take_screenshots_mode=0, backend: IScrapingBackend | None = None, repeat_count_till_error=5)
Bases:
objectThe ScrapingBot class is the main class to be used for creating a scraping bot. It is used to define the steps and groups of steps that the bot should execute. The bot can be run by calling the run() method.
- __init__(driver, max_retries=3, take_screenshots_mode=0, backend: IScrapingBackend | None = None, repeat_count_till_error=5)
- group_name()
- backend_notify(message)
- set_warning_handler(param)
- set_exception_handler(param)
- take_screenshot_on_error(path)
- add_step(step_or_callback: ScrapingStep, step_group: ScrapingStepGroup | None = None, before_validation: Callable[[ScrapingLogic], None] | None = None, after_validation: Callable[[ScrapingLogic], None] | None = None) ScrapingStep
- sleep(seconds)
- all_groups_executed()
- get_next_step(step)
- set_current_group(group)
- finished()
- get_all_steps_by_interval(interval)
- get_next_group()
- run(first_group: str | ScrapingStepGroup | None = None)
Run the bot and execute all steps in the defined groups.
- Parameters:
first_group – This is the name of the first group to start. If it is None we use “default” as the first group.
- Returns:
True if the bot finished successfully, False otherwise.
- get_converted_data(data)
- save_backend_data(data)
- send_error_to_backend(error)
- send_data_to_backend(key=None, data=None)
- __annotations__ = {'_backend': <class 'pyselenscrapr.ScrapingBackend.IScrapingBackend'>}
- __dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingBot', '__annotations__': {'_backend': <class 'pyselenscrapr.ScrapingBackend.IScrapingBackend'>}, '__doc__': '\n\n The ScrapingBot class is the main class to be used for creating a scraping bot. It is used to define the steps and\n groups of steps that the bot should execute. The bot can be run by calling the run() method.\n\n ', '_data': {}, '_driver': None, '_warning_handler': None, '_current_group': None, '_exception_handler': None, '_take_screenshot_on_error': None, '_retry_count': 0, '_repeat_count_till_error': 5, '_repeat_count': 0, '_stepGroups': [], '_max_retries': 3, '_take_screenshots_mode': 0, '_screenshot_path': '.', '_backend': None, '__init__': <function ScrapingBot.__init__>, '_on_warning': <function ScrapingBot._on_warning>, 'group_name': <function ScrapingBot.group_name>, 'backend_notify': <function ScrapingBot.backend_notify>, '_take_screenshot': <function ScrapingBot._take_screenshot>, '_raise_exception': <function ScrapingBot._raise_exception>, '_on_exception': <function ScrapingBot._on_exception>, 'set_warning_handler': <function ScrapingBot.set_warning_handler>, 'set_exception_handler': <function ScrapingBot.set_exception_handler>, 'take_screenshot_on_error': <function ScrapingBot.take_screenshot_on_error>, 'add_step_group': <function ScrapingBot.add_step_group>, 'add_step': <function ScrapingBot.add_step>, '_run_step': <function ScrapingBot._run_step>, 'sleep': <function ScrapingBot.sleep>, '_on_debug': <function ScrapingBot._on_debug>, 'all_groups_executed': <function ScrapingBot.all_groups_executed>, 'get_next_step': <function ScrapingBot.get_next_step>, 'set_current_group': <function ScrapingBot.set_current_group>, 'finished': <function ScrapingBot.finished>, 'get_all_steps_by_interval': <function ScrapingBot.get_all_steps_by_interval>, '_run_before_step': <function ScrapingBot._run_before_step>, 'get_next_group': <function ScrapingBot.get_next_group>, '_run_after_step': <function ScrapingBot._run_after_step>, '_run_after_group': <function ScrapingBot._run_after_group>, '_is_group_finished': <function ScrapingBot._is_group_finished>, 'run': <function ScrapingBot.run>, 'get_converted_data': <function ScrapingBot.get_converted_data>, 'save_backend_data': <function ScrapingBot.save_backend_data>, 'send_error_to_backend': <function ScrapingBot.send_error_to_backend>, 'send_data_to_backend': <function ScrapingBot.send_data_to_backend>, 'set_data': <function ScrapingBot.set_data>, 'append_data': <function ScrapingBot.append_data>, 'has_data': <function ScrapingBot.has_data>, 'get_data': <function ScrapingBot.get_data>, 'get_task_log': <function ScrapingBot.get_task_log>, '__dict__': <attribute '__dict__' of 'ScrapingBot' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingBot' objects>})
- __module__ = 'pyselenscrapr.ScrapingBot'
- set_data(key, value, send_to_backend=False)
- append_data(key, value, send_to_backend=False)
- has_data(key)
- get_data(key)
- get_task_log()
pyselenscrapr.ScrapingLogic module
- pyselenscrapr.ScrapingLogic.tocontainer(func, bot)
- class pyselenscrapr.ScrapingLogic.ScrapingLogic(driver, bot)
Bases:
objectScraipingLogic is a class that is the main interface between the bot and the selenium driver. It contains a lot of helper functions that make it easier to interact with the driver.
You can use all driver functions and also the functions in this class.
- __init__(driver, bot)
Constructor for ScrapingLogic. Will be called from the bot.
- Parameters:
driver – the selenium driver
bot – the bot that is using the driver
- __getitem__(item)
- __getattr__(item)
- __repr__()
Return repr(self).
- sleep(seconds)
Sleep for a given amount of seconds. :param seconds: the amount of seconds to sleep :return: None
- replace_input_text(selector, keys)
If you have an input field and you want to replace the text in it, you can use this function.. It will first scroll to the element, then select all text in the input field and then send the new keys to the input field.
- Parameters:
selector – CSS or XPATH selector
keys – the new text that should be in the input field
- Returns:
True if the operation was successful, False otherwise
- clear_input_text(selector)
If you have an input field and you want to clear the text in it, you can use this function. It will first scroll to the element, then select all text in the input field and then send the DELETE key to the input field.
- Parameters:
selector – CSS or XPATH selector
- Returns:
True if the operation was successful, False otherwise
- send_keys_to_element(selector, keys)
Send keys to an element. This function will send the keys to a CSS or XPATH element.
- Parameters:
selector – CSS or XPATH selector
keys – the keys that should be sent to the element
- Returns:
True if the operation was successful, False otherwise
- is_visible(selector)
- set_data(key, value, send_to_backend=False)
- take_screenshot(step)
- append_data(key, value, send_to_backend=False)
- send_data_to_backend(key=None, data=None)
- has_data(key)
- get_data(key)
- notify(message)
- convert_table_to_df(t)
- get_number_of_content(selector)
- convert_tables_to_df(tables)
- get_all_elements(selector)
- wait_for_reload(timeout=40, min_wait=0.1)
- get_best_element(selector)
- element_count(selector)
- element_text(selector)
Get the text of an element. This function will return the text of the element if it is available.
- Parameters:
selector – CSS or XPATH selector
- Returns:
the text of the element
- wait_until_present(selector, timeout=20)
Wait until an element is present. This function will wait until an element is present in the DOM.
- Parameters:
selector – CSS or XPATH selector
timeout – the timeout in seconds
- Returns:
True if the element is present, False otherwise
- wait_until_clickable(selector, timeout=10)
- element_exists(selector)
- set_attribute(selector, attribute, value)
- scroll_to_element(selector)
- click_on_element_by_xpath_with_jquery(xpath)
- inner_text_contains(selector, text)
- click_on_best_element(selector)
- click_by_jquery_on_node(parent_button)
- scroll_to_element_by_js(object)
- __dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingLogic', '__doc__': '\n ScraipingLogic is a class that is the main interface between the bot and the selenium driver.\n It contains a lot of helper functions that make it easier to interact with the driver.\n\n You can use all driver functions and also the functions in this class.\n ', '__init__': <function ScrapingLogic.__init__>, '__getitem__': <function ScrapingLogic.__getitem__>, '__getattr__': <function ScrapingLogic.__getattr__>, '__repr__': <function ScrapingLogic.__repr__>, 'sleep': <function ScrapingLogic.sleep>, 'replace_input_text': <function ScrapingLogic.replace_input_text>, 'clear_input_text': <function ScrapingLogic.clear_input_text>, 'send_keys_to_element': <function ScrapingLogic.send_keys_to_element>, 'is_visible': <function ScrapingLogic.is_visible>, 'set_data': <function ScrapingLogic.set_data>, 'take_screenshot': <function ScrapingLogic.take_screenshot>, 'append_data': <function ScrapingLogic.append_data>, 'send_data_to_backend': <function ScrapingLogic.send_data_to_backend>, 'has_data': <function ScrapingLogic.has_data>, 'get_data': <function ScrapingLogic.get_data>, 'notify': <function ScrapingLogic.notify>, 'convert_table_to_df': <function ScrapingLogic.convert_table_to_df>, 'get_number_of_content': <function ScrapingLogic.get_number_of_content>, 'convert_tables_to_df': <function ScrapingLogic.convert_tables_to_df>, 'get_all_elements': <function ScrapingLogic.get_all_elements>, 'wait_for_reload': <function ScrapingLogic.wait_for_reload>, 'get_best_element': <function ScrapingLogic.get_best_element>, 'element_count': <function ScrapingLogic.element_count>, 'element_text': <function ScrapingLogic.element_text>, 'wait_until_present': <function ScrapingLogic.wait_until_present>, 'wait_until_clickable': <function ScrapingLogic.wait_until_clickable>, 'element_exists': <function ScrapingLogic.element_exists>, 'set_attribute': <function ScrapingLogic.set_attribute>, 'scroll_to_element': <function ScrapingLogic.scroll_to_element>, 'click_on_element_by_xpath_with_jquery': <function ScrapingLogic.click_on_element_by_xpath_with_jquery>, 'inner_text_contains': <function ScrapingLogic.inner_text_contains>, 'click_on_best_element': <function ScrapingLogic.click_on_best_element>, 'click_by_jquery_on_node': <function ScrapingLogic.click_by_jquery_on_node>, 'scroll_to_element_by_js': <function ScrapingLogic.scroll_to_element_by_js>, '__dict__': <attribute '__dict__' of 'ScrapingLogic' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingLogic' objects>, '__annotations__': {}})
- __module__ = 'pyselenscrapr.ScrapingLogic'
pyselenscrapr.ScrapingStep module
- class pyselenscrapr.ScrapingStep.ScrapingStepErrorHandling
Bases:
object- ThrowException = 0
- RetryAndThrowException = 1
- Ignore = 2
- __dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingStep', 'ThrowException': 0, 'RetryAndThrowException': 1, 'Ignore': 2, '__dict__': <attribute '__dict__' of 'ScrapingStepErrorHandling' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingStepErrorHandling' objects>, '__doc__': None, '__annotations__': {}})
- __module__ = 'pyselenscrapr.ScrapingStep'
- class pyselenscrapr.ScrapingStep.ScrapingStepInterval
Bases:
object- Order = 0
- BeforeAnyStep = 1
- AfterAnyStep = 2
- BeforeValidation = 3
- AfterValidation = 4
- BeforeRetry = 5
- AfterRetry = 6
- BeforePagination = 7
- AfterPagination = 8
- __dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingStep', 'Order': 0, 'BeforeAnyStep': 1, 'AfterAnyStep': 2, 'BeforeValidation': 3, 'AfterValidation': 4, 'BeforeRetry': 5, 'AfterRetry': 6, 'BeforePagination': 7, 'AfterPagination': 8, '__dict__': <attribute '__dict__' of 'ScrapingStepInterval' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingStepInterval' objects>, '__doc__': None, '__annotations__': {}})
- __module__ = 'pyselenscrapr.ScrapingStep'
- class pyselenscrapr.ScrapingStep.ScrapingStepRepeat
Bases:
object- Repeat = 0
- NoRepeat = 1
- __dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingStep', 'Repeat': 0, 'NoRepeat': 1, '__dict__': <attribute '__dict__' of 'ScrapingStepRepeat' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingStepRepeat' objects>, '__doc__': None, '__annotations__': {}})
- __module__ = 'pyselenscrapr.ScrapingStep'
- class pyselenscrapr.ScrapingStep.IScrapingStep
Bases:
object- __init__()
- __dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingStep', '__init__': <function IScrapingStep.__init__>, '__dict__': <attribute '__dict__' of 'IScrapingStep' objects>, '__weakref__': <attribute '__weakref__' of 'IScrapingStep' objects>, '__doc__': None, '__annotations__': {}})
- __module__ = 'pyselenscrapr.ScrapingStep'
- class pyselenscrapr.ScrapingStep.ScrapingStep(name: str, execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None], can_execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, was_executed: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, before_validation: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, retry: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, previous_step: ~pyselenscrapr.ScrapingStep.IScrapingStep | None = None, error_handling: ~pyselenscrapr.ScrapingStep.ScrapingStepErrorHandling = 1, interval: ~pyselenscrapr.ScrapingStep.ScrapingStepInterval = 0, repeat: ~pyselenscrapr.ScrapingStep.ScrapingStepRepeat = 1, exit_bot_when_errored: bool = False, retry_count: int = 3)
Bases:
IScrapingStepScrapingStep is a class that represents a single step in a scraping process.
- childGroups = []
- robot = None
- __init__(name: str, execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None], can_execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, was_executed: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, before_validation: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, retry: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, previous_step: ~pyselenscrapr.ScrapingStep.IScrapingStep | None = None, error_handling: ~pyselenscrapr.ScrapingStep.ScrapingStepErrorHandling = 1, interval: ~pyselenscrapr.ScrapingStep.ScrapingStepInterval = 0, repeat: ~pyselenscrapr.ScrapingStep.ScrapingStepRepeat = 1, exit_bot_when_errored: bool = False, retry_count: int = 3)
Constructor for ScrapingStep
- Parameters:
name – a string representing the name of the step - it is used later for debugging purposes
execute – a function that will be executed when the step is executed
can_execute – a function that will be executed to check if the step can be executed
was_executed – a function that will be executed to check if the step was executed
before_validation – a function that will be executed before the step is validated
retry – a function that will be executed if the step fails
previous_step – the previous step in the scraping process
error_handling – an enum representing the error handling strategy
interval – an enum representing the interval at which the step will be executed
repeat – an enum representing if the step should be repeated
retry_count – an integer representing the number of retries
- __str__()
Return str(self).
- interval() ScrapingStepInterval
- next_step(step)
- raise_exception(message)
- set_previous_step(step)
- add_child_group(group)
- previous_step() IScrapingStep
- execute(logic)
- set_executed()
- before_validation()
- reset()
- log(message)
- retry()
- __annotations__ = {}
- __module__ = 'pyselenscrapr.ScrapingStep'
- exit_bot_when_errored()
- can_retry()
- is_executed(logic)
- error_handling() ScrapingStepErrorHandling
- class pyselenscrapr.ScrapingStep.ScrapingStepConditional(name: str, execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None], can_execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, was_executed: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, before_validation: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, retry: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, previous_step: ~pyselenscrapr.ScrapingStep.IScrapingStep | None = None, error_handling: ~pyselenscrapr.ScrapingStep.ScrapingStepErrorHandling = 1, interval: ~pyselenscrapr.ScrapingStep.ScrapingStepInterval = 0, repeat: ~pyselenscrapr.ScrapingStep.ScrapingStepRepeat = 1, exit_bot_when_errored: bool = False, retry_count: int = 3)
Bases:
ScrapingStep- __annotations__ = {}
- __module__ = 'pyselenscrapr.ScrapingStep'
pyselenscrapr.ScrapingStepGroup module
- class pyselenscrapr.ScrapingStepGroup.ScrapingStepGroup(name: str, steps: [<class 'pyselenscrapr.ScrapingStep.IScrapingStep'>] = [])
Bases:
object- __init__(name: str, steps: [<class 'pyselenscrapr.ScrapingStep.IScrapingStep'>] = [])
- __annotations__ = {'_steps': [<class 'pyselenscrapr.ScrapingStep.IScrapingStep'>]}
- __dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingStepGroup', '__annotations__': {'_steps': [<class 'pyselenscrapr.ScrapingStep.IScrapingStep'>]}, '_steps': [], 'name': None, '__init__': <function ScrapingStepGroup.__init__>, 'add_step': <function ScrapingStepGroup.add_step>, '__dict__': <attribute '__dict__' of 'ScrapingStepGroup' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingStepGroup' objects>, '__doc__': None})
- __module__ = 'pyselenscrapr.ScrapingStepGroup'
- name = None
- add_step(step)
pyselenscrapr.ScrapingStepLoop module
- class pyselenscrapr.ScrapingStepLoop.ScrapingLogicIterator(logic: ScrapingLogic, element, index)
Bases:
ScrapingLogic- __init__(logic: ScrapingLogic, element, index)
Constructor for ScrapingLogic. Will be called from the bot.
- Parameters:
driver – the selenium driver
bot – the bot that is using the driver
- index()
- element()
- __annotations__ = {}
- __module__ = 'pyselenscrapr.ScrapingStepLoop'
- class pyselenscrapr.ScrapingStepLoop.ScrapingStepIteration(name: str, execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None], can_execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, was_executed: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, before_validation: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, retry: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, previous_step: ~pyselenscrapr.ScrapingStep.IScrapingStep | None = None, error_handling: ~pyselenscrapr.ScrapingStep.ScrapingStepErrorHandling = 1, interval: ~pyselenscrapr.ScrapingStep.ScrapingStepInterval = 0, repeat: ~pyselenscrapr.ScrapingStep.ScrapingStepRepeat = 1, exit_bot_when_errored: bool = False, retry_count: int = 3)
Bases:
ScrapingStep- __annotations__ = {}
- __module__ = 'pyselenscrapr.ScrapingStepLoop'
- class pyselenscrapr.ScrapingStepLoop.ScrapingStepLoop(name: str, iteration_callback: Callable[[ScrapingLogic], any], iteration_steps: list[ScrapingStep], iterations: int | None = None, retry_count: int = 3)
Bases:
ScrapingStep- __init__(name: str, iteration_callback: Callable[[ScrapingLogic], any], iteration_steps: list[ScrapingStep], iterations: int | None = None, retry_count: int = 3)
Constructor for ScrapingStep
- Parameters:
name – a string representing the name of the step - it is used later for debugging purposes
execute – a function that will be executed when the step is executed
can_execute – a function that will be executed to check if the step can be executed
was_executed – a function that will be executed to check if the step was executed
before_validation – a function that will be executed before the step is validated
retry – a function that will be executed if the step fails
previous_step – the previous step in the scraping process
error_handling – an enum representing the error handling strategy
interval – an enum representing the interval at which the step will be executed
repeat – an enum representing if the step should be repeated
retry_count – an integer representing the number of retries
- execute(logic: ScrapingLogic)
- __annotations__ = {}
- __module__ = 'pyselenscrapr.ScrapingStepLoop'
pyselenscrapr.ScrapingStepPagination module
- class pyselenscrapr.ScrapingStepPagination.ScrapingStepPaginationMode
Bases:
object- RandomPages = 1
- AllPages = 2
- __dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingStepPagination', 'RandomPages': 1, 'AllPages': 2, '__dict__': <attribute '__dict__' of 'ScrapingStepPaginationMode' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingStepPaginationMode' objects>, '__doc__': None, '__annotations__': {}})
- __module__ = 'pyselenscrapr.ScrapingStepPagination'
- class pyselenscrapr.ScrapingStepPagination.ScrapingStepPagination(name: str, execute: Callable[[IScrapingStep], any], goto_page: Callable[[IScrapingStep, int], None], validate_page: Callable[[IScrapingStep, int], bool], pagination_mode: ScrapingStepPaginationMode, page_count: Callable[[IScrapingStep], int], exit_bot_when_errored: bool = False)
Bases:
ScrapingStep- __init__(name: str, execute: Callable[[IScrapingStep], any], goto_page: Callable[[IScrapingStep, int], None], validate_page: Callable[[IScrapingStep, int], bool], pagination_mode: ScrapingStepPaginationMode, page_count: Callable[[IScrapingStep], int], exit_bot_when_errored: bool = False)
Constructor for ScrapingStep
- Parameters:
name – a string representing the name of the step - it is used later for debugging purposes
execute – a function that will be executed when the step is executed
can_execute – a function that will be executed to check if the step can be executed
was_executed – a function that will be executed to check if the step was executed
before_validation – a function that will be executed before the step is validated
retry – a function that will be executed if the step fails
previous_step – the previous step in the scraping process
error_handling – an enum representing the error handling strategy
interval – an enum representing the interval at which the step will be executed
repeat – an enum representing if the step should be repeated
retry_count – an integer representing the number of retries
- finished()
- can_retry()
- retry()
- sleep_random()
- execute(logic)
- __annotations__ = {}
- __module__ = 'pyselenscrapr.ScrapingStepPagination'