pyselenscrapr package

Submodules

pyselenscrapr.ScrapingBackend module

class pyselenscrapr.ScrapingBackend.IScrapingBackend

Bases: ABC

This is a interface that represents a backend for the scraping process.

saveData(data: dict, key: str | None = None)
errorHandling(error: Exception, debugData=None)
notify(message: str)
__abstractmethods__ = frozenset({})
__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingBackend', '__doc__': '\n    This is a interface that represents a backend for the scraping process.\n    ', 'saveData': <function IScrapingBackend.saveData>, 'errorHandling': <function IScrapingBackend.errorHandling>, 'notify': <function IScrapingBackend.notify>, '__dict__': <attribute '__dict__' of 'IScrapingBackend' objects>, '__weakref__': <attribute '__weakref__' of 'IScrapingBackend' objects>, '__abstractmethods__': frozenset(), '_abc_impl': <_abc._abc_data object>, '__annotations__': {}})
__module__ = 'pyselenscrapr.ScrapingBackend'
class pyselenscrapr.ScrapingBackend.ScrapingBackendWebhook(url, error_route='/error', notify_route='/notify', data_route='/data')

Bases: IScrapingBackend

This class is used to send the data you scraped to a webhook.

__init__(url, error_route='/error', notify_route='/notify', data_route='/data')
saveData(data: dict, key: str | None = None)
errorHandling(error: Exception, debugData=None)
__abstractmethods__ = frozenset({})
__annotations__ = {}
__module__ = 'pyselenscrapr.ScrapingBackend'
notify(message: str)

pyselenscrapr.ScrapingBot module

class pyselenscrapr.ScrapingBot.TakeScreenshotModes

Bases: object

This enum is used to define when the bot should take a screenshot of the current page.

OnError = 2
Always = 1
Never = 0
__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingBot', '__doc__': '\n    This enum is used to define when the bot should take a screenshot of the current page.\n    ', 'OnError': 2, 'Always': 1, 'Never': 0, '__dict__': <attribute '__dict__' of 'TakeScreenshotModes' objects>, '__weakref__': <attribute '__weakref__' of 'TakeScreenshotModes' objects>, '__annotations__': {}})
__module__ = 'pyselenscrapr.ScrapingBot'
class pyselenscrapr.ScrapingBot.ScrapingBot(driver, max_retries=3, take_screenshots_mode=0, backend: IScrapingBackend | None = None, repeat_count_till_error=5)

Bases: object

The ScrapingBot class is the main class to be used for creating a scraping bot. It is used to define the steps and groups of steps that the bot should execute. The bot can be run by calling the run() method.

__init__(driver, max_retries=3, take_screenshots_mode=0, backend: IScrapingBackend | None = None, repeat_count_till_error=5)
group_name()
backend_notify(message)
set_warning_handler(param)
set_exception_handler(param)
take_screenshot_on_error(path)
add_step_group(group_name: str)
add_step(step_or_callback: ScrapingStep, step_group: ScrapingStepGroup | None = None, before_validation: Callable[[ScrapingLogic], None] | None = None, after_validation: Callable[[ScrapingLogic], None] | None = None) ScrapingStep
sleep(seconds)
all_groups_executed()
get_next_step(step)
set_current_group(group)
finished()
get_all_steps_by_interval(interval)
get_next_group()
run(first_group: str | ScrapingStepGroup | None = None)

Run the bot and execute all steps in the defined groups.

Parameters:

first_group – This is the name of the first group to start. If it is None we use “default” as the first group.

Returns:

True if the bot finished successfully, False otherwise.

get_converted_data(data)
save_backend_data(data)
send_error_to_backend(error)
send_data_to_backend(key=None, data=None)
__annotations__ = {'_backend': <class 'pyselenscrapr.ScrapingBackend.IScrapingBackend'>}
__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingBot', '__annotations__': {'_backend': <class 'pyselenscrapr.ScrapingBackend.IScrapingBackend'>}, '__doc__': '\n\n    The ScrapingBot class is the main class to be used for creating a scraping bot. It is used to define the steps and\n    groups of steps that the bot should execute. The bot can be run by calling the run() method.\n\n    ', '_data': {}, '_driver': None, '_warning_handler': None, '_current_group': None, '_exception_handler': None, '_take_screenshot_on_error': None, '_retry_count': 0, '_repeat_count_till_error': 5, '_repeat_count': 0, '_stepGroups': [], '_max_retries': 3, '_take_screenshots_mode': 0, '_screenshot_path': '.', '_backend': None, '__init__': <function ScrapingBot.__init__>, '_on_warning': <function ScrapingBot._on_warning>, 'group_name': <function ScrapingBot.group_name>, 'backend_notify': <function ScrapingBot.backend_notify>, '_take_screenshot': <function ScrapingBot._take_screenshot>, '_raise_exception': <function ScrapingBot._raise_exception>, '_on_exception': <function ScrapingBot._on_exception>, 'set_warning_handler': <function ScrapingBot.set_warning_handler>, 'set_exception_handler': <function ScrapingBot.set_exception_handler>, 'take_screenshot_on_error': <function ScrapingBot.take_screenshot_on_error>, 'add_step_group': <function ScrapingBot.add_step_group>, 'add_step': <function ScrapingBot.add_step>, '_run_step': <function ScrapingBot._run_step>, 'sleep': <function ScrapingBot.sleep>, '_on_debug': <function ScrapingBot._on_debug>, 'all_groups_executed': <function ScrapingBot.all_groups_executed>, 'get_next_step': <function ScrapingBot.get_next_step>, 'set_current_group': <function ScrapingBot.set_current_group>, 'finished': <function ScrapingBot.finished>, 'get_all_steps_by_interval': <function ScrapingBot.get_all_steps_by_interval>, '_run_before_step': <function ScrapingBot._run_before_step>, 'get_next_group': <function ScrapingBot.get_next_group>, '_run_after_step': <function ScrapingBot._run_after_step>, '_run_after_group': <function ScrapingBot._run_after_group>, '_is_group_finished': <function ScrapingBot._is_group_finished>, 'run': <function ScrapingBot.run>, 'get_converted_data': <function ScrapingBot.get_converted_data>, 'save_backend_data': <function ScrapingBot.save_backend_data>, 'send_error_to_backend': <function ScrapingBot.send_error_to_backend>, 'send_data_to_backend': <function ScrapingBot.send_data_to_backend>, 'set_data': <function ScrapingBot.set_data>, 'append_data': <function ScrapingBot.append_data>, 'has_data': <function ScrapingBot.has_data>, 'get_data': <function ScrapingBot.get_data>, 'get_task_log': <function ScrapingBot.get_task_log>, '__dict__': <attribute '__dict__' of 'ScrapingBot' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingBot' objects>})
__module__ = 'pyselenscrapr.ScrapingBot'
set_data(key, value, send_to_backend=False)
append_data(key, value, send_to_backend=False)
has_data(key)
get_data(key)
get_task_log()

pyselenscrapr.ScrapingLogic module

pyselenscrapr.ScrapingLogic.tocontainer(func, bot)
class pyselenscrapr.ScrapingLogic.ScrapingLogic(driver, bot)

Bases: object

ScraipingLogic is a class that is the main interface between the bot and the selenium driver. It contains a lot of helper functions that make it easier to interact with the driver.

You can use all driver functions and also the functions in this class.

__init__(driver, bot)

Constructor for ScrapingLogic. Will be called from the bot.

Parameters:
  • driver – the selenium driver

  • bot – the bot that is using the driver

__getitem__(item)
__getattr__(item)
__repr__()

Return repr(self).

sleep(seconds)

Sleep for a given amount of seconds. :param seconds: the amount of seconds to sleep :return: None

replace_input_text(selector, keys)

If you have an input field and you want to replace the text in it, you can use this function.. It will first scroll to the element, then select all text in the input field and then send the new keys to the input field.

Parameters:
  • selector – CSS or XPATH selector

  • keys – the new text that should be in the input field

Returns:

True if the operation was successful, False otherwise

clear_input_text(selector)

If you have an input field and you want to clear the text in it, you can use this function. It will first scroll to the element, then select all text in the input field and then send the DELETE key to the input field.

Parameters:

selector – CSS or XPATH selector

Returns:

True if the operation was successful, False otherwise

send_keys_to_element(selector, keys)

Send keys to an element. This function will send the keys to a CSS or XPATH element.

Parameters:
  • selector – CSS or XPATH selector

  • keys – the keys that should be sent to the element

Returns:

True if the operation was successful, False otherwise

is_visible(selector)
set_data(key, value, send_to_backend=False)
take_screenshot(step)
append_data(key, value, send_to_backend=False)
send_data_to_backend(key=None, data=None)
has_data(key)
get_data(key)
notify(message)
convert_table_to_df(t)
get_number_of_content(selector)
convert_tables_to_df(tables)
get_all_elements(selector)
wait_for_reload(timeout=40, min_wait=0.1)
get_best_element(selector)
element_count(selector)
element_text(selector)

Get the text of an element. This function will return the text of the element if it is available.

Parameters:

selector – CSS or XPATH selector

Returns:

the text of the element

wait_until_present(selector, timeout=20)

Wait until an element is present. This function will wait until an element is present in the DOM.

Parameters:
  • selector – CSS or XPATH selector

  • timeout – the timeout in seconds

Returns:

True if the element is present, False otherwise

wait_until_clickable(selector, timeout=10)
element_exists(selector)
set_attribute(selector, attribute, value)
scroll_to_element(selector)
click_on_element_by_xpath_with_jquery(xpath)
inner_text_contains(selector, text)
click_on_best_element(selector)
click_by_jquery_on_node(parent_button)
scroll_to_element_by_js(object)
__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingLogic', '__doc__': '\n    ScraipingLogic is a class that is the main interface between the bot and the selenium driver.\n    It contains a lot of helper functions that make it easier to interact with the driver.\n\n    You can use all driver functions and also the functions in this class.\n    ', '__init__': <function ScrapingLogic.__init__>, '__getitem__': <function ScrapingLogic.__getitem__>, '__getattr__': <function ScrapingLogic.__getattr__>, '__repr__': <function ScrapingLogic.__repr__>, 'sleep': <function ScrapingLogic.sleep>, 'replace_input_text': <function ScrapingLogic.replace_input_text>, 'clear_input_text': <function ScrapingLogic.clear_input_text>, 'send_keys_to_element': <function ScrapingLogic.send_keys_to_element>, 'is_visible': <function ScrapingLogic.is_visible>, 'set_data': <function ScrapingLogic.set_data>, 'take_screenshot': <function ScrapingLogic.take_screenshot>, 'append_data': <function ScrapingLogic.append_data>, 'send_data_to_backend': <function ScrapingLogic.send_data_to_backend>, 'has_data': <function ScrapingLogic.has_data>, 'get_data': <function ScrapingLogic.get_data>, 'notify': <function ScrapingLogic.notify>, 'convert_table_to_df': <function ScrapingLogic.convert_table_to_df>, 'get_number_of_content': <function ScrapingLogic.get_number_of_content>, 'convert_tables_to_df': <function ScrapingLogic.convert_tables_to_df>, 'get_all_elements': <function ScrapingLogic.get_all_elements>, 'wait_for_reload': <function ScrapingLogic.wait_for_reload>, 'get_best_element': <function ScrapingLogic.get_best_element>, 'element_count': <function ScrapingLogic.element_count>, 'element_text': <function ScrapingLogic.element_text>, 'wait_until_present': <function ScrapingLogic.wait_until_present>, 'wait_until_clickable': <function ScrapingLogic.wait_until_clickable>, 'element_exists': <function ScrapingLogic.element_exists>, 'set_attribute': <function ScrapingLogic.set_attribute>, 'scroll_to_element': <function ScrapingLogic.scroll_to_element>, 'click_on_element_by_xpath_with_jquery': <function ScrapingLogic.click_on_element_by_xpath_with_jquery>, 'inner_text_contains': <function ScrapingLogic.inner_text_contains>, 'click_on_best_element': <function ScrapingLogic.click_on_best_element>, 'click_by_jquery_on_node': <function ScrapingLogic.click_by_jquery_on_node>, 'scroll_to_element_by_js': <function ScrapingLogic.scroll_to_element_by_js>, '__dict__': <attribute '__dict__' of 'ScrapingLogic' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingLogic' objects>, '__annotations__': {}})
__module__ = 'pyselenscrapr.ScrapingLogic'

pyselenscrapr.ScrapingStep module

class pyselenscrapr.ScrapingStep.ScrapingStepErrorHandling

Bases: object

ThrowException = 0
RetryAndThrowException = 1
Ignore = 2
__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingStep', 'ThrowException': 0, 'RetryAndThrowException': 1, 'Ignore': 2, '__dict__': <attribute '__dict__' of 'ScrapingStepErrorHandling' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingStepErrorHandling' objects>, '__doc__': None, '__annotations__': {}})
__module__ = 'pyselenscrapr.ScrapingStep'
class pyselenscrapr.ScrapingStep.ScrapingStepInterval

Bases: object

Order = 0
BeforeAnyStep = 1
AfterAnyStep = 2
BeforeValidation = 3
AfterValidation = 4
BeforeRetry = 5
AfterRetry = 6
BeforePagination = 7
AfterPagination = 8
__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingStep', 'Order': 0, 'BeforeAnyStep': 1, 'AfterAnyStep': 2, 'BeforeValidation': 3, 'AfterValidation': 4, 'BeforeRetry': 5, 'AfterRetry': 6, 'BeforePagination': 7, 'AfterPagination': 8, '__dict__': <attribute '__dict__' of 'ScrapingStepInterval' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingStepInterval' objects>, '__doc__': None, '__annotations__': {}})
__module__ = 'pyselenscrapr.ScrapingStep'
class pyselenscrapr.ScrapingStep.ScrapingStepRepeat

Bases: object

Repeat = 0
NoRepeat = 1
__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingStep', 'Repeat': 0, 'NoRepeat': 1, '__dict__': <attribute '__dict__' of 'ScrapingStepRepeat' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingStepRepeat' objects>, '__doc__': None, '__annotations__': {}})
__module__ = 'pyselenscrapr.ScrapingStep'
class pyselenscrapr.ScrapingStep.IScrapingStep

Bases: object

__init__()
__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingStep', '__init__': <function IScrapingStep.__init__>, '__dict__': <attribute '__dict__' of 'IScrapingStep' objects>, '__weakref__': <attribute '__weakref__' of 'IScrapingStep' objects>, '__doc__': None, '__annotations__': {}})
__module__ = 'pyselenscrapr.ScrapingStep'
class pyselenscrapr.ScrapingStep.ScrapingStep(name: str, execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None], can_execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, was_executed: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, before_validation: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, retry: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, previous_step: ~pyselenscrapr.ScrapingStep.IScrapingStep | None = None, error_handling: ~pyselenscrapr.ScrapingStep.ScrapingStepErrorHandling = 1, interval: ~pyselenscrapr.ScrapingStep.ScrapingStepInterval = 0, repeat: ~pyselenscrapr.ScrapingStep.ScrapingStepRepeat = 1, exit_bot_when_errored: bool = False, retry_count: int = 3)

Bases: IScrapingStep

ScrapingStep is a class that represents a single step in a scraping process.

childGroups = []
robot = None
__init__(name: str, execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None], can_execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, was_executed: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, before_validation: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, retry: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, previous_step: ~pyselenscrapr.ScrapingStep.IScrapingStep | None = None, error_handling: ~pyselenscrapr.ScrapingStep.ScrapingStepErrorHandling = 1, interval: ~pyselenscrapr.ScrapingStep.ScrapingStepInterval = 0, repeat: ~pyselenscrapr.ScrapingStep.ScrapingStepRepeat = 1, exit_bot_when_errored: bool = False, retry_count: int = 3)

Constructor for ScrapingStep

Parameters:
  • name – a string representing the name of the step - it is used later for debugging purposes

  • execute – a function that will be executed when the step is executed

  • can_execute – a function that will be executed to check if the step can be executed

  • was_executed – a function that will be executed to check if the step was executed

  • before_validation – a function that will be executed before the step is validated

  • retry – a function that will be executed if the step fails

  • previous_step – the previous step in the scraping process

  • error_handling – an enum representing the error handling strategy

  • interval – an enum representing the interval at which the step will be executed

  • repeat – an enum representing if the step should be repeated

  • retry_count – an integer representing the number of retries

__str__()

Return str(self).

name() str
interval() ScrapingStepInterval
next_step(step)
raise_exception(message)
set_previous_step(step)
add_child_group(group)
previous_step() IScrapingStep
can_execute() bool
was_executed() bool
execute(logic)
set_executed()
before_validation()
reset()
log(message)
retry()
__annotations__ = {}
__module__ = 'pyselenscrapr.ScrapingStep'
exit_bot_when_errored()
can_retry()
is_executed(logic)
error_handling() ScrapingStepErrorHandling
class pyselenscrapr.ScrapingStep.ScrapingStepConditional(name: str, execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None], can_execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, was_executed: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, before_validation: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, retry: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, previous_step: ~pyselenscrapr.ScrapingStep.IScrapingStep | None = None, error_handling: ~pyselenscrapr.ScrapingStep.ScrapingStepErrorHandling = 1, interval: ~pyselenscrapr.ScrapingStep.ScrapingStepInterval = 0, repeat: ~pyselenscrapr.ScrapingStep.ScrapingStepRepeat = 1, exit_bot_when_errored: bool = False, retry_count: int = 3)

Bases: ScrapingStep

__annotations__ = {}
__module__ = 'pyselenscrapr.ScrapingStep'

pyselenscrapr.ScrapingStepGroup module

class pyselenscrapr.ScrapingStepGroup.ScrapingStepGroup(name: str, steps: [<class 'pyselenscrapr.ScrapingStep.IScrapingStep'>] = [])

Bases: object

__init__(name: str, steps: [<class 'pyselenscrapr.ScrapingStep.IScrapingStep'>] = [])
__annotations__ = {'_steps': [<class 'pyselenscrapr.ScrapingStep.IScrapingStep'>]}
__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingStepGroup', '__annotations__': {'_steps': [<class 'pyselenscrapr.ScrapingStep.IScrapingStep'>]}, '_steps': [], 'name': None, '__init__': <function ScrapingStepGroup.__init__>, 'add_step': <function ScrapingStepGroup.add_step>, '__dict__': <attribute '__dict__' of 'ScrapingStepGroup' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingStepGroup' objects>, '__doc__': None})
__module__ = 'pyselenscrapr.ScrapingStepGroup'
name = None
add_step(step)

pyselenscrapr.ScrapingStepLoop module

class pyselenscrapr.ScrapingStepLoop.ScrapingLogicIterator(logic: ScrapingLogic, element, index)

Bases: ScrapingLogic

__init__(logic: ScrapingLogic, element, index)

Constructor for ScrapingLogic. Will be called from the bot.

Parameters:
  • driver – the selenium driver

  • bot – the bot that is using the driver

index()
element()
__annotations__ = {}
__module__ = 'pyselenscrapr.ScrapingStepLoop'
class pyselenscrapr.ScrapingStepLoop.ScrapingStepIteration(name: str, execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None], can_execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, was_executed: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, before_validation: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, retry: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, previous_step: ~pyselenscrapr.ScrapingStep.IScrapingStep | None = None, error_handling: ~pyselenscrapr.ScrapingStep.ScrapingStepErrorHandling = 1, interval: ~pyselenscrapr.ScrapingStep.ScrapingStepInterval = 0, repeat: ~pyselenscrapr.ScrapingStep.ScrapingStepRepeat = 1, exit_bot_when_errored: bool = False, retry_count: int = 3)

Bases: ScrapingStep

__annotations__ = {}
__module__ = 'pyselenscrapr.ScrapingStepLoop'
class pyselenscrapr.ScrapingStepLoop.ScrapingStepLoop(name: str, iteration_callback: Callable[[ScrapingLogic], any], iteration_steps: list[ScrapingStep], iterations: int | None = None, retry_count: int = 3)

Bases: ScrapingStep

__init__(name: str, iteration_callback: Callable[[ScrapingLogic], any], iteration_steps: list[ScrapingStep], iterations: int | None = None, retry_count: int = 3)

Constructor for ScrapingStep

Parameters:
  • name – a string representing the name of the step - it is used later for debugging purposes

  • execute – a function that will be executed when the step is executed

  • can_execute – a function that will be executed to check if the step can be executed

  • was_executed – a function that will be executed to check if the step was executed

  • before_validation – a function that will be executed before the step is validated

  • retry – a function that will be executed if the step fails

  • previous_step – the previous step in the scraping process

  • error_handling – an enum representing the error handling strategy

  • interval – an enum representing the interval at which the step will be executed

  • repeat – an enum representing if the step should be repeated

  • retry_count – an integer representing the number of retries

execute(logic: ScrapingLogic)
__annotations__ = {}
__module__ = 'pyselenscrapr.ScrapingStepLoop'

pyselenscrapr.ScrapingStepPagination module

class pyselenscrapr.ScrapingStepPagination.ScrapingStepPaginationMode

Bases: object

RandomPages = 1
AllPages = 2
__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingStepPagination', 'RandomPages': 1, 'AllPages': 2, '__dict__': <attribute '__dict__' of 'ScrapingStepPaginationMode' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingStepPaginationMode' objects>, '__doc__': None, '__annotations__': {}})
__module__ = 'pyselenscrapr.ScrapingStepPagination'
class pyselenscrapr.ScrapingStepPagination.ScrapingStepPagination(name: str, execute: Callable[[IScrapingStep], any], goto_page: Callable[[IScrapingStep, int], None], validate_page: Callable[[IScrapingStep, int], bool], pagination_mode: ScrapingStepPaginationMode, page_count: Callable[[IScrapingStep], int], exit_bot_when_errored: bool = False)

Bases: ScrapingStep

__init__(name: str, execute: Callable[[IScrapingStep], any], goto_page: Callable[[IScrapingStep, int], None], validate_page: Callable[[IScrapingStep, int], bool], pagination_mode: ScrapingStepPaginationMode, page_count: Callable[[IScrapingStep], int], exit_bot_when_errored: bool = False)

Constructor for ScrapingStep

Parameters:
  • name – a string representing the name of the step - it is used later for debugging purposes

  • execute – a function that will be executed when the step is executed

  • can_execute – a function that will be executed to check if the step can be executed

  • was_executed – a function that will be executed to check if the step was executed

  • before_validation – a function that will be executed before the step is validated

  • retry – a function that will be executed if the step fails

  • previous_step – the previous step in the scraping process

  • error_handling – an enum representing the error handling strategy

  • interval – an enum representing the interval at which the step will be executed

  • repeat – an enum representing if the step should be repeated

  • retry_count – an integer representing the number of retries

finished()
can_retry()
retry()
sleep_random()
execute(logic)
__annotations__ = {}
__module__ = 'pyselenscrapr.ScrapingStepPagination'

pyselenscrapr.ValidationError module

exception pyselenscrapr.ValidationError.ValidationError(validation_error_message)

Bases: Exception

__init__(validation_error_message)
validation_error_message = None
__module__ = 'pyselenscrapr.ValidationError'
__str__()

Return str(self).

Module contents