pyselenscrapr package

Submodules

pyselenscrapr.ScrapingBackend module

class pyselenscrapr.ScrapingBackend.IScrapingBackend

Bases: ABC

This is a interface that represents a backend for the scraping process.

saveData(data: dict, key: str | None = None)

errorHandling(error: Exception, debugData=None)

notify(message: str)

__abstractmethods__ = frozenset({})

__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingBackend', '__doc__': '\n This is a interface that represents a backend for the scraping process.\n ', 'saveData': <function IScrapingBackend.saveData>, 'errorHandling': <function IScrapingBackend.errorHandling>, 'notify': <function IScrapingBackend.notify>, '__dict__': <attribute '__dict__' of 'IScrapingBackend' objects>, '__weakref__': <attribute '__weakref__' of 'IScrapingBackend' objects>, '__abstractmethods__': frozenset(), '_abc_impl': <_abc._abc_data object>, '__annotations__': {}})

__module__ = 'pyselenscrapr.ScrapingBackend'

class pyselenscrapr.ScrapingBackend.ScrapingBackendWebhook(url, error_route='/error', notify_route='/notify', data_route='/data')

Bases: IScrapingBackend

This class is used to send the data you scraped to a webhook.

__init__(url, error_route='/error', notify_route='/notify', data_route='/data')

saveData(data: dict, key: str | None = None)

errorHandling(error: Exception, debugData=None)

__abstractmethods__ = frozenset({})

__annotations__ = {}

__module__ = 'pyselenscrapr.ScrapingBackend'

notify(message: str)

pyselenscrapr.ScrapingBot module

class pyselenscrapr.ScrapingBot.TakeScreenshotModes

Bases: object

This enum is used to define when the bot should take a screenshot of the current page.

OnError = 2

Always = 1

Never = 0

__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingBot', '__doc__': '\n This enum is used to define when the bot should take a screenshot of the current page.\n ', 'OnError': 2, 'Always': 1, 'Never': 0, '__dict__': <attribute '__dict__' of 'TakeScreenshotModes' objects>, '__weakref__': <attribute '__weakref__' of 'TakeScreenshotModes' objects>, '__annotations__': {}})

__module__ = 'pyselenscrapr.ScrapingBot'

class pyselenscrapr.ScrapingBot.ScrapingBot(driver, max_retries=3, take_screenshots_mode=0, backend: IScrapingBackend | None = None, repeat_count_till_error=5)

Bases: object

The ScrapingBot class is the main class to be used for creating a scraping bot. It is used to define the steps and groups of steps that the bot should execute. The bot can be run by calling the run() method.

__init__(driver, max_retries=3, take_screenshots_mode=0, backend: IScrapingBackend | None = None, repeat_count_till_error=5)

group_name()

backend_notify(message)

set_warning_handler(param)

set_exception_handler(param)

take_screenshot_on_error(path)

add_step_group(group_name: str)

add_step(step_or_callback: ScrapingStep, step_group: ScrapingStepGroup | None = None, before_validation: Callable[[ScrapingLogic], None] | None = None, after_validation: Callable[[ScrapingLogic], None] | None = None) → ScrapingStep

sleep(seconds)

all_groups_executed()

get_next_step(step)

set_current_group(group)

finished()

get_all_steps_by_interval(interval)

get_next_group()

run(first_group: str | ScrapingStepGroup | None = None)

Run the bot and execute all steps in the defined groups.

Parameters:: first_group – This is the name of the first group to start. If it is None we use “default” as the first group.
Returns:: True if the bot finished successfully, False otherwise.

get_converted_data(data)

save_backend_data(data)

send_error_to_backend(error)

send_data_to_backend(key=None, data=None)

__annotations__ = {'_backend': <class 'pyselenscrapr.ScrapingBackend.IScrapingBackend'>}

__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingBot', '__annotations__': {'_backend': <class 'pyselenscrapr.ScrapingBackend.IScrapingBackend'>}, '__doc__': '\n\n The ScrapingBot class is the main class to be used for creating a scraping bot. It is used to define the steps and\n groups of steps that the bot should execute. The bot can be run by calling the run() method.\n\n ', '_data': {}, '_driver': None, '_warning_handler': None, '_current_group': None, '_exception_handler': None, '_take_screenshot_on_error': None, '_retry_count': 0, '_repeat_count_till_error': 5, '_repeat_count': 0, '_stepGroups': [], '_max_retries': 3, '_take_screenshots_mode': 0, '_screenshot_path': '.', '_backend': None, '__init__': <function ScrapingBot.__init__>, '_on_warning': <function ScrapingBot._on_warning>, 'group_name': <function ScrapingBot.group_name>, 'backend_notify': <function ScrapingBot.backend_notify>, '_take_screenshot': <function ScrapingBot._take_screenshot>, '_raise_exception': <function ScrapingBot._raise_exception>, '_on_exception': <function ScrapingBot._on_exception>, 'set_warning_handler': <function ScrapingBot.set_warning_handler>, 'set_exception_handler': <function ScrapingBot.set_exception_handler>, 'take_screenshot_on_error': <function ScrapingBot.take_screenshot_on_error>, 'add_step_group': <function ScrapingBot.add_step_group>, 'add_step': <function ScrapingBot.add_step>, '_run_step': <function ScrapingBot._run_step>, 'sleep': <function ScrapingBot.sleep>, '_on_debug': <function ScrapingBot._on_debug>, 'all_groups_executed': <function ScrapingBot.all_groups_executed>, 'get_next_step': <function ScrapingBot.get_next_step>, 'set_current_group': <function ScrapingBot.set_current_group>, 'finished': <function ScrapingBot.finished>, 'get_all_steps_by_interval': <function ScrapingBot.get_all_steps_by_interval>, '_run_before_step': <function ScrapingBot._run_before_step>, 'get_next_group': <function ScrapingBot.get_next_group>, '_run_after_step': <function ScrapingBot._run_after_step>, '_run_after_group': <function ScrapingBot._run_after_group>, '_is_group_finished': <function ScrapingBot._is_group_finished>, 'run': <function ScrapingBot.run>, 'get_converted_data': <function ScrapingBot.get_converted_data>, 'save_backend_data': <function ScrapingBot.save_backend_data>, 'send_error_to_backend': <function ScrapingBot.send_error_to_backend>, 'send_data_to_backend': <function ScrapingBot.send_data_to_backend>, 'set_data': <function ScrapingBot.set_data>, 'append_data': <function ScrapingBot.append_data>, 'has_data': <function ScrapingBot.has_data>, 'get_data': <function ScrapingBot.get_data>, 'get_task_log': <function ScrapingBot.get_task_log>, '__dict__': <attribute '__dict__' of 'ScrapingBot' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingBot' objects>})

__module__ = 'pyselenscrapr.ScrapingBot'

set_data(key, value, send_to_backend=False)

append_data(key, value, send_to_backend=False)

has_data(key)

get_data(key)

get_task_log()

pyselenscrapr.ScrapingLogic module

pyselenscrapr.ScrapingLogic.tocontainer(func, bot)

class pyselenscrapr.ScrapingLogic.ScrapingLogic(driver, bot)

Bases: object

ScraipingLogic is a class that is the main interface between the bot and the selenium driver. It contains a lot of helper functions that make it easier to interact with the driver.

You can use all driver functions and also the functions in this class.

__init__(driver, bot)

Constructor for ScrapingLogic. Will be called from the bot.

Parameters:

driver – the selenium driver
bot – the bot that is using the driver

__getitem__(item)

__getattr__(item)

__repr__(): Return repr(self).

sleep(seconds): Sleep for a given amount of seconds. :param seconds: the amount of seconds to sleep :return: None

replace_input_text(selector, keys)

If you have an input field and you want to replace the text in it, you can use this function.. It will first scroll to the element, then select all text in the input field and then send the new keys to the input field.

Parameters:

selector – CSS or XPATH selector
keys – the new text that should be in the input field

Returns:

True if the operation was successful, False otherwise

clear_input_text(selector)

If you have an input field and you want to clear the text in it, you can use this function. It will first scroll to the element, then select all text in the input field and then send the DELETE key to the input field.

Parameters:: selector – CSS or XPATH selector
Returns:: True if the operation was successful, False otherwise

send_keys_to_element(selector, keys)

Send keys to an element. This function will send the keys to a CSS or XPATH element.

Parameters:

selector – CSS or XPATH selector
keys – the keys that should be sent to the element

Returns:

True if the operation was successful, False otherwise

is_visible(selector)

set_data(key, value, send_to_backend=False)

take_screenshot(step)

append_data(key, value, send_to_backend=False)

send_data_to_backend(key=None, data=None)

has_data(key)

get_data(key)

notify(message)

convert_table_to_df(t)

get_number_of_content(selector)

convert_tables_to_df(tables)

get_all_elements(selector)

wait_for_reload(timeout=40, min_wait=0.1)

get_best_element(selector)

element_count(selector)

element_text(selector)

Get the text of an element. This function will return the text of the element if it is available.

Parameters:: selector – CSS or XPATH selector
Returns:: the text of the element

wait_until_present(selector, timeout=20)

Wait until an element is present. This function will wait until an element is present in the DOM.

Parameters:

selector – CSS or XPATH selector
timeout – the timeout in seconds

Returns:

True if the element is present, False otherwise

wait_until_clickable(selector, timeout=10)

element_exists(selector)

set_attribute(selector, attribute, value)

scroll_to_element(selector)

click_on_element_by_xpath_with_jquery(xpath)

inner_text_contains(selector, text)

click_on_best_element(selector)

click_by_jquery_on_node(parent_button)

scroll_to_element_by_js(object)

__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingLogic', '__doc__': '\n ScraipingLogic is a class that is the main interface between the bot and the selenium driver.\n It contains a lot of helper functions that make it easier to interact with the driver.\n\n You can use all driver functions and also the functions in this class.\n ', '__init__': <function ScrapingLogic.__init__>, '__getitem__': <function ScrapingLogic.__getitem__>, '__getattr__': <function ScrapingLogic.__getattr__>, '__repr__': <function ScrapingLogic.__repr__>, 'sleep': <function ScrapingLogic.sleep>, 'replace_input_text': <function ScrapingLogic.replace_input_text>, 'clear_input_text': <function ScrapingLogic.clear_input_text>, 'send_keys_to_element': <function ScrapingLogic.send_keys_to_element>, 'is_visible': <function ScrapingLogic.is_visible>, 'set_data': <function ScrapingLogic.set_data>, 'take_screenshot': <function ScrapingLogic.take_screenshot>, 'append_data': <function ScrapingLogic.append_data>, 'send_data_to_backend': <function ScrapingLogic.send_data_to_backend>, 'has_data': <function ScrapingLogic.has_data>, 'get_data': <function ScrapingLogic.get_data>, 'notify': <function ScrapingLogic.notify>, 'convert_table_to_df': <function ScrapingLogic.convert_table_to_df>, 'get_number_of_content': <function ScrapingLogic.get_number_of_content>, 'convert_tables_to_df': <function ScrapingLogic.convert_tables_to_df>, 'get_all_elements': <function ScrapingLogic.get_all_elements>, 'wait_for_reload': <function ScrapingLogic.wait_for_reload>, 'get_best_element': <function ScrapingLogic.get_best_element>, 'element_count': <function ScrapingLogic.element_count>, 'element_text': <function ScrapingLogic.element_text>, 'wait_until_present': <function ScrapingLogic.wait_until_present>, 'wait_until_clickable': <function ScrapingLogic.wait_until_clickable>, 'element_exists': <function ScrapingLogic.element_exists>, 'set_attribute': <function ScrapingLogic.set_attribute>, 'scroll_to_element': <function ScrapingLogic.scroll_to_element>, 'click_on_element_by_xpath_with_jquery': <function ScrapingLogic.click_on_element_by_xpath_with_jquery>, 'inner_text_contains': <function ScrapingLogic.inner_text_contains>, 'click_on_best_element': <function ScrapingLogic.click_on_best_element>, 'click_by_jquery_on_node': <function ScrapingLogic.click_by_jquery_on_node>, 'scroll_to_element_by_js': <function ScrapingLogic.scroll_to_element_by_js>, '__dict__': <attribute '__dict__' of 'ScrapingLogic' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingLogic' objects>, '__annotations__': {}})

__module__ = 'pyselenscrapr.ScrapingLogic'

pyselenscrapr.ScrapingStep module

class pyselenscrapr.ScrapingStep.ScrapingStepErrorHandling

Bases: object

ThrowException = 0

RetryAndThrowException = 1

Ignore = 2

__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingStep', 'ThrowException': 0, 'RetryAndThrowException': 1, 'Ignore': 2, '__dict__': <attribute '__dict__' of 'ScrapingStepErrorHandling' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingStepErrorHandling' objects>, '__doc__': None, '__annotations__': {}})

__module__ = 'pyselenscrapr.ScrapingStep'

class pyselenscrapr.ScrapingStep.ScrapingStepInterval

Bases: object

Order = 0

BeforeAnyStep = 1

AfterAnyStep = 2

BeforeValidation = 3

AfterValidation = 4

BeforeRetry = 5

AfterRetry = 6

BeforePagination = 7

AfterPagination = 8

__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingStep', 'Order': 0, 'BeforeAnyStep': 1, 'AfterAnyStep': 2, 'BeforeValidation': 3, 'AfterValidation': 4, 'BeforeRetry': 5, 'AfterRetry': 6, 'BeforePagination': 7, 'AfterPagination': 8, '__dict__': <attribute '__dict__' of 'ScrapingStepInterval' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingStepInterval' objects>, '__doc__': None, '__annotations__': {}})

__module__ = 'pyselenscrapr.ScrapingStep'

class pyselenscrapr.ScrapingStep.ScrapingStepRepeat

Bases: object

Repeat = 0

NoRepeat = 1

__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingStep', 'Repeat': 0, 'NoRepeat': 1, '__dict__': <attribute '__dict__' of 'ScrapingStepRepeat' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingStepRepeat' objects>, '__doc__': None, '__annotations__': {}})

__module__ = 'pyselenscrapr.ScrapingStep'

class pyselenscrapr.ScrapingStep.IScrapingStep

Bases: object

__init__()

__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingStep', '__init__': <function IScrapingStep.__init__>, '__dict__': <attribute '__dict__' of 'IScrapingStep' objects>, '__weakref__': <attribute '__weakref__' of 'IScrapingStep' objects>, '__doc__': None, '__annotations__': {}})

__module__ = 'pyselenscrapr.ScrapingStep'

class pyselenscrapr.ScrapingStep.ScrapingStep(name: str, execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None], can_execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, was_executed: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, before_validation: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, retry: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, previous_step: ~pyselenscrapr.ScrapingStep.IScrapingStep | None = None, error_handling: ~pyselenscrapr.ScrapingStep.ScrapingStepErrorHandling = 1, interval: ~pyselenscrapr.ScrapingStep.ScrapingStepInterval = 0, repeat: ~pyselenscrapr.ScrapingStep.ScrapingStepRepeat = 1, exit_bot_when_errored: bool = False, retry_count: int = 3)

Bases: IScrapingStep

ScrapingStep is a class that represents a single step in a scraping process.

childGroups = []

robot = None

__init__(name: str, execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None], can_execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, was_executed: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, before_validation: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, retry: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, previous_step: ~pyselenscrapr.ScrapingStep.IScrapingStep | None = None, error_handling: ~pyselenscrapr.ScrapingStep.ScrapingStepErrorHandling = 1, interval: ~pyselenscrapr.ScrapingStep.ScrapingStepInterval = 0, repeat: ~pyselenscrapr.ScrapingStep.ScrapingStepRepeat = 1, exit_bot_when_errored: bool = False, retry_count: int = 3)

Constructor for ScrapingStep

Parameters:

name – a string representing the name of the step - it is used later for debugging purposes
execute – a function that will be executed when the step is executed
can_execute – a function that will be executed to check if the step can be executed
was_executed – a function that will be executed to check if the step was executed
before_validation – a function that will be executed before the step is validated
retry – a function that will be executed if the step fails
previous_step – the previous step in the scraping process
error_handling – an enum representing the error handling strategy
interval – an enum representing the interval at which the step will be executed
repeat – an enum representing if the step should be repeated
retry_count – an integer representing the number of retries

__str__(): Return str(self).

name() → str

interval() → ScrapingStepInterval

next_step(step)

raise_exception(message)

set_previous_step(step)

add_child_group(group)

previous_step() → IScrapingStep

can_execute() → bool

was_executed() → bool

execute(logic)

set_executed()

before_validation()

reset()

log(message)

retry()

__annotations__ = {}

__module__ = 'pyselenscrapr.ScrapingStep'

exit_bot_when_errored()

can_retry()

is_executed(logic)

error_handling() → ScrapingStepErrorHandling

class pyselenscrapr.ScrapingStep.ScrapingStepConditional(name: str, execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None], can_execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, was_executed: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, before_validation: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, retry: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, previous_step: ~pyselenscrapr.ScrapingStep.IScrapingStep | None = None, error_handling: ~pyselenscrapr.ScrapingStep.ScrapingStepErrorHandling = 1, interval: ~pyselenscrapr.ScrapingStep.ScrapingStepInterval = 0, repeat: ~pyselenscrapr.ScrapingStep.ScrapingStepRepeat = 1, exit_bot_when_errored: bool = False, retry_count: int = 3)

Bases: ScrapingStep

__annotations__ = {}

__module__ = 'pyselenscrapr.ScrapingStep'

pyselenscrapr.ScrapingStepGroup module

class pyselenscrapr.ScrapingStepGroup.ScrapingStepGroup(name: str, steps: [<class 'pyselenscrapr.ScrapingStep.IScrapingStep'>] = [])

Bases: object

__init__(name: str, steps: [<class 'pyselenscrapr.ScrapingStep.IScrapingStep'>] = [])

__annotations__ = {'_steps': [<class 'pyselenscrapr.ScrapingStep.IScrapingStep'>]}

__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingStepGroup', '__annotations__': {'_steps': [<class 'pyselenscrapr.ScrapingStep.IScrapingStep'>]}, '_steps': [], 'name': None, '__init__': <function ScrapingStepGroup.__init__>, 'add_step': <function ScrapingStepGroup.add_step>, '__dict__': <attribute '__dict__' of 'ScrapingStepGroup' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingStepGroup' objects>, '__doc__': None})

__module__ = 'pyselenscrapr.ScrapingStepGroup'

name = None

add_step(step)

pyselenscrapr.ScrapingStepLoop module

class pyselenscrapr.ScrapingStepLoop.ScrapingLogicIterator(logic: ScrapingLogic, element, index)

Bases: ScrapingLogic

__init__(logic: ScrapingLogic, element, index)

Constructor for ScrapingLogic. Will be called from the bot.

Parameters:

driver – the selenium driver
bot – the bot that is using the driver

index()

element()

__annotations__ = {}

__module__ = 'pyselenscrapr.ScrapingStepLoop'

class pyselenscrapr.ScrapingStepLoop.ScrapingStepIteration(name: str, execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None], can_execute: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, was_executed: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], bool] | None = None, before_validation: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, retry: ~typing.Callable[[<module 'pyselenscrapr.ScrapingLogic' from '/home/docs/checkouts/readthedocs.org/user_builds/pyselenscrapr/checkouts/latest/pyselenscrapr/ScrapingLogic.py'>], None] | None = None, previous_step: ~pyselenscrapr.ScrapingStep.IScrapingStep | None = None, error_handling: ~pyselenscrapr.ScrapingStep.ScrapingStepErrorHandling = 1, interval: ~pyselenscrapr.ScrapingStep.ScrapingStepInterval = 0, repeat: ~pyselenscrapr.ScrapingStep.ScrapingStepRepeat = 1, exit_bot_when_errored: bool = False, retry_count: int = 3)

Bases: ScrapingStep

__annotations__ = {}

__module__ = 'pyselenscrapr.ScrapingStepLoop'

class pyselenscrapr.ScrapingStepLoop.ScrapingStepLoop(name: str, iteration_callback: Callable[[ScrapingLogic], any], iteration_steps: list[ScrapingStep], iterations: int | None = None, retry_count: int = 3)

Bases: ScrapingStep

__init__(name: str, iteration_callback: Callable[[ScrapingLogic], any], iteration_steps: list[ScrapingStep], iterations: int | None = None, retry_count: int = 3)

Constructor for ScrapingStep

Parameters:

name – a string representing the name of the step - it is used later for debugging purposes
execute – a function that will be executed when the step is executed
can_execute – a function that will be executed to check if the step can be executed
was_executed – a function that will be executed to check if the step was executed
before_validation – a function that will be executed before the step is validated
retry – a function that will be executed if the step fails
previous_step – the previous step in the scraping process
error_handling – an enum representing the error handling strategy
interval – an enum representing the interval at which the step will be executed
repeat – an enum representing if the step should be repeated
retry_count – an integer representing the number of retries

execute(logic: ScrapingLogic)

__annotations__ = {}

__module__ = 'pyselenscrapr.ScrapingStepLoop'

pyselenscrapr.ScrapingStepPagination module

class pyselenscrapr.ScrapingStepPagination.ScrapingStepPaginationMode

Bases: object

RandomPages = 1

AllPages = 2

__dict__ = mappingproxy({'__module__': 'pyselenscrapr.ScrapingStepPagination', 'RandomPages': 1, 'AllPages': 2, '__dict__': <attribute '__dict__' of 'ScrapingStepPaginationMode' objects>, '__weakref__': <attribute '__weakref__' of 'ScrapingStepPaginationMode' objects>, '__doc__': None, '__annotations__': {}})

__module__ = 'pyselenscrapr.ScrapingStepPagination'

class pyselenscrapr.ScrapingStepPagination.ScrapingStepPagination(name: str, execute: Callable[[IScrapingStep], any], goto_page: Callable[[IScrapingStep, int], None], validate_page: Callable[[IScrapingStep, int], bool], pagination_mode: ScrapingStepPaginationMode, page_count: Callable[[IScrapingStep], int], exit_bot_when_errored: bool = False)

Bases: ScrapingStep

__init__(name: str, execute: Callable[[IScrapingStep], any], goto_page: Callable[[IScrapingStep, int], None], validate_page: Callable[[IScrapingStep, int], bool], pagination_mode: ScrapingStepPaginationMode, page_count: Callable[[IScrapingStep], int], exit_bot_when_errored: bool = False)

Constructor for ScrapingStep

Parameters:

name – a string representing the name of the step - it is used later for debugging purposes
execute – a function that will be executed when the step is executed
can_execute – a function that will be executed to check if the step can be executed
was_executed – a function that will be executed to check if the step was executed
before_validation – a function that will be executed before the step is validated
retry – a function that will be executed if the step fails
previous_step – the previous step in the scraping process
error_handling – an enum representing the error handling strategy
interval – an enum representing the interval at which the step will be executed
repeat – an enum representing if the step should be repeated
retry_count – an integer representing the number of retries

finished()

can_retry()

retry()

sleep_random()

execute(logic)

__annotations__ = {}

__module__ = 'pyselenscrapr.ScrapingStepPagination'

pyselenscrapr.ValidationError module

exception pyselenscrapr.ValidationError.ValidationError(validation_error_message)

Bases: Exception

__init__(validation_error_message)

validation_error_message = None

__module__ = 'pyselenscrapr.ValidationError'

__str__(): Return str(self).

pyselenscrapr package

Submodules

pyselenscrapr.ScrapingBackend module

pyselenscrapr.ScrapingBot module

pyselenscrapr.ScrapingLogic module

pyselenscrapr.ScrapingStep module

pyselenscrapr.ScrapingStepGroup module

pyselenscrapr.ScrapingStepLoop module

pyselenscrapr.ScrapingStepPagination module

pyselenscrapr.ValidationError module

Module contents