olivia_finder.myrequests
The myrequest package is prepared to concurrently make requests to a web server, being able to abuse these requests without denying us the service.
The package includes different modules that are responsible for carrying out this task transparently, such as obtaining proxies and useragents to disguise the origin of the request, or the concurrent execution of requests.
Module structure
Package structure
!tree ../../olivia_finder/olivia_finder/myrequests
[01;34m../../olivia_finder/olivia_finder/myrequests[0m
βββ [01;34mdata[0m
βΒ Β βββ [00museragents.txt[0m
βββ [00m__init__.py[0m
βββ [00mjob.py[0m
βββ [01;34mproxy_builders[0m
βΒ Β βββ [00m__init__.py[0m
βΒ Β βββ [00mlist_builder.py[0m
βΒ Β βββ [00mproxy_builder.py[0m
βΒ Β βββ [00mssl_proxies.py[0m
βββ [00mproxy_handler.py[0m
βββ [01;32mrequest_handler.py[0m
βββ [00mrequest_worker.py[0m
βββ [00museragent_handler.py[0m
2 directories, 11 files
Subpackage myrequests.proxy_builders
The proxy builder subpackage takes care of getting a list of proxies.
Two implementations are available, one based on an online proxy provider called SSL proxies, and the other based on a proxy list. The proxy list-based implementation is proposed as the best option due to its genericity.
We can focus on two different ways:
Obtain the data through Web Scraping from some website that provides updated proxys, like SSLProxies
Obtain the data from a proxies list in format
<IP>:<PORT>from a web server
This is shown below
_Web scraping implementation (from sslproxies.org)_
from olivia_finder.myrequests.proxy_builders.ssl_proxies import SSLProxiesBuilder
pb_SSLProxies = SSLProxiesBuilder()
pb_SSLProxies.get_proxies()
['78.46.190.133:8000',
'64.225.4.63:9993',
...
'103.129.92.95:9995',
'40.83.102.86:80',
'87.237.239.57:3128',
'86.57.137.63:2222',
'140.238.245.116:8100',
'171.244.65.14:4002',
'35.240.219.50:8080',
'115.144.1.222:12089',
'119.8.120.4:80',
'41.174.96.38:32650']
_Web list implementation (from lists)_
from olivia_finder.myrequests.proxy_builders.list_builder import ListProxyBuilder
pb_ListBuilder = ListProxyBuilder(
url="https://raw.githubusercontent.com/mertguvencli/http-proxy-list/main/proxy-list/data.txt")
pb_ListBuilder.get_proxies()
['77.247.108.17:33080',
'195.133.45.149:7788',
'94.110.148.115:3128',
'35.240.156.235:8080',
...
'103.157.83.229:8080',
'36.91.46.26:8080',
'82.165.184.53:80']
Module myrequests.proxy_handler
from olivia_finder.myrequests.proxy_handler import ProxyHandler
ph = ProxyHandler()
for i in range(10):
print(ph.get_next_proxy())
http://170.130.55.153:5001
http://104.17.16.136:80
http://104.234.138.40:3128
http://45.131.5.32:80
http://203.32.120.18:80
http://172.67.23.197:80
http://185.162.229.77:80
http://203.13.32.148:80
http://172.67.251.80:80
http://103.19.130.50:8080
Module myrequests.useragent_handler
from olivia_finder.myrequests.useragent_handler import UserAgentHandler
The purpose of this class is to provide a set of useragents to be used by the RequestHandler object with the aim of hiding the original identity of the web request
The class is prepared to load the useragents from a text file contained in the package, and in turn can obtain them from a website dedicated to provide them.
If both options are not available, there will be used the default ones hardcoded in the class
Useragents dataset included on the package MyRequests
!tail ../../olivia_finder/olivia_finder/myrequests/data/useragents.txt
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.91 Safari/537.36
Mozilla/5.0 (iPad; U; CPU OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3
Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) coc_coc_browser/50.0.125 Chrome/44.0.2403.125 Safari/537.36
Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0; SLCC2; .NET CLR 2.0.50727; .NET4.0C; .NET4.0E)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.124 Safari/537.36
Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; MAARJS; rv:11.0) like Gecko
Mozilla/5.0 (Linux; Android 5.0; SAMSUNG SM-N900T Build/LRX21V) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/2.1 Chrome/34.0.1847.76 Mobile Safari/537.36
Mozilla/5.0 (iPhone; CPU iPhone OS 8_4 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) GSA/7.0.55539 Mobile/12H143 Safari/600.1.4
The default constructor loads the usragents from the file
ua_handler = UserAgentHandler()
ua_handler.useragents_list[:5]
['Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.94 Chrome/37.0.2062.94 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko',
'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9']
We can force obtaining the useragents from the Internet with the flag:
use_file=False
We can force get useragents from internet
import gc
# Delete the object and force the garbage collector to free the memory
del ua_handler
UserAgentHandler.destroy() # Delete the singleton instance
gc.collect()
from olivia_finder.myrequests.useragent_handler import UserAgentHandler
ua_handler = UserAgentHandler(use_file=False)
ua_handler.useragents_list[:5]
['Mozilla/5.0 (compatible; U; ABrowse 0.6; Syllable) AppleWebKit/420+ (KHTML, like Gecko)', 'Mozilla/5.0 (compatible; U; ABrowse 0.6; Syllable) AppleWebKit/420+ (KHTML, like Gecko)', 'Mozilla/5.0 (compatible; ABrowse 0.4; Syllable)', 'Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Acoo Browser 1.98.744; .NET CLR 3.5.30729)', 'Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Acoo Browser 1.98.744; .NET CLR 3.5.30729)']
Once the class is initialized, it can provide a random useragent to the object RequestHandler to perform the request
useragents = [ua_handler.get_next_useragent() for _ in range(10)]
useragents
['Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.125 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; Touch; ASU2JS; rv:11.0) like Gecko',
'Mozilla/5.0 (X11; Linux x86_64; U; en-us) AppleWebKit/537.36 (KHTML, like Gecko) Silk/3.68 like Chrome/39.0.2171.93 Safari/537.36',
'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:33.0) Gecko/20100101 Firefox/33.0',
'Mozilla/5.0 (Linux; Android 4.4.2; SM-T530NU Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.84 Safari/537.36',
'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/8.0; 1ButtonTaskbar)',
'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0',
'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.125 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/600.6.3 (KHTML, like Gecko) Version/8.0.6 Safari/600.6.3']
Module myrequests.request_handler
from olivia_finder.myrequests.job import RequestJob
from olivia_finder.myrequests.request_handler import RequestHandler
It is the main class of the MyRequest package and makes use of the ProxyHandler and UserAgentHandler classes to obtain the proxies and user agents that will be used in the web requests that is responsible for performing.
The default constructor does not receive parameters, the class will manage to instantize their units and use the default configuration
Make a request
job = RequestJob(
key="networkx",
url="https://www.pypi.org/project/networkx/"
)
rh = RequestHandler()
finalized_job = rh.do_request(job)
As a result we obtain the ResponseJob object but with the data of the response
print(
f'Key: {finalized_job.key}
'
f'URL: {finalized_job.url}
'
f'Response: {finalized_job.response}
'
)
Key: networkx
URL: https://www.pypi.org/project/networkx/
Response: <Response [200]>
Do parallel requests
We can make parallel requests through the use of Threads, it is safe to do so since the class is prepared for it
# Initialize RequestHandler
from tqdm import tqdm
rh = RequestHandler()
# Initialize RequestJobs
request_jobs = [
RequestJob(key="networkx", url="https://www.pypi.org/project/networkx/"),
RequestJob(key="pandas", url="https://www.pypi.org/project/pandas/"),
RequestJob(key="numpy", url="https://www.pypi.org/project/numpy/"),
RequestJob(key="matplotlib",
url="https://www.pypi.org/project/matplotlib/"),
RequestJob(key="scipy", url="https://www.pypi.org/project/scipy/"),
RequestJob(key="scikit-learn",
url="https://www.pypi.org/project/scikit-learn/"),
RequestJob(key="tensorflow",
url="https://www.pypi.org/project/tensorflow/"),
RequestJob(key="keras", url="https://www.pypi.org/project/keras/")
]
# Set number of workers
num_workers = 4
# Initialize progress bar
progress_bar = tqdm(total=len(request_jobs))
finalized_jobs = rh.do_requests(
request_jobs=request_jobs,
num_workers=num_workers,
progress_bar=progress_bar
)
0%| | 0/8 [00:00<?, ?it/s] 88%|βββββββββ | 7/8 [00:02<00:00, 3.36it/s]
As a result we get a list of ResponseJob objects
for job in finalized_jobs:
print(f'Key: {job.key}, URL: {job.url}, Response: {job.response}')
Key: networkx, URL: https://www.pypi.org/project/networkx/, Response: <Response [200]>
Key: scipy, URL: https://www.pypi.org/project/scipy/, Response: <Response [200]>
Key: pandas, URL: https://www.pypi.org/project/pandas/, Response: <Response [200]>
Key: tensorflow, URL: https://www.pypi.org/project/tensorflow/, Response: <Response [200]>
Key: numpy, URL: https://www.pypi.org/project/numpy/, Response: <Response [200]>
Key: scikit-learn, URL: https://www.pypi.org/project/scikit-learn/, Response: <Response [200]>
Key: matplotlib, URL: https://www.pypi.org/project/matplotlib/, Response: <Response [200]>
Key: keras, URL: https://www.pypi.org/project/keras/, Response: <Response [200]>
The Job object contains the response to request
print(finalized_jobs[0].response.text[10000:20000])
class="split-layout split-layout--middle package-description">
<p class="package-description__summary">Python package for creating and manipulating graphs and networks</p>
<div data-html-include="/_includes/edit-project-button/networkx">
</div>
</div>
</div>
</div>
<div data-controller="project-tabs">
<div class="tabs-container">
<div class="vertical-tabs">
<div class="vertical-tabs__tabs">
<div class="sidebar-section">
<h3 class="sidebar-section__title">Navigation</h3>
<nav aria-label="Navigation for networkx">
<ul class="vertical-tabs__list" role="tablist">
<li role="tab">
<a id="description-tab" href="#description" data-project-tabs-target="tab" data-action="project-tabs#onTabClick" class="vertical-tabs__tab vertical-tabs__tab--with-icon vertical-tabs__tab--is-active" aria-selected="true" aria-label="Project description. Focus will be moved to the description.">
<i class="fa fa-align-left" aria-hidden="true"></i>
Project description
</a>
</li>
<li role="tab">
<a id="history-tab" href="#history" data-project-tabs-target="tab" data-action="project-tabs#onTabClick" class="vertical-tabs__tab vertical-tabs__tab--with-icon" aria-label="Release history. Focus will be moved to the history panel.">
<i class="fa fa-history" aria-hidden="true"></i>
Release history
</a>
</li>
<li role="tab">
<a id="files-tab" href="#files" data-project-tabs-target="tab" data-action="project-tabs#onTabClick" class="vertical-tabs__tab vertical-tabs__tab--with-icon" aria-label="Download files. Focus will be moved to the project files.">
<i class="fa fa-download" aria-hidden="true"></i>
Download files
</a>
</li>
</ul>
</nav>
</div>
...
1""" 2 3The myrequest package is prepared to concurrently make requests to a web server, being able to abuse these requests without denying us the service. 4 5The package includes different modules that are responsible for carrying out this task transparently, such as obtaining proxies and useragents to disguise the origin of the request, or the concurrent execution of requests. 6 7 8## Module structure 9 10**Package structure** 11 12 13 14```python 15!tree ../../olivia_finder/olivia_finder/myrequests 16``` 17 [01;34m../../olivia_finder/olivia_finder/myrequests[0m 18 βββ [01;34mdata[0m 19 βΒ Β βββ [00museragents.txt[0m 20 βββ [00m__init__.py[0m 21 βββ [00mjob.py[0m 22 βββ [01;34mproxy_builders[0m 23 βΒ Β βββ [00m__init__.py[0m 24 βΒ Β βββ [00mlist_builder.py[0m 25 βΒ Β βββ [00mproxy_builder.py[0m 26 βΒ Β βββ [00mssl_proxies.py[0m 27 βββ [00mproxy_handler.py[0m 28 βββ [01;32mrequest_handler.py[0m 29 βββ [00mrequest_worker.py[0m 30 βββ [00museragent_handler.py[0m 31 32 2 directories, 11 files 33 34 35## Subpackage `myrequests.proxy_builders` 36 37 38The proxy builder subpackage takes care of getting a list of proxies. 39 40Two implementations are available, one based on an online proxy provider called SSL proxies, and the other based on a proxy list. The proxy list-based implementation is proposed as the best option due to its genericity. 41 42We can focus on two different ways: 43 44- Obtain the data through Web Scraping from some website that provides updated proxys, like SSLProxies 45 46- Obtain the data from a proxies list in format `<IP>:<PORT>` from a web server 47 48This is shown below 49 50 51**_Web scraping implementation (from sslproxies.org)_** 52 53 54 55```python 56from olivia_finder.myrequests.proxy_builders.ssl_proxies import SSLProxiesBuilder 57``` 58 59 60```python 61pb_SSLProxies = SSLProxiesBuilder() 62pb_SSLProxies.get_proxies() 63``` 64 65 ['78.46.190.133:8000', 66 '64.225.4.63:9993', 67 ... 68 '103.129.92.95:9995', 69 '40.83.102.86:80', 70 '87.237.239.57:3128', 71 '86.57.137.63:2222', 72 '140.238.245.116:8100', 73 '171.244.65.14:4002', 74 '35.240.219.50:8080', 75 '115.144.1.222:12089', 76 '119.8.120.4:80', 77 '41.174.96.38:32650'] 78 79 80 81**_Web list implementation (from lists)_** 82 83 84 85```python 86from olivia_finder.myrequests.proxy_builders.list_builder import ListProxyBuilder 87``` 88 89 90```python 91pb_ListBuilder = ListProxyBuilder( 92 url="https://raw.githubusercontent.com/mertguvencli/http-proxy-list/main/proxy-list/data.txt") 93pb_ListBuilder.get_proxies() 94``` 95 96 ['77.247.108.17:33080', 97 '195.133.45.149:7788', 98 '94.110.148.115:3128', 99 '35.240.156.235:8080', 100 ... 101 '103.157.83.229:8080', 102 '36.91.46.26:8080', 103 '82.165.184.53:80'] 104 105 106 107## Module `myrequests.proxy_handler` 108 109 110```python 111from olivia_finder.myrequests.proxy_handler import ProxyHandler 112``` 113 114 115```python 116ph = ProxyHandler() 117``` 118 119 120```python 121for i in range(10): 122 print(ph.get_next_proxy()) 123``` 124 125 http://170.130.55.153:5001 126 http://104.17.16.136:80 127 http://104.234.138.40:3128 128 http://45.131.5.32:80 129 http://203.32.120.18:80 130 http://172.67.23.197:80 131 http://185.162.229.77:80 132 http://203.13.32.148:80 133 http://172.67.251.80:80 134 http://103.19.130.50:8080 135 136 137## Module `myrequests.useragent_handler` 138 139 140```python 141from olivia_finder.myrequests.useragent_handler import UserAgentHandler 142``` 143 144The purpose of this class is to provide a set of useragents to be used by the RequestHandler object with the aim of hiding the original identity of the web request 145 146The class is prepared to load the useragents from a text file contained in the package, and in turn can obtain them from a website dedicated to provide them. 147 148If both options are not available, there will be used the default ones hardcoded in the class 149 150 151Useragents dataset included on the package MyRequests 152 153 154 155```python 156!tail ../../olivia_finder/olivia_finder/myrequests/data/useragents.txt 157``` 158 159 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.91 Safari/537.36 160 Mozilla/5.0 (iPad; U; CPU OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3 161 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 162 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36 163 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) coc_coc_browser/50.0.125 Chrome/44.0.2403.125 Safari/537.36 164 Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0; SLCC2; .NET CLR 2.0.50727; .NET4.0C; .NET4.0E) 165 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.124 Safari/537.36 166 Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; MAARJS; rv:11.0) like Gecko 167 Mozilla/5.0 (Linux; Android 5.0; SAMSUNG SM-N900T Build/LRX21V) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/2.1 Chrome/34.0.1847.76 Mobile Safari/537.36 168 Mozilla/5.0 (iPhone; CPU iPhone OS 8_4 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) GSA/7.0.55539 Mobile/12H143 Safari/600.1.4 169 170The default constructor loads the usragents from the file 171 172 173 174```python 175ua_handler = UserAgentHandler() 176ua_handler.useragents_list[:5] 177``` 178 179 180 ['Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.94 Chrome/37.0.2062.94 Safari/537.36', 181 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36', 182 'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko', 183 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0', 184 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9'] 185 186 187 188We can force obtaining the useragents from the Internet with the flag: 189 190```python 191use_file=False 192``` 193 194We can force get useragents from internet 195 196 197```python 198import gc 199 200# Delete the object and force the garbage collector to free the memory 201del ua_handler 202UserAgentHandler.destroy() # Delete the singleton instance 203gc.collect() 204 205from olivia_finder.myrequests.useragent_handler import UserAgentHandler 206ua_handler = UserAgentHandler(use_file=False) 207ua_handler.useragents_list[:5] 208``` 209 210 ['Mozilla/5.0 (compatible; U; ABrowse 0.6; Syllable) AppleWebKit/420+ (KHTML, like Gecko)', 211 'Mozilla/5.0 (compatible; U; ABrowse 0.6; Syllable) AppleWebKit/420+ (KHTML, like Gecko)', 212 'Mozilla/5.0 (compatible; ABrowse 0.4; Syllable)', 213 'Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Acoo Browser 1.98.744; .NET CLR 3.5.30729)', 214 'Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Acoo Browser 1.98.744; .NET CLR 3.5.30729)'] 215 216 217Once the class is initialized, it can provide a random useragent to the object RequestHandler to perform the request 218 219 220 221```python 222useragents = [ua_handler.get_next_useragent() for _ in range(10)] 223useragents 224``` 225 226 227 228 229 ['Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.125 Safari/537.36', 230 'Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; Touch; ASU2JS; rv:11.0) like Gecko', 231 'Mozilla/5.0 (X11; Linux x86_64; U; en-us) AppleWebKit/537.36 (KHTML, like Gecko) Silk/3.68 like Chrome/39.0.2171.93 Safari/537.36', 232 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:33.0) Gecko/20100101 Firefox/33.0', 233 'Mozilla/5.0 (Linux; Android 4.4.2; SM-T530NU Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.84 Safari/537.36', 234 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/8.0; 1ButtonTaskbar)', 235 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0', 236 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36', 237 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.125 Safari/537.36', 238 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/600.6.3 (KHTML, like Gecko) Version/8.0.6 Safari/600.6.3'] 239 240 241 242## Module `myrequests.request_handler` 243 244 245```python 246from olivia_finder.myrequests.job import RequestJob 247from olivia_finder.myrequests.request_handler import RequestHandler 248``` 249 250It is the main class of the MyRequest package and makes use of the ProxyHandler and UserAgentHandler classes to obtain the proxies and user agents that will be used in the web requests that is responsible for performing. 251 252 253The default constructor does not receive parameters, the class will manage to instantize their units and use the default configuration 254 255 256**Make a request** 257 258 259 260```python 261job = RequestJob( 262 key="networkx", 263 url="https://www.pypi.org/project/networkx/" 264) 265``` 266 267 268```python 269rh = RequestHandler() 270finalized_job = rh.do_request(job) 271``` 272 273As a result we obtain the ResponseJob object but with the data of the response 274 275 276 277```python 278print( 279 f'Key: {finalized_job.key}\n' 280 f'URL: {finalized_job.url}\n' 281 f'Response: {finalized_job.response}\n' 282) 283``` 284 285 Key: networkx 286 URL: https://www.pypi.org/project/networkx/ 287 Response: <Response [200]> 288 289 290 291**Do parallel requests** 292 293 294We can make parallel requests through the use of Threads, it is safe to do so since the class is prepared for it 295 296 297 298```python 299# Initialize RequestHandler 300from tqdm import tqdm 301rh = RequestHandler() 302 303# Initialize RequestJobs 304request_jobs = [ 305 RequestJob(key="networkx", url="https://www.pypi.org/project/networkx/"), 306 RequestJob(key="pandas", url="https://www.pypi.org/project/pandas/"), 307 RequestJob(key="numpy", url="https://www.pypi.org/project/numpy/"), 308 RequestJob(key="matplotlib", 309 url="https://www.pypi.org/project/matplotlib/"), 310 RequestJob(key="scipy", url="https://www.pypi.org/project/scipy/"), 311 RequestJob(key="scikit-learn", 312 url="https://www.pypi.org/project/scikit-learn/"), 313 RequestJob(key="tensorflow", 314 url="https://www.pypi.org/project/tensorflow/"), 315 RequestJob(key="keras", url="https://www.pypi.org/project/keras/") 316] 317 318# Set number of workers 319num_workers = 4 320 321# Initialize progress bar 322progress_bar = tqdm(total=len(request_jobs)) 323 324finalized_jobs = rh.do_requests( 325 request_jobs=request_jobs, 326 num_workers=num_workers, 327 progress_bar=progress_bar 328) 329``` 330 331 0%| | 0/8 [00:00<?, ?it/s] 88%|βββββββββ | 7/8 [00:02<00:00, 3.36it/s] 332 333As a result we get a list of ResponseJob objects 334 335 336 337```python 338for job in finalized_jobs: 339 print(f'Key: {job.key}, URL: {job.url}, Response: {job.response}') 340``` 341 342 Key: networkx, URL: https://www.pypi.org/project/networkx/, Response: <Response [200]> 343 Key: scipy, URL: https://www.pypi.org/project/scipy/, Response: <Response [200]> 344 Key: pandas, URL: https://www.pypi.org/project/pandas/, Response: <Response [200]> 345 Key: tensorflow, URL: https://www.pypi.org/project/tensorflow/, Response: <Response [200]> 346 Key: numpy, URL: https://www.pypi.org/project/numpy/, Response: <Response [200]> 347 Key: scikit-learn, URL: https://www.pypi.org/project/scikit-learn/, Response: <Response [200]> 348 Key: matplotlib, URL: https://www.pypi.org/project/matplotlib/, Response: <Response [200]> 349 Key: keras, URL: https://www.pypi.org/project/keras/, Response: <Response [200]> 350 351 352The Job object contains the response to request 353 354 355 356```python 357print(finalized_jobs[0].response.text[10000:20000]) 358``` 359 360 class="split-layout split-layout--middle package-description"> 361 362 <p class="package-description__summary">Python package for creating and manipulating graphs and networks</p> 363 364 <div data-html-include="/_includes/edit-project-button/networkx"> 365 </div> 366 </div> 367 </div> 368 </div> 369 370 <div data-controller="project-tabs"> 371 <div class="tabs-container"> 372 <div class="vertical-tabs"> 373 <div class="vertical-tabs__tabs"> 374 <div class="sidebar-section"> 375 <h3 class="sidebar-section__title">Navigation</h3> 376 <nav aria-label="Navigation for networkx"> 377 <ul class="vertical-tabs__list" role="tablist"> 378 <li role="tab"> 379 <a id="description-tab" href="#description" data-project-tabs-target="tab" data-action="project-tabs#onTabClick" class="vertical-tabs__tab vertical-tabs__tab--with-icon vertical-tabs__tab--is-active" aria-selected="true" aria-label="Project description. Focus will be moved to the description."> 380 <i class="fa fa-align-left" aria-hidden="true"></i> 381 Project description 382 </a> 383 </li> 384 <li role="tab"> 385 <a id="history-tab" href="#history" data-project-tabs-target="tab" data-action="project-tabs#onTabClick" class="vertical-tabs__tab vertical-tabs__tab--with-icon" aria-label="Release history. Focus will be moved to the history panel."> 386 <i class="fa fa-history" aria-hidden="true"></i> 387 Release history 388 </a> 389 </li> 390 391 <li role="tab"> 392 <a id="files-tab" href="#files" data-project-tabs-target="tab" data-action="project-tabs#onTabClick" class="vertical-tabs__tab vertical-tabs__tab--with-icon" aria-label="Download files. Focus will be moved to the project files."> 393 <i class="fa fa-download" aria-hidden="true"></i> 394 Download files 395 </a> 396 </li> 397 398 </ul> 399 </nav> 400 </div> 401 402... 403 404"""