olivia_finder.myrequests

The myrequest package is prepared to concurrently make requests to a web server, being able to abuse these requests without denying us the service.

The package includes different modules that are responsible for carrying out this task transparently, such as obtaining proxies and useragents to disguise the origin of the request, or the concurrent execution of requests.

Module structure

Package structure

!tree ../../olivia_finder/olivia_finder/myrequests
../../olivia_finder/olivia_finder/myrequests
β”œβ”€β”€ data
β”‚Β Β  └── useragents.txt
β”œβ”€β”€ __init__.py
β”œβ”€β”€ job.py
β”œβ”€β”€ proxy_builders
β”‚Β Β  β”œβ”€β”€ __init__.py
β”‚Β Β  β”œβ”€β”€ list_builder.py
β”‚Β Β  β”œβ”€β”€ proxy_builder.py
β”‚Β Β  └── ssl_proxies.py
β”œβ”€β”€ proxy_handler.py
β”œβ”€β”€ request_handler.py
β”œβ”€β”€ request_worker.py
└── useragent_handler.py

2 directories, 11 files

Subpackage myrequests.proxy_builders

The proxy builder subpackage takes care of getting a list of proxies.

Two implementations are available, one based on an online proxy provider called SSL proxies, and the other based on a proxy list. The proxy list-based implementation is proposed as the best option due to its genericity.

We can focus on two different ways:

  • Obtain the data through Web Scraping from some website that provides updated proxys, like SSLProxies

  • Obtain the data from a proxies list in format <IP>:<PORT> from a web server

This is shown below

_Web scraping implementation (from sslproxies.org)_

pb_SSLProxies = SSLProxiesBuilder()
pb_SSLProxies.get_proxies()
['78.46.190.133:8000',
 '64.225.4.63:9993',
 ...
 '103.129.92.95:9995',
 '40.83.102.86:80',
 '87.237.239.57:3128',
 '86.57.137.63:2222',
 '140.238.245.116:8100',
 '171.244.65.14:4002',
 '35.240.219.50:8080',
 '115.144.1.222:12089',
 '119.8.120.4:80',
 '41.174.96.38:32650']

_Web list implementation (from lists)_

pb_ListBuilder = ListProxyBuilder(
    url="https://raw.githubusercontent.com/mertguvencli/http-proxy-list/main/proxy-list/data.txt")
pb_ListBuilder.get_proxies()
['77.247.108.17:33080',
 '195.133.45.149:7788',
 '94.110.148.115:3128',
 '35.240.156.235:8080',
  ...
 '103.157.83.229:8080',
 '36.91.46.26:8080',
 '82.165.184.53:80']

Module myrequests.proxy_handler

from olivia_finder.myrequests.proxy_handler import ProxyHandler
ph = ProxyHandler()
for i in range(10):
    print(ph.get_next_proxy())
http://170.130.55.153:5001
http://104.17.16.136:80
http://104.234.138.40:3128
http://45.131.5.32:80
http://203.32.120.18:80
http://172.67.23.197:80
http://185.162.229.77:80
http://203.13.32.148:80
http://172.67.251.80:80
http://103.19.130.50:8080

Module myrequests.useragent_handler

from olivia_finder.myrequests.useragent_handler import UserAgentHandler

The purpose of this class is to provide a set of useragents to be used by the RequestHandler object with the aim of hiding the original identity of the web request

The class is prepared to load the useragents from a text file contained in the package, and in turn can obtain them from a website dedicated to provide them.

If both options are not available, there will be used the default ones hardcoded in the class

Useragents dataset included on the package MyRequests

!tail ../../olivia_finder/olivia_finder/myrequests/data/useragents.txt
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.91 Safari/537.36
Mozilla/5.0 (iPad; U; CPU OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3
Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) coc_coc_browser/50.0.125 Chrome/44.0.2403.125 Safari/537.36
Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0; SLCC2; .NET CLR 2.0.50727; .NET4.0C; .NET4.0E)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.124 Safari/537.36
Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; MAARJS; rv:11.0) like Gecko
Mozilla/5.0 (Linux; Android 5.0; SAMSUNG SM-N900T Build/LRX21V) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/2.1 Chrome/34.0.1847.76 Mobile Safari/537.36
Mozilla/5.0 (iPhone; CPU iPhone OS 8_4 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) GSA/7.0.55539 Mobile/12H143 Safari/600.1.4

The default constructor loads the usragents from the file

ua_handler = UserAgentHandler()
ua_handler.useragents_list[:5]
['Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.94 Chrome/37.0.2062.94 Safari/537.36',
 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36',
 'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko',
 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0',
 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9']

We can force obtaining the useragents from the Internet with the flag:

use_file=False

We can force get useragents from internet

import gc

# Delete the object and force the garbage collector to free the memory
del ua_handler
UserAgentHandler.destroy()  # Delete the singleton instance
gc.collect()

from olivia_finder.myrequests.useragent_handler import UserAgentHandler
ua_handler = UserAgentHandler(use_file=False)
ua_handler.useragents_list[:5]

['Mozilla/5.0 (compatible; U; ABrowse 0.6; Syllable) AppleWebKit/420+ (KHTML, like Gecko)', 'Mozilla/5.0 (compatible; U; ABrowse 0.6; Syllable) AppleWebKit/420+ (KHTML, like Gecko)', 'Mozilla/5.0 (compatible; ABrowse 0.4; Syllable)', 'Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Acoo Browser 1.98.744; .NET CLR 3.5.30729)', 'Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Acoo Browser 1.98.744; .NET CLR 3.5.30729)']

Once the class is initialized, it can provide a random useragent to the object RequestHandler to perform the request

useragents = [ua_handler.get_next_useragent() for _ in range(10)]
useragents
['Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.125 Safari/537.36',
 'Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; Touch; ASU2JS; rv:11.0) like Gecko',
 'Mozilla/5.0 (X11; Linux x86_64; U; en-us) AppleWebKit/537.36 (KHTML, like Gecko) Silk/3.68 like Chrome/39.0.2171.93 Safari/537.36',
 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:33.0) Gecko/20100101 Firefox/33.0',
 'Mozilla/5.0 (Linux; Android 4.4.2; SM-T530NU Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.84 Safari/537.36',
 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/8.0; 1ButtonTaskbar)',
 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0',
 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36',
 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.125 Safari/537.36',
 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/600.6.3 (KHTML, like Gecko) Version/8.0.6 Safari/600.6.3']

Module myrequests.request_handler

from olivia_finder.myrequests.job import RequestJob
from olivia_finder.myrequests.request_handler import RequestHandler

It is the main class of the MyRequest package and makes use of the ProxyHandler and UserAgentHandler classes to obtain the proxies and user agents that will be used in the web requests that is responsible for performing.

The default constructor does not receive parameters, the class will manage to instantize their units and use the default configuration

Make a request

job = RequestJob(
    key="networkx",
    url="https://www.pypi.org/project/networkx/"
)
rh = RequestHandler()
finalized_job = rh.do_request(job)

As a result we obtain the ResponseJob object but with the data of the response

print(
    f'Key: {finalized_job.key}
'
    f'URL: {finalized_job.url}
'
    f'Response: {finalized_job.response}
'
)
Key: networkx
URL: https://www.pypi.org/project/networkx/
Response: <Response [200]>

Do parallel requests

We can make parallel requests through the use of Threads, it is safe to do so since the class is prepared for it

# Initialize RequestHandler
from tqdm import tqdm
rh = RequestHandler()

# Initialize RequestJobs
request_jobs = [
    RequestJob(key="networkx", url="https://www.pypi.org/project/networkx/"),
    RequestJob(key="pandas", url="https://www.pypi.org/project/pandas/"),
    RequestJob(key="numpy", url="https://www.pypi.org/project/numpy/"),
    RequestJob(key="matplotlib",
               url="https://www.pypi.org/project/matplotlib/"),
    RequestJob(key="scipy", url="https://www.pypi.org/project/scipy/"),
    RequestJob(key="scikit-learn",
               url="https://www.pypi.org/project/scikit-learn/"),
    RequestJob(key="tensorflow",
               url="https://www.pypi.org/project/tensorflow/"),
    RequestJob(key="keras", url="https://www.pypi.org/project/keras/")
]

# Set number of workers
num_workers = 4

# Initialize progress bar
progress_bar = tqdm(total=len(request_jobs))

finalized_jobs = rh.do_requests(
    request_jobs=request_jobs,
    num_workers=num_workers,
    progress_bar=progress_bar
)
  0%|          | 0/8 [00:00<?, ?it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 7/8 [00:02<00:00,  3.36it/s]

As a result we get a list of ResponseJob objects

for job in finalized_jobs:
    print(f'Key: {job.key}, URL: {job.url}, Response: {job.response}')
Key: networkx, URL: https://www.pypi.org/project/networkx/, Response: <Response [200]>
Key: scipy, URL: https://www.pypi.org/project/scipy/, Response: <Response [200]>
Key: pandas, URL: https://www.pypi.org/project/pandas/, Response: <Response [200]>
Key: tensorflow, URL: https://www.pypi.org/project/tensorflow/, Response: <Response [200]>
Key: numpy, URL: https://www.pypi.org/project/numpy/, Response: <Response [200]>
Key: scikit-learn, URL: https://www.pypi.org/project/scikit-learn/, Response: <Response [200]>
Key: matplotlib, URL: https://www.pypi.org/project/matplotlib/, Response: <Response [200]>
Key: keras, URL: https://www.pypi.org/project/keras/, Response: <Response [200]>

The Job object contains the response to request

print(finalized_jobs[0].response.text[10000:20000])
 class="split-layout split-layout--middle package-description">

      <p class="package-description__summary">Python package for creating and manipulating graphs and networks</p>

    <div data-html-include="/_includes/edit-project-button/networkx">
    </div>
    </div>
  </div>
</div>

<div data-controller="project-tabs">
  <div class="tabs-container">
    <div class="vertical-tabs">
      <div class="vertical-tabs__tabs">
        <div class="sidebar-section">
          <h3 class="sidebar-section__title">Navigation</h3>
          <nav aria-label="Navigation for networkx">
            <ul class="vertical-tabs__list" role="tablist">
              <li role="tab">
                <a id="description-tab" href="#description" data-project-tabs-target="tab" data-action="project-tabs#onTabClick" class="vertical-tabs__tab vertical-tabs__tab--with-icon vertical-tabs__tab--is-active" aria-selected="true" aria-label="Project description. Focus will be moved to the description.">
                  <i class="fa fa-align-left" aria-hidden="true"></i>
                  Project description
                </a>
              </li>
              <li role="tab">
                <a id="history-tab" href="#history" data-project-tabs-target="tab" data-action="project-tabs#onTabClick" class="vertical-tabs__tab vertical-tabs__tab--with-icon" aria-label="Release history. Focus will be moved to the history panel.">
                  <i class="fa fa-history" aria-hidden="true"></i>
                  Release history
                </a>
              </li>

              <li role="tab">
                <a id="files-tab" href="#files" data-project-tabs-target="tab" data-action="project-tabs#onTabClick" class="vertical-tabs__tab vertical-tabs__tab--with-icon" aria-label="Download files. Focus will be moved to the project files.">
                  <i class="fa fa-download" aria-hidden="true"></i>
                  Download files
                </a>
              </li>

            </ul>
          </nav>
        </div>

...

  1"""
  2
  3The myrequest package is prepared to concurrently make requests to a web server, being able to abuse these requests without denying us the service. 
  4
  5The package includes different modules that are responsible for carrying out this task transparently, such as obtaining proxies and useragents to disguise the origin of the request, or the concurrent execution of requests.
  6
  7
  8## Module structure
  9
 10**Package structure**
 11
 12
 13
 14```python
 15!tree ../../olivia_finder/olivia_finder/myrequests
 16```
 17    ../../olivia_finder/olivia_finder/myrequests
 18    β”œβ”€β”€ data
 19    β”‚Β Β  └── useragents.txt
 20    β”œβ”€β”€ __init__.py
 21    β”œβ”€β”€ job.py
 22    β”œβ”€β”€ proxy_builders
 23    β”‚Β Β  β”œβ”€β”€ __init__.py
 24    β”‚Β Β  β”œβ”€β”€ list_builder.py
 25    β”‚Β Β  β”œβ”€β”€ proxy_builder.py
 26    β”‚Β Β  └── ssl_proxies.py
 27    β”œβ”€β”€ proxy_handler.py
 28    β”œβ”€β”€ request_handler.py
 29    β”œβ”€β”€ request_worker.py
 30    └── useragent_handler.py
 31    
 32    2 directories, 11 files
 33
 34
 35## Subpackage `myrequests.proxy_builders`
 36
 37
 38The proxy builder subpackage takes care of getting a list of proxies. 
 39
 40Two implementations are available, one based on an online proxy provider called SSL proxies, and the other based on a proxy list. The proxy list-based implementation is proposed as the best option due to its genericity.
 41
 42We can focus on two different ways:
 43
 44- Obtain the data through Web Scraping from some website that provides updated proxys, like SSLProxies
 45
 46- Obtain the data from a proxies list in format `<IP>:<PORT>` from a web server
 47
 48This is shown below
 49
 50
 51**_Web scraping implementation (from sslproxies.org)_**
 52
 53
 54
 55```python
 56from olivia_finder.myrequests.proxy_builders.ssl_proxies import SSLProxiesBuilder
 57```
 58
 59
 60```python
 61pb_SSLProxies = SSLProxiesBuilder()
 62pb_SSLProxies.get_proxies()
 63```
 64
 65    ['78.46.190.133:8000',
 66     '64.225.4.63:9993',
 67     ...
 68     '103.129.92.95:9995',
 69     '40.83.102.86:80',
 70     '87.237.239.57:3128',
 71     '86.57.137.63:2222',
 72     '140.238.245.116:8100',
 73     '171.244.65.14:4002',
 74     '35.240.219.50:8080',
 75     '115.144.1.222:12089',
 76     '119.8.120.4:80',
 77     '41.174.96.38:32650']
 78
 79
 80
 81**_Web list implementation (from lists)_**
 82
 83
 84
 85```python
 86from olivia_finder.myrequests.proxy_builders.list_builder import ListProxyBuilder
 87```
 88
 89
 90```python
 91pb_ListBuilder = ListProxyBuilder(
 92    url="https://raw.githubusercontent.com/mertguvencli/http-proxy-list/main/proxy-list/data.txt")
 93pb_ListBuilder.get_proxies()
 94```
 95
 96    ['77.247.108.17:33080',
 97     '195.133.45.149:7788',
 98     '94.110.148.115:3128',
 99     '35.240.156.235:8080',
100      ...
101     '103.157.83.229:8080',
102     '36.91.46.26:8080',
103     '82.165.184.53:80']
104
105
106
107## Module `myrequests.proxy_handler`
108
109
110```python
111from olivia_finder.myrequests.proxy_handler import ProxyHandler
112```
113
114
115```python
116ph = ProxyHandler()
117```
118
119
120```python
121for i in range(10):
122    print(ph.get_next_proxy())
123```
124
125    http://170.130.55.153:5001
126    http://104.17.16.136:80
127    http://104.234.138.40:3128
128    http://45.131.5.32:80
129    http://203.32.120.18:80
130    http://172.67.23.197:80
131    http://185.162.229.77:80
132    http://203.13.32.148:80
133    http://172.67.251.80:80
134    http://103.19.130.50:8080
135
136
137## Module `myrequests.useragent_handler`
138
139
140```python
141from olivia_finder.myrequests.useragent_handler import UserAgentHandler
142```
143
144The purpose of this class is to provide a set of useragents to be used by the RequestHandler object with the aim of hiding the original identity of the web request
145
146The class is prepared to load the useragents from a text file contained in the package, and in turn can obtain them from a website dedicated to provide them.
147
148If both options are not available, there will be used the default ones hardcoded in the class
149
150
151Useragents dataset included on the package MyRequests
152
153
154
155```python
156!tail ../../olivia_finder/olivia_finder/myrequests/data/useragents.txt
157```
158
159    Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.91 Safari/537.36
160    Mozilla/5.0 (iPad; U; CPU OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3
161    Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36
162    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36
163    Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) coc_coc_browser/50.0.125 Chrome/44.0.2403.125 Safari/537.36
164    Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0; SLCC2; .NET CLR 2.0.50727; .NET4.0C; .NET4.0E)
165    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.124 Safari/537.36
166    Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; MAARJS; rv:11.0) like Gecko
167    Mozilla/5.0 (Linux; Android 5.0; SAMSUNG SM-N900T Build/LRX21V) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/2.1 Chrome/34.0.1847.76 Mobile Safari/537.36
168    Mozilla/5.0 (iPhone; CPU iPhone OS 8_4 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) GSA/7.0.55539 Mobile/12H143 Safari/600.1.4
169
170The default constructor loads the usragents from the file
171
172
173
174```python
175ua_handler = UserAgentHandler()
176ua_handler.useragents_list[:5]
177```
178
179
180    ['Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.94 Chrome/37.0.2062.94 Safari/537.36',
181     'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36',
182     'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko',
183     'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0',
184     'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9']
185
186
187
188We can force obtaining the useragents from the Internet with the flag:
189
190```python
191use_file=False
192```
193
194We can force get useragents from internet 
195
196
197```python
198import gc
199
200# Delete the object and force the garbage collector to free the memory
201del ua_handler
202UserAgentHandler.destroy()  # Delete the singleton instance
203gc.collect()
204
205from olivia_finder.myrequests.useragent_handler import UserAgentHandler
206ua_handler = UserAgentHandler(use_file=False)
207ua_handler.useragents_list[:5]
208```
209
210  ['Mozilla/5.0 (compatible; U; ABrowse 0.6; Syllable) AppleWebKit/420+ (KHTML, like Gecko)',
211  'Mozilla/5.0 (compatible; U; ABrowse 0.6;  Syllable) AppleWebKit/420+ (KHTML, like Gecko)',
212  'Mozilla/5.0 (compatible; ABrowse 0.4; Syllable)',
213  'Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Acoo Browser 1.98.744; .NET CLR 3.5.30729)',
214  'Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; Acoo Browser 1.98.744; .NET CLR   3.5.30729)']
215
216
217Once the class is initialized, it can provide a random useragent to the object RequestHandler to perform the request
218
219
220
221```python
222useragents = [ua_handler.get_next_useragent() for _ in range(10)]
223useragents
224```
225
226
227
228
229    ['Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.125 Safari/537.36',
230     'Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; Touch; ASU2JS; rv:11.0) like Gecko',
231     'Mozilla/5.0 (X11; Linux x86_64; U; en-us) AppleWebKit/537.36 (KHTML, like Gecko) Silk/3.68 like Chrome/39.0.2171.93 Safari/537.36',
232     'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:33.0) Gecko/20100101 Firefox/33.0',
233     'Mozilla/5.0 (Linux; Android 4.4.2; SM-T530NU Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.84 Safari/537.36',
234     'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/8.0; 1ButtonTaskbar)',
235     'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0',
236     'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36',
237     'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.125 Safari/537.36',
238     'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/600.6.3 (KHTML, like Gecko) Version/8.0.6 Safari/600.6.3']
239
240
241
242## Module `myrequests.request_handler`
243
244
245```python
246from olivia_finder.myrequests.job import RequestJob
247from olivia_finder.myrequests.request_handler import RequestHandler
248```
249
250It is the main class of the MyRequest package and makes use of the ProxyHandler and UserAgentHandler classes to obtain the proxies and user agents that will be used in the web requests that is responsible for performing.
251
252
253The default constructor does not receive parameters, the class will manage to instantize their units and use the default configuration
254
255
256**Make a request**
257
258
259
260```python
261job = RequestJob(
262    key="networkx",
263    url="https://www.pypi.org/project/networkx/"
264)
265```
266
267
268```python
269rh = RequestHandler()
270finalized_job = rh.do_request(job)
271```
272
273As a result we obtain the ResponseJob object but with the data of the response
274
275
276
277```python
278print(
279    f'Key: {finalized_job.key}\n'
280    f'URL: {finalized_job.url}\n'
281    f'Response: {finalized_job.response}\n'
282)
283```
284
285    Key: networkx
286    URL: https://www.pypi.org/project/networkx/
287    Response: <Response [200]>
288    
289
290
291**Do parallel requests**
292
293
294We can make parallel requests through the use of Threads, it is safe to do so since the class is prepared for it
295
296
297
298```python
299# Initialize RequestHandler
300from tqdm import tqdm
301rh = RequestHandler()
302
303# Initialize RequestJobs
304request_jobs = [
305    RequestJob(key="networkx", url="https://www.pypi.org/project/networkx/"),
306    RequestJob(key="pandas", url="https://www.pypi.org/project/pandas/"),
307    RequestJob(key="numpy", url="https://www.pypi.org/project/numpy/"),
308    RequestJob(key="matplotlib",
309               url="https://www.pypi.org/project/matplotlib/"),
310    RequestJob(key="scipy", url="https://www.pypi.org/project/scipy/"),
311    RequestJob(key="scikit-learn",
312               url="https://www.pypi.org/project/scikit-learn/"),
313    RequestJob(key="tensorflow",
314               url="https://www.pypi.org/project/tensorflow/"),
315    RequestJob(key="keras", url="https://www.pypi.org/project/keras/")
316]
317
318# Set number of workers
319num_workers = 4
320
321# Initialize progress bar
322progress_bar = tqdm(total=len(request_jobs))
323
324finalized_jobs = rh.do_requests(
325    request_jobs=request_jobs,
326    num_workers=num_workers,
327    progress_bar=progress_bar
328)
329```
330
331      0%|          | 0/8 [00:00<?, ?it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 7/8 [00:02<00:00,  3.36it/s]
332
333As a result we get a list of ResponseJob objects
334
335
336
337```python
338for job in finalized_jobs:
339    print(f'Key: {job.key}, URL: {job.url}, Response: {job.response}')
340```
341
342    Key: networkx, URL: https://www.pypi.org/project/networkx/, Response: <Response [200]>
343    Key: scipy, URL: https://www.pypi.org/project/scipy/, Response: <Response [200]>
344    Key: pandas, URL: https://www.pypi.org/project/pandas/, Response: <Response [200]>
345    Key: tensorflow, URL: https://www.pypi.org/project/tensorflow/, Response: <Response [200]>
346    Key: numpy, URL: https://www.pypi.org/project/numpy/, Response: <Response [200]>
347    Key: scikit-learn, URL: https://www.pypi.org/project/scikit-learn/, Response: <Response [200]>
348    Key: matplotlib, URL: https://www.pypi.org/project/matplotlib/, Response: <Response [200]>
349    Key: keras, URL: https://www.pypi.org/project/keras/, Response: <Response [200]>
350
351
352The Job object contains the response to request
353
354
355
356```python
357print(finalized_jobs[0].response.text[10000:20000])
358```
359
360     class="split-layout split-layout--middle package-description">
361        
362          <p class="package-description__summary">Python package for creating and manipulating graphs and networks</p>
363        
364        <div data-html-include="/_includes/edit-project-button/networkx">
365        </div>
366        </div>
367      </div>
368    </div>
369    
370    <div data-controller="project-tabs">
371      <div class="tabs-container">
372        <div class="vertical-tabs">
373          <div class="vertical-tabs__tabs">
374            <div class="sidebar-section">
375              <h3 class="sidebar-section__title">Navigation</h3>
376              <nav aria-label="Navigation for networkx">
377                <ul class="vertical-tabs__list" role="tablist">
378                  <li role="tab">
379                    <a id="description-tab" href="#description" data-project-tabs-target="tab" data-action="project-tabs#onTabClick" class="vertical-tabs__tab vertical-tabs__tab--with-icon vertical-tabs__tab--is-active" aria-selected="true" aria-label="Project description. Focus will be moved to the description.">
380                      <i class="fa fa-align-left" aria-hidden="true"></i>
381                      Project description
382                    </a>
383                  </li>
384                  <li role="tab">
385                    <a id="history-tab" href="#history" data-project-tabs-target="tab" data-action="project-tabs#onTabClick" class="vertical-tabs__tab vertical-tabs__tab--with-icon" aria-label="Release history. Focus will be moved to the history panel.">
386                      <i class="fa fa-history" aria-hidden="true"></i>
387                      Release history
388                    </a>
389                  </li>
390                  
391                  <li role="tab">
392                    <a id="files-tab" href="#files" data-project-tabs-target="tab" data-action="project-tabs#onTabClick" class="vertical-tabs__tab vertical-tabs__tab--with-icon" aria-label="Download files. Focus will be moved to the project files.">
393                      <i class="fa fa-download" aria-hidden="true"></i>
394                      Download files
395                    </a>
396                  </li>
397                  
398                </ul>
399              </nav>
400            </div>
401            
402...
403
404"""