olivia_finder.data_source
Description
This package is responsible for providing a base data structure for all the derivated classes whose purpose is the obtaining data from a specific source.
It is composed of several modules:
Package structure:
├── csv_ds.py
├── data_source.py
├── librariesio_ds.py
├── repository_scrapers
│ ├── bioconductor.py
│ ├── cran.py
│ ├── npm.py
│ ├── pypi.py
│ └── r.py
└── scraper_ds.py
Package modules:
data_source.py
Implements the abstract class Datasource, which is the base class of the rest of the implementations
csv_ds.py
Implement datasource for *.csv files
librariesio_ds.py
Implements datasource for the API of Libraries.io
scraper_ds.py
Implements the abstract class ScraperDataSource, which is the base class of customized implementations for each repository
repository_scraper/
Inside there are several implementations based on Datasource web Scraping for Cran, Bioconductor, NPM and PyPI
Doc pages
For more info see data_source package docs
Web Scraping-Based implementations
Constructor
The default constructor does not receive parameters
The number of optional parameters depends on the implementation, but as a rule we can define a name and a description (With the purpose of offering information)
The most relevant parameter is the RequestHandler object, which will use by the webscraping based DataSource to make requests to the website to which it refers
Implementation for CRAN
from olivia_finder.data_source.repository_scrapers.cran import CranScraper
cran_ds = CranScraper()
Implementation for Bioconductor
from olivia_finder.data_source.repository_scrapers.bioconductor import BioconductorScraper
bioconductor_scraper = BioconductorScraper()
Implementation for PyPi
from olivia_finder.data_source.repository_scrapers.pypi import PypiScraper
pypi_scraper = PypiScraper()
Implementation for NPM
from olivia_finder.data_source.repository_scrapers.npm import NpmScraper
npm_scraper = NpmScraper()
Github repository implementation
from olivia_finder.data_source.repository_scrapers.github import GithubScraper
github_scraper = GithubScraper()
Obtain package names
CRAN package names
cran_ds.obtain_package_names()[:10]
['A3',
'AalenJohansen',
'AATtools',
'ABACUS',
'abbreviate',
'abbyyR',
'abc',
'abc.data',
'ABC.RAP',
'ABCanalysis']
Bioconductor package names
bioconductor_scraper.obtain_package_names()[:10]
['ABSSeq',
'ABarray',
'ACE',
'ACME',
'ADAM',
'ADAMgui',
'ADImpute',
'ADaCGH2',
'AGDEX',
'AHMassBank']
PyPi package names
pypi_scraper.obtain_package_names()[:10]
['0',
'0-._.-._.-._.-._.-._.-._.-0',
'000',
'0.0.1',
'00101s',
'00print_lol',
'00SMALINUX',
'0101',
'01changer',
'01d61084-d29e-11e9-96d1-7c5cf84ffe8e']
NPM package names
Note:
- This process is very expensive, the implementation is functional but its use is not recommended unless it is necessary
- It is recommended to import the list of npm packets properctioned as a txt file
Output folder can be configured in config.ini file working_dir
# npm_scraper.obtain_package_names(
# page_size=100, # Number of packages to obtain per request
# save_chunks=True, # Save packages in a chunk file
# show_progress_bar=True # Show progress bar
# )[:10]
The file with the NPM package list is the following
!wc -l ../results/package_lists/npm_packages.txt
wc: ../results/package_lists/npm_packages.txt: No existe el archivo o el directorio
!tail -n 20 results/package_lists/npm_packages.txt
tail: no se puede abrir 'results/package_lists/npm_packages.txt' para lectura: No existe el archivo o el directorio
Obtain package data
CRAN data of A3 package
cran_ds.obtain_package_data('A3')
{'name': 'A3',
'version': '1.0.0',
'dependencies': [{'name': 'R', 'version': '≥ 2.15.0'},
{'name': 'xtable', 'version': ''},
{'name': 'pbapply', 'version': ''}],
'url': 'https://cran.r-project.org/package=A3'}
If the petition fails we will obtain None
non_existent_package = cran_ds.obtain_package_data('NON_EXISTENT_PACKAGE')
print(non_existent_package)
None
Bioconductor data of a4 package
bioconductor_scraper.obtain_package_data('a4')
{'name': 'a4',
'version': '1.48.0',
'dependencies': [{'name': 'a4Base', 'version': ''},
{'name': 'a4Preproc', 'version': ''},
{'name': 'a4Classif', 'version': ''},
{'name': 'a4Core', 'version': ''},
{'name': 'a4Reporting', 'version': ''}],
'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4.html'}
PyPi data od networkx package
pypi_scraper.obtain_package_data('networkx')
{'name': 'networkx',
'version': '3.1',
'url': 'https://pypi.org/project/networkx/',
'dependencies': [{'name': 'numpy', 'version': None},
{'name': 'scipy', 'version': None},
{'name': 'matplotlib', 'version': None},
{'name': 'pandas', 'version': None},
{'name': 'pre', 'version': None},
{'name': 'mypy', 'version': None},
{'name': 'sphinx', 'version': None},
{'name': 'pydata', 'version': None},
{'name': 'numpydoc', 'version': None},
{'name': 'pillow', 'version': None},
{'name': 'nb2plots', 'version': None},
{'name': 'texext', 'version': None},
{'name': 'lxml', 'version': None},
{'name': 'pygraphviz', 'version': None},
{'name': 'pydot', 'version': None},
{'name': 'sympy', 'version': None},
{'name': 'pytest', 'version': None},
{'name': 'codecov', 'version': None}]}
NPM data of aws-sdk package
npm_scraper.obtain_package_data('aws-sdk')
{'name': 'aws-sdk',
'version': '2.1406.0',
'dependencies': [{'name': 'buffer', 'version': '4.9.2'},
{'name': 'events', 'version': '1.1.1'},
{'name': 'ieee754', 'version': '1.1.13'},
{'name': 'jmespath', 'version': '0.16.0'},
{'name': 'querystring', 'version': '0.2.0'},
{'name': 'sax', 'version': '1.2.1'},
{'name': 'url', 'version': '0.10.3'},
{'name': 'util', 'version': '^0.12.4'},
{'name': 'uuid', 'version': '8.0.0'},
{'name': 'xml2js', 'version': '0.5.0'},
{'name': '@types/node', 'version': '6.0.92'},
{'name': 'browserify', 'version': '13.1.0'},
{'name': 'chai', 'version': '^3.0'},
{'name': 'codecov', 'version': '^3.8.2'},
{'name': 'coffeeify', 'version': '*'},
{'name': 'coffeescript', 'version': '^1.12.7'},
{'name': 'cucumber', 'version': '0.5.x'},
{'name': 'eslint', 'version': '^5.8.0'},
{'name': 'hash-test-vectors', 'version': '^1.3.2'},
{'name': 'insert-module-globals', 'version': '^7.0.0'},
{'name': 'istanbul', 'version': '*'},
{'name': 'jasmine', 'version': '^2.5.3'},
{'name': 'jasmine-core', 'version': '^2.5.2'},
{'name': 'json-loader', 'version': '^0.5.4'},
{'name': 'karma', 'version': '^4.1.0'},
{'name': 'karma-chrome-launcher', 'version': '2.2.0'},
{'name': 'karma-jasmine', 'version': '^1.1.0'},
{'name': 'mocha', 'version': '^3.0.0'},
{'name': 'repl.history', 'version': '*'},
{'name': 'semver', 'version': '*'},
{'name': 'typescript', 'version': '2.0.8'},
{'name': 'uglify-js', 'version': '2.x'},
{'name': 'webpack', 'version': '^1.15.0'}],
'url': 'https://www.npmjs.com/package/aws-sdk'}
Obtain a list of packages data
CRAN data for the packages A3, AER y a non existent package
cran_ds.obtain_packages_data(['A3', 'AER', "NON_EXISTING_PACKAGE"])
([{'name': 'A3',
'version': '1.0.0',
'dependencies': [{'name': 'R', 'version': '≥ 2.15.0'},
{'name': 'xtable', 'version': ''},
{'name': 'pbapply', 'version': ''}],
'url': 'https://cran.r-project.org/package=A3'},
{'name': 'AER',
'version': '1.2-10',
'dependencies': [{'name': 'R', 'version': '≥ 3.0.0'},
{'name': 'car', 'version': '≥ 2.0-19'},
{'name': 'lmtest', 'version': ''},
{'name': 'sandwich', 'version': '≥ 2.4-0'},
{'name': 'survival', 'version': '≥ 2.37-5'},
{'name': 'zoo', 'version': ''},
{'name': 'stats', 'version': ''},
{'name': 'Formula', 'version': '≥ 0.2-0'}],
'url': 'https://cran.r-project.org/package=AER'}],
['NON_EXISTING_PACKAGE'])
Bioconductor data for the packages TDARACNE, ASICS and a non existent package
from tqdm import tqdm
bioconductor_scraper.obtain_packages_data(
package_names=['a4', 'a4Preproc', 'a4Classif', 'a4Core', 'a4Base'],
progress_bar=tqdm(total=5)
)
100%|██████████| 5/5 [00:00<00:00, 5.79it/s]
([{'name': 'a4',
'version': '1.48.0',
'dependencies': [{'name': 'a4Base', 'version': ''},
{'name': 'a4Preproc', 'version': ''},
{'name': 'a4Classif', 'version': ''},
{'name': 'a4Core', 'version': ''},
{'name': 'a4Reporting', 'version': ''}],
'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4.html'},
{'name': 'a4Preproc',
'version': '1.48.0',
'dependencies': [{'name': 'BiocGenerics', 'version': ''},
{'name': 'Biobase', 'version': ''}],
'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4Preproc.html'},
{'name': 'a4Classif',
'version': '1.48.0',
'dependencies': [{'name': 'a4Core', 'version': ''},
{'name': 'a4Preproc', 'version': ''},
{'name': 'methods', 'version': ''},
{'name': 'Biobase', 'version': ''},
{'name': 'ROCR', 'version': ''},
{'name': 'pamr', 'version': ''},
{'name': 'glmnet', 'version': ''},
{'name': 'varSelRF', 'version': ''},
{'name': 'utils', 'version': ''},
{'name': 'graphics', 'version': ''},
{'name': 'stats', 'version': ''}],
'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4Classif.html'},
{'name': 'a4Core',
'version': '1.48.0',
'dependencies': [{'name': 'Biobase', 'version': ''},
{'name': 'glmnet', 'version': ''},
{'name': 'methods', 'version': ''},
{'name': 'stats', 'version': ''}],
'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4Core.html'},
{'name': 'a4Base',
'version': '1.48.0',
'dependencies': [{'name': 'a4Preproc', 'version': ''},
{'name': 'a4Core', 'version': ''},
{'name': 'methods', 'version': ''},
{'name': 'graphics', 'version': ''},
{'name': 'grid', 'version': ''},
{'name': 'Biobase', 'version': ''},
{'name': 'annaffy', 'version': ''},
{'name': 'mpm', 'version': ''},
{'name': 'genefilter', 'version': ''},
{'name': 'limma', 'version': ''},
{'name': 'multtest', 'version': ''},
{'name': 'glmnet', 'version': ''},
{'name': 'gplots', 'version': ''}],
'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4Base.html'}],
[])
pypi_scraper.obtain_packages_data(
['networkx', 'requests', "tqdm", "NON_EXISTING_PACKAGE"])
([{'name': 'networkx',
'version': '3.1',
'url': 'https://pypi.org/project/networkx/',
'dependencies': [{'name': 'numpy', 'version': None},
{'name': 'scipy', 'version': None},
{'name': 'matplotlib', 'version': None},
{'name': 'pandas', 'version': None},
{'name': 'pre', 'version': None},
{'name': 'mypy', 'version': None},
{'name': 'sphinx', 'version': None},
{'name': 'pydata', 'version': None},
{'name': 'numpydoc', 'version': None},
{'name': 'pillow', 'version': None},
{'name': 'nb2plots', 'version': None},
{'name': 'texext', 'version': None},
{'name': 'lxml', 'version': None},
{'name': 'pygraphviz', 'version': None},
{'name': 'pydot', 'version': None},
{'name': 'sympy', 'version': None},
{'name': 'pytest', 'version': None},
{'name': 'codecov', 'version': None}]},
{'name': 'requests',
'version': '2.31.0',
'url': 'https://pypi.org/project/requests/',
'dependencies': [{'name': 'charset', 'version': None},
{'name': 'idna', 'version': None},
{'name': 'urllib3', 'version': None},
{'name': 'certifi', 'version': None},
{'name': 'PySocks', 'version': None},
{'name': 'chardet', 'version': None}]},
{'name': 'tqdm',
'version': '4.65.0',
'url': 'https://pypi.org/project/tqdm/',
'dependencies': [{'name': 'colorama', 'version': None},
{'name': 'py', 'version': None},
{'name': 'twine', 'version': None},
{'name': 'wheel', 'version': None},
{'name': 'ipywidgets', 'version': None},
{'name': 'slack', 'version': None},
{'name': 'requests', 'version': None}]}],
['NON_EXISTING_PACKAGE'])
npm_scraper.obtain_packages_data(
['aws-sdk', 'request', "NON_EXISTING_PACKAGE"])
([{'name': 'aws-sdk',
'version': '2.1406.0',
'dependencies': [{'name': 'buffer', 'version': '4.9.2'},
{'name': 'events', 'version': '1.1.1'},
{'name': 'ieee754', 'version': '1.1.13'},
{'name': 'jmespath', 'version': '0.16.0'},
{'name': 'querystring', 'version': '0.2.0'},
{'name': 'sax', 'version': '1.2.1'},
{'name': 'url', 'version': '0.10.3'},
{'name': 'util', 'version': '^0.12.4'},
{'name': 'uuid', 'version': '8.0.0'},
{'name': 'xml2js', 'version': '0.5.0'},
{'name': '@types/node', 'version': '6.0.92'},
{'name': 'browserify', 'version': '13.1.0'},
{'name': 'chai', 'version': '^3.0'},
{'name': 'codecov', 'version': '^3.8.2'},
{'name': 'coffeeify', 'version': '*'},
{'name': 'coffeescript', 'version': '^1.12.7'},
{'name': 'cucumber', 'version': '0.5.x'},
{'name': 'eslint', 'version': '^5.8.0'},
{'name': 'hash-test-vectors', 'version': '^1.3.2'},
{'name': 'insert-module-globals', 'version': '^7.0.0'},
{'name': 'istanbul', 'version': '*'},
{'name': 'jasmine', 'version': '^2.5.3'},
{'name': 'jasmine-core', 'version': '^2.5.2'},
{'name': 'json-loader', 'version': '^0.5.4'},
{'name': 'karma', 'version': '^4.1.0'},
{'name': 'karma-chrome-launcher', 'version': '2.2.0'},
{'name': 'karma-jasmine', 'version': '^1.1.0'},
{'name': 'mocha', 'version': '^3.0.0'},
{'name': 'repl.history', 'version': '*'},
{'name': 'semver', 'version': '*'},
{'name': 'typescript', 'version': '2.0.8'},
{'name': 'uglify-js', 'version': '2.x'},
{'name': 'webpack', 'version': '^1.15.0'}],
'url': 'https://www.npmjs.com/package/aws-sdk'},
{'name': 'request',
'version': '2.88.2',
'dependencies': [{'name': 'aws-sign2', 'version': '~0.7.0'},
{'name': 'aws4', 'version': '^1.8.0'},
{'name': 'caseless', 'version': '~0.12.0'},
{'name': 'combined-stream', 'version': '~1.0.6'},
{'name': 'extend', 'version': '~3.0.2'},
{'name': 'forever-agent', 'version': '~0.6.1'},
{'name': 'form-data', 'version': '~2.3.2'},
{'name': 'har-validator', 'version': '~5.1.3'},
{'name': 'http-signature', 'version': '~1.2.0'},
{'name': 'is-typedarray', 'version': '~1.0.0'},
{'name': 'isstream', 'version': '~0.1.2'},
{'name': 'json-stringify-safe', 'version': '~5.0.1'},
{'name': 'mime-types', 'version': '~2.1.19'},
{'name': 'oauth-sign', 'version': '~0.9.0'},
{'name': 'performance-now', 'version': '^2.1.0'},
{'name': 'qs', 'version': '~6.5.2'},
{'name': 'safe-buffer', 'version': '^5.1.2'},
{'name': 'tough-cookie', 'version': '~2.5.0'},
{'name': 'tunnel-agent', 'version': '^0.6.0'},
{'name': 'uuid', 'version': '^3.3.2'},
{'name': 'bluebird', 'version': '^3.2.1'},
{'name': 'browserify', 'version': '^13.0.1'},
{'name': 'browserify-istanbul', 'version': '^2.0.0'},
{'name': 'buffer-equal', 'version': '^1.0.0'},
{'name': 'codecov', 'version': '^3.0.4'},
{'name': 'coveralls', 'version': '^3.0.2'},
{'name': 'function-bind', 'version': '^1.0.2'},
{'name': 'karma', 'version': '^3.0.0'},
{'name': 'karma-browserify', 'version': '^5.0.1'},
{'name': 'karma-cli', 'version': '^1.0.0'},
{'name': 'karma-coverage', 'version': '^1.0.0'},
{'name': 'karma-phantomjs-launcher', 'version': '^1.0.0'},
{'name': 'karma-tap', 'version': '^3.0.1'},
{'name': 'nyc', 'version': '^14.1.1'},
{'name': 'phantomjs-prebuilt', 'version': '^2.1.3'},
{'name': 'rimraf', 'version': '^2.2.8'},
{'name': 'server-destroy', 'version': '^1.0.1'},
{'name': 'standard', 'version': '^9.0.0'},
{'name': 'tape', 'version': '^4.6.0'},
{'name': 'taper', 'version': '^0.5.0'}],
'url': 'https://www.npmjs.com/package/request'}],
['NON_EXISTING_PACKAGE'])
CSV-Based implementation
Constructor
from olivia_finder.data_source.csv_ds import CSVDataSource
bioconductor_csv = CSVDataSource(
"aux_data/bioconductor_adjlist_test.csv", # Path to the CSV file
# Name of the field that contains the dependencies
dependent_field="name",
# Name of the field that contains the name of the package
dependency_field="dependency",
# Name of the field that contains the version of the package
dependent_version_field="version",
# Name of the field that contains the version of the dependency
dependency_version_field="dependency_version",
# Name of the field that contains the URL of the package
dependent_url_field="url",
)
Obtain package data
bioconductor_csv.obtain_package_data('BANDITS')
{'name': 'BANDITS',
'version': '1.16.0',
'url': 'https://www.bioconductor.org/packages/release/bioc/html/BANDITS.html',
'dependencies': [{'name': 'R', 'version': nan},
{'name': 'Rcpp', 'version': nan},
{'name': 'doRNG', 'version': nan},
{'name': 'MASS', 'version': nan},
{'name': 'data.table', 'version': nan},
{'name': 'R.utils', 'version': nan},
{'name': 'doParallel', 'version': nan},
{'name': 'parallel', 'version': nan},
{'name': 'foreach', 'version': nan},
{'name': 'methods', 'version': nan},
{'name': 'stats', 'version': nan},
{'name': 'graphics', 'version': nan},
{'name': 'ggplot2', 'version': nan},
{'name': 'DRIMSeq', 'version': '1.28.0'},
{'name': 'BiocParallel', 'version': '1.34.0'}]}
Obtain a list of packages data
bioconductor_csv.obtain_packages_data(
['BANDITS', 'ASICS', "NON_EXISTING_PACKAGE"])
([{'name': 'BANDITS',
'version': '1.16.0',
'url': 'https://www.bioconductor.org/packages/release/bioc/html/BANDITS.html',
'dependencies': [{'name': 'R', 'version': nan},
{'name': 'Rcpp', 'version': nan},
{'name': 'doRNG', 'version': nan},
{'name': 'MASS', 'version': nan},
{'name': 'data.table', 'version': nan},
{'name': 'R.utils', 'version': nan},
{'name': 'doParallel', 'version': nan},
{'name': 'parallel', 'version': nan},
{'name': 'foreach', 'version': nan},
{'name': 'methods', 'version': nan},
{'name': 'stats', 'version': nan},
{'name': 'graphics', 'version': nan},
{'name': 'ggplot2', 'version': nan},
{'name': 'DRIMSeq', 'version': '1.28.0'},
{'name': 'BiocParallel', 'version': '1.34.0'}]},
{'name': 'ASICS',
'version': '2.16.0',
'url': 'https://www.bioconductor.org/packages/release/bioc/html/ASICS.html',
'dependencies': [{'name': 'R', 'version': nan},
{'name': 'BiocParallel', 'version': '1.34.0'},
{'name': 'ggplot2', 'version': nan},
{'name': 'glmnet', 'version': nan},
{'name': 'grDevices', 'version': nan},
{'name': 'gridExtra', 'version': nan},
{'name': 'methods', 'version': nan},
{'name': 'mvtnorm', 'version': nan},
{'name': 'PepsNMR', 'version': '1.18.0'},
{'name': 'plyr', 'version': nan},
{'name': 'quadprog', 'version': nan},
{'name': 'ropls', 'version': '1.32.0'},
{'name': 'stats', 'version': nan},
{'name': 'SummarizedExperiment', 'version': '1.30.0'},
{'name': 'utils', 'version': nan},
{'name': 'Matrix', 'version': nan},
{'name': 'zoo', 'version': nan}]}],
['NON_EXISTING_PACKAGE'])
Web API-Based implementation (Libraries.io API)
Based on the Web API of Libraries.io we can obtain data from this source.
It is important to note that the data is not updated as a mandatory point to care about
Constructor
In this case, it is necessary to define the API Key of Libraries.io in the _config.ini_ file
from olivia_finder.data_source.librariesio_ds import LibrariesioDataSource
pypi_libio = LibrariesioDataSource(platform="pypi")
nuget_libio = LibrariesioDataSource(platform="nuget")
cran_libio = LibrariesioDataSource(platform="cran")
Obtain package names
This functionality has not been implemented because there is no way to get this data through the API
The library used to access API from Python has a search functionality but unfortunately it cannot be used efficiently for this task
# Set the apikey as an environment variable
from pybraries.search import Search
search = Search()
info = search.project_search(platform='pypi')
for project in info:
print(project['name'])
A string of keywords must be passed as a keyword argument
typescript
@types/node
eslint
webpack
prettier
@types/jest
@types/react
@babel/preset-typescript
@babel/runtime
jest
rxjs
postcss
vue-template-compiler
vue
axios
requests
moment
@types/react-dom
@types/mocha
babel-runtime
babel-preset-react
@babel/core
babel-core
@babel/preset-env
@babel/plugin-proposal-class-properties
@babel/plugin-transform-runtime
@babel/preset-react
babel-jest
commander
rollup
Obtain package data
pypi_libio.obtain_package_data('networkx')
{'name': 'networkx',
'version': '3.1rc0',
'dependencies': [{'name': 'codecov', 'version': '2.1.13'},
{'name': 'pytest-cov', 'version': '4.0.0'},
{'name': 'pytest', 'version': '7.4.0'},
{'name': 'sympy', 'version': '1.11.1'},
{'name': 'pydot', 'version': '0.9.10'},
{'name': 'pygraphviz', 'version': '1.3.1'},
{'name': 'lxml', 'version': '4.9.2'},
{'name': 'texext', 'version': '0.6.7'},
{'name': 'nb2plots', 'version': '0.6.1'},
{'name': 'pillow', 'version': '9.5.0'},
{'name': 'numpydoc', 'version': '1.5.0'},
{'name': 'sphinx-gallery', 'version': '0.13.0'},
{'name': 'pydata-sphinx-theme', 'version': '0.13.3'},
{'name': 'sphinx', 'version': '7.0.1'},
{'name': 'mypy', 'version': '1.4.1'},
{'name': 'pre-commit', 'version': '3.3.3'},
{'name': 'pandas', 'version': '2.0.1'},
{'name': 'matplotlib', 'version': '3.7.1'},
{'name': 'scipy', 'version': '1.11.0'},
{'name': 'numpy', 'version': '1.25.0'}],
'url': 'https://pypi.org/project/networkx/'}
nuget_libio.obtain_package_data('Microsoft.Extensions.DependencyInjection')
{'name': 'Microsoft.Extensions.DependencyInjection',
'version': '8.0.0-preview.5.23280.8',
'dependencies': [{'name': 'System.Threading.Tasks.Extensions',
'version': '4.5.4'},
{'name': 'Microsoft.Extensions.DependencyInjection.Abstractions',
'version': '3.1.32'},
{'name': 'Microsoft.Bcl.AsyncInterfaces', 'version': '7.0.0'}],
'url': 'https://www.nuget.org/packages/Microsoft.Extensions.DependencyInjection/'}
Obtain a list of packages data
cran_libio.obtain_packages_data(['A3', 'AER', "NON_EXISTING_PACKAGE"])
[{'name': 'A3',
'version': '1.0.0',
'dependencies': [{'name': 'R', 'version': None},
{'name': 'randomForest', 'version': None}],
'url': 'https://cran.r-project.org/package=A3'},
{'name': 'AER',
'version': '1.2-9',
'dependencies': [{'name': 'vars', 'version': '0.5.3'},
{'name': 'urca', 'version': None},
{'name': 'tseries', 'version': None},
{'name': 'truncreg', 'version': None},
{'name': 'systemfit', 'version': None},
{'name': 'strucchange', 'version': None},
{'name': 'scatterplot3d', 'version': '0.3.4'},
{'name': 'sampleSelection', 'version': None},
{'name': 'rugarch', 'version': None},
{'name': 'ROCR', 'version': None},
{'name': 'rgl', 'version': '0.109.2'},
{'name': 'quantreg', 'version': '5.42.1'},
{'name': 'pscl', 'version': '1.5.5'},
{'name': 'plm', 'version': None},
{'name': 'np', 'version': None},
{'name': 'nnet', 'version': None},
{'name': 'nlme', 'version': None},
{'name': 'mlogit', 'version': None},
{'name': 'MASS', 'version': None},
{'name': 'longmemo', 'version': None},
{'name': 'lattice', 'version': None},
{'name': 'KernSmooth', 'version': None},
{'name': 'ineq', 'version': None},
{'name': 'foreign', 'version': None},
{'name': 'forecast', 'version': '8.17.0'},
{'name': 'fGarch', 'version': '3042.83.2'},
{'name': 'effects', 'version': None},
{'name': 'dynlm', 'version': None},
{'name': 'boot', 'version': None},
{'name': 'Formula', 'version': None},
{'name': 'stats', 'version': None},
{'name': 'zoo', 'version': None},
{'name': 'survival', 'version': None},
{'name': 'sandwich', 'version': None},
{'name': 'lmtest', 'version': None},
{'name': 'car', 'version': None},
{'name': 'R', 'version': None}],
'url': 'https://cran.r-project.org/package=AER'}]
1''' 2 3## Description 4 5 6This package is responsible for providing a base data structure for all the derivated classes whose purpose is the obtaining data from a specific source. 7 8It is composed of several modules: 9 10##### Package structure: 11 12```data_source 13├── csv_ds.py 14├── data_source.py 15├── librariesio_ds.py 16├── repository_scrapers 17│ ├── bioconductor.py 18│ ├── cran.py 19│ ├── npm.py 20│ ├── pypi.py 21│ └── r.py 22└── scraper_ds.py 23``` 24 25##### Package modules: 26 27- **data_source.py** 28 29 Implements the abstract class Datasource, which is the base class of the rest of the implementations 30 31- **csv_ds.py** 32 33 Implement datasource for \*.csv files 34 35- **librariesio_ds.py** 36 37 Implements datasource for the API of Libraries.io 38 39- **scraper_ds.py** 40 41 Implements the abstract class ScraperDataSource, which is the base class of customized implementations for each repository 42 43- **repository_scraper/** 44 45 Inside there are several implementations based on Datasource web Scraping for Cran, Bioconductor, NPM and PyPI 46 47##### Doc pages 48 49For more info see [data_source package docs](https://dab0012.github.io/olivia-finder/olivia_finder/data_source/data_source_module.html) 50 51 52 53## Web Scraping-Based implementations 54 55 56### Constructor 57 58 59- The default constructor does not receive parameters 60 61- The number of optional parameters depends on the implementation, but as a rule we can define a name and a description (With the purpose of offering information) 62 63- The most relevant parameter is the RequestHandler object, which will use by the webscraping based DataSource to make requests to the website to which it refers 64 65 66Implementation for CRAN 67 68 69 70```python 71from olivia_finder.data_source.repository_scrapers.cran import CranScraper 72cran_ds = CranScraper() 73``` 74 75Implementation for Bioconductor 76 77 78 79```python 80from olivia_finder.data_source.repository_scrapers.bioconductor import BioconductorScraper 81bioconductor_scraper = BioconductorScraper() 82``` 83 84Implementation for PyPi 85 86 87 88```python 89from olivia_finder.data_source.repository_scrapers.pypi import PypiScraper 90pypi_scraper = PypiScraper() 91``` 92 93Implementation for NPM 94 95 96 97```python 98from olivia_finder.data_source.repository_scrapers.npm import NpmScraper 99npm_scraper = NpmScraper() 100``` 101 102Github repository implementation 103 104 105```python 106from olivia_finder.data_source.repository_scrapers.github import GithubScraper 107github_scraper = GithubScraper() 108``` 109 110### Obtain package names 111 112 113CRAN package names 114 115 116 117```python 118cran_ds.obtain_package_names()[:10] 119``` 120 121 122 123 124 ['A3', 125 'AalenJohansen', 126 'AATtools', 127 'ABACUS', 128 'abbreviate', 129 'abbyyR', 130 'abc', 131 'abc.data', 132 'ABC.RAP', 133 'ABCanalysis'] 134 135 136 137Bioconductor package names 138 139 140 141```python 142bioconductor_scraper.obtain_package_names()[:10] 143``` 144 145 146 147 148 ['ABSSeq', 149 'ABarray', 150 'ACE', 151 'ACME', 152 'ADAM', 153 'ADAMgui', 154 'ADImpute', 155 'ADaCGH2', 156 'AGDEX', 157 'AHMassBank'] 158 159 160 161PyPi package names 162 163 164 165```python 166pypi_scraper.obtain_package_names()[:10] 167``` 168 169 170 171 172 ['0', 173 '0-._.-._.-._.-._.-._.-._.-0', 174 '000', 175 '0.0.1', 176 '00101s', 177 '00print_lol', 178 '00SMALINUX', 179 '0101', 180 '01changer', 181 '01d61084-d29e-11e9-96d1-7c5cf84ffe8e'] 182 183 184 185NPM package names 186 187<span style="color: red">Note:</span> 188 189- This process is very expensive, the implementation is functional but its use is not recommended unless it is necessary 190- It is recommended to import the list of npm packets properctioned as a txt file 191 192Output folder can be configured in `config.ini` file `working_dir` 193 194 195 196```python 197# npm_scraper.obtain_package_names( 198# page_size=100, # Number of packages to obtain per request 199# save_chunks=True, # Save packages in a chunk file 200# show_progress_bar=True # Show progress bar 201# )[:10] 202``` 203 204The file with the NPM package list is the following 205 206 207 208```python 209!wc -l ../results/package_lists/npm_packages.txt 210``` 211 212 wc: ../results/package_lists/npm_packages.txt: No existe el archivo o el directorio 213 214 215 216```python 217!tail -n 20 results/package_lists/npm_packages.txt 218``` 219 220 tail: no se puede abrir 'results/package_lists/npm_packages.txt' para lectura: No existe el archivo o el directorio 221 222 223### Obtain package data 224 225 226CRAN data of A3 package 227 228 229 230```python 231cran_ds.obtain_package_data('A3') 232``` 233 234 235 236 237 {'name': 'A3', 238 'version': '1.0.0', 239 'dependencies': [{'name': 'R', 'version': '≥ 2.15.0'}, 240 {'name': 'xtable', 'version': ''}, 241 {'name': 'pbapply', 'version': ''}], 242 'url': 'https://cran.r-project.org/package=A3'} 243 244 245 246If the petition fails we will obtain None 247 248 249 250```python 251non_existent_package = cran_ds.obtain_package_data('NON_EXISTENT_PACKAGE') 252print(non_existent_package) 253``` 254 255 None 256 257 258Bioconductor data of a4 package 259 260 261 262```python 263bioconductor_scraper.obtain_package_data('a4') 264``` 265 266 267 268 269 {'name': 'a4', 270 'version': '1.48.0', 271 'dependencies': [{'name': 'a4Base', 'version': ''}, 272 {'name': 'a4Preproc', 'version': ''}, 273 {'name': 'a4Classif', 'version': ''}, 274 {'name': 'a4Core', 'version': ''}, 275 {'name': 'a4Reporting', 'version': ''}], 276 'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4.html'} 277 278 279 280PyPi data od networkx package 281 282 283 284```python 285pypi_scraper.obtain_package_data('networkx') 286``` 287 288 289 290 291 {'name': 'networkx', 292 'version': '3.1', 293 'url': 'https://pypi.org/project/networkx/', 294 'dependencies': [{'name': 'numpy', 'version': None}, 295 {'name': 'scipy', 'version': None}, 296 {'name': 'matplotlib', 'version': None}, 297 {'name': 'pandas', 'version': None}, 298 {'name': 'pre', 'version': None}, 299 {'name': 'mypy', 'version': None}, 300 {'name': 'sphinx', 'version': None}, 301 {'name': 'pydata', 'version': None}, 302 {'name': 'numpydoc', 'version': None}, 303 {'name': 'pillow', 'version': None}, 304 {'name': 'nb2plots', 'version': None}, 305 {'name': 'texext', 'version': None}, 306 {'name': 'lxml', 'version': None}, 307 {'name': 'pygraphviz', 'version': None}, 308 {'name': 'pydot', 'version': None}, 309 {'name': 'sympy', 'version': None}, 310 {'name': 'pytest', 'version': None}, 311 {'name': 'codecov', 'version': None}]} 312 313 314 315NPM data of aws-sdk package 316 317 318 319```python 320npm_scraper.obtain_package_data('aws-sdk') 321``` 322 323 324 325 326 {'name': 'aws-sdk', 327 'version': '2.1406.0', 328 'dependencies': [{'name': 'buffer', 'version': '4.9.2'}, 329 {'name': 'events', 'version': '1.1.1'}, 330 {'name': 'ieee754', 'version': '1.1.13'}, 331 {'name': 'jmespath', 'version': '0.16.0'}, 332 {'name': 'querystring', 'version': '0.2.0'}, 333 {'name': 'sax', 'version': '1.2.1'}, 334 {'name': 'url', 'version': '0.10.3'}, 335 {'name': 'util', 'version': '^0.12.4'}, 336 {'name': 'uuid', 'version': '8.0.0'}, 337 {'name': 'xml2js', 'version': '0.5.0'}, 338 {'name': '@types/node', 'version': '6.0.92'}, 339 {'name': 'browserify', 'version': '13.1.0'}, 340 {'name': 'chai', 'version': '^3.0'}, 341 {'name': 'codecov', 'version': '^3.8.2'}, 342 {'name': 'coffeeify', 'version': '*'}, 343 {'name': 'coffeescript', 'version': '^1.12.7'}, 344 {'name': 'cucumber', 'version': '0.5.x'}, 345 {'name': 'eslint', 'version': '^5.8.0'}, 346 {'name': 'hash-test-vectors', 'version': '^1.3.2'}, 347 {'name': 'insert-module-globals', 'version': '^7.0.0'}, 348 {'name': 'istanbul', 'version': '*'}, 349 {'name': 'jasmine', 'version': '^2.5.3'}, 350 {'name': 'jasmine-core', 'version': '^2.5.2'}, 351 {'name': 'json-loader', 'version': '^0.5.4'}, 352 {'name': 'karma', 'version': '^4.1.0'}, 353 {'name': 'karma-chrome-launcher', 'version': '2.2.0'}, 354 {'name': 'karma-jasmine', 'version': '^1.1.0'}, 355 {'name': 'mocha', 'version': '^3.0.0'}, 356 {'name': 'repl.history', 'version': '*'}, 357 {'name': 'semver', 'version': '*'}, 358 {'name': 'typescript', 'version': '2.0.8'}, 359 {'name': 'uglify-js', 'version': '2.x'}, 360 {'name': 'webpack', 'version': '^1.15.0'}], 361 'url': 'https://www.npmjs.com/package/aws-sdk'} 362 363 364 365### Obtain a list of packages data 366 367 368CRAN data for the packages A3, AER y a non existent package 369 370 371 372```python 373cran_ds.obtain_packages_data(['A3', 'AER', "NON_EXISTING_PACKAGE"]) 374``` 375 376 377 378 379 ([{'name': 'A3', 380 'version': '1.0.0', 381 'dependencies': [{'name': 'R', 'version': '≥ 2.15.0'}, 382 {'name': 'xtable', 'version': ''}, 383 {'name': 'pbapply', 'version': ''}], 384 'url': 'https://cran.r-project.org/package=A3'}, 385 {'name': 'AER', 386 'version': '1.2-10', 387 'dependencies': [{'name': 'R', 'version': '≥ 3.0.0'}, 388 {'name': 'car', 'version': '≥ 2.0-19'}, 389 {'name': 'lmtest', 'version': ''}, 390 {'name': 'sandwich', 'version': '≥ 2.4-0'}, 391 {'name': 'survival', 'version': '≥ 2.37-5'}, 392 {'name': 'zoo', 'version': ''}, 393 {'name': 'stats', 'version': ''}, 394 {'name': 'Formula', 'version': '≥ 0.2-0'}], 395 'url': 'https://cran.r-project.org/package=AER'}], 396 ['NON_EXISTING_PACKAGE']) 397 398 399 400Bioconductor data for the packages TDARACNE, ASICS and a non existent package 401 402 403 404```python 405from tqdm import tqdm 406 407bioconductor_scraper.obtain_packages_data( 408 package_names=['a4', 'a4Preproc', 'a4Classif', 'a4Core', 'a4Base'], 409 progress_bar=tqdm(total=5) 410) 411``` 412 413 100%|██████████| 5/5 [00:00<00:00, 5.79it/s] 414 415 416 417 418 419 ([{'name': 'a4', 420 'version': '1.48.0', 421 'dependencies': [{'name': 'a4Base', 'version': ''}, 422 {'name': 'a4Preproc', 'version': ''}, 423 {'name': 'a4Classif', 'version': ''}, 424 {'name': 'a4Core', 'version': ''}, 425 {'name': 'a4Reporting', 'version': ''}], 426 'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4.html'}, 427 {'name': 'a4Preproc', 428 'version': '1.48.0', 429 'dependencies': [{'name': 'BiocGenerics', 'version': ''}, 430 {'name': 'Biobase', 'version': ''}], 431 'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4Preproc.html'}, 432 {'name': 'a4Classif', 433 'version': '1.48.0', 434 'dependencies': [{'name': 'a4Core', 'version': ''}, 435 {'name': 'a4Preproc', 'version': ''}, 436 {'name': 'methods', 'version': ''}, 437 {'name': 'Biobase', 'version': ''}, 438 {'name': 'ROCR', 'version': ''}, 439 {'name': 'pamr', 'version': ''}, 440 {'name': 'glmnet', 'version': ''}, 441 {'name': 'varSelRF', 'version': ''}, 442 {'name': 'utils', 'version': ''}, 443 {'name': 'graphics', 'version': ''}, 444 {'name': 'stats', 'version': ''}], 445 'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4Classif.html'}, 446 {'name': 'a4Core', 447 'version': '1.48.0', 448 'dependencies': [{'name': 'Biobase', 'version': ''}, 449 {'name': 'glmnet', 'version': ''}, 450 {'name': 'methods', 'version': ''}, 451 {'name': 'stats', 'version': ''}], 452 'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4Core.html'}, 453 {'name': 'a4Base', 454 'version': '1.48.0', 455 'dependencies': [{'name': 'a4Preproc', 'version': ''}, 456 {'name': 'a4Core', 'version': ''}, 457 {'name': 'methods', 'version': ''}, 458 {'name': 'graphics', 'version': ''}, 459 {'name': 'grid', 'version': ''}, 460 {'name': 'Biobase', 'version': ''}, 461 {'name': 'annaffy', 'version': ''}, 462 {'name': 'mpm', 'version': ''}, 463 {'name': 'genefilter', 'version': ''}, 464 {'name': 'limma', 'version': ''}, 465 {'name': 'multtest', 'version': ''}, 466 {'name': 'glmnet', 'version': ''}, 467 {'name': 'gplots', 'version': ''}], 468 'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4Base.html'}], 469 []) 470 471 472 473 474```python 475pypi_scraper.obtain_packages_data( 476 ['networkx', 'requests', "tqdm", "NON_EXISTING_PACKAGE"]) 477``` 478 479 480 481 482 ([{'name': 'networkx', 483 'version': '3.1', 484 'url': 'https://pypi.org/project/networkx/', 485 'dependencies': [{'name': 'numpy', 'version': None}, 486 {'name': 'scipy', 'version': None}, 487 {'name': 'matplotlib', 'version': None}, 488 {'name': 'pandas', 'version': None}, 489 {'name': 'pre', 'version': None}, 490 {'name': 'mypy', 'version': None}, 491 {'name': 'sphinx', 'version': None}, 492 {'name': 'pydata', 'version': None}, 493 {'name': 'numpydoc', 'version': None}, 494 {'name': 'pillow', 'version': None}, 495 {'name': 'nb2plots', 'version': None}, 496 {'name': 'texext', 'version': None}, 497 {'name': 'lxml', 'version': None}, 498 {'name': 'pygraphviz', 'version': None}, 499 {'name': 'pydot', 'version': None}, 500 {'name': 'sympy', 'version': None}, 501 {'name': 'pytest', 'version': None}, 502 {'name': 'codecov', 'version': None}]}, 503 {'name': 'requests', 504 'version': '2.31.0', 505 'url': 'https://pypi.org/project/requests/', 506 'dependencies': [{'name': 'charset', 'version': None}, 507 {'name': 'idna', 'version': None}, 508 {'name': 'urllib3', 'version': None}, 509 {'name': 'certifi', 'version': None}, 510 {'name': 'PySocks', 'version': None}, 511 {'name': 'chardet', 'version': None}]}, 512 {'name': 'tqdm', 513 'version': '4.65.0', 514 'url': 'https://pypi.org/project/tqdm/', 515 'dependencies': [{'name': 'colorama', 'version': None}, 516 {'name': 'py', 'version': None}, 517 {'name': 'twine', 'version': None}, 518 {'name': 'wheel', 'version': None}, 519 {'name': 'ipywidgets', 'version': None}, 520 {'name': 'slack', 'version': None}, 521 {'name': 'requests', 'version': None}]}], 522 ['NON_EXISTING_PACKAGE']) 523 524 525 526 527```python 528npm_scraper.obtain_packages_data( 529 ['aws-sdk', 'request', "NON_EXISTING_PACKAGE"]) 530``` 531 532 533 534 535 ([{'name': 'aws-sdk', 536 'version': '2.1406.0', 537 'dependencies': [{'name': 'buffer', 'version': '4.9.2'}, 538 {'name': 'events', 'version': '1.1.1'}, 539 {'name': 'ieee754', 'version': '1.1.13'}, 540 {'name': 'jmespath', 'version': '0.16.0'}, 541 {'name': 'querystring', 'version': '0.2.0'}, 542 {'name': 'sax', 'version': '1.2.1'}, 543 {'name': 'url', 'version': '0.10.3'}, 544 {'name': 'util', 'version': '^0.12.4'}, 545 {'name': 'uuid', 'version': '8.0.0'}, 546 {'name': 'xml2js', 'version': '0.5.0'}, 547 {'name': '@types/node', 'version': '6.0.92'}, 548 {'name': 'browserify', 'version': '13.1.0'}, 549 {'name': 'chai', 'version': '^3.0'}, 550 {'name': 'codecov', 'version': '^3.8.2'}, 551 {'name': 'coffeeify', 'version': '*'}, 552 {'name': 'coffeescript', 'version': '^1.12.7'}, 553 {'name': 'cucumber', 'version': '0.5.x'}, 554 {'name': 'eslint', 'version': '^5.8.0'}, 555 {'name': 'hash-test-vectors', 'version': '^1.3.2'}, 556 {'name': 'insert-module-globals', 'version': '^7.0.0'}, 557 {'name': 'istanbul', 'version': '*'}, 558 {'name': 'jasmine', 'version': '^2.5.3'}, 559 {'name': 'jasmine-core', 'version': '^2.5.2'}, 560 {'name': 'json-loader', 'version': '^0.5.4'}, 561 {'name': 'karma', 'version': '^4.1.0'}, 562 {'name': 'karma-chrome-launcher', 'version': '2.2.0'}, 563 {'name': 'karma-jasmine', 'version': '^1.1.0'}, 564 {'name': 'mocha', 'version': '^3.0.0'}, 565 {'name': 'repl.history', 'version': '*'}, 566 {'name': 'semver', 'version': '*'}, 567 {'name': 'typescript', 'version': '2.0.8'}, 568 {'name': 'uglify-js', 'version': '2.x'}, 569 {'name': 'webpack', 'version': '^1.15.0'}], 570 'url': 'https://www.npmjs.com/package/aws-sdk'}, 571 {'name': 'request', 572 'version': '2.88.2', 573 'dependencies': [{'name': 'aws-sign2', 'version': '~0.7.0'}, 574 {'name': 'aws4', 'version': '^1.8.0'}, 575 {'name': 'caseless', 'version': '~0.12.0'}, 576 {'name': 'combined-stream', 'version': '~1.0.6'}, 577 {'name': 'extend', 'version': '~3.0.2'}, 578 {'name': 'forever-agent', 'version': '~0.6.1'}, 579 {'name': 'form-data', 'version': '~2.3.2'}, 580 {'name': 'har-validator', 'version': '~5.1.3'}, 581 {'name': 'http-signature', 'version': '~1.2.0'}, 582 {'name': 'is-typedarray', 'version': '~1.0.0'}, 583 {'name': 'isstream', 'version': '~0.1.2'}, 584 {'name': 'json-stringify-safe', 'version': '~5.0.1'}, 585 {'name': 'mime-types', 'version': '~2.1.19'}, 586 {'name': 'oauth-sign', 'version': '~0.9.0'}, 587 {'name': 'performance-now', 'version': '^2.1.0'}, 588 {'name': 'qs', 'version': '~6.5.2'}, 589 {'name': 'safe-buffer', 'version': '^5.1.2'}, 590 {'name': 'tough-cookie', 'version': '~2.5.0'}, 591 {'name': 'tunnel-agent', 'version': '^0.6.0'}, 592 {'name': 'uuid', 'version': '^3.3.2'}, 593 {'name': 'bluebird', 'version': '^3.2.1'}, 594 {'name': 'browserify', 'version': '^13.0.1'}, 595 {'name': 'browserify-istanbul', 'version': '^2.0.0'}, 596 {'name': 'buffer-equal', 'version': '^1.0.0'}, 597 {'name': 'codecov', 'version': '^3.0.4'}, 598 {'name': 'coveralls', 'version': '^3.0.2'}, 599 {'name': 'function-bind', 'version': '^1.0.2'}, 600 {'name': 'karma', 'version': '^3.0.0'}, 601 {'name': 'karma-browserify', 'version': '^5.0.1'}, 602 {'name': 'karma-cli', 'version': '^1.0.0'}, 603 {'name': 'karma-coverage', 'version': '^1.0.0'}, 604 {'name': 'karma-phantomjs-launcher', 'version': '^1.0.0'}, 605 {'name': 'karma-tap', 'version': '^3.0.1'}, 606 {'name': 'nyc', 'version': '^14.1.1'}, 607 {'name': 'phantomjs-prebuilt', 'version': '^2.1.3'}, 608 {'name': 'rimraf', 'version': '^2.2.8'}, 609 {'name': 'server-destroy', 'version': '^1.0.1'}, 610 {'name': 'standard', 'version': '^9.0.0'}, 611 {'name': 'tape', 'version': '^4.6.0'}, 612 {'name': 'taper', 'version': '^0.5.0'}], 613 'url': 'https://www.npmjs.com/package/request'}], 614 ['NON_EXISTING_PACKAGE']) 615 616 617 618## CSV-Based implementation 619 620 621### Constructor 622 623 624 625```python 626from olivia_finder.data_source.csv_ds import CSVDataSource 627bioconductor_csv = CSVDataSource( 628 "aux_data/bioconductor_adjlist_test.csv", # Path to the CSV file 629 # Name of the field that contains the dependencies 630 dependent_field="name", 631 # Name of the field that contains the name of the package 632 dependency_field="dependency", 633 # Name of the field that contains the version of the package 634 dependent_version_field="version", 635 # Name of the field that contains the version of the dependency 636 dependency_version_field="dependency_version", 637 # Name of the field that contains the URL of the package 638 dependent_url_field="url", 639) 640``` 641 642### Obtain package data 643 644 645 646```python 647bioconductor_csv.obtain_package_data('BANDITS') 648``` 649 650 651 652 653 {'name': 'BANDITS', 654 'version': '1.16.0', 655 'url': 'https://www.bioconductor.org/packages/release/bioc/html/BANDITS.html', 656 'dependencies': [{'name': 'R', 'version': nan}, 657 {'name': 'Rcpp', 'version': nan}, 658 {'name': 'doRNG', 'version': nan}, 659 {'name': 'MASS', 'version': nan}, 660 {'name': 'data.table', 'version': nan}, 661 {'name': 'R.utils', 'version': nan}, 662 {'name': 'doParallel', 'version': nan}, 663 {'name': 'parallel', 'version': nan}, 664 {'name': 'foreach', 'version': nan}, 665 {'name': 'methods', 'version': nan}, 666 {'name': 'stats', 'version': nan}, 667 {'name': 'graphics', 'version': nan}, 668 {'name': 'ggplot2', 'version': nan}, 669 {'name': 'DRIMSeq', 'version': '1.28.0'}, 670 {'name': 'BiocParallel', 'version': '1.34.0'}]} 671 672 673 674### Obtain a list of packages data 675 676 677 678```python 679bioconductor_csv.obtain_packages_data( 680 ['BANDITS', 'ASICS', "NON_EXISTING_PACKAGE"]) 681``` 682 683 684 685 686 ([{'name': 'BANDITS', 687 'version': '1.16.0', 688 'url': 'https://www.bioconductor.org/packages/release/bioc/html/BANDITS.html', 689 'dependencies': [{'name': 'R', 'version': nan}, 690 {'name': 'Rcpp', 'version': nan}, 691 {'name': 'doRNG', 'version': nan}, 692 {'name': 'MASS', 'version': nan}, 693 {'name': 'data.table', 'version': nan}, 694 {'name': 'R.utils', 'version': nan}, 695 {'name': 'doParallel', 'version': nan}, 696 {'name': 'parallel', 'version': nan}, 697 {'name': 'foreach', 'version': nan}, 698 {'name': 'methods', 'version': nan}, 699 {'name': 'stats', 'version': nan}, 700 {'name': 'graphics', 'version': nan}, 701 {'name': 'ggplot2', 'version': nan}, 702 {'name': 'DRIMSeq', 'version': '1.28.0'}, 703 {'name': 'BiocParallel', 'version': '1.34.0'}]}, 704 {'name': 'ASICS', 705 'version': '2.16.0', 706 'url': 'https://www.bioconductor.org/packages/release/bioc/html/ASICS.html', 707 'dependencies': [{'name': 'R', 'version': nan}, 708 {'name': 'BiocParallel', 'version': '1.34.0'}, 709 {'name': 'ggplot2', 'version': nan}, 710 {'name': 'glmnet', 'version': nan}, 711 {'name': 'grDevices', 'version': nan}, 712 {'name': 'gridExtra', 'version': nan}, 713 {'name': 'methods', 'version': nan}, 714 {'name': 'mvtnorm', 'version': nan}, 715 {'name': 'PepsNMR', 'version': '1.18.0'}, 716 {'name': 'plyr', 'version': nan}, 717 {'name': 'quadprog', 'version': nan}, 718 {'name': 'ropls', 'version': '1.32.0'}, 719 {'name': 'stats', 'version': nan}, 720 {'name': 'SummarizedExperiment', 'version': '1.30.0'}, 721 {'name': 'utils', 'version': nan}, 722 {'name': 'Matrix', 'version': nan}, 723 {'name': 'zoo', 'version': nan}]}], 724 ['NON_EXISTING_PACKAGE']) 725 726 727 728## Web API-Based implementation (Libraries.io API) 729 730 731Based on the Web API of Libraries.io we can obtain data from this source. 732 733It is important to note that the data is not updated as a mandatory point to care about 734 735 736### Constructor 737 738 739In this case, it is necessary to define the API Key of Libraries.io in the _config.ini_ file 740 741 742 743```python 744from olivia_finder.data_source.librariesio_ds import LibrariesioDataSource 745 746pypi_libio = LibrariesioDataSource(platform="pypi") 747nuget_libio = LibrariesioDataSource(platform="nuget") 748cran_libio = LibrariesioDataSource(platform="cran") 749``` 750 751### Obtain package names 752 753 754<p style="color:red"> 755This functionality has not been implemented because there is no way to get this data through the API 756</p> 757 758 759The library used to access API from Python has a search functionality but unfortunately it cannot be used efficiently for this task 760 761 762 763```python 764# Set the apikey as an environment variable 765from pybraries.search import Search 766 767search = Search() 768info = search.project_search(platform='pypi') 769 770for project in info: 771 print(project['name']) 772``` 773 774 A string of keywords must be passed as a keyword argument 775 typescript 776 @types/node 777 eslint 778 webpack 779 prettier 780 @types/jest 781 @types/react 782 @babel/preset-typescript 783 @babel/runtime 784 jest 785 rxjs 786 postcss 787 vue-template-compiler 788 vue 789 axios 790 requests 791 moment 792 @types/react-dom 793 @types/mocha 794 babel-runtime 795 babel-preset-react 796 @babel/core 797 babel-core 798 @babel/preset-env 799 @babel/plugin-proposal-class-properties 800 @babel/plugin-transform-runtime 801 @babel/preset-react 802 babel-jest 803 commander 804 rollup 805 806 807### Obtain package data 808 809 810 811```python 812pypi_libio.obtain_package_data('networkx') 813``` 814 815 816 817 818 {'name': 'networkx', 819 'version': '3.1rc0', 820 'dependencies': [{'name': 'codecov', 'version': '2.1.13'}, 821 {'name': 'pytest-cov', 'version': '4.0.0'}, 822 {'name': 'pytest', 'version': '7.4.0'}, 823 {'name': 'sympy', 'version': '1.11.1'}, 824 {'name': 'pydot', 'version': '0.9.10'}, 825 {'name': 'pygraphviz', 'version': '1.3.1'}, 826 {'name': 'lxml', 'version': '4.9.2'}, 827 {'name': 'texext', 'version': '0.6.7'}, 828 {'name': 'nb2plots', 'version': '0.6.1'}, 829 {'name': 'pillow', 'version': '9.5.0'}, 830 {'name': 'numpydoc', 'version': '1.5.0'}, 831 {'name': 'sphinx-gallery', 'version': '0.13.0'}, 832 {'name': 'pydata-sphinx-theme', 'version': '0.13.3'}, 833 {'name': 'sphinx', 'version': '7.0.1'}, 834 {'name': 'mypy', 'version': '1.4.1'}, 835 {'name': 'pre-commit', 'version': '3.3.3'}, 836 {'name': 'pandas', 'version': '2.0.1'}, 837 {'name': 'matplotlib', 'version': '3.7.1'}, 838 {'name': 'scipy', 'version': '1.11.0'}, 839 {'name': 'numpy', 'version': '1.25.0'}], 840 'url': 'https://pypi.org/project/networkx/'} 841 842 843 844 845```python 846nuget_libio.obtain_package_data('Microsoft.Extensions.DependencyInjection') 847``` 848 849 850 851 852 {'name': 'Microsoft.Extensions.DependencyInjection', 853 'version': '8.0.0-preview.5.23280.8', 854 'dependencies': [{'name': 'System.Threading.Tasks.Extensions', 855 'version': '4.5.4'}, 856 {'name': 'Microsoft.Extensions.DependencyInjection.Abstractions', 857 'version': '3.1.32'}, 858 {'name': 'Microsoft.Bcl.AsyncInterfaces', 'version': '7.0.0'}], 859 'url': 'https://www.nuget.org/packages/Microsoft.Extensions.DependencyInjection/'} 860 861 862 863### Obtain a list of packages data 864 865 866 867```python 868cran_libio.obtain_packages_data(['A3', 'AER', "NON_EXISTING_PACKAGE"]) 869``` 870 871 872 873 874 [{'name': 'A3', 875 'version': '1.0.0', 876 'dependencies': [{'name': 'R', 'version': None}, 877 {'name': 'randomForest', 'version': None}], 878 'url': 'https://cran.r-project.org/package=A3'}, 879 {'name': 'AER', 880 'version': '1.2-9', 881 'dependencies': [{'name': 'vars', 'version': '0.5.3'}, 882 {'name': 'urca', 'version': None}, 883 {'name': 'tseries', 'version': None}, 884 {'name': 'truncreg', 'version': None}, 885 {'name': 'systemfit', 'version': None}, 886 {'name': 'strucchange', 'version': None}, 887 {'name': 'scatterplot3d', 'version': '0.3.4'}, 888 {'name': 'sampleSelection', 'version': None}, 889 {'name': 'rugarch', 'version': None}, 890 {'name': 'ROCR', 'version': None}, 891 {'name': 'rgl', 'version': '0.109.2'}, 892 {'name': 'quantreg', 'version': '5.42.1'}, 893 {'name': 'pscl', 'version': '1.5.5'}, 894 {'name': 'plm', 'version': None}, 895 {'name': 'np', 'version': None}, 896 {'name': 'nnet', 'version': None}, 897 {'name': 'nlme', 'version': None}, 898 {'name': 'mlogit', 'version': None}, 899 {'name': 'MASS', 'version': None}, 900 {'name': 'longmemo', 'version': None}, 901 {'name': 'lattice', 'version': None}, 902 {'name': 'KernSmooth', 'version': None}, 903 {'name': 'ineq', 'version': None}, 904 {'name': 'foreign', 'version': None}, 905 {'name': 'forecast', 'version': '8.17.0'}, 906 {'name': 'fGarch', 'version': '3042.83.2'}, 907 {'name': 'effects', 'version': None}, 908 {'name': 'dynlm', 'version': None}, 909 {'name': 'boot', 'version': None}, 910 {'name': 'Formula', 'version': None}, 911 {'name': 'stats', 'version': None}, 912 {'name': 'zoo', 'version': None}, 913 {'name': 'survival', 'version': None}, 914 {'name': 'sandwich', 'version': None}, 915 {'name': 'lmtest', 'version': None}, 916 {'name': 'car', 'version': None}, 917 {'name': 'R', 'version': None}], 918 'url': 'https://cran.r-project.org/package=AER'}] 919 920'''