olivia_finder.data_source

Description

This package is responsible for providing a base data structure for all the derivated classes whose purpose is the obtaining data from a specific source.

It is composed of several modules:

Package structure:
├── csv_ds.py
├── data_source.py
├── librariesio_ds.py
├── repository_scrapers
│   ├── bioconductor.py
│   ├── cran.py
│   ├── npm.py
│   ├── pypi.py
│   └── r.py
└── scraper_ds.py
Package modules:
  • data_source.py

    Implements the abstract class Datasource, which is the base class of the rest of the implementations

  • csv_ds.py

    Implement datasource for *.csv files

  • librariesio_ds.py

    Implements datasource for the API of Libraries.io

  • scraper_ds.py

    Implements the abstract class ScraperDataSource, which is the base class of customized implementations for each repository

  • repository_scraper/

    Inside there are several implementations based on Datasource web Scraping for Cran, Bioconductor, NPM and PyPI

Doc pages

For more info see data_source package docs

Web Scraping-Based implementations

Constructor

  • The default constructor does not receive parameters

  • The number of optional parameters depends on the implementation, but as a rule we can define a name and a description (With the purpose of offering information)

  • The most relevant parameter is the RequestHandler object, which will use by the webscraping based DataSource to make requests to the website to which it refers

Implementation for CRAN

from olivia_finder.data_source.repository_scrapers.cran import CranScraper
cran_ds = CranScraper()

Implementation for Bioconductor

from olivia_finder.data_source.repository_scrapers.bioconductor import BioconductorScraper
bioconductor_scraper = BioconductorScraper()

Implementation for PyPi

from olivia_finder.data_source.repository_scrapers.pypi import PypiScraper
pypi_scraper = PypiScraper()

Implementation for NPM

from olivia_finder.data_source.repository_scrapers.npm import NpmScraper
npm_scraper = NpmScraper()

Github repository implementation

from olivia_finder.data_source.repository_scrapers.github import GithubScraper
github_scraper = GithubScraper()

Obtain package names

CRAN package names

cran_ds.obtain_package_names()[:10]
['A3',
 'AalenJohansen',
 'AATtools',
 'ABACUS',
 'abbreviate',
 'abbyyR',
 'abc',
 'abc.data',
 'ABC.RAP',
 'ABCanalysis']

Bioconductor package names

bioconductor_scraper.obtain_package_names()[:10]
['ABSSeq',
 'ABarray',
 'ACE',
 'ACME',
 'ADAM',
 'ADAMgui',
 'ADImpute',
 'ADaCGH2',
 'AGDEX',
 'AHMassBank']

PyPi package names

pypi_scraper.obtain_package_names()[:10]
['0',
 '0-._.-._.-._.-._.-._.-._.-0',
 '000',
 '0.0.1',
 '00101s',
 '00print_lol',
 '00SMALINUX',
 '0101',
 '01changer',
 '01d61084-d29e-11e9-96d1-7c5cf84ffe8e']

NPM package names

Note:

  • This process is very expensive, the implementation is functional but its use is not recommended unless it is necessary
  • It is recommended to import the list of npm packets properctioned as a txt file

Output folder can be configured in config.ini file working_dir

# npm_scraper.obtain_package_names(
#     page_size=100,                          # Number of packages to obtain per request
#     save_chunks=True,                       # Save packages in a chunk file
#     show_progress_bar=True                  # Show progress bar
# )[:10]

The file with the NPM package list is the following

!wc -l ../results/package_lists/npm_packages.txt
wc: ../results/package_lists/npm_packages.txt: No existe el archivo o el directorio
!tail -n 20 results/package_lists/npm_packages.txt
tail: no se puede abrir 'results/package_lists/npm_packages.txt' para lectura: No existe el archivo o el directorio

Obtain package data

CRAN data of A3 package

cran_ds.obtain_package_data('A3')
{'name': 'A3',
 'version': '1.0.0',
 'dependencies': [{'name': 'R', 'version': '≥ 2.15.0'},
  {'name': 'xtable', 'version': ''},
  {'name': 'pbapply', 'version': ''}],
 'url': 'https://cran.r-project.org/package=A3'}

If the petition fails we will obtain None

non_existent_package = cran_ds.obtain_package_data('NON_EXISTENT_PACKAGE')
print(non_existent_package)
None

Bioconductor data of a4 package

bioconductor_scraper.obtain_package_data('a4')
{'name': 'a4',
 'version': '1.48.0',
 'dependencies': [{'name': 'a4Base', 'version': ''},
  {'name': 'a4Preproc', 'version': ''},
  {'name': 'a4Classif', 'version': ''},
  {'name': 'a4Core', 'version': ''},
  {'name': 'a4Reporting', 'version': ''}],
 'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4.html'}

PyPi data od networkx package

pypi_scraper.obtain_package_data('networkx')
{'name': 'networkx',
 'version': '3.1',
 'url': 'https://pypi.org/project/networkx/',
 'dependencies': [{'name': 'numpy', 'version': None},
  {'name': 'scipy', 'version': None},
  {'name': 'matplotlib', 'version': None},
  {'name': 'pandas', 'version': None},
  {'name': 'pre', 'version': None},
  {'name': 'mypy', 'version': None},
  {'name': 'sphinx', 'version': None},
  {'name': 'pydata', 'version': None},
  {'name': 'numpydoc', 'version': None},
  {'name': 'pillow', 'version': None},
  {'name': 'nb2plots', 'version': None},
  {'name': 'texext', 'version': None},
  {'name': 'lxml', 'version': None},
  {'name': 'pygraphviz', 'version': None},
  {'name': 'pydot', 'version': None},
  {'name': 'sympy', 'version': None},
  {'name': 'pytest', 'version': None},
  {'name': 'codecov', 'version': None}]}

NPM data of aws-sdk package

npm_scraper.obtain_package_data('aws-sdk')
{'name': 'aws-sdk',
 'version': '2.1406.0',
 'dependencies': [{'name': 'buffer', 'version': '4.9.2'},
  {'name': 'events', 'version': '1.1.1'},
  {'name': 'ieee754', 'version': '1.1.13'},
  {'name': 'jmespath', 'version': '0.16.0'},
  {'name': 'querystring', 'version': '0.2.0'},
  {'name': 'sax', 'version': '1.2.1'},
  {'name': 'url', 'version': '0.10.3'},
  {'name': 'util', 'version': '^0.12.4'},
  {'name': 'uuid', 'version': '8.0.0'},
  {'name': 'xml2js', 'version': '0.5.0'},
  {'name': '@types/node', 'version': '6.0.92'},
  {'name': 'browserify', 'version': '13.1.0'},
  {'name': 'chai', 'version': '^3.0'},
  {'name': 'codecov', 'version': '^3.8.2'},
  {'name': 'coffeeify', 'version': '*'},
  {'name': 'coffeescript', 'version': '^1.12.7'},
  {'name': 'cucumber', 'version': '0.5.x'},
  {'name': 'eslint', 'version': '^5.8.0'},
  {'name': 'hash-test-vectors', 'version': '^1.3.2'},
  {'name': 'insert-module-globals', 'version': '^7.0.0'},
  {'name': 'istanbul', 'version': '*'},
  {'name': 'jasmine', 'version': '^2.5.3'},
  {'name': 'jasmine-core', 'version': '^2.5.2'},
  {'name': 'json-loader', 'version': '^0.5.4'},
  {'name': 'karma', 'version': '^4.1.0'},
  {'name': 'karma-chrome-launcher', 'version': '2.2.0'},
  {'name': 'karma-jasmine', 'version': '^1.1.0'},
  {'name': 'mocha', 'version': '^3.0.0'},
  {'name': 'repl.history', 'version': '*'},
  {'name': 'semver', 'version': '*'},
  {'name': 'typescript', 'version': '2.0.8'},
  {'name': 'uglify-js', 'version': '2.x'},
  {'name': 'webpack', 'version': '^1.15.0'}],
 'url': 'https://www.npmjs.com/package/aws-sdk'}

Obtain a list of packages data

CRAN data for the packages A3, AER y a non existent package

cran_ds.obtain_packages_data(['A3', 'AER', "NON_EXISTING_PACKAGE"])
([{'name': 'A3',
   'version': '1.0.0',
   'dependencies': [{'name': 'R', 'version': '≥ 2.15.0'},
    {'name': 'xtable', 'version': ''},
    {'name': 'pbapply', 'version': ''}],
   'url': 'https://cran.r-project.org/package=A3'},
  {'name': 'AER',
   'version': '1.2-10',
   'dependencies': [{'name': 'R', 'version': '≥ 3.0.0'},
    {'name': 'car', 'version': '≥ 2.0-19'},
    {'name': 'lmtest', 'version': ''},
    {'name': 'sandwich', 'version': '≥ 2.4-0'},
    {'name': 'survival', 'version': '≥ 2.37-5'},
    {'name': 'zoo', 'version': ''},
    {'name': 'stats', 'version': ''},
    {'name': 'Formula', 'version': '≥ 0.2-0'}],
   'url': 'https://cran.r-project.org/package=AER'}],
 ['NON_EXISTING_PACKAGE'])

Bioconductor data for the packages TDARACNE, ASICS and a non existent package

from tqdm import tqdm

bioconductor_scraper.obtain_packages_data(
    package_names=['a4', 'a4Preproc', 'a4Classif', 'a4Core', 'a4Base'],
    progress_bar=tqdm(total=5)
)
100%|██████████| 5/5 [00:00<00:00,  5.79it/s]





([{'name': 'a4',
   'version': '1.48.0',
   'dependencies': [{'name': 'a4Base', 'version': ''},
    {'name': 'a4Preproc', 'version': ''},
    {'name': 'a4Classif', 'version': ''},
    {'name': 'a4Core', 'version': ''},
    {'name': 'a4Reporting', 'version': ''}],
   'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4.html'},
  {'name': 'a4Preproc',
   'version': '1.48.0',
   'dependencies': [{'name': 'BiocGenerics', 'version': ''},
    {'name': 'Biobase', 'version': ''}],
   'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4Preproc.html'},
  {'name': 'a4Classif',
   'version': '1.48.0',
   'dependencies': [{'name': 'a4Core', 'version': ''},
    {'name': 'a4Preproc', 'version': ''},
    {'name': 'methods', 'version': ''},
    {'name': 'Biobase', 'version': ''},
    {'name': 'ROCR', 'version': ''},
    {'name': 'pamr', 'version': ''},
    {'name': 'glmnet', 'version': ''},
    {'name': 'varSelRF', 'version': ''},
    {'name': 'utils', 'version': ''},
    {'name': 'graphics', 'version': ''},
    {'name': 'stats', 'version': ''}],
   'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4Classif.html'},
  {'name': 'a4Core',
   'version': '1.48.0',
   'dependencies': [{'name': 'Biobase', 'version': ''},
    {'name': 'glmnet', 'version': ''},
    {'name': 'methods', 'version': ''},
    {'name': 'stats', 'version': ''}],
   'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4Core.html'},
  {'name': 'a4Base',
   'version': '1.48.0',
   'dependencies': [{'name': 'a4Preproc', 'version': ''},
    {'name': 'a4Core', 'version': ''},
    {'name': 'methods', 'version': ''},
    {'name': 'graphics', 'version': ''},
    {'name': 'grid', 'version': ''},
    {'name': 'Biobase', 'version': ''},
    {'name': 'annaffy', 'version': ''},
    {'name': 'mpm', 'version': ''},
    {'name': 'genefilter', 'version': ''},
    {'name': 'limma', 'version': ''},
    {'name': 'multtest', 'version': ''},
    {'name': 'glmnet', 'version': ''},
    {'name': 'gplots', 'version': ''}],
   'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4Base.html'}],
 [])
pypi_scraper.obtain_packages_data(
    ['networkx', 'requests', "tqdm", "NON_EXISTING_PACKAGE"])
([{'name': 'networkx',
   'version': '3.1',
   'url': 'https://pypi.org/project/networkx/',
   'dependencies': [{'name': 'numpy', 'version': None},
    {'name': 'scipy', 'version': None},
    {'name': 'matplotlib', 'version': None},
    {'name': 'pandas', 'version': None},
    {'name': 'pre', 'version': None},
    {'name': 'mypy', 'version': None},
    {'name': 'sphinx', 'version': None},
    {'name': 'pydata', 'version': None},
    {'name': 'numpydoc', 'version': None},
    {'name': 'pillow', 'version': None},
    {'name': 'nb2plots', 'version': None},
    {'name': 'texext', 'version': None},
    {'name': 'lxml', 'version': None},
    {'name': 'pygraphviz', 'version': None},
    {'name': 'pydot', 'version': None},
    {'name': 'sympy', 'version': None},
    {'name': 'pytest', 'version': None},
    {'name': 'codecov', 'version': None}]},
  {'name': 'requests',
   'version': '2.31.0',
   'url': 'https://pypi.org/project/requests/',
   'dependencies': [{'name': 'charset', 'version': None},
    {'name': 'idna', 'version': None},
    {'name': 'urllib3', 'version': None},
    {'name': 'certifi', 'version': None},
    {'name': 'PySocks', 'version': None},
    {'name': 'chardet', 'version': None}]},
  {'name': 'tqdm',
   'version': '4.65.0',
   'url': 'https://pypi.org/project/tqdm/',
   'dependencies': [{'name': 'colorama', 'version': None},
    {'name': 'py', 'version': None},
    {'name': 'twine', 'version': None},
    {'name': 'wheel', 'version': None},
    {'name': 'ipywidgets', 'version': None},
    {'name': 'slack', 'version': None},
    {'name': 'requests', 'version': None}]}],
 ['NON_EXISTING_PACKAGE'])
npm_scraper.obtain_packages_data(
    ['aws-sdk', 'request', "NON_EXISTING_PACKAGE"])
([{'name': 'aws-sdk',
   'version': '2.1406.0',
   'dependencies': [{'name': 'buffer', 'version': '4.9.2'},
    {'name': 'events', 'version': '1.1.1'},
    {'name': 'ieee754', 'version': '1.1.13'},
    {'name': 'jmespath', 'version': '0.16.0'},
    {'name': 'querystring', 'version': '0.2.0'},
    {'name': 'sax', 'version': '1.2.1'},
    {'name': 'url', 'version': '0.10.3'},
    {'name': 'util', 'version': '^0.12.4'},
    {'name': 'uuid', 'version': '8.0.0'},
    {'name': 'xml2js', 'version': '0.5.0'},
    {'name': '@types/node', 'version': '6.0.92'},
    {'name': 'browserify', 'version': '13.1.0'},
    {'name': 'chai', 'version': '^3.0'},
    {'name': 'codecov', 'version': '^3.8.2'},
    {'name': 'coffeeify', 'version': '*'},
    {'name': 'coffeescript', 'version': '^1.12.7'},
    {'name': 'cucumber', 'version': '0.5.x'},
    {'name': 'eslint', 'version': '^5.8.0'},
    {'name': 'hash-test-vectors', 'version': '^1.3.2'},
    {'name': 'insert-module-globals', 'version': '^7.0.0'},
    {'name': 'istanbul', 'version': '*'},
    {'name': 'jasmine', 'version': '^2.5.3'},
    {'name': 'jasmine-core', 'version': '^2.5.2'},
    {'name': 'json-loader', 'version': '^0.5.4'},
    {'name': 'karma', 'version': '^4.1.0'},
    {'name': 'karma-chrome-launcher', 'version': '2.2.0'},
    {'name': 'karma-jasmine', 'version': '^1.1.0'},
    {'name': 'mocha', 'version': '^3.0.0'},
    {'name': 'repl.history', 'version': '*'},
    {'name': 'semver', 'version': '*'},
    {'name': 'typescript', 'version': '2.0.8'},
    {'name': 'uglify-js', 'version': '2.x'},
    {'name': 'webpack', 'version': '^1.15.0'}],
   'url': 'https://www.npmjs.com/package/aws-sdk'},
  {'name': 'request',
   'version': '2.88.2',
   'dependencies': [{'name': 'aws-sign2', 'version': '~0.7.0'},
    {'name': 'aws4', 'version': '^1.8.0'},
    {'name': 'caseless', 'version': '~0.12.0'},
    {'name': 'combined-stream', 'version': '~1.0.6'},
    {'name': 'extend', 'version': '~3.0.2'},
    {'name': 'forever-agent', 'version': '~0.6.1'},
    {'name': 'form-data', 'version': '~2.3.2'},
    {'name': 'har-validator', 'version': '~5.1.3'},
    {'name': 'http-signature', 'version': '~1.2.0'},
    {'name': 'is-typedarray', 'version': '~1.0.0'},
    {'name': 'isstream', 'version': '~0.1.2'},
    {'name': 'json-stringify-safe', 'version': '~5.0.1'},
    {'name': 'mime-types', 'version': '~2.1.19'},
    {'name': 'oauth-sign', 'version': '~0.9.0'},
    {'name': 'performance-now', 'version': '^2.1.0'},
    {'name': 'qs', 'version': '~6.5.2'},
    {'name': 'safe-buffer', 'version': '^5.1.2'},
    {'name': 'tough-cookie', 'version': '~2.5.0'},
    {'name': 'tunnel-agent', 'version': '^0.6.0'},
    {'name': 'uuid', 'version': '^3.3.2'},
    {'name': 'bluebird', 'version': '^3.2.1'},
    {'name': 'browserify', 'version': '^13.0.1'},
    {'name': 'browserify-istanbul', 'version': '^2.0.0'},
    {'name': 'buffer-equal', 'version': '^1.0.0'},
    {'name': 'codecov', 'version': '^3.0.4'},
    {'name': 'coveralls', 'version': '^3.0.2'},
    {'name': 'function-bind', 'version': '^1.0.2'},
    {'name': 'karma', 'version': '^3.0.0'},
    {'name': 'karma-browserify', 'version': '^5.0.1'},
    {'name': 'karma-cli', 'version': '^1.0.0'},
    {'name': 'karma-coverage', 'version': '^1.0.0'},
    {'name': 'karma-phantomjs-launcher', 'version': '^1.0.0'},
    {'name': 'karma-tap', 'version': '^3.0.1'},
    {'name': 'nyc', 'version': '^14.1.1'},
    {'name': 'phantomjs-prebuilt', 'version': '^2.1.3'},
    {'name': 'rimraf', 'version': '^2.2.8'},
    {'name': 'server-destroy', 'version': '^1.0.1'},
    {'name': 'standard', 'version': '^9.0.0'},
    {'name': 'tape', 'version': '^4.6.0'},
    {'name': 'taper', 'version': '^0.5.0'}],
   'url': 'https://www.npmjs.com/package/request'}],
 ['NON_EXISTING_PACKAGE'])

CSV-Based implementation

Constructor

from olivia_finder.data_source.csv_ds import CSVDataSource
bioconductor_csv = CSVDataSource(
    "aux_data/bioconductor_adjlist_test.csv",   # Path to the CSV file
    # Name of the field that contains the dependencies
    dependent_field="name",
    # Name of the field that contains the name of the package
    dependency_field="dependency",
    # Name of the field that contains the version of the package
    dependent_version_field="version",
    # Name of the field that contains the version of the dependency
    dependency_version_field="dependency_version",
    # Name of the field that contains the URL of the package
    dependent_url_field="url",
)

Obtain package data

bioconductor_csv.obtain_package_data('BANDITS')
{'name': 'BANDITS',
 'version': '1.16.0',
 'url': 'https://www.bioconductor.org/packages/release/bioc/html/BANDITS.html',
 'dependencies': [{'name': 'R', 'version': nan},
  {'name': 'Rcpp', 'version': nan},
  {'name': 'doRNG', 'version': nan},
  {'name': 'MASS', 'version': nan},
  {'name': 'data.table', 'version': nan},
  {'name': 'R.utils', 'version': nan},
  {'name': 'doParallel', 'version': nan},
  {'name': 'parallel', 'version': nan},
  {'name': 'foreach', 'version': nan},
  {'name': 'methods', 'version': nan},
  {'name': 'stats', 'version': nan},
  {'name': 'graphics', 'version': nan},
  {'name': 'ggplot2', 'version': nan},
  {'name': 'DRIMSeq', 'version': '1.28.0'},
  {'name': 'BiocParallel', 'version': '1.34.0'}]}

Obtain a list of packages data

bioconductor_csv.obtain_packages_data(
    ['BANDITS', 'ASICS', "NON_EXISTING_PACKAGE"])
([{'name': 'BANDITS',
   'version': '1.16.0',
   'url': 'https://www.bioconductor.org/packages/release/bioc/html/BANDITS.html',
   'dependencies': [{'name': 'R', 'version': nan},
    {'name': 'Rcpp', 'version': nan},
    {'name': 'doRNG', 'version': nan},
    {'name': 'MASS', 'version': nan},
    {'name': 'data.table', 'version': nan},
    {'name': 'R.utils', 'version': nan},
    {'name': 'doParallel', 'version': nan},
    {'name': 'parallel', 'version': nan},
    {'name': 'foreach', 'version': nan},
    {'name': 'methods', 'version': nan},
    {'name': 'stats', 'version': nan},
    {'name': 'graphics', 'version': nan},
    {'name': 'ggplot2', 'version': nan},
    {'name': 'DRIMSeq', 'version': '1.28.0'},
    {'name': 'BiocParallel', 'version': '1.34.0'}]},
  {'name': 'ASICS',
   'version': '2.16.0',
   'url': 'https://www.bioconductor.org/packages/release/bioc/html/ASICS.html',
   'dependencies': [{'name': 'R', 'version': nan},
    {'name': 'BiocParallel', 'version': '1.34.0'},
    {'name': 'ggplot2', 'version': nan},
    {'name': 'glmnet', 'version': nan},
    {'name': 'grDevices', 'version': nan},
    {'name': 'gridExtra', 'version': nan},
    {'name': 'methods', 'version': nan},
    {'name': 'mvtnorm', 'version': nan},
    {'name': 'PepsNMR', 'version': '1.18.0'},
    {'name': 'plyr', 'version': nan},
    {'name': 'quadprog', 'version': nan},
    {'name': 'ropls', 'version': '1.32.0'},
    {'name': 'stats', 'version': nan},
    {'name': 'SummarizedExperiment', 'version': '1.30.0'},
    {'name': 'utils', 'version': nan},
    {'name': 'Matrix', 'version': nan},
    {'name': 'zoo', 'version': nan}]}],
 ['NON_EXISTING_PACKAGE'])

Web API-Based implementation (Libraries.io API)

Based on the Web API of Libraries.io we can obtain data from this source.

It is important to note that the data is not updated as a mandatory point to care about

Constructor

In this case, it is necessary to define the API Key of Libraries.io in the _config.ini_ file

from olivia_finder.data_source.librariesio_ds import LibrariesioDataSource

pypi_libio  = LibrariesioDataSource(platform="pypi")
nuget_libio = LibrariesioDataSource(platform="nuget")
cran_libio  = LibrariesioDataSource(platform="cran")

Obtain package names

This functionality has not been implemented because there is no way to get this data through the API

The library used to access API from Python has a search functionality but unfortunately it cannot be used efficiently for this task

# Set the apikey as an environment variable
from pybraries.search import Search

search = Search()
info = search.project_search(platform='pypi')

for project in info:
    print(project['name'])
A string of keywords must be passed as a keyword argument
typescript
@types/node
eslint
webpack
prettier
@types/jest
@types/react
@babel/preset-typescript
@babel/runtime
jest
rxjs
postcss
vue-template-compiler
vue
axios
requests
moment
@types/react-dom
@types/mocha
babel-runtime
babel-preset-react
@babel/core
babel-core
@babel/preset-env
@babel/plugin-proposal-class-properties
@babel/plugin-transform-runtime
@babel/preset-react
babel-jest
commander
rollup

Obtain package data

pypi_libio.obtain_package_data('networkx')
{'name': 'networkx',
 'version': '3.1rc0',
 'dependencies': [{'name': 'codecov', 'version': '2.1.13'},
  {'name': 'pytest-cov', 'version': '4.0.0'},
  {'name': 'pytest', 'version': '7.4.0'},
  {'name': 'sympy', 'version': '1.11.1'},
  {'name': 'pydot', 'version': '0.9.10'},
  {'name': 'pygraphviz', 'version': '1.3.1'},
  {'name': 'lxml', 'version': '4.9.2'},
  {'name': 'texext', 'version': '0.6.7'},
  {'name': 'nb2plots', 'version': '0.6.1'},
  {'name': 'pillow', 'version': '9.5.0'},
  {'name': 'numpydoc', 'version': '1.5.0'},
  {'name': 'sphinx-gallery', 'version': '0.13.0'},
  {'name': 'pydata-sphinx-theme', 'version': '0.13.3'},
  {'name': 'sphinx', 'version': '7.0.1'},
  {'name': 'mypy', 'version': '1.4.1'},
  {'name': 'pre-commit', 'version': '3.3.3'},
  {'name': 'pandas', 'version': '2.0.1'},
  {'name': 'matplotlib', 'version': '3.7.1'},
  {'name': 'scipy', 'version': '1.11.0'},
  {'name': 'numpy', 'version': '1.25.0'}],
 'url': 'https://pypi.org/project/networkx/'}
nuget_libio.obtain_package_data('Microsoft.Extensions.DependencyInjection')
{'name': 'Microsoft.Extensions.DependencyInjection',
 'version': '8.0.0-preview.5.23280.8',
 'dependencies': [{'name': 'System.Threading.Tasks.Extensions',
   'version': '4.5.4'},
  {'name': 'Microsoft.Extensions.DependencyInjection.Abstractions',
   'version': '3.1.32'},
  {'name': 'Microsoft.Bcl.AsyncInterfaces', 'version': '7.0.0'}],
 'url': 'https://www.nuget.org/packages/Microsoft.Extensions.DependencyInjection/'}

Obtain a list of packages data

cran_libio.obtain_packages_data(['A3', 'AER', "NON_EXISTING_PACKAGE"])
[{'name': 'A3',
  'version': '1.0.0',
  'dependencies': [{'name': 'R', 'version': None},
   {'name': 'randomForest', 'version': None}],
  'url': 'https://cran.r-project.org/package=A3'},
 {'name': 'AER',
  'version': '1.2-9',
  'dependencies': [{'name': 'vars', 'version': '0.5.3'},
   {'name': 'urca', 'version': None},
   {'name': 'tseries', 'version': None},
   {'name': 'truncreg', 'version': None},
   {'name': 'systemfit', 'version': None},
   {'name': 'strucchange', 'version': None},
   {'name': 'scatterplot3d', 'version': '0.3.4'},
   {'name': 'sampleSelection', 'version': None},
   {'name': 'rugarch', 'version': None},
   {'name': 'ROCR', 'version': None},
   {'name': 'rgl', 'version': '0.109.2'},
   {'name': 'quantreg', 'version': '5.42.1'},
   {'name': 'pscl', 'version': '1.5.5'},
   {'name': 'plm', 'version': None},
   {'name': 'np', 'version': None},
   {'name': 'nnet', 'version': None},
   {'name': 'nlme', 'version': None},
   {'name': 'mlogit', 'version': None},
   {'name': 'MASS', 'version': None},
   {'name': 'longmemo', 'version': None},
   {'name': 'lattice', 'version': None},
   {'name': 'KernSmooth', 'version': None},
   {'name': 'ineq', 'version': None},
   {'name': 'foreign', 'version': None},
   {'name': 'forecast', 'version': '8.17.0'},
   {'name': 'fGarch', 'version': '3042.83.2'},
   {'name': 'effects', 'version': None},
   {'name': 'dynlm', 'version': None},
   {'name': 'boot', 'version': None},
   {'name': 'Formula', 'version': None},
   {'name': 'stats', 'version': None},
   {'name': 'zoo', 'version': None},
   {'name': 'survival', 'version': None},
   {'name': 'sandwich', 'version': None},
   {'name': 'lmtest', 'version': None},
   {'name': 'car', 'version': None},
   {'name': 'R', 'version': None}],
  'url': 'https://cran.r-project.org/package=AER'}]
  1'''
  2
  3## Description
  4
  5
  6This package is responsible for providing a base data structure for all the derivated classes whose purpose is the obtaining data from a specific source.
  7
  8It is composed of several modules:
  9
 10##### Package structure:
 11
 12```data_source
 13├── csv_ds.py
 14├── data_source.py
 15├── librariesio_ds.py
 16├── repository_scrapers
 17│   ├── bioconductor.py
 18│   ├── cran.py
 19│   ├── npm.py
 20│   ├── pypi.py
 21│   └── r.py
 22└── scraper_ds.py
 23```
 24
 25##### Package modules:
 26
 27- **data_source.py**
 28
 29  Implements the abstract class Datasource, which is the base class of the rest of the implementations
 30
 31- **csv_ds.py**
 32
 33  Implement datasource for \*.csv files
 34
 35- **librariesio_ds.py**
 36
 37  Implements datasource for the API of Libraries.io
 38
 39- **scraper_ds.py**
 40
 41  Implements the abstract class ScraperDataSource, which is the base class of customized implementations for each repository
 42
 43- **repository_scraper/**
 44
 45  Inside there are several implementations based on Datasource web Scraping for Cran, Bioconductor, NPM and PyPI
 46
 47##### Doc pages
 48
 49For more info see [data_source package docs](https://dab0012.github.io/olivia-finder/olivia_finder/data_source/data_source_module.html)
 50
 51
 52
 53## Web Scraping-Based implementations
 54
 55
 56### Constructor
 57
 58
 59- The default constructor does not receive parameters
 60
 61- The number of optional parameters depends on the implementation, but as a rule we can define a name and a description (With the purpose of offering information)
 62
 63- The most relevant parameter is the RequestHandler object, which will use by the webscraping based DataSource to make requests to the website to which it refers
 64
 65
 66Implementation for CRAN
 67
 68
 69
 70```python
 71from olivia_finder.data_source.repository_scrapers.cran import CranScraper
 72cran_ds = CranScraper()
 73```
 74
 75Implementation for Bioconductor
 76
 77
 78
 79```python
 80from olivia_finder.data_source.repository_scrapers.bioconductor import BioconductorScraper
 81bioconductor_scraper = BioconductorScraper()
 82```
 83
 84Implementation for PyPi
 85
 86
 87
 88```python
 89from olivia_finder.data_source.repository_scrapers.pypi import PypiScraper
 90pypi_scraper = PypiScraper()
 91```
 92
 93Implementation for NPM
 94
 95
 96
 97```python
 98from olivia_finder.data_source.repository_scrapers.npm import NpmScraper
 99npm_scraper = NpmScraper()
100```
101
102Github repository implementation
103
104
105```python
106from olivia_finder.data_source.repository_scrapers.github import GithubScraper
107github_scraper = GithubScraper()
108```
109
110### Obtain package names
111
112
113CRAN package names
114
115
116
117```python
118cran_ds.obtain_package_names()[:10]
119```
120
121
122
123
124    ['A3',
125     'AalenJohansen',
126     'AATtools',
127     'ABACUS',
128     'abbreviate',
129     'abbyyR',
130     'abc',
131     'abc.data',
132     'ABC.RAP',
133     'ABCanalysis']
134
135
136
137Bioconductor package names
138
139
140
141```python
142bioconductor_scraper.obtain_package_names()[:10]
143```
144
145
146
147
148    ['ABSSeq',
149     'ABarray',
150     'ACE',
151     'ACME',
152     'ADAM',
153     'ADAMgui',
154     'ADImpute',
155     'ADaCGH2',
156     'AGDEX',
157     'AHMassBank']
158
159
160
161PyPi package names
162
163
164
165```python
166pypi_scraper.obtain_package_names()[:10]
167```
168
169
170
171
172    ['0',
173     '0-._.-._.-._.-._.-._.-._.-0',
174     '000',
175     '0.0.1',
176     '00101s',
177     '00print_lol',
178     '00SMALINUX',
179     '0101',
180     '01changer',
181     '01d61084-d29e-11e9-96d1-7c5cf84ffe8e']
182
183
184
185NPM package names
186
187<span style="color: red">Note:</span>
188
189- This process is very expensive, the implementation is functional but its use is not recommended unless it is necessary
190- It is recommended to import the list of npm packets properctioned as a txt file
191
192Output folder can be configured in `config.ini` file `working_dir`
193
194
195
196```python
197# npm_scraper.obtain_package_names(
198#     page_size=100,                          # Number of packages to obtain per request
199#     save_chunks=True,                       # Save packages in a chunk file
200#     show_progress_bar=True                  # Show progress bar
201# )[:10]
202```
203
204The file with the NPM package list is the following
205
206
207
208```python
209!wc -l ../results/package_lists/npm_packages.txt
210```
211
212    wc: ../results/package_lists/npm_packages.txt: No existe el archivo o el directorio
213
214
215
216```python
217!tail -n 20 results/package_lists/npm_packages.txt
218```
219
220    tail: no se puede abrir 'results/package_lists/npm_packages.txt' para lectura: No existe el archivo o el directorio
221
222
223### Obtain package data
224
225
226CRAN data of A3 package
227
228
229
230```python
231cran_ds.obtain_package_data('A3')
232```
233
234
235
236
237    {'name': 'A3',
238     'version': '1.0.0',
239     'dependencies': [{'name': 'R', 'version': '≥ 2.15.0'},
240      {'name': 'xtable', 'version': ''},
241      {'name': 'pbapply', 'version': ''}],
242     'url': 'https://cran.r-project.org/package=A3'}
243
244
245
246If the petition fails we will obtain None
247
248
249
250```python
251non_existent_package = cran_ds.obtain_package_data('NON_EXISTENT_PACKAGE')
252print(non_existent_package)
253```
254
255    None
256
257
258Bioconductor data of a4 package
259
260
261
262```python
263bioconductor_scraper.obtain_package_data('a4')
264```
265
266
267
268
269    {'name': 'a4',
270     'version': '1.48.0',
271     'dependencies': [{'name': 'a4Base', 'version': ''},
272      {'name': 'a4Preproc', 'version': ''},
273      {'name': 'a4Classif', 'version': ''},
274      {'name': 'a4Core', 'version': ''},
275      {'name': 'a4Reporting', 'version': ''}],
276     'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4.html'}
277
278
279
280PyPi data od networkx package
281
282
283
284```python
285pypi_scraper.obtain_package_data('networkx')
286```
287
288
289
290
291    {'name': 'networkx',
292     'version': '3.1',
293     'url': 'https://pypi.org/project/networkx/',
294     'dependencies': [{'name': 'numpy', 'version': None},
295      {'name': 'scipy', 'version': None},
296      {'name': 'matplotlib', 'version': None},
297      {'name': 'pandas', 'version': None},
298      {'name': 'pre', 'version': None},
299      {'name': 'mypy', 'version': None},
300      {'name': 'sphinx', 'version': None},
301      {'name': 'pydata', 'version': None},
302      {'name': 'numpydoc', 'version': None},
303      {'name': 'pillow', 'version': None},
304      {'name': 'nb2plots', 'version': None},
305      {'name': 'texext', 'version': None},
306      {'name': 'lxml', 'version': None},
307      {'name': 'pygraphviz', 'version': None},
308      {'name': 'pydot', 'version': None},
309      {'name': 'sympy', 'version': None},
310      {'name': 'pytest', 'version': None},
311      {'name': 'codecov', 'version': None}]}
312
313
314
315NPM data of aws-sdk package
316
317
318
319```python
320npm_scraper.obtain_package_data('aws-sdk')
321```
322
323
324
325
326    {'name': 'aws-sdk',
327     'version': '2.1406.0',
328     'dependencies': [{'name': 'buffer', 'version': '4.9.2'},
329      {'name': 'events', 'version': '1.1.1'},
330      {'name': 'ieee754', 'version': '1.1.13'},
331      {'name': 'jmespath', 'version': '0.16.0'},
332      {'name': 'querystring', 'version': '0.2.0'},
333      {'name': 'sax', 'version': '1.2.1'},
334      {'name': 'url', 'version': '0.10.3'},
335      {'name': 'util', 'version': '^0.12.4'},
336      {'name': 'uuid', 'version': '8.0.0'},
337      {'name': 'xml2js', 'version': '0.5.0'},
338      {'name': '@types/node', 'version': '6.0.92'},
339      {'name': 'browserify', 'version': '13.1.0'},
340      {'name': 'chai', 'version': '^3.0'},
341      {'name': 'codecov', 'version': '^3.8.2'},
342      {'name': 'coffeeify', 'version': '*'},
343      {'name': 'coffeescript', 'version': '^1.12.7'},
344      {'name': 'cucumber', 'version': '0.5.x'},
345      {'name': 'eslint', 'version': '^5.8.0'},
346      {'name': 'hash-test-vectors', 'version': '^1.3.2'},
347      {'name': 'insert-module-globals', 'version': '^7.0.0'},
348      {'name': 'istanbul', 'version': '*'},
349      {'name': 'jasmine', 'version': '^2.5.3'},
350      {'name': 'jasmine-core', 'version': '^2.5.2'},
351      {'name': 'json-loader', 'version': '^0.5.4'},
352      {'name': 'karma', 'version': '^4.1.0'},
353      {'name': 'karma-chrome-launcher', 'version': '2.2.0'},
354      {'name': 'karma-jasmine', 'version': '^1.1.0'},
355      {'name': 'mocha', 'version': '^3.0.0'},
356      {'name': 'repl.history', 'version': '*'},
357      {'name': 'semver', 'version': '*'},
358      {'name': 'typescript', 'version': '2.0.8'},
359      {'name': 'uglify-js', 'version': '2.x'},
360      {'name': 'webpack', 'version': '^1.15.0'}],
361     'url': 'https://www.npmjs.com/package/aws-sdk'}
362
363
364
365### Obtain a list of packages data
366
367
368CRAN data for the packages A3, AER y a non existent package
369
370
371
372```python
373cran_ds.obtain_packages_data(['A3', 'AER', "NON_EXISTING_PACKAGE"])
374```
375
376
377
378
379    ([{'name': 'A3',
380       'version': '1.0.0',
381       'dependencies': [{'name': 'R', 'version': '≥ 2.15.0'},
382        {'name': 'xtable', 'version': ''},
383        {'name': 'pbapply', 'version': ''}],
384       'url': 'https://cran.r-project.org/package=A3'},
385      {'name': 'AER',
386       'version': '1.2-10',
387       'dependencies': [{'name': 'R', 'version': '≥ 3.0.0'},
388        {'name': 'car', 'version': '≥ 2.0-19'},
389        {'name': 'lmtest', 'version': ''},
390        {'name': 'sandwich', 'version': '≥ 2.4-0'},
391        {'name': 'survival', 'version': '≥ 2.37-5'},
392        {'name': 'zoo', 'version': ''},
393        {'name': 'stats', 'version': ''},
394        {'name': 'Formula', 'version': '≥ 0.2-0'}],
395       'url': 'https://cran.r-project.org/package=AER'}],
396     ['NON_EXISTING_PACKAGE'])
397
398
399
400Bioconductor data for the packages TDARACNE, ASICS and a non existent package
401
402
403
404```python
405from tqdm import tqdm
406
407bioconductor_scraper.obtain_packages_data(
408    package_names=['a4', 'a4Preproc', 'a4Classif', 'a4Core', 'a4Base'],
409    progress_bar=tqdm(total=5)
410)
411```
412
413    100%|██████████| 5/5 [00:00<00:00,  5.79it/s]
414
415
416
417
418
419    ([{'name': 'a4',
420       'version': '1.48.0',
421       'dependencies': [{'name': 'a4Base', 'version': ''},
422        {'name': 'a4Preproc', 'version': ''},
423        {'name': 'a4Classif', 'version': ''},
424        {'name': 'a4Core', 'version': ''},
425        {'name': 'a4Reporting', 'version': ''}],
426       'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4.html'},
427      {'name': 'a4Preproc',
428       'version': '1.48.0',
429       'dependencies': [{'name': 'BiocGenerics', 'version': ''},
430        {'name': 'Biobase', 'version': ''}],
431       'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4Preproc.html'},
432      {'name': 'a4Classif',
433       'version': '1.48.0',
434       'dependencies': [{'name': 'a4Core', 'version': ''},
435        {'name': 'a4Preproc', 'version': ''},
436        {'name': 'methods', 'version': ''},
437        {'name': 'Biobase', 'version': ''},
438        {'name': 'ROCR', 'version': ''},
439        {'name': 'pamr', 'version': ''},
440        {'name': 'glmnet', 'version': ''},
441        {'name': 'varSelRF', 'version': ''},
442        {'name': 'utils', 'version': ''},
443        {'name': 'graphics', 'version': ''},
444        {'name': 'stats', 'version': ''}],
445       'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4Classif.html'},
446      {'name': 'a4Core',
447       'version': '1.48.0',
448       'dependencies': [{'name': 'Biobase', 'version': ''},
449        {'name': 'glmnet', 'version': ''},
450        {'name': 'methods', 'version': ''},
451        {'name': 'stats', 'version': ''}],
452       'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4Core.html'},
453      {'name': 'a4Base',
454       'version': '1.48.0',
455       'dependencies': [{'name': 'a4Preproc', 'version': ''},
456        {'name': 'a4Core', 'version': ''},
457        {'name': 'methods', 'version': ''},
458        {'name': 'graphics', 'version': ''},
459        {'name': 'grid', 'version': ''},
460        {'name': 'Biobase', 'version': ''},
461        {'name': 'annaffy', 'version': ''},
462        {'name': 'mpm', 'version': ''},
463        {'name': 'genefilter', 'version': ''},
464        {'name': 'limma', 'version': ''},
465        {'name': 'multtest', 'version': ''},
466        {'name': 'glmnet', 'version': ''},
467        {'name': 'gplots', 'version': ''}],
468       'url': 'https://www.bioconductor.org/packages/release/bioc/html/a4Base.html'}],
469     [])
470
471
472
473
474```python
475pypi_scraper.obtain_packages_data(
476    ['networkx', 'requests', "tqdm", "NON_EXISTING_PACKAGE"])
477```
478
479
480
481
482    ([{'name': 'networkx',
483       'version': '3.1',
484       'url': 'https://pypi.org/project/networkx/',
485       'dependencies': [{'name': 'numpy', 'version': None},
486        {'name': 'scipy', 'version': None},
487        {'name': 'matplotlib', 'version': None},
488        {'name': 'pandas', 'version': None},
489        {'name': 'pre', 'version': None},
490        {'name': 'mypy', 'version': None},
491        {'name': 'sphinx', 'version': None},
492        {'name': 'pydata', 'version': None},
493        {'name': 'numpydoc', 'version': None},
494        {'name': 'pillow', 'version': None},
495        {'name': 'nb2plots', 'version': None},
496        {'name': 'texext', 'version': None},
497        {'name': 'lxml', 'version': None},
498        {'name': 'pygraphviz', 'version': None},
499        {'name': 'pydot', 'version': None},
500        {'name': 'sympy', 'version': None},
501        {'name': 'pytest', 'version': None},
502        {'name': 'codecov', 'version': None}]},
503      {'name': 'requests',
504       'version': '2.31.0',
505       'url': 'https://pypi.org/project/requests/',
506       'dependencies': [{'name': 'charset', 'version': None},
507        {'name': 'idna', 'version': None},
508        {'name': 'urllib3', 'version': None},
509        {'name': 'certifi', 'version': None},
510        {'name': 'PySocks', 'version': None},
511        {'name': 'chardet', 'version': None}]},
512      {'name': 'tqdm',
513       'version': '4.65.0',
514       'url': 'https://pypi.org/project/tqdm/',
515       'dependencies': [{'name': 'colorama', 'version': None},
516        {'name': 'py', 'version': None},
517        {'name': 'twine', 'version': None},
518        {'name': 'wheel', 'version': None},
519        {'name': 'ipywidgets', 'version': None},
520        {'name': 'slack', 'version': None},
521        {'name': 'requests', 'version': None}]}],
522     ['NON_EXISTING_PACKAGE'])
523
524
525
526
527```python
528npm_scraper.obtain_packages_data(
529    ['aws-sdk', 'request', "NON_EXISTING_PACKAGE"])
530```
531
532
533
534
535    ([{'name': 'aws-sdk',
536       'version': '2.1406.0',
537       'dependencies': [{'name': 'buffer', 'version': '4.9.2'},
538        {'name': 'events', 'version': '1.1.1'},
539        {'name': 'ieee754', 'version': '1.1.13'},
540        {'name': 'jmespath', 'version': '0.16.0'},
541        {'name': 'querystring', 'version': '0.2.0'},
542        {'name': 'sax', 'version': '1.2.1'},
543        {'name': 'url', 'version': '0.10.3'},
544        {'name': 'util', 'version': '^0.12.4'},
545        {'name': 'uuid', 'version': '8.0.0'},
546        {'name': 'xml2js', 'version': '0.5.0'},
547        {'name': '@types/node', 'version': '6.0.92'},
548        {'name': 'browserify', 'version': '13.1.0'},
549        {'name': 'chai', 'version': '^3.0'},
550        {'name': 'codecov', 'version': '^3.8.2'},
551        {'name': 'coffeeify', 'version': '*'},
552        {'name': 'coffeescript', 'version': '^1.12.7'},
553        {'name': 'cucumber', 'version': '0.5.x'},
554        {'name': 'eslint', 'version': '^5.8.0'},
555        {'name': 'hash-test-vectors', 'version': '^1.3.2'},
556        {'name': 'insert-module-globals', 'version': '^7.0.0'},
557        {'name': 'istanbul', 'version': '*'},
558        {'name': 'jasmine', 'version': '^2.5.3'},
559        {'name': 'jasmine-core', 'version': '^2.5.2'},
560        {'name': 'json-loader', 'version': '^0.5.4'},
561        {'name': 'karma', 'version': '^4.1.0'},
562        {'name': 'karma-chrome-launcher', 'version': '2.2.0'},
563        {'name': 'karma-jasmine', 'version': '^1.1.0'},
564        {'name': 'mocha', 'version': '^3.0.0'},
565        {'name': 'repl.history', 'version': '*'},
566        {'name': 'semver', 'version': '*'},
567        {'name': 'typescript', 'version': '2.0.8'},
568        {'name': 'uglify-js', 'version': '2.x'},
569        {'name': 'webpack', 'version': '^1.15.0'}],
570       'url': 'https://www.npmjs.com/package/aws-sdk'},
571      {'name': 'request',
572       'version': '2.88.2',
573       'dependencies': [{'name': 'aws-sign2', 'version': '~0.7.0'},
574        {'name': 'aws4', 'version': '^1.8.0'},
575        {'name': 'caseless', 'version': '~0.12.0'},
576        {'name': 'combined-stream', 'version': '~1.0.6'},
577        {'name': 'extend', 'version': '~3.0.2'},
578        {'name': 'forever-agent', 'version': '~0.6.1'},
579        {'name': 'form-data', 'version': '~2.3.2'},
580        {'name': 'har-validator', 'version': '~5.1.3'},
581        {'name': 'http-signature', 'version': '~1.2.0'},
582        {'name': 'is-typedarray', 'version': '~1.0.0'},
583        {'name': 'isstream', 'version': '~0.1.2'},
584        {'name': 'json-stringify-safe', 'version': '~5.0.1'},
585        {'name': 'mime-types', 'version': '~2.1.19'},
586        {'name': 'oauth-sign', 'version': '~0.9.0'},
587        {'name': 'performance-now', 'version': '^2.1.0'},
588        {'name': 'qs', 'version': '~6.5.2'},
589        {'name': 'safe-buffer', 'version': '^5.1.2'},
590        {'name': 'tough-cookie', 'version': '~2.5.0'},
591        {'name': 'tunnel-agent', 'version': '^0.6.0'},
592        {'name': 'uuid', 'version': '^3.3.2'},
593        {'name': 'bluebird', 'version': '^3.2.1'},
594        {'name': 'browserify', 'version': '^13.0.1'},
595        {'name': 'browserify-istanbul', 'version': '^2.0.0'},
596        {'name': 'buffer-equal', 'version': '^1.0.0'},
597        {'name': 'codecov', 'version': '^3.0.4'},
598        {'name': 'coveralls', 'version': '^3.0.2'},
599        {'name': 'function-bind', 'version': '^1.0.2'},
600        {'name': 'karma', 'version': '^3.0.0'},
601        {'name': 'karma-browserify', 'version': '^5.0.1'},
602        {'name': 'karma-cli', 'version': '^1.0.0'},
603        {'name': 'karma-coverage', 'version': '^1.0.0'},
604        {'name': 'karma-phantomjs-launcher', 'version': '^1.0.0'},
605        {'name': 'karma-tap', 'version': '^3.0.1'},
606        {'name': 'nyc', 'version': '^14.1.1'},
607        {'name': 'phantomjs-prebuilt', 'version': '^2.1.3'},
608        {'name': 'rimraf', 'version': '^2.2.8'},
609        {'name': 'server-destroy', 'version': '^1.0.1'},
610        {'name': 'standard', 'version': '^9.0.0'},
611        {'name': 'tape', 'version': '^4.6.0'},
612        {'name': 'taper', 'version': '^0.5.0'}],
613       'url': 'https://www.npmjs.com/package/request'}],
614     ['NON_EXISTING_PACKAGE'])
615
616
617
618## CSV-Based implementation
619
620
621### Constructor
622
623
624
625```python
626from olivia_finder.data_source.csv_ds import CSVDataSource
627bioconductor_csv = CSVDataSource(
628    "aux_data/bioconductor_adjlist_test.csv",   # Path to the CSV file
629    # Name of the field that contains the dependencies
630    dependent_field="name",
631    # Name of the field that contains the name of the package
632    dependency_field="dependency",
633    # Name of the field that contains the version of the package
634    dependent_version_field="version",
635    # Name of the field that contains the version of the dependency
636    dependency_version_field="dependency_version",
637    # Name of the field that contains the URL of the package
638    dependent_url_field="url",
639)
640```
641
642### Obtain package data
643
644
645
646```python
647bioconductor_csv.obtain_package_data('BANDITS')
648```
649
650
651
652
653    {'name': 'BANDITS',
654     'version': '1.16.0',
655     'url': 'https://www.bioconductor.org/packages/release/bioc/html/BANDITS.html',
656     'dependencies': [{'name': 'R', 'version': nan},
657      {'name': 'Rcpp', 'version': nan},
658      {'name': 'doRNG', 'version': nan},
659      {'name': 'MASS', 'version': nan},
660      {'name': 'data.table', 'version': nan},
661      {'name': 'R.utils', 'version': nan},
662      {'name': 'doParallel', 'version': nan},
663      {'name': 'parallel', 'version': nan},
664      {'name': 'foreach', 'version': nan},
665      {'name': 'methods', 'version': nan},
666      {'name': 'stats', 'version': nan},
667      {'name': 'graphics', 'version': nan},
668      {'name': 'ggplot2', 'version': nan},
669      {'name': 'DRIMSeq', 'version': '1.28.0'},
670      {'name': 'BiocParallel', 'version': '1.34.0'}]}
671
672
673
674### Obtain a list of packages data
675
676
677
678```python
679bioconductor_csv.obtain_packages_data(
680    ['BANDITS', 'ASICS', "NON_EXISTING_PACKAGE"])
681```
682
683
684
685
686    ([{'name': 'BANDITS',
687       'version': '1.16.0',
688       'url': 'https://www.bioconductor.org/packages/release/bioc/html/BANDITS.html',
689       'dependencies': [{'name': 'R', 'version': nan},
690        {'name': 'Rcpp', 'version': nan},
691        {'name': 'doRNG', 'version': nan},
692        {'name': 'MASS', 'version': nan},
693        {'name': 'data.table', 'version': nan},
694        {'name': 'R.utils', 'version': nan},
695        {'name': 'doParallel', 'version': nan},
696        {'name': 'parallel', 'version': nan},
697        {'name': 'foreach', 'version': nan},
698        {'name': 'methods', 'version': nan},
699        {'name': 'stats', 'version': nan},
700        {'name': 'graphics', 'version': nan},
701        {'name': 'ggplot2', 'version': nan},
702        {'name': 'DRIMSeq', 'version': '1.28.0'},
703        {'name': 'BiocParallel', 'version': '1.34.0'}]},
704      {'name': 'ASICS',
705       'version': '2.16.0',
706       'url': 'https://www.bioconductor.org/packages/release/bioc/html/ASICS.html',
707       'dependencies': [{'name': 'R', 'version': nan},
708        {'name': 'BiocParallel', 'version': '1.34.0'},
709        {'name': 'ggplot2', 'version': nan},
710        {'name': 'glmnet', 'version': nan},
711        {'name': 'grDevices', 'version': nan},
712        {'name': 'gridExtra', 'version': nan},
713        {'name': 'methods', 'version': nan},
714        {'name': 'mvtnorm', 'version': nan},
715        {'name': 'PepsNMR', 'version': '1.18.0'},
716        {'name': 'plyr', 'version': nan},
717        {'name': 'quadprog', 'version': nan},
718        {'name': 'ropls', 'version': '1.32.0'},
719        {'name': 'stats', 'version': nan},
720        {'name': 'SummarizedExperiment', 'version': '1.30.0'},
721        {'name': 'utils', 'version': nan},
722        {'name': 'Matrix', 'version': nan},
723        {'name': 'zoo', 'version': nan}]}],
724     ['NON_EXISTING_PACKAGE'])
725
726
727
728## Web API-Based implementation (Libraries.io API)
729
730
731Based on the Web API of Libraries.io we can obtain data from this source.
732
733It is important to note that the data is not updated as a mandatory point to care about
734
735
736### Constructor
737
738
739In this case, it is necessary to define the API Key of Libraries.io in the _config.ini_ file
740
741
742
743```python
744from olivia_finder.data_source.librariesio_ds import LibrariesioDataSource
745
746pypi_libio  = LibrariesioDataSource(platform="pypi")
747nuget_libio = LibrariesioDataSource(platform="nuget")
748cran_libio  = LibrariesioDataSource(platform="cran")
749```
750
751### Obtain package names
752
753
754<p style="color:red">
755This functionality has not been implemented because there is no way to get this data through the API
756</p>
757
758
759The library used to access API from Python has a search functionality but unfortunately it cannot be used efficiently for this task
760
761
762
763```python
764# Set the apikey as an environment variable
765from pybraries.search import Search
766
767search = Search()
768info = search.project_search(platform='pypi')
769
770for project in info:
771    print(project['name'])
772```
773
774    A string of keywords must be passed as a keyword argument
775    typescript
776    @types/node
777    eslint
778    webpack
779    prettier
780    @types/jest
781    @types/react
782    @babel/preset-typescript
783    @babel/runtime
784    jest
785    rxjs
786    postcss
787    vue-template-compiler
788    vue
789    axios
790    requests
791    moment
792    @types/react-dom
793    @types/mocha
794    babel-runtime
795    babel-preset-react
796    @babel/core
797    babel-core
798    @babel/preset-env
799    @babel/plugin-proposal-class-properties
800    @babel/plugin-transform-runtime
801    @babel/preset-react
802    babel-jest
803    commander
804    rollup
805
806
807### Obtain package data
808
809
810
811```python
812pypi_libio.obtain_package_data('networkx')
813```
814
815
816
817
818    {'name': 'networkx',
819     'version': '3.1rc0',
820     'dependencies': [{'name': 'codecov', 'version': '2.1.13'},
821      {'name': 'pytest-cov', 'version': '4.0.0'},
822      {'name': 'pytest', 'version': '7.4.0'},
823      {'name': 'sympy', 'version': '1.11.1'},
824      {'name': 'pydot', 'version': '0.9.10'},
825      {'name': 'pygraphviz', 'version': '1.3.1'},
826      {'name': 'lxml', 'version': '4.9.2'},
827      {'name': 'texext', 'version': '0.6.7'},
828      {'name': 'nb2plots', 'version': '0.6.1'},
829      {'name': 'pillow', 'version': '9.5.0'},
830      {'name': 'numpydoc', 'version': '1.5.0'},
831      {'name': 'sphinx-gallery', 'version': '0.13.0'},
832      {'name': 'pydata-sphinx-theme', 'version': '0.13.3'},
833      {'name': 'sphinx', 'version': '7.0.1'},
834      {'name': 'mypy', 'version': '1.4.1'},
835      {'name': 'pre-commit', 'version': '3.3.3'},
836      {'name': 'pandas', 'version': '2.0.1'},
837      {'name': 'matplotlib', 'version': '3.7.1'},
838      {'name': 'scipy', 'version': '1.11.0'},
839      {'name': 'numpy', 'version': '1.25.0'}],
840     'url': 'https://pypi.org/project/networkx/'}
841
842
843
844
845```python
846nuget_libio.obtain_package_data('Microsoft.Extensions.DependencyInjection')
847```
848
849
850
851
852    {'name': 'Microsoft.Extensions.DependencyInjection',
853     'version': '8.0.0-preview.5.23280.8',
854     'dependencies': [{'name': 'System.Threading.Tasks.Extensions',
855       'version': '4.5.4'},
856      {'name': 'Microsoft.Extensions.DependencyInjection.Abstractions',
857       'version': '3.1.32'},
858      {'name': 'Microsoft.Bcl.AsyncInterfaces', 'version': '7.0.0'}],
859     'url': 'https://www.nuget.org/packages/Microsoft.Extensions.DependencyInjection/'}
860
861
862
863### Obtain a list of packages data
864
865
866
867```python
868cran_libio.obtain_packages_data(['A3', 'AER', "NON_EXISTING_PACKAGE"])
869```
870
871
872
873
874    [{'name': 'A3',
875      'version': '1.0.0',
876      'dependencies': [{'name': 'R', 'version': None},
877       {'name': 'randomForest', 'version': None}],
878      'url': 'https://cran.r-project.org/package=A3'},
879     {'name': 'AER',
880      'version': '1.2-9',
881      'dependencies': [{'name': 'vars', 'version': '0.5.3'},
882       {'name': 'urca', 'version': None},
883       {'name': 'tseries', 'version': None},
884       {'name': 'truncreg', 'version': None},
885       {'name': 'systemfit', 'version': None},
886       {'name': 'strucchange', 'version': None},
887       {'name': 'scatterplot3d', 'version': '0.3.4'},
888       {'name': 'sampleSelection', 'version': None},
889       {'name': 'rugarch', 'version': None},
890       {'name': 'ROCR', 'version': None},
891       {'name': 'rgl', 'version': '0.109.2'},
892       {'name': 'quantreg', 'version': '5.42.1'},
893       {'name': 'pscl', 'version': '1.5.5'},
894       {'name': 'plm', 'version': None},
895       {'name': 'np', 'version': None},
896       {'name': 'nnet', 'version': None},
897       {'name': 'nlme', 'version': None},
898       {'name': 'mlogit', 'version': None},
899       {'name': 'MASS', 'version': None},
900       {'name': 'longmemo', 'version': None},
901       {'name': 'lattice', 'version': None},
902       {'name': 'KernSmooth', 'version': None},
903       {'name': 'ineq', 'version': None},
904       {'name': 'foreign', 'version': None},
905       {'name': 'forecast', 'version': '8.17.0'},
906       {'name': 'fGarch', 'version': '3042.83.2'},
907       {'name': 'effects', 'version': None},
908       {'name': 'dynlm', 'version': None},
909       {'name': 'boot', 'version': None},
910       {'name': 'Formula', 'version': None},
911       {'name': 'stats', 'version': None},
912       {'name': 'zoo', 'version': None},
913       {'name': 'survival', 'version': None},
914       {'name': 'sandwich', 'version': None},
915       {'name': 'lmtest', 'version': None},
916       {'name': 'car', 'version': None},
917       {'name': 'R', 'version': None}],
918      'url': 'https://cran.r-project.org/package=AER'}]
919
920'''