Skip to content

Local relational access to openly-available publication data sets

License

Notifications You must be signed in to change notification settings

dspinellis/alexandria3k

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Alexandria3k CI

Alexandria3k

The alexandria3k package supplies a library and a command-line tool providing efficient relational query access to the following large scientific publication open data sets. Data are decompressed on the fly, thus allowing the package's use even on storage-restricted laptops.

  • Crossref (157 GB compressed, 1 TB uncompressed). This contains publication metadata from about 134 million publications from all major international publishers with full citation data for 60 million of them.
  • PubMed (43 GB compressed, 327 GB uncompressed). This comprises more than 36 million citations for biomedical literature from MEDLINE, life science journals, and online books, with rich domain-specific metadata, such as MeSH indexing, funding, genetic, and chemical details.
  • ORCID summary data set (25 GB compressed, 435 GB uncompressed). This contains about 78 million author details records.
  • DataCite (22 GB compressed, 197 GB uncompressed). This comprises research outputs and resources, such as data, pre-prints, images, and samples, containing about 50 million work entries.
  • United States Patent Office issued patents (11 GB compressed, 115 GB uncompressed). This containins about 5.4 million records.

Further supported data sets include funder bodies, journal names, open access journals, and research organizations.

The alexandria3k package installation contains all elements required to run it. It does not require the installation, configuration, and maintenance of a third party relational or graph database. It can therefore be used out-of-the-box for performing reproducible publication research on the desktop.

Installation and documentation

The alexandria3k is available on PyPI. The complete reference and use documentation for alexandria3k can be found here.

Major contributors

Publication

Details about the rationale, design, implementation, and use of this software can be found in the following paper.

Diomidis Spinellis. Open reproducible scientometric research with Alexandria3k. PLoS ONE 18(11): e0294946. November 2023. doi: 10.1371/journal.pone.0294946