GitHub - lalBi94/scrapper-fandom: Scraping data from most of the wikis on fandom.com.

Introduction

This script is used to scrape data from most of the wikis on fandom.com. It retrieves only the text contained in a given div, and can be easily adjusted via class/Analyzer.mjs.

Installation

npm

npm install
npm start

pnpm

pnpm install
pnpm start

Configuration

You can change :

The source fandom
The source of the page containing the register of all pages
The name of the subfolder to be created in out/
The name of the file containing the scrapped page content
the name of the file containing the history of links present on the wiki.

// Base url of some fandom's wiki ex: https://some-wiki.fandom.com without '/' at end.
const from = "https://naruto.fandom.com";

// https://some-wiki.fandom.com/wiki/Special:AllPages or https://some-wiki.fandom.com/fr/wiki/Sp%C3%A9cial:Toutes_les_pages
const entry_point_from_all_pages =
    "https://naruto.fandom.com/fr/wiki/Sp%C3%A9cial:Toutes_les_pages?from=%22Gaara%22...%21%21";

// Name of the subfolder to be created in out/ (Default: some-wiki relative to "from" variable).
const sub_dir = new URL(from).hostname.split(".")[0];

// Data file name.
const filename_data = `${sub_dir}-data.json`;

// History file name.
const filename_history = `${sub_dir}-history.json`;

Example

I want to scrape the contents of the Solo Leveling fandom wiki page.

Minimal configuration

const from = "https://solo-leveling.fandom.com";
const entry_point_from_all_pages =
    "https://solo-leveling.fandom.com/fr/wiki/Sp%C3%A9cial:Toutes_les_pages";

Processing

Preparing future links to visit in order to capture text data.

Retrieving page content.

Collecting data in out/solo-leveling/solo-leveling-data.json.

Licence

This project is licensed by MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
class		class
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.mjs		main.mjs
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Installation

npm

pnpm

Configuration

Example

Minimal configuration

Processing

Licence

About

Releases

Languages

License

lalBi94/scrapper-fandom

Folders and files

Latest commit

History

Repository files navigation

Introduction

Installation

npm

pnpm

Configuration

Example

Minimal configuration

Processing

Licence

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages