GitHub

Hackmap

https://tomthe.github.io/hackmap

This website shows a few million submissions, comments and users of Hacker News. The position is determined by an algorithm that places similar titles close to each other. The size of the node represents the number of citations.

How to use:

Use the search box to find an author or a paper
Use the mouse wheel or two fingers to zoom in and out
Click on a node and you can find a link to the source in this place

The data was downloaded from the HN-API But since ~40 Million items are too much for a browser, I kept only those 2.6 Million with at least some replies. I fed the comments to a SentenceTransformers model (all-MiniLM-L6-v2) to create text embeddings. Titles of submissions are often ambiguous, so I used the average embeddings of their comments to get a better representation of the content of submissions. The same was done for the users. Then I used UMAP to reduce the dimensionality of the embeddings to 3, 2 and 1 dimensions. 3 for the colors, 2 for the placement of the nodes and 1 for a plot with the time dimension. But the 3D colors didn't add much information, so I removed them.
I also used Bertopic to get clusters and names for these clusters... but they also don't add much information upon the titles of the submissions.
There are several implementations of maps like this. (todo: add links) Some of them are very sophisticated, but they don't show the actual text on the canvas. I think showing as much information as possible, while not overwhelming the user (and browser...) is very important for how much the user can get out of such a visualization of big data. Another important aspect is that I wanted to host the whole thing on a static hoster, which makes things much easier in the long term. I used mostly vanilla Javascript (good decision for such a site - no build step and no fighting against Svelte or React) and the excellent force-graph library.

Since there are too many data points to show at once, the page fetches a base map with the 40 000 most important nodes and then fetches additional data tiles when you zoom in. Unfortunately, I couldn't find the time to implement a static search over all the data, so the search currently only works for the base-tile of 40 000 nodes.
The color of the nodes is based on the publication date. The size is based on the score of submissions and the number of direct and indirect child comments for comments and users.

The biggest challenge in this project was that it worked so well that I got constantly distracted by the stories and comments that I discovered while testing the plot. This is why I release it now in this work-in-progress state. Firefox doesn't render some nodes when zoomed in too much, Chrome renders them, but has problems with showing the correct tooltips.

Candos and Todos:

Better search
More levels of tiles
Tuning of the size and show parameters
Earlier data from HN
Better UI
Other datasets (MusicBrainz, OpenAlex, Newspapers,...)

[email protected]

The code can be found on Github: github.com/tomthe/demographymap

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data/hnbig4		data/hnbig4
index.html		index.html
readme.md		readme.md
streaming-tsv-parser.js		streaming-tsv-parser.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hackmap

About

Releases

Packages

Languages

tomthe/hackmap

Folders and files

Latest commit

History

Repository files navigation

Hackmap

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages