Searching the Forum [tech]

More forum searching shenanigans.

I added Google Analytics tags to the site! It doesn’t track what searches you do (and won’t ever), but does track visits. We’re not exactly a highly active site. :slight_smile: But it’s helping me with page load and performance tracking.

The crawler now gets a list of all image URLs in all posts, and stores that. This means I could (for example) mirror all the in-forum images somewhere else, e.g. into a specialty Pinterest board or whatever. But for now, we’ll content ourselves with some statistics. There’s 1701 unique image links total in the entire forum. Of those, 888 (52%) are ‘embedded’ - someone uploaded the image to Roll20, and Roll20 in turn sent it to S3 for storage. The other 813 are off-site links, reaction GIFs, whatever. The offsite links come from 396 distinct hosts. The top hosts by count are Giphy (138), Pinterest (73), Imgur (46), YouTube (26), Wikia (24), Tumblr (20), Tenor (18), and our own wiki (11).

author: Bill G.
url: https://app.roll20.net/forum/permalink/6507808

Nothing surprising in those stats, but still interesting to see them. Thanks!

author: *** Dave H.
url: https://app.roll20.net/forum/permalink/6508105

The initial search page + data should load in about 3 seconds instead of 11 now. This is a weird setup, so if anyone is using the forum search and suddenly has problems with it not working, please let me know.

EDIT: yeah, every time I crawl the site, one or two pages don’t make it. It’s not retrying on error due to shenanigans, so I’ll fix that over this weekend.

author: Bill G.
url: https://app.roll20.net/forum/permalink/6514304

Search hasn’t updated in several days because Roll20 made some changes that broke me. I’m getting socket timeouts, and my URL de-duping isn’t working thanks to their new pagination scheme. I’m working on fixing it.

author: Bill G.
url: https://app.roll20.net/forum/permalink/6804376

FINALLY. Indexes were rebuilt: 2018-9-30 13:18:46

I switched from the built-in skip duplicates logic (which is shit) to seenreq (which is frustrating), and used async/await instead of promises for better troubleshooting. But it’s working now, and I verified that it’s retrying all errors, and grabs each page only once.

author: Bill G.
url: https://app.roll20.net/forum/permalink/6846350

Thanks, Bill! Your computer-fu is deeply appreciated.

Related image

author: *** Dave H.
url: https://app.roll20.net/forum/permalink/6846680