top | item 27698322

Show HN: VisaWhen – Data on US visa issuance backlogs

104 points| underyx | 4 years ago

Heya! Not the usual sort of thing to be posted here, but I wanted to show off what I made yesterday. Here's a sample page about H1-B visas issued in Bogota:

<https://visawhen.com/consulates/bogota/h1b>

The code is source-available (not open source) at <https://github.com/underyx/visawhen>. It's my first time choosing a source-available license over MIT, mainly out of fear of existing immigration startups just gobbling this data and code up; frankly I didn't think the implications through though, I just threw a safe license on there.

The way the project works is:

- Use requests-html to find publicly available PDFs from government pages

- Use camelot to OCR the PDFs and extract data tables from them

- Since the previous step takes crazy long for my tastes (around 8000 pages at around 5 seconds each) I've used dask to split the work into chunks and parallel-process them across my laptop's CPUs.

- Do data cleanup and processing with pandas, and save all of it to a SQLite file.

- Take data from the SQLite file with next.js and generate a static HTML page for each possible embassy - visa type combination

- The pages use ECharts to visualize data, and Bulma as a CSS framework

- Build and host each commit via Netlify

- But proxy to Netlify from CloudFlare, which I believe has more edge locations in the free plan

- Collect any donations via Ko-Fi

- Use Google Analytics to have a general idea about visitor counts

- Use FullStory session recordings to find out about bugs – I've fixed quite a few and I think I'll probably remove this tracking after a bit of time

…and that's where I'm at now. I'm pretty happy about the results. Most pages load in less than 300ms, which is something I care about all too much. More importantly, I've shared the site with some immigration communities I'm part of, and the response has been very positive! Let me know what y'all think.

26 comments

order
[+] simonw|4 years ago|reply
I downloaded your consulates.sqlite3 file and opened it up in https://datasette.io/ on my laptop - if you do the same (and run "datasette install datasette-vega" to get the charting plugin)

Having done that...

http://127.0.0.1:8001/consulates/backlogs?_facet=Post+Slug&_...

Full page screenshot here: https://static.simonwillison.net/static/2021/consulate-backl...

Shows an interesting graph where the number of L1 visa issuances in London drops from around 500 a month to 0 around March/April 2020, eventually climbing to between 19 and 65 per month in the past few months up to today.

This is a really neat dataset, congrats!

[+] jmercouris|4 years ago|reply
That’s super cool, very useful. I would be glad if instead there weren’t absurd delays.
[+] RileyJames|4 years ago|reply
Pretty cool. I once drove from Victoria to Calgary and back to get a US tourist visa because Vancouver had a 30 day wait and Calgary had 3 days.

I lucked out a drove there and back during a chinook, so roads were good, the drive was epic.

I recall I was able to see the bookings available at each consulate, but I think I’d been preliminarily approved, or paid something at that stage.

Nice work on the site, adding any opacity to beauracratic processes is a positive in my book.

[+] lgats|4 years ago|reply
Neat! I’ve built a little site on some similar US visa data, https://visa.ooo
[+] axaxs|4 years ago|reply
This site is awesome, thanks for sharing. Wish I had something like this a few years ago. The USCIS is nearly impossible to reach by phone, unless you memorize a very specific set of options and get lucky.
[+] Evidlo|4 years ago|reply
Maybe alphabetize the list of visa types.
[+] underyx|4 years ago|reply
Yeah, or I was thinking they should be sorted by amount issued, most to least popular.
[+] ftyers|4 years ago|reply
Would be cool to see a list of consulates ordered by speed for a given visa class. In the US ATM but have to leave the country to renew the H1B (ugh), but don't care where I do it. The thing has been authorised, just need to do the DS-160 and interview clown parade.
[+] simonw|4 years ago|reply
Have you considered using GitHub Actions to automate the data scraping? A GitHub Actions workflow is allowed to run for up to 6 hours so even with the PDF processing you are doing it may be enough time to generate the full site.
[+] graton|4 years ago|reply
It would be cool when you select a consulate and then select the visa type, if next to the visa type was a brief description of the visa. For those who don't know what all these visa type codes mean.
[+] underyx|4 years ago|reply
Yeah, thanks for the note! I was thinking of doing that as well. I left it for later cause it’ll be annoying to gather all the names :D
[+] jcims|4 years ago|reply
Very cool. Might be interesting to add some stats on how a particular consulate compares to average of all backlogs and of those in the same country.
[+] whoisjuan|4 years ago|reply
Thanks for sharing details about the architecture and infrastructure behind this. That's pretty neat.