Continuing my annual tradition, I put together a quick-search page for Halifax Burger Week 2021, scraping the official site for data. This time I tweeted about the whole process. I've collected the thread below.
If you want to read the original thread on Twitter, start here. I've also rolled it up into a single page using Threadreader.
So #HFXBurgerWeek is starting tomorrow, and I haven't gotten around to making my unofficial fast-find burger listing web page yet. Well, I've got some time now and I should be able to punk rock this thing together in about an hour. Follow along in this thread! (Or mute me)
Here's the burger lineup in the official site:
https://burgerweek.co/burger-lineup/
Big delicious pictures, some search tools - but if you just want to find a burger quickly on your phone, it's a bit clunky. You have to tap to see details.
Devtools says it's 176 files and nearly 8 megabytes
I'm not blaming The Coast for their website - it has many jobs to do for many different stakeholders - but I like fast simple lookup tools over big splashy sites. In fact this year's version seems a bit more svelte than before. Still, I'm going to make my own much lighter version
I'm going to reuse a lot of the things I've already set up from previous years, so the main challenge is: how do I get the current burger data in a useful format? Two years ago they actually sent me a spreadsheet, but it's too late now for that. So I'm going to SCRAPE the SITE. I have no guarantees that this will work this year, but it's been good in previous years and it doesn't look like it's changed that much.
To do this, I'm going to write a script in Ruby using the Nokogiri parsing library - in terminal EMACS, because I'm an Old.
I like doing these kinds of scripts with a shell on one side and my code on the other, just writing outputs as I got until everything is working properly. I can full-screen this and just work on it step-by-step and it's actually a very pleasing kind of way to hack for me.
The first thing to do is read in the main listing page. To do this I set up Nokogiri and tell it to load up the page. There was some weird SSL issue, but it turns out I can access the site just fine from regular http:// as well as https:// Just outputting what it loaded for now.
Just like your browser reads a page and then displays it in a window, Nokogiri reads a page and then stores it as a data structure in memory so you can do stuff to it with your code, like find and extract data.
The site has a separate page for each burger, with all the details we need on it. What I want to do is get the address for each page from this main listing page.
My browser's inspector shows that each entry has a link of class "listing-item-container" - let's see if that works
One nice thing about Nokogiri is you can find stuff on a page using more or less the same syntax as CSS or JQuery selectors. So to find everything with class "listing-item-container" I just ask it for .css('.listing-item-container') - and it give me 142 matches.
Pretty good!
Then I can just loop through those matches and ask it for the 'href' attribute.
Now I have a list of all of the separate burger page addresses!
Some Ruby people will say I'm being too verbose here breaking things out into all those lines. After all these years though I've learned that a bit of extra explicitness can really help clarity. But yes I could golf all of this down to one line if I wanted to:
Nokogiri::HTML(open('http://burgerweek.co/burger-lineup/')).css('listing-item-container').each{|x| puts x.attr('href')}
Just because you CAN doesn't mean you SHOULD though. So back to the clunkier but more readable code.
[Jurassic Park gif]
This time instead of just outputting the entries, I'm mapping them into an array. That .map{|x| ... } thing is one of the reasons I still love working with Ruby so much and also kind of resent all the extra brackets and arrows etc that JavaScript requires for the same thing.
Now I have a list of all the burger pages, I can loop through it, load each up with Nokogiri, and extract the necessary information. Easy as ... oh drat.
If you look back at the previous screenshot you'll see all the links are https:// and for some reason that's breaking here.
So I just do a quick substitution and boom now I'm going through every page!
However, this takes a while to run, like over a minute to hit all 143 or so pages. I also find it impolite to hit a site over and over again with a scraper like that, even if it's just getting the HTML. So I'm temporarily adding [0..2] to the end of the fetch, which just uses the first three elements of the array. This loads much more quickly - and I should be able to figure out most of the rest just from here.
Here's one of the pages:
As you can see most of the information we'll need is up at the top.
There is also conveniently a map at the bottom.
Let's poke around the HTML to see if we can find useful selectors to point Nokogiri to.
Here's the HTML for the top part of the page. Annoyingly both the name of the burger and the name of the restaurant are H2s at the same level - but at least one has a different class. And they should both be in the same order. Price and FeedNS donation have their own classes.
Description looks like it's inside a "#listing-overview"
And - hooray! - the GPS coordinates are attributes in the map container tag - and also parameters in the Google Maps Directions link below, if necessary. Last year everything was in a GIS component and un-extractable.
Okay - so I'm going to try getting the restaurant name and the burger. Hmmm. They're both H2s, and the burger is mixed in with the price and other details. Kind of messy. This happens with stuff coming from CMSes. You can see on the right how it's coming out of Nokogiri.
(A few minutes pass...)
Okay I've hacked together a janky solution - but at least it works!
This gets all the H2s in the titlebar section. The content of the 1st is the restaurant. The content of the 2nd is the burger and some pricing. "split('$').first" is a quick way of getting everything up to the $
If the tags-free second h2 text is:
YOLO$18 $1 to Feed NS
Then split('$') turns it into an array broken on $:
["YOLO", "18 ", "1 to Feed NS"]
And the burger name will always be the 1st element
So:
h2s.last.text.split('$').first
Sometimes Ruby feels like poetry.
When I got into Ruby from Java, just that one line would have been paragraphs of iterators and class casting and all sorts of things. Ruby could put into one line - and one line that was still legible and expressive.
The next ones are easier: price and FeedNS amount each have their own spans with dedicated classes and we can just do selectors for them. To keep track of things I'm outputting all the results to the console. (The "#{...}" lets me stick variables in strings.)
Results on the right.
The rest of the details on this page are coming in pretty easily.
"Tags" end up with lots of extra space around them, so I call ".strip" to remove all that. The "Description" is in an area with an ID ('#listing-overview') but I needed to pull out just the stuff in P tags.
We got latitude and longitude from the map container tag - and now we have everything we need!
Now to try it on all the pages. Ideally the content will be structured consistently.
Ideally.
Hey it went through all of them without crashing on missing data, or even any mis-formatting! I've been doing this for decades and I'm still suspicious when this kind of thing works the first time. But I'll take it.
Hungry yet?
But what do we DO with this data now that we know we can get it?
We're going to turn it into JavaScript.
Then we're going to write that JS to a file.
Then we're going to load that file on a page where more JavaScript can write out the data in useful ways.
Anyone still here?
Instead of outputting the results, we're going to put them into a data structure - Ruby calls these Hashes, but they're sometimes called Dictionaries or Objects. They're just collections of keys and values, wrapped in { ... }.
(I'm using an older Ruby hash assignment style because I'm an Old.)
Our final result is going to be an array - an ordered list - of these burger hashes. I set up an array before the main loop with:
burgers = []
And after we make each new burger entry we add it to the array:
burgers << next_entry
At the end we have a nice collection of data.
Now we have this data we want to turn it into JavaScript, because that's the only language web browsers understand. Thankfully this is pretty easy by simply calling ".to_json" on our array. Here we're sorting them by restaurant first, and then adding a name before the json.
"JSON" stands for "JavaScript Object Notation". It's basically data formatted as JavaScript. Having spent much of my early career wrestling with XML, this is much nicer to work with. You can see it on the right. It looks quite a bit like the raw Ruby output from earlier.
One last test: I set it to fetch all the burgers from all the pages and then write everything out as JSON. It churns for a while - but everything is looking good!
Next we do a bit of UNIX command-line magic: instead of having our script write out to the screen, we can tell it instead to write out to a file! We just use ">" and give it a name to use.
ruby scraper_2021.rb > burgers_2021.js
And now we have a 60kb file of burger data!
Oops - I realized that I forgot to get the restaurant address! It's just an extra data element that we can pull from ''.listing-address' on the page and put into our array of hashes.
BRB
Okay now we have our burger data, we can start figuring out how to show it on our page. I'm mostly using the same HTML page as last year. Here it is:
That's it.
That's the site.
It's mostly framing stuff, linking in some CSS and JS (including the burgers_2021.js file we just made) and Google Analytics so I can see stuff about my visits. All the real action happens near the middle, right here:
<div id="burgerlist"></div>
The other key part is the "burgerutils_2021.js" file, which contains a few key JavaScript functions.
The heart of the site is this function:
The JS file we created earlier gives us all the burger info in a "burgerData" variable, which we loop through here to build a bunch of HTML - which is then put in the "burgerlist" element on the page. It's a vintage approach - but it works!
(Had to go out for dinner - returned a few hours later)
... and I'm back.
The original versions of this project used JQuery to make the JavaScript easier to work with. Then I realized I was bringing in a whole lot of library code just to do a handful of relatively simple things. So I did some research on how to do JQuery stuff in plain old JS.
The first big advantage of JQuery is a streamlined way to not run anything until after the page is fully loaded. Turns out this code does the same thing.
To be honest I just copy-pasted it from StackOverflow. It does the trick though. (Also you may have noticed I'm using a programmer's font that converts things like != into custom symbols like in line 2 above)
Anyhow, when the page finishes loading, it runs the loadBurgers() function I showed earlier. It goes through the whole data set, builds a pile of HTML with it, and then dumps it all into that burgerlist div - using the non-JQuery plain JS method.
So yeah sure during my regular day-to-day work I'm transpiling ECMAScript and my node_modules folder is so huge I had to reconfigure my file system to support it - but part of this "punk rock" project is to do as much as possible with as little as possible.
It's. Just. Plain. JavaScript.
The fanciest part is sortByDistance(), which gets the current GPS position and when it's done that it calls updateListing(). This does some ugly juggling but eventually assigns a "distance" value to each burger, and then re-sorts the list by distance and then redraws it. The distance algorithm does some spherical trigonometry stuff I don't really understand. I copied it from StackOverflow as well - but it works!
The last bit is a quick and dirty "search" function that simply checks entries if they have text that matches what was typed into the field, and shows them if they do and hides them if they don't. I'm not going to apologize for using a regular expression. It's a simple one at least!
The CSS is pretty basic. I used to have all of Bootstrap set up for this, along with custom fonts.
Now it's just a page or so of straightforward stuff and nice fonts if you have them but sanserif if you don't.
If you're on mobile, there's some slightly different CSS that shrinks the text a bit. There's also custom styling for the one "awesome" button. EMACS doesn't colour nested CSS very well but meh.
Okay let's fire this up!
Looking pretty good! Except for all those "Undefined" burgers. Is this the new trend this year, like kimchi was in 2018? No, I probably just used the wrong name somewhere. It's my most common mistake.
Ah - the code was looking for "name" but the data file had "burgername". Naming is one of the trickiest things in software, believe it or not. This is one of the ways my English degree has actually helped my software career a lot. But with multiple files it's easy to lose track of which names you're using for what things.
Anyhow, it looks better now!
Tried the filter - it seems to work! However, it revealed a problem I should have caught earlier: special characters in the content. "mac ‘n’ cheeze" ugh. Need to do some search/replace. This is actually where I fire up VSCode since it's better at that than terminal emacs.
It's mostly smart quotes and em-dashes and things like that. Perennial headaches for this kind of parsing stuff. I just swap them with dumb quotes and regular dashes and some HTML special character codes etc. Close enough for rock.
I'm in California right now so the "show me the closest burger" function is going to be a bit weird - though for security reasons it doesn't work when just looking at the file on my computer. I need to get it published! When I first set this up several years ago, I had actually planned a full database-backed Ruby on Rails app - then I realized I could do the whole thing with a few flat files with absolutely nothing underneath.
The cool kids now call this JAMStack. I call it keeping things ~1997.
Since it's just flat files, I could put this on nearly any host anywhere. I like @websavers for plesk/php hosting here in Nova Scotia. I could probably use Netlify for this since it's "serverless". Or Github, or Amazon. But it's already on @heroku so I'll stay there for now.
So first I commit all my changes into Git version control. Since I've been in EMACS I'm using "magit" which is really nice, even for the limited stuff I use it for.
(I rename the files every year so browsers don't use the old cached ones. Once-a-year-timestamping)
8:39 PM: It's overkill to use Heroku to host simple flat files, but I picked it because I already used it for lots of stuff and the base level is free.
However, to be able to use SSL I have to move to a paid level, and without SSL the GPS won't work.
So I might move things.
Tomorrow.
Anyhow, for the time being you can check out the updated "punk rock" Burger Week burgers listing at http://burgerweek.shindigital.com and feel free to "View Source" and see how it all fits together.
GPS won't work though.
Soon!
Anyhow, as I mentioned way upstream, the official site listing is 176 files and nearly 8 megabytes.
My version (which is arguably more useful in certain ways) is 7 files and 125kb.
The sum total of JavaScript functionality, which includes dynamically updated page contents, search/filter logic, and GPS distance sorting, fits into one 3.34KB file. No extra libraries. No transpiling. No monstrous node_modules folder. 100 lines of code.
I enjoy doing this every year, even if I only have the stomach for maybe one or two of these burgers - or even if I'm out of town for all of it like now.
It's like the Ramones doing 90 second songs in an era of double-length prog rock albums.
(By the way, the first recorded use of "punk" to refer to music was in a review of the Stooges' first album: "it's the music of punks, it's about cruising for burgers in your car" So burgers and punk go way back.)
(The next morning...)
Morning update! I realized that since Burger Week is only, um, a week, it wouldn't break the bank to upgrade to a paid plan at Heroku so I can get SSL working. It's $7 / month which is $1.75 for the week. Hosting is cheap. Especially if it's just flat files for a limited time!
Upgrading a server used to take months and cost thousands of $. Now it's a click of a button and you only pay for time used.
Setting up SSL/https used to take all day. Now it's one command-line call. Thank you letsencrypt!
So now we can hit https:burgerweek.shindigital.com and it's secure! There's no forms to fill out or anything so it doesn't really NEED to be secure, but in 2021 many browsers won't let you get GPS coordinates without SSL, and search engines will rank you lower if you don't have it.
... but there's now 2 problems:
1. you can still get to the old unsecured address and
2. the location sorting still doesn't seem to be working.
I think I know what to do about both of these. Light debugging goes well with morning coffee!
So what @halifaxbeard is jokingly suggesting here is that I submit a petition to Google to get the site added to an exclusive list of sites built into Chrome that will always be redirected to https.
Maybe not this early in the day!
Instead I just slapped some JS in at the top of the page that redirects.
Next up is the distance calculations not working. As I guessed, it was once again about Naming Things. For 2020 I used much shorter lookup keys in the burgers list - e.g. 'lt' instead of 'latitude' - to make the file as small as possible. I just forgot to update the function!
Labels are now updated, but I wanted to just make sure things are working properly, so I added every web programmer's oldest and best friend: console.log. This writes out to a behind-the-scenes browser panel, so you can see what's going on. I'm just showing the calculated results
Quick git commit and deployment and - boom! - we're in business!
(I'm in California right now so the distances are extra large - but it's working)
Closest burger week burger to me is at Peggys Cove, a mere 4832km away.
That last final step is of course to REMOVE the console.log call from the code and redeploy.
I, um, may sometimes forget to do this step.
I know I'm not alone though!
Enable "Developer Tools" and check your favourite sites to see what gets left in there.
Okay - someone pointed out that my site isn't picking up the search terms. Looks like they added extra tag criteria this year, in addition to the regular tags they've had before.
This shouldn't be too difficult to extract though. Let's look at the HTML and figure out the best way to get the tags.
They're 'li' elements inside a 'listing-features' div.
So I add another Nokogiri selector, which returns an array of matches. I extract the text and merge it into a comma-separated string. Then I test it in the shell window in EMACS like before.
Looking good!
To keep things simple, I'm just adding the "Features" values (if any) to the end of the existing Tags list.
Ruby has an "unless" operator which simply means the opposite of "if" and it usually just makes things extra confusing - but it's handy for short things like this to check for blanks
Since we're just adding to the tags data, we don't have to update anything else - so git commit and publish and ... now you can find the gluten free burgers nearest you.
Total time: 20 minutes, including documentation and a pee break
Anyhow, I hope at least someone out there enjoyed this dive into my punk-rock #HFXBurgerWeek side site.
Check it out if you need a fast way to find a burger near you:
https://burgerweek.shindigital.com
It's always fun to build and I'm glad I could share it with you.