What is this?
Pretty simple. Every 15 minutes, YAHNR scrapes Hacker News and generates a new page containing the top stories from the past 24 hours sorted in descending order by points.
Why man, why?
- Since I sleep I figure I'll miss some good posts during the night and would like to see them.
- Curiosity. I read the NPR news app blog post on their tech setup and was impressed. Since pretty much everything I do is dynamic, I wanted to play around with this approach and see how it went.
How'd you do it?
Pretty simple! The code's up on GitHub.
- Wrote a script that scrapes the front page of HN and dumps the structured data into a JSON file.
- Combine the past 24 hours of JSON files into one big JSON file
- Upload the JSON files to S3
- Use some simple HTML/JS to parse the JSON file and generate the sorted table.
So it's all great!
- There's overlap between days for items that were on the front page for more than 24 hours.
- I should probably normalize point values by time of day since some hours just may have higher average point volume.
- I'm not happy with the bigass JSON file. At the same time I don't want to make a 96 asynscronous calls to get the other JSON files.