What is this?

Pretty simple. Every 15 minutes, YAHNR scrapes Hacker News and generates a new page containing the top stories from the past 24 hours sorted in descending order by points.


Why man, why?

  • Since I sleep I figure I'll miss some good posts during the night and would like to see them.
  • Curiosity. I read the NPR news app blog post on their tech setup and was impressed. Since pretty much everything I do is dynamic, I wanted to play around with this approach and see how it went.

How'd you do it?

Pretty simple! The code's up on GitHub.

  1. Wrote a script that scrapes the front page of HN and dumps the structured data into a JSON file.
  2. Combine the past 24 hours of JSON files into one big JSON file
  3. Upload the JSON files to S3
  4. Use some simple HTML/JS to parse the JSON file and generate the sorted table.

So it's all great!

Not really!

  • There's overlap between days for items that were on the front page for more than 24 hours.
  • I should probably normalize point values by time of day since some hours just may have higher average point volume.
  • I'm not happy with the bigass JSON file. At the same time I don't want to make a 96 asynscronous calls to get the other JSON files.