Codeless Sentiment Analysis of HackerNews
This posts reports on a experiment with Node-RED (a project supported by the JS Foundation), using its visual data flow editor to fetch HN posts, filter those with positive words in the comments, and tweet them on @HackerGoodNews.
The flow
I recently came across Node-RED and wondered how good it would be as a tool to automate a few things I find more and more annoying to do manually. A friend of mine challenged me to use it to tweet “positive” posts from Hacker News (mostly because there was a sentiment analysis node as part of the built-in set of nodes). Extra points for making it without code (even though there is a function
node to write arbitrary javascript to process messages).
Here is the result:
It works as follows:
Parse HN's RSS feed
creates a new message (Node-RED’s unit of processing) every time a new post appears in HN’s RSS feed.limit 5 msg/s
ensures that no more that no more than 5 messages per second go through the rest of the flow. HackerNews seems to dislike too frequent requests, as made in the comment fetching nodedelay 1 hour
gives some time for the comments to appearfetch comments
requests the comment web page of the postkeep text only
extracts the comments from the web pagejoin
merges all the comments into one textsentiment
analyses the textpositive only
lets only messages with a positive score throughtitle > 120 chars ?
switches messages depending on whether the title of the post is longer than 120 characterstruncate title
shortens the long titlelong title
produces a tweet with the shortened title (adding suspension mark) and the url of the postshort title
simply produces a tweet with the title and the url of the postTweet
publishes the tweet on @hackergoodnews.
The sentiment analysis is very simple : it uses the AFINN word list where words or phrases have been manually evaluated between -5 and 5. For example, “you’re a terrific fascist” has a (positive) score of 2 points, because terific is worth 4
, and fascist -2
… Well, I said it’s simple, not perfect (and to be honest, I browsed the list for a while to find this example ;) )
As a more honest example, here are the 5 most negative posts of last week-end:
- Reflecting on One Very, Very Strange Year at Uber (score: -117)
- Uber Investigating Sexual Harassment Claims by Ex-Employee (score: -77)
- Fukishima is Worse than Ever (score: -60)
- Uber launches ‘urgent investigation’ into sexual harassment claims (score :-59)
- The Age of Rudeness (score: -48)
And here are the 5 most positive posts:
- I started a one-man biz that’s beating VC-backed startups (score: 615)
- Ask HN: Non-technical readers of HN, why are you here? (score: 273)
- ReactOS 0.4.4 Released (score: 231)
- Let’s not demonize driving, just stop subsidizing it (score: 175)
- Firefox 57 as the first release where only WebExtensions will be supported (score: 127)
Conclusion
Node-RED makes it really easy to build a tweeting bot that ignores sexual harassment and nuclear disasters without even coding (of course, that relies on a sane community commenting the news).
The only shortcoming I found concerns the feed parser which does not remember what was already processed. For example, if I were to restart my Node-RED instance, all the positive posts of the home page would be sent to twitter. That’s not a problem since Twitter detects and rejects duplicate statuses, but it would be more annoying if the messages were sent by email.
However, another (and maybe the biggest) strength of Node-RED is how easy it is to extend it by writing new nodes (with code, this time). That brings two solutions to my “problem”: either use one of the many existing nodes to hack some persistence into the feed parser (e.g. with a database node or a file node), or write my own feed parser. Good times ahead ! :)
Comments
Let’s see if the discussion of this article on HN will be positive ;)
Appendix
Here is the flow to import into Node-RED, if you want to tinker with it:
|
|