Codeless Sentiment Analysis of HackerNews
This posts reports on a experiment with Node-RED (a project supported by the JS Foundation), using its visual data flow editor to fetch HN posts, filter those with positive words in the comments, and tweet them on @HackerGoodNews.
The flow
I recently came across Node-RED and wondered how good it would be as a tool to automate a few things I find more and more annoying to do manually. A friend of mine challenged me to use it to tweet “positive” posts from Hacker News (mostly because there was a sentiment analysis node as part of the built-in set of nodes). Extra points for making it without code (even though there is a function
node to write arbitrary javascript to process messages).
Here is the result:
It works as follows:
Parse HN's RSS feed
creates a new message (Node-RED’s unit of processing) every time a new post appears in HN’s RSS feed.limit 5 msg/s
ensures that no more that no more than 5 messages per second go through the rest of the flow. HackerNews seems to dislike too frequent requests, as made in the comment fetching nodedelay 1 hour
gives some time for the comments to appearfetch comments
requests the comment web page of the postkeep text only
extracts the comments from the web pagejoin
merges all the comments into one textsentiment
analyses the textpositive only
lets only messages with a positive score throughtitle > 120 chars ?
switches messages depending on whether the title of the post is longer than 120 characterstruncate title
shortens the long titlelong title
produces a tweet with the shortened title (adding suspension mark) and the url of the postshort title
simply produces a tweet with the title and the url of the postTweet
publishes the tweet on @hackergoodnews.
The sentiment analysis is very simple : it uses the AFINN word list where words or phrases have been manually evaluated between -5 and 5. For example, “you’re a terrific fascist” has a (positive) score of 2 points, because terific is worth 4
, and fascist -2
… Well, I said it’s simple, not perfect (and to be honest, I browsed the list for a while to find this example ;) )
As a more honest example, here are the 5 most negative posts of last week-end:
- Reflecting on One Very, Very Strange Year at Uber (score: -117)
- Uber Investigating Sexual Harassment Claims by Ex-Employee (score: -77)
- Fukishima is Worse than Ever (score: -60)
- Uber launches ‘urgent investigation’ into sexual harassment claims (score :-59)
- The Age of Rudeness (score: -48)
And here are the 5 most positive posts:
- I started a one-man biz that’s beating VC-backed startups (score: 615)
- Ask HN: Non-technical readers of HN, why are you here? (score: 273)
- ReactOS 0.4.4 Released (score: 231)
- Let’s not demonize driving, just stop subsidizing it (score: 175)
- Firefox 57 as the first release where only WebExtensions will be supported (score: 127)
Conclusion
Node-RED makes it really easy to build a tweeting bot that ignores sexual harassment and nuclear disasters without even coding (of course, that relies on a sane community commenting the news).
The only shortcoming I found concerns the feed parser which does not remember what was already processed. For example, if I were to restart my Node-RED instance, all the positive posts of the home page would be sent to twitter. That’s not a problem since Twitter detects and rejects duplicate statuses, but it would be more annoying if the messages were sent by email.
However, another (and maybe the biggest) strength of Node-RED is how easy it is to extend it by writing new nodes (with code, this time). That brings two solutions to my “problem”: either use one of the many existing nodes to hack some persistence into the feed parser (e.g. with a database node or a file node), or write my own feed parser. Good times ahead ! :)
Comments
Let’s see if the discussion of this article on HN will be positive ;)
Appendix
Here is the flow to import into Node-RED, if you want to tinker with it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 |
|