Data science for Hong Kong protests

Given the current situation, how can data science help find solutions? What light can we shine to help people understand how to make sense of what is going on?

If you have any ideas, or can even help us organise a hackathon (especially by providing a venue), please get in touch!

Collecting what is already available is valuable… For example:

I guess a easier start would be like a time series chart of the number of people arrested?

That would be great!

Weirdly enough, the police has stopped publishing arrest statistics on its website in June…


Crowdsourced tear gas, including latlong!

I believe this heat map of tear gas is based on the spreadsheet above:

The government has released public order arrest statistics by age and gender:

Age of the arrestees Male Female
Under 12 1 0
12 to 14 104 55
15 to 17 519 223
18 to 20 886 306
21 to 25 1 388 439
26 to 30 638 202
31 to 40 449 153
41 to 50 197 84
Above 50 161 51
Total 4 343 1 513

Drawing these arrest statistics as a reasonable histogram is a good exercise.

Btw, actually there seems to have more than a dozen articles released from the gov everyday, possible to do some nlp?

What sort of articles?

I’d like to do some author clustering!

It should be actually a list of press release everyday actually, for example this is all the 2019-12-10 press release:

Oh yes, I’ve wanted to collect and classify these press releases for years! Let’s work on that together?

1 Like

Sure, and welcome whoever saw this in the open internet as well