Have you ever wondered how many usernames are still available on Twitter? How hard would it be to find a unique username on a social network which has been created a decade ago I do!
In this article I will try to demonstrate why these questions are more complex than it appeared at first glance.
Before we start, let’s have a look at the Twitter name policy.
The first point to highlight here is that contrary to other social networks Twitter does not free blocked accounts (unless you are a brand whose name has been taken). This means that all the accounts that have been blocked since 2006 are still locked!
While on the subject of Twitter policy, I also looked for some official stats concerning how many users registered since the tool was first released. Unfortunately, I was not able to find anything on the topic. Twitter, as many other tech companies, is keeping these numbers secret.
So without any official source of data how are we going to tackle this problem?
Let’s build our own tools!
My first idea was to design a bot that will take advantage of the built-in username checker. Indeed, who better than Twitter knows which names are still valid?
Without going into the technical details, I tried to hijack the API… And it was not a viable solution. Long story short: Twitter kicked me and I was not allowed to access my account anymore (oops).
Well, fair enough. Let’s try something else. Knowing that you don’t need to be logged in to go on a twitter profile, what about creating a web scrapper that will visit Twitter profiles and collect all the data for me?
I started by selecting a sample of usernames I found relevant: 2035 first names, which were given in France between 2004 and 2012.
According to you, how many are still available on Twitter?
To my great surprise, there was actually not a single one available anymore! Likewise, I tried with @_username and @username_ (as it’s advised in the Twitter name policy).
Once again, the percentage of available names today is under the 2%!
In search of an explanation
These surprisingly low results arose my curiosity. Thus, considering that the « join date » is displayed on the public profile how about trying to see when these username were taken?
I reconfigured my bot to get this new data and display the result in a chart. Here is the D3 representation I got:
(Dataset : 2035 French names)
Two statements can be made from this chart:
- It seems that there was a huge peak of Twitter registration in the beginning of 2007.
- Interestingly, we can see that every March a recurrent pattern occurred as new users are joining!
Concerning the first statement I tried to correlate these data to another source of information. According to a researched paper, The Tweets They are a-Changin’: Evolution of Twitter Users and Behavior, the peak of user acquisition as been reached in 2009 for Twitter. So, is my original sample (2035 French names) biased? Maybe.
Another hypothesis is that this peak on my chart shows the moment when there has been a popular interest for Twitter. Knowing that popular interest and massive user acquisition are two different concepts that might explained the gap.
Concerning the second statement I have to say that it’s completely unexpected. Does that mean that Twitter is actually releasing some usernames every March?
I ran some other tests to see if the same behavior was observable with some other data samples.
See the Pen Twitter growth (FR names with underscore ex: @_adrien) by Adrien Rahier (@FracArt) on CodePen.0
See the Pen Twitter growth (FR names with underscore ex: @adrien_) by Adrien Rahier (@FracArt) on CodePen.0
(Datasets: 1- 2035 French names with underscore at the beginning, 2- 2035 French names with underscore at the end, 3- 1831 animals words)
Concerning the first statement, we still see on these three charts two peaks: one in 2007 and one in 2009 (yes!). However, the weird March pattern is less pronounced here.
Finally, as it as been suggested by a friend: Maybe some people are just doing some domain name squatting (the fact to reserve a domain name without doing anything to resell it later).
To clarify this, I made a quick Excel bar chart to show the repartition of tweets per user.
(Dataset : 2035 French names)
Once again, it’s hard to draw general conclusions with just one batch of users. However, it’s sounds like most of the accounts have at least tweeted two times.
To conclude this article and answer the original questions: Nowadays it’s going to be very hard to find a username that hasn’t been taken yet on Twitter.
For example, among the 1831 French animals word I tested, only the common terms @fourmilier and @vampire are still available!
Beyond this simple statement, this side project also showed me that it’s not very complicated to create and train a bot that will crawl some webpages for you.
Oddly enough, this market sector, automation of tasks to gather data, is still not claimed by any companies or startups. As far as I know the only example I can cite is the « Le camping » startup Phantombuster (read more about them here).
So there is probably a lot to do on this business segment. Indeed, from a business intelligence perspective this kind of informations can be very profitable (aka your company has access to some informations that your other competitors won’t have).
Finally, if you have some other ideas about how to interpret these charts or extend this side-project feel free to comment this article or ping me on Twitter : @AdrienRahier.
- All the code that has been used in this article is available on this Github repo.
- You can see the four charts on my Codepen.
- CasperJS has been used to scrap the data.
- D3.js and Excel have been used to display the datas.