Here's how I got your credit card number, or: Why sequential identifiers are a bad idea
4 min read
People use image sharing services all the time: sometimes the content they post on there is nothing relevant, but more often than not they just forget they're giving access to a lot of their data to anyone with enough skill to know where, and how, to look for.
Imgur, a popular image hosting and sharing website
A quick rundown
The concept for this article came from a recent video on the popular tech YouTube channel Linus Tech Tips. Specifically, the host (Linus Sebastian) shows how a popular image sharing service named LightShot allowed anyone to access any picture given an identifier constructed using 2 letters of the alphabet, followed by any combination of 4 digits.
Note: This isn't actually correct, and any combination of digits and letters is valid as long as it doesn't start with a 0 and it is less than 15 characters long
The homepage of LightShot
The cold hard truth
This fact in itself doesn't sound too bad: it's a public image sharing service after all.
What's actually frightening is that it took surprisingly little research to find out the truth: the identification system LightShot uses is not random at all, it is in fact sequential, but just so happens to use a numeric system that's pretty uncommon.
While we as humans are mostly used to numbers in base 10 and computers process information in base 2, many other formats exist: one of them is base 36, which encompasses both the 26 letters of the alphabet and the usual Arabic digits from 0 to 9 (hence why it's called base 36, because 26 + 10 = 36) in its representational form.
An example of base36 numbers
Scraping the data
Knowing this, writing a simple script that can iterate all the base36 IDs under 64 bits and download every image that ever existed on the platform was trivial. Not even CloudFlare was of any help, as libraries to bypass the so-called "Under Attack Mode" are readily available in the open source community, therefore making our little scraping endeavor almost invisible.
An interesting finding is that LightShot doesn't seem to like IDs starting with a zero, as those just redirect to the main page, but that was an easy one-line fix. Another quirk that I found is that LightShot is exploiting other image hosting services such as ImgUr and ImageShack to serve its content.
Correction: They did so in the past, but then changed it and they now serve them via image.prntscr.com
The script in action: it retrieves the "true" URL where the image is hosted (on the far-right) as well as the original ID (the one labeled 'x') on the LightShot platform. The 'i' value is the LightShot ID converted to a base 10 number
What's even more scary though, is the information that I found. As you may expect, a good chunk of it was just innocent screenshots: games, presentations, graphs, reports, homework, chats, stuff like that. What I also found however, was an unholy amount of confidential information such as credit card details, bitcoin wallets keys, banking credentials, nudes, gore, and much more: probably not the kind of stuff you'd want everyone to know about, right?
For good measure, I've attached a couple of probably very confidential screenshots I was able to find out in the wild (obviously stripped of the most sensitive information):
Some login credentials to a crypto trading platform and wallet addresses
A screenshot of a Telegram chat with banking credentials and full credit card details
This was achieved with only 57 lines of Python code and 3 open source libraries in around 2 hours, and I managed to download more than 20 thousand images with a simplicity that I can just describe as jaw dropping.
The kind of data I gathered in so little time is enough to ruin many people's lives— and the worst part is they did this themselves.
Drawing a conclusion from all of this isn't too hard then: think twice about what you post and especially where you do so, because your boyfriend may not be the only one enjoying your new tanga.
This should also be a lesson for software developers and engineers all over the world that using sequential identifiers in public services is a terrible idea and a recipe for disaster.
- Numpy - For its amazing
- Cloudscraper, which allowed me to bypass LightShot's cloudflare protection with ease
- Requests, for its no-nonsense and dead-simple API which made downloading images a breeze