Data leak – over billion people affected (PDL / OXY)

On October 16th 2019 two cybersecurity experts – Bob Diachenko and Vinny Troia discovered unsecured elasticsearch environment. Sadly, this is not unique. Open-Source Elasticsearch does not have security mechanisms on its own and allowing access from Internet is always a bad idea.

Turns out that elaticsearch had huge amount of personalized data indexed, to be precise – 4 terabytes huge. Company who is owning elasticsearch database is unknown, but it seems that gathered data is or were owned by People Data Labs (PDL) company and OxyData.io.

Most of the data was unusually valuable, as data was enriched. Meaning that data stored in those indices was previously correlated before from multiple smaller pieces, to create one rich document. That enriched data is then product of information and is sold by companies like PDL and OxyData. Data that was inside documents covers e-mail adresses, phone numbers, personal data, profiles data from LinkedIn and Facebook. To put that in some numbers, data had:

  • PDL
    • 1,2 billion unique data
    • 650 million e-mail adresses
  • OxyData
    • 380 million unique data, mainly from LinkedIn

The question is – how to know if the data is true and up to date? Luckyly PDL offers 1k queries per month free to their database. So such queries were send and actual data received from PDL were 100% accurate with data within elasticsearch indices. Data were the same.

Both of companies, PDL and OxyData, stands that there were no hacking attack, and source of that data was via customers, who bought the data. It’s hard to call hacking or breach, when all you need to do is put in your browser http://35.199.58.125:9200 .

Of course adress and port is unavailable right now 🙂

That is the reason why you should never use unsecured elasticsearch for production data processing. It is important to point, that elasticsearch is not to blame for this breach, but  lack of security, such as those which are offered by Energy Logserver.