FedCSIS 2020 Data Mining Competition

FedCSIS 2020 Challenge: Network Device Workload Prediction

FedCSIS 2020 Data Mining Challenge: Network Device Workload Prediction is the seventh data mining competition organized in association with Conference on Computer Science and Information Systems (https://fedcsis.org/). This time, the considered task is related to the monitoring of large IT infrastructures and the estimation of their resource allocation. The challenge is sponsored by EMCA Software and Polish Information Processing Society (PTI).


By this challenge, we want to  answer the question of whether it is possible to reliably predict workload-related characteristics of monitored devices, based on historical data gathered from such devices. This task is of paramount importance for IT and technical teams that can put their hands on a tool that allows them to manage the capacity of their infrastructure.

An additional difficulty within this challenge, and also the reason why it might be especially interesting for the data science community, arises from the fact that devices considered in the data are not uniform. In essence, logs cover readings from various types of hardware. Some of them are cross-dependent, as they are a part of the same IT system. Moreover, some devices have multiple interfaces for which the data is aggregated.

More details regarding the task and a description of the challenge data can be found in the Task description section (see: https://knowledgepit.ml/fedcsis20-challenge/)

As in previous years, a special session devoted to the competition will be held at the conference. We will invite authors of selected challenge reports to extend them for publication in the conference proceedings (after reviews by Organizing Committee members) and presentation at the conference. The papers will be indexed by the IEEE Digital Library and Web of Science. The invited teams will be chosen based on their final rank, innovativeness of their approach, and quality of the submitted report.

Authors of the top-ranked solutions (based on the final evaluation scores) will be awarded prizes funded by our sponsors:

First Prize: 1500 USD + one free FedCSIS'20 conference registration,
Second Prize: 1000 USD + one free FedCSIS'20 conference registration,
Third Prize: 500 USD + one free FedCSIS'20 conference registration.
The award ceremony will take place during the FedCSIS'20 conference. Please note that the winners will only be eligible for the money prizes only if their final score exceeds the baseline solution score by at least 10%.

For all additional details, see:

Energy Logserver on SEMAFOR 2020

Energy Logserver on SEMAFOR 2020

Energy Logserver continues the tradition from 2 years ago and this year we will also appear at SEMAFOR as one of the patrons of the event. We invite everyone to take part in our lecture, which will be led by EMCA CEO, Artur Bicki. The topic we will face this year is SIEM from Elasticsearch.

The lecture will be devoted to the issues of building the SIEM platform based on project components around Elasticsearch. Based on the Energy Logserver system, the functionalities of analyzing and handling security events will be presented. On a vivid example, we will present the possibilities of analyzing and correlating events from logs and network traffic, as well as managing detected incidents.
Let's meet on March 19 at 12:10.


SEMAFOR is one of the largest cyber security conferences in Poland. For years it has been a place where the most modern and best solutions in the field of IT security are presented. Participants can not only gain extremely valuable knowledge straight from global experts, but also establish partner and business relationships.

The two-day event will be held in Warsaw on March 19-20. Start at 8 am!


Data leak – over billion people affected (PDL / OXY)

On October 16th 2019 two cybersecurity experts – Bob Diachenko and Vinny Troia discovered unsecured elasticsearch environment. Sadly, this is not unique. Open-Source Elasticsearch does not have security mechanisms on its own and allowing access from Internet is always a bad idea.

Turns out that elaticsearch had huge amount of personalized data indexed, to be precise – 4 terabytes huge. Company who is owning elasticsearch database is unknown, but it seems that gathered data is or were owned by People Data Labs (PDL) company and OxyData.io.

Most of the data was unusually valuable, as data was enriched. Meaning that data stored in those indices was previously correlated before from multiple smaller pieces, to create one rich document. That enriched data is then product of information and is sold by companies like PDL and OxyData. Data that was inside documents covers e-mail adresses, phone numbers, personal data, profiles data from LinkedIn and Facebook. To put that in some numbers, data had:

  • PDL
    • 1,2 billion unique data
    • 650 million e-mail adresses
  • OxyData
    • 380 million unique data, mainly from LinkedIn

The question is – how to know if the data is true and up to date? Luckyly PDL offers 1k queries per month free to their database. So such queries were send and actual data received from PDL were 100% accurate with data within elasticsearch indices. Data were the same.

Both of companies, PDL and OxyData, stands that there were no hacking attack, and source of that data was via customers, who bought the data. It’s hard to call hacking or breach, when all you need to do is put in your browser .

Of course adress and port is unavailable right now 🙂

That is the reason why you should never use unsecured elasticsearch for production data processing. It is important to point, that elasticsearch is not to blame for this breach, but  lack of security, such as those which are offered by Energy Logserver.