Data Science Infrastructure

Our team has been participating in the RoboCup competiton since 1998. Multiple types of data have been collected since 2009 and stored in an unstructured way. While being abundant, these data could not be manipulated easily, and was exclusively accessible to team members. This project will unlock the data potential by storing it in a structured database where multiple types of tools and operations can now be implemented.

What has been done

The robots collect data during each game and write this in a binary file. Additional data like configurations files and the git commit of the source code that was running on the robot is also collected. This raw data is already published at https://logs.naoth.de/

The data is structured according to events and games. This however is not done consistently nor is the structure documented.

What needs to be done?

  • put the data into a consistent file structure
  • document the file structure
  • setup a database
  • write code that parses the logs and configuration files and put the data in the db
  • make the database accessible via the browser

How can this be achieved?

It's possible to start postgres on a server and expose this to the outside world. However some level of resilience is necessary. To handle restarts of the server without interuption of service we can utilize kubernetes. This is the de-facto standard in the business world today. There are already tools that make setting up kubernetes and postgres easy.