Machine learning is one area that cannot succeed without data. Traditionally, machine learning frameworks read it from CSV files or similar data sources. This brings an interesting set of challenges because in most cases the data is stored in databases, not simple raw files. It takes time and effort to move data from one format to another. Additionally, one needs to write some code (usually python) to prepare the data just like the ML framework expects it.
In January I wrote my first post about node.js and MariaDB . In February I continued with a second post about using jQuery and some GIS calculations. Now it is time for the third and this time the main focus is not so much on GIS functionality, but instead on the capabilities MariaDB has for handling piles of unstructured data. In this case I’ll be focusing on crunching a pile of XML files without importing the XML data itself.