What’s the coolest thing you’ve programmed with Python?
A Python library for data science that works like a pandas dataframe but can handle arbitrary data sizes.This is available as opensource software (https://github.com/omegaml/omegaml).
Problem: In-memory data analysis quickly becomes complex and has significant drawbacks when data > RAM
Pandas is a library for statistical data processing.It only works “in memory”, i.e. the data must be fully loaded in RAM. Advantage: very fast. Disadvantage: only works if the data is less than/equal to the RAM size. After that, it quickly becomes very complex and requires complex infrastructure.
Example – Calculate sales per product:
• df is a dataframe of all sales. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Solution “hybrid in-memory” – combination of pandas with horizontally scalable database (MongoDB).
With my library omega|ml [1 you can execute exactly the same statement, only that it is translated into MongoDB syntax and executed directly in the MongoDB cluster.
Thus, any data size is no longer a problem, the whole thing is still very fast.Sometimes omega|ml is even faster because the data doesn’t have to be loaded into memory first. Working together in a team with the same data is even easier as a side effect, because you no longer have to laboriously push CSV data.
It goes on – omega|ml is also the fastest way to turn a machine learning model into a REST API (takes less than 1 second)
Data retrieval is only a small but essential part of the library, which also makes such dataframes as well as machine learning models very easy to access via a REST API.
I described how it works in an article in Towards Data Science[2.
Opensource & available free of charge
omega|ml is available free of charge as open source[3, the part described above is called MDataFrame (M stands either for “Massive” or “MongoDB”, which I use as DB).
Footnotes
[1 omega|ml – enable machine learning in production