Just read a few at a time…
Is 600 MB a lot for pandas? Of course, CSV isn’t really optimal but I would’ve sworn pandas happily works with gigabytes of data.
It really depends on the machine that is running the code. Pandas will always have the entire thing loaded in memory, and while 600Mb is not a concern for our modern laptops running a single analysis at a time, it can get really messy if the person is not thinking about hardware limitations
What do you mean not optimal? This is quite literally the most popular format for any serious data handling and exchange. One byte per separator and newline is all you need. It is not compressed so allows you to stream as well. If you don’t need tree structure it is massively better than others
I guess it’s more of a critique of how bad CSV is for storing large data than pandas being inefficient
pola.rs enters the chat…
You havent seen anything until you need to put a 4.2gb gzipped csv into a pandas dataframe, which works without any issues I should note.
I raise you thousands of gzipped files (total > 20GB) combined into one dataframe. Frankly, my work laptop did not like it all that much. But most basic operations still worked fine tho