• QuizzaciousOtter@lemm.ee
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 months ago

    Is 600 MB a lot for pandas? Of course, CSV isn’t really optimal but I would’ve sworn pandas happily works with gigabytes of data.

    • tequinhu@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      It really depends on the machine that is running the code. Pandas will always have the entire thing loaded in memory, and while 600Mb is not a concern for our modern laptops running a single analysis at a time, it can get really messy if the person is not thinking about hardware limitations

    • MoonHawk@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      4 months ago

      What do you mean not optimal? This is quite literally the most popular format for any serious data handling and exchange. One byte per separator and newline is all you need. It is not compressed so allows you to stream as well. If you don’t need tree structure it is massively better than others

    • gigachad@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      I guess it’s more of a critique of how bad CSV is for storing large data than pandas being inefficient

  • Kausta@lemm.ee
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    You havent seen anything until you need to put a 4.2gb gzipped csv into a pandas dataframe, which works without any issues I should note.

    • thisfro@slrpnk.net
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      I raise you thousands of gzipped files (total > 20GB) combined into one dataframe. Frankly, my work laptop did not like it all that much. But most basic operations still worked fine tho