PostgreSQL isn't immune to this problem. It has a long history with vacuum and each release makes things better. But I haven't run either PostgreSQL or MongoDB in production for a busy OLTP workload so I don't know whether these would be a serious problem. Context matters in this case. I can read many things on the internet that tell me MongoDB compaction and PostgrSQL vacuum are never an issue or are always an issue. The hard part is figuring out whether that is true for the workload that you care about.
I do have experience with InnoDB. InnoDB purge has some things in common with PostgreSQL vacuum. It cleans up old versions required by MVCC when the old versions are no longer visible. For InnoDB this means it physically removes rows that were delete marked. Performance can slow to a crawl when purge gets too far behind and there are a few things that could be done to control that, but purge has never been routine maintenance. It just runs in the background. The controls for it are:
- avoid long-running transactions when purge lag is too big. These block purge.
- set innodb_max_purge_lag_delay to delay row changes when lag is growing
- use more than one purge thread. This arrived in MySQL 5.5 and Percona 5.1 and the Percona implementation was really, really useful for me. Thanks Percona.
- use innodb_purge_batch_size. I have no experience with this.
Even though purge is online InnoDB can still require maintenance. On servers that I care about 2/3 of the data growth rate was caused by fragmentation versus 1/3 from new data. Defragmentation is useful to reclaim the space wasted by fragmentation, especially when using pure-flash storage. The frequency in this case is a few times per year.
How do you defragment servers in production? If you are fortunate to have several replicas per master and spare capacity, then you can defragment a replica and then replace the master. The other options include using online DDL that is new in MySQL 5.6 to defragment secondary indexes. To defragment a primary index you probably need an online schema change framework like pt-online-schema change, oak-online-alter-table or Facebook OSC.
I think there is a better way to defragment InnoDB that is incremental and less invasive. But this is only a theory today. One of my talented peers is trying to figure this out and we hope to have good news and code to share in a few months.