What is the overhead for innodb_buffer_pool_size in InnoDB? By overhead I mean the amount of memory used in Innodb that is determined by the size of the buffer pool. I ignore memory allocations that are determined by other my.cnf variables.
The overhead after startup is ~8% of innodb_buffer_pool_size or 1462 bytes per 16 KB page on a 64-bit Linux 2.6 server. The overhead at startup is ~21% or 3537 bytes per page because of a temporary large allocation used for crash recovery. The overhead for early 5.0 versions (<= 5.0.30 I think) is described by Percona. Domas has also written about this. And Peter Z got this right in his post on basic optimization, but I wanted to provide a few more details.
This lists the large allocations that depend on the size of the InnoDB buffer pool. They were measured when innodb_buffer_pool_size=12GB. The overhead excluding crash recovery code that runs at startup is 1462 bytes per 16kb page.
- 12 GB - buf_pool->frame_mem - memory to cache disk pages
- 708 MB - (944 bytes / 16kb page) - buf_pool->blocks - one instance of buf_block_t for each cached disk page. The size of the struct is 560 bytes and 3 os_event_struct (120 bytes each) instances are allocated separately for each buf_block_t. Assuming the 120 byte allocation is rounded up to 128 bytes, then the overhead for this is (560 + 3*128) == 944 bytes.
- 6 MB - (8 bytes / 16kb page) - buf_pool->blocks_of_frames
- 12 MB - (16 bytes / 16kb page) - buf_pool->page_hash
- 195 MB- (259 bytes / 16kb page) - btr_search_sys->hash_index
- 30 MB - (40 bytes / 16kb page) - lock_sys->rec_hash
- 146 MB - (195 bytes / 16kb page) - dict_init
- 1556 MB - (2075 bytes / 16kb page) - recv_sys_init - this is a temporary allocation used at startup and then deallocated.
This limit assumes that no memory is used by the rest of MySQL. You should use less than the limit to save space for the process image, thread stacks, sort areas, in-memory tables used for group by and order by processing, the MyISAM key cache, the OS buffer cache for the binlog and the OS buffer cache for MyISAM data blocks. Don't forget that InnoDB will do a large (0.12 * innodb_buffer_pool_size) temporary allocation at startup to run crash recovery.
The per buffer pool page overhead is high. Can it be lower? There is one buf_block_t instance per page in the buffer pool and:
- each buf_block_t has an InnoDB mutex_t and an InnoDB rw_lock_t.
- each rw_lock_t has a mutex_t.
- each rw_lock_t and mutex_t have an os_event_t (condition variable).
- each mutex_t has a pthread_mutex_t.
The overhead can be reduced to 2% by using 64kb pages instead of 16kb pages. This might be a good thing to based on the 5 minute rule. You have to change UNIV_PAGE_SIZE and UNIV_PAGE_SIZE_SHIFT, build MySQL and then run a lot of tests to get this. To get mysql-test to pass you need to increase the default values for innodb_log_file_size and innodb_buffer_pool_size -- I multiplied them by 4, and you need to fix an assertion in srv0start.
The size of buf_block_t would be smaller if the fields used to wake and sleep threads were moved from buf_block_t into the thread control block and other changes were made to the synchronization code that I will not describe here. InnoDB doesn't really have a thread control block. For threads handling user connections, trx_struct can be used. This save about 1/3 of the overhead. buf_block_t can be at least 56 bytes smaller by using integer rather than long integer for some fields and removing at least one pointer from non-debug builds.
InnoDB synchronization could be simplified to use one pthread_cond_t per mutex_struct and one per rw_lock_struct in place of os_event_t. This would save ~200 bytes per page.
rw_lock_struct and mutex_struct use long integers for some fields for which integers are sufficient. This would save 16 bytes for mutex_lock_struct and 32 bytes for rw_lock_struct. It would save 64 bytes per buf_block_t
The InnoDB adaptive search index is ~20% of the overhead or 1.6% of innodb_buffer_pool_size.
The InnoDB data dictionary is ~10% of the overhead or 0.8% of innodb_buffer_pool_size. I don't think the size of the InnoDB data dictionary needs to grow linearly with the size of the InnoDB buffer cache.