The first step is to use the tcmalloc library. I use tcmalloc_minimal rather than tcmalloc and suggest you do the same. To use tcmalloc without rebuilding you may be able to set LD_PRELOAD_LIBRARY. I prefer to link against it. To do that, run configure and then build mysqld:
./configure --with-mysqld-ldflags=-L/path/to/google-perftools \
I confirm that this really linked tcmalloc by running 'nm mysqld | grep -i tcmalloc'. In some cases for me, it wasn't linking it because libtool used a command line with -lc prior to -ltcmalloc_minimal as a result of odd dependencies listed in the *.la files local to my environment used by libtool. So if something goes wrong, that may be the problem.
Google Perftools provides hierarchical profiling. If you aren't using DTrace or have not compiled your Linux kernel and all system libraries with sufficient flags to get hierarchical profiles from oProfile, then you need perftools. Details on it are here and here. Unlike tcmalloc, support for CPU profiling is not transparent. You need to call an exported function from libprofiler at thread start time. I have modified MySQL to do that for all threads started in mysqld.cc which includes some background threads and all user threads. I chose not to do this for the background threads started by InnoDB. Code and more details (see README.patch) are in my 5.0 branch revision 2702.
Finally, I provided access to a few features from perftools (see revision 2703 for the code):
- RELEASE MEMORY is a SQL command that returns cached memory from tcmalloc back to the OS
- SHOW MEMORY STATUS is a SQL command that prints the data returned from MallocExtension::GetStats. This is a lot of data on the state of memory allocated by tcmalloc.
- tcmalloc_max_thread_cache_size is a my.cnf parameter that sets the size of the memory cache shared by all threads.