Flash devices are described in part by their write amplification factor (or WAF). When the OS writes a page once the device might write it more than once and this multiple is the write amplification factor. The WAF isn't always described in marketing and even if it were the value you get in production is workload dependent.
Variations of the log-structured merge tree have been used by many new storage servers including HBase, Bigtable, Cassandra and leveldb. These servers append changes (delete, insert, update) to the end of a file rather than in place. To find one row by key value with an LSM the server might have to read from from multiple files or multiple locations within one file to fine one. I have been calling this the read penalty because a workload is very likely to do more disk reads when using an LSM than when using an update-in-place engine but I think that read amplification factor (or RAF) might be a better phrase. If a workload does 100 disk reads on InnoDB and 120 disk reads on an LSM then the RAF is 1.2. The RAF matters even for many write-optimized servers because an update intensive workload requires many random disk reads. Although an LSM can avoid some of the reads when the update is a replace or when the operation is commutative and doesn't require an immediate result. For example an update that increments a row can log +1 when the request doesn't need to return the old value.
Many LSM implementations use a bloom filter to reduce the RAF. The bloom filter prevents some reads from files known not to have data for a given key. A bloom filter only works for point lookups. It cannot be used for a range scan and the RAF for a workload will be at its worst when you map a relational schema directly to HBase (1 row in InnoDB --> 1 row in HBase). Fortunately many of the LSM implementations support schemas in which more data is consolidated into one row and in many cases something that requires a range scan in a SQL RDBMS will use a point lookup in HBase.
There are new products (TokuDB, Acunu, maybe RethinkDB) that claim to be better than an LSM in part because their RAF is much closer to one for both point lookups and range scans. By closer to one I mean that there is (almost) no read penalty. This should be easy to verify with a production workload.
While there are very interesting performance models described in the literature I use a very simple one when considering the read amplification factor. In my model all levels of a tree-structured index are in RAM except for the lowest level. In this model a point lookup with an update-in-place DBMS does at most one disk read from an index leaf page excluding access to external/overflow pages for LOB columns and other special cases. For something that claims to be better than an update-in-place DBMS I want to know how many index leaf pages are read in the worst and average cases.