src/leveldb/README.md

   1 **LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.**
   2
   3 Authors: Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)
   4
   5 # Features
   6   * Keys and values are arbitrary byte arrays.
   7   * Data is stored sorted by key.
   8   * Callers can provide a custom comparison function to override the sort order.
   9   * The basic operations are `Put(key,value)`, `Get(key)`, `Delete(key)`.
  10   * Multiple changes can be made in one atomic batch.
  11   * Users can create a transient snapshot to get a consistent view of data.
  12   * Forward and backward iteration is supported over the data.
  13   * Data is automatically compressed using the [Snappy compression library](http://code.google.com/p/snappy).
  14   * External activity (file system operations etc.) is relayed through a virtual interface so users can customize the operating system interactions.
  15   * [Detailed documentation](http://htmlpreview.github.io/?https://github.com/google/leveldb/blob/master/doc/index.html) about how to use the library is included with the source code.
  16
  17
  18 # Limitations
  19   * This is not a SQL database.  It does not have a relational data model, it does not support SQL queries, and it has no support for indexes.
  20   * Only a single process (possibly multi-threaded) can access a particular database at a time.
  21   * There is no client-server support builtin to the library.  An application that needs such support will have to wrap their own server around the library.
  22
  23 # Performance
  24
  25 Here is a performance report (with explanations) from the run of the
  26 included db_bench program.  The results are somewhat noisy, but should
  27 be enough to get a ballpark performance estimate.
  28
  29 ## Setup
  30
  31 We use a database with a million entries.  Each entry has a 16 byte
  32 key, and a 100 byte value.  Values used by the benchmark compress to
  33 about half their original size.
  34
  35     LevelDB:    version 1.1
  36     Date:       Sun May  1 12:11:26 2011
  37     CPU:        4 x Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz
  38     CPUCache:   4096 KB
  39     Keys:       16 bytes each
  40     Values:     100 bytes each (50 bytes after compression)
  41     Entries:    1000000
  42     Raw Size:   110.6 MB (estimated)
  43     File Size:  62.9 MB (estimated)
  44
  45 ## Write performance
  46
  47 The "fill" benchmarks create a brand new database, in either
  48 sequential, or random order.  The "fillsync" benchmark flushes data
  49 from the operating system to the disk after every operation; the other
  50 write operations leave the data sitting in the operating system buffer
  51 cache for a while.  The "overwrite" benchmark does random writes that
  52 update existing keys in the database.
  53
  54     fillseq      :       1.765 micros/op;   62.7 MB/s
  55     fillsync     :     268.409 micros/op;    0.4 MB/s (10000 ops)
  56     fillrandom   :       2.460 micros/op;   45.0 MB/s
  57     overwrite    :       2.380 micros/op;   46.5 MB/s
  58
  59 Each "op" above corresponds to a write of a single key/value pair.
  60 I.e., a random write benchmark goes at approximately 400,000 writes per second.
  61
  62 Each "fillsync" operation costs much less (0.3 millisecond)
  63 than a disk seek (typically 10 milliseconds).  We suspect that this is
  64 because the hard disk itself is buffering the update in its memory and
  65 responding before the data has been written to the platter.  This may
  66 or may not be safe based on whether or not the hard disk has enough
  67 power to save its memory in the event of a power failure.
  68
  69 ## Read performance
  70
  71 We list the performance of reading sequentially in both the forward
  72 and reverse direction, and also the performance of a random lookup.
  73 Note that the database created by the benchmark is quite small.
  74 Therefore the report characterizes the performance of leveldb when the
  75 working set fits in memory.  The cost of reading a piece of data that
  76 is not present in the operating system buffer cache will be dominated
  77 by the one or two disk seeks needed to fetch the data from disk.
  78 Write performance will be mostly unaffected by whether or not the
  79 working set fits in memory.
  80
  81     readrandom   :      16.677 micros/op;  (approximately 60,000 reads per second)
  82     readseq      :       0.476 micros/op;  232.3 MB/s
  83     readreverse  :       0.724 micros/op;  152.9 MB/s
  84
  85 LevelDB compacts its underlying storage data in the background to
  86 improve read performance.  The results listed above were done
  87 immediately after a lot of random writes.  The results after
  88 compactions (which are usually triggered automatically) are better.
  89
  90     readrandom   :      11.602 micros/op;  (approximately 85,000 reads per second)
  91     readseq      :       0.423 micros/op;  261.8 MB/s
  92     readreverse  :       0.663 micros/op;  166.9 MB/s
  93
  94 Some of the high cost of reads comes from repeated decompression of blocks
  95 read from disk.  If we supply enough cache to the leveldb so it can hold the
  96 uncompressed blocks in memory, the read performance improves again:
  97
  98     readrandom   :       9.775 micros/op;  (approximately 100,000 reads per second before compaction)
  99     readrandom   :       5.215 micros/op;  (approximately 190,000 reads per second after compaction)
 100
 101 ## Repository contents
 102
 103 See doc/index.html for more explanation. See doc/impl.html for a brief overview of the implementation.
 104
 105 The public interface is in include/*.h.  Callers should not include or
 106 rely on the details of any other header files in this package.  Those
 107 internal APIs may be changed without warning.
 108
 109 Guide to header files:
 110
 111 * **include/db.h**: Main interface to the DB: Start here
 112
 113 * **include/options.h**: Control over the behavior of an entire database,
 114 and also control over the behavior of individual reads and writes.
 115
 116 * **include/comparator.h**: Abstraction for user-specified comparison function.
 117 If you want just bytewise comparison of keys, you can use the default
 118 comparator, but clients can write their own comparator implementations if they
 119 want custom ordering (e.g. to handle different character encodings, etc.)
 120
 121 * **include/iterator.h**: Interface for iterating over data. You can get
 122 an iterator from a DB object.
 123
 124 * **include/write_batch.h**: Interface for atomically applying multiple
 125 updates to a database.
 126
 127 * **include/slice.h**: A simple module for maintaining a pointer and a
 128 length into some other byte array.
 129
 130 * **include/status.h**: Status is returned from many of the public interfaces
 131 and is used to report success and various kinds of errors.
 132
 133 * **include/env.h**:
 134 Abstraction of the OS environment.  A posix implementation of this interface is
 135 in util/env_posix.cc
 136
 137 * **include/table.h, include/table_builder.h**: Lower-level modules that most
 138 clients probably won't use directly