Frequently Asked Questions

This is a list of FAQs about MDBM. To suggest new entries, send mail to the mdbm-users group.

General

What is an MDBM?

MDBM is a fast in-memory hash-based key-value store.

What are the most common use cases for MDBM?

MDBM is commonly used for caching and storing static data for quick access.

How do I get started?

  • Build the package ‘make’ and install it ‘make install’

What are the different language bindings available for MDBM?

  • Bindings are included for C/C++, and perl.

General Problems

How do I report a problem with MDBM?

Why can’t I create a 2G MDBM on a 32-bit system?

In order to use a 2G MDBM, you’ll have to lower the data-size limit in your process (man limits) from the default (512MB) to something lower.

I have a corrupted mdbm, what can I do?

Often, MDBMs get corrupted due to an application not handling locking correctly. If you are doing reads and writes, you must open that MDBM with locking (default is exclusive locking). You must hold the lock for the duration that you are referencing a record (for all operations, including reads, writes, and deletes). For example, when doing a read, you would typically:

  1. mdbm_lock
  2. mdbm_fetch
  3. copy out the data pointed to by the returned fetched datum
  4. mdbm_unlock
  5. access your copied-out data

Another common way for MDBMs to become corrupt is for an application to mistakenly write (via a bad pointer) into the MDBM’s mapped space. There is an MDBM protect feature for debugging purposes to catch these problems. See MDBM data store protection to catch wild-pointer reads/writes for more information.

Try using mdbm_check to identify the extent of the damage. If there is major corruption, mdbm_check will abort.

Once an MDBM is corrupt, most of the time it’s not possible to tell how it got corrupted. You need to catch it in the act of being corrupted.

File System

Will MDBM perform well over NFS?

No. MDBM over NFS MUST BE AVOIDED at all costs.

It does not perform well at all. MDBM uses mmap(), and although the NFS driver does support the mmap() operation, it gets converted to regular block fetches and updates, so the performance sucks.

What’s the worst-case-scenario if I never call mdbm_close nor mdbm_sync (I just exit)? Can the database become corrupted? Can individual records become corrupted?

If you never call mdbm_sync and don’t pass either MDBM_O_ASYNC or MDBM_O_FSYNC to mdbm_open, the data will never be sync’d to disk (unless the system runs low on physical memory and starts swapping). If you reboot the system and open the database, it will be empty.

Is the whole file mapped to memory at any given time, or is there some sort of page fault/swapping that goes on?

The whole thing is mapped when you open the database, but individual pages are faulted in when they’re touched.

How often (if ever) is the backing file sync’d to disk?

Never, by default. If you want sync’ing, you either have to use mdbm_sync to manually sync or specify MDBM_O_ASYNC when opening to enable background sync’ing by the kernel syncing process.

There’s a flag for a memory-only database without a backing file. Is that correct?

There is a private internal flag which is not part of the public API to signify a memory-only MDBM. To create a memory-only MDBM, specify NULL for the mdbm_open file argument.

Memory only MDBMs must be initialized as a fixed-sized MDBM because there is no way to handle dynamic-growth size changes across processes. This means that you must specify a presize to mdbm_open. You must also use mdbm_limit_size_v3 with max_page equivalent to the presize.

Why is my MDBM file modified time not being updated after I store something?

Simply, mod time does not get updated on mmap’d writes. MDBM writes are simply writing to memory; they are not directly doing file-based operations which would affect a file’s modified time.

For a high-performance store (ex., when you want to do 50K-100K writes/sec), it wouldn’t be reasonable to update the modified time for each write.

Under what circumstances can a process or system crash cause loss of data in an MDBM?

A system crash will lose data unless something has synced the MDBM. By default, nothing does that.

If mdbm_sync is called, am I guaranteed that the MDBM will be corruption-free if there is a subsequent crash, even if it’s missing some updates subsequent to the sync?

That’s tricky. If the MDBM page size doesn’t match the OS page size, then it’s possible that VM pressure might cause only part of of a database page to get synced to disk. That would corrupt that page.

mdbm_sync itself isn’t foolproof either because it doesn’t lock the MDBM and does a background sync. mdbm_fsync is better for integrity in that it locks the database and uses a synchronous fsync. The downside, of course, is that the database is locked until all the dirty pages are flushed.

What kind of overhead would I expect from using MDBM_O_ASYNC?

I infer that the system sync process, which flushes data to disk every 30 seconds, would also write the mmap-ed changed pages to disk every 30 seconds, so the worst case performance would be the time it takes to write the data, amortized over the time.

But, does the sync process lock the pages (this could be important if we’re doing very high data rates on the MDBM - for example 100Ks/sec)?

Yes, the sync locks the pages, so if you touch a page while it’s being flushed, you’ll block. I haven’t looked at this closely in a while, but I also recall that a sync results in a page fault when you touch a page for the first time after it’s been synced. It’s a quick fault (no disk access), but it still hurts a bit.

Does mdbm_close do an implicit flush to disk?

mdbm_close only syncs if the MDBM was opened with MDBM_O_FSYNC. mdbm_close itself won’t cause any flushing.

Does the MDBM file on the disk have the latest updates to the key-value pairs?

In general, unless you use mdbm_sync, or use mdbm_open with MDBM_O_FSYNC (or O_FSYNC in earlier MDBM versions), your data will probably not be written to disk by mdbm_close.

On FreeBSD, the mmap’d file that holds the MDBM is not sync’d to the physical disk unless mdbm_sync is used or during a normal system shutdown when all dirty file data gets sync’d to disk.

On RHEL, modifications to the mmap’d file are background-sync’d to disk after 30 seconds for files that are on a normal file-system mount. However, MDBMs that are hosted on a tmpfs file-system are not sync’d (and are also not preserved across a system reboot).

Can I copy an MDBM file from one machine to another?

Normally this is a bad idea, it is recommended to use mdbm_export at the source machine to obtain a portable file and then perform an mdbm_import on this file to get the MDBM on the destination machine. The mdbm_copy command is also available. Neither mdbm_export nor mdbm_copy guarantee data consistency, since calls to mdbm_store that store related data can occur mid-copy.

Sizes and limitations

My MDBM says it is 4G in size, will I need more RAM?

MDBM is a sparse file when large object mode is enabled. Use mdbm_stat to view the actual allocated size of the database (and Large Object Store).

Why has my MDBM dynamically grown to be huge? I don’t have nearly that much data.

You have a data-sensitive problem. If you are using duplicate keys, or you have a pathological dataset, some of your pages are filling up too soon. When there is no more room on a page, an MDBM will grow to a maximum limit. When your MDBM can no longer grow, an attempted store to an full page will return an error.

There are only few knobs to turn, in priority order:

  1. Enable large objects (only settable at create time) if you have a small percentage (<5%) of objects that are significantly bigger than the others.
    • The v2 implementation will create a 4GB file size because large objects are stored at a 4GB file offset and below. It’s a sparse file so only the necessary pages on disk are used for storing data.
  2. Increase your page size (only settable at create time)
    • This might decrease performance because many more keys might need to be compared on a page to determine whether your lookup key exists. If there are few keys/page this won’t be significant. If there are many (100+) keys/page it might be noticeable. If you have a lot of lookups where the key doesn’t exist, this hurts performance because it has to compare every key on the page.
  3. Try another hash function (only settable at create time)
    • If your key is a string, try the Jenkins hash function
    • If you key is binary, try CRC32, SHA1, MD5 (probably in that order, YMMV)
  4. Use mdbm_open and mdbm_limit_size to set the initial and maximum MDBM size when the file is created.
    • This will create a flatter internal btree which might help distribute your data more uniformly. This might help you reduce that number of nearly full pages where a store operation would fail due to lack of space (this is your real problem as opposed to a large sparse MDBM). A consequence is that the actual number of pages used on disk might be higher, but that’s probably a good trade-off.
  5. If you really don’t have a good idea of the final size of your MDBM (as needed in the previous option), use mdbm_open with an initialize size with your best guess.

If option 1 does not work, then you might need a combination of the other.

MDBM is a hashed key-value store, so changing your hash function or page size changes how your data is distributed between MDBM pages. If the hash function you chose happens to parcel out too many keys into a single page, that page will split and MDBM’s file size will double. If you keep adding data that happens to hit the same page, the MDBM will keep splitting and file size will keep growing and growing.

Use mdbm_stat to look at your histogram data. You want to avoid having many pages that are nearly full when your MDBM close to its maximum size.

How do I control the amount of memory available to mmap?

In FreeBSD this can be controlled by using the kernel variable vm.max_proc_mmap, though it’s usually not necessary to tune this. 32-bit applications on FreeBSD trade-off space for malloc against space for mmap according to the data segment size limit. This is controlled at the kernel level using the kern.maxdsiz loader variable (FreeBSD 4) or compat.ia32.maxdsiz sysctl (FreeBSD 6/7). In addition, the process rlimit for data segment size can be used to lower the data segment size limit (and therefore make more room for mmaps).

How do I determine how my MDBM is mapped into memory?

MDBM v3 mmaps an entire MDBM file into memory. Simply mapping an MDBM does not make it memory resident. Although a file’s size on disk might be quite large, the sparse file structure will only bring pages containing data into memory when they are referenced (ex., fetch, store, or delete operation). The mdbm_preload routine may be used to make an MDBM memory resident.

On RHEL, you can review a process’ mapped regions and associated files via cat /proc/pid/smaps.

I’m using the MDBM within PHP that runs within yapache, given that each yapache child runs as a process, will each process mmap the MDBM separately?

Usually, the individual processes will map the MDBM separately (because the MDBM is opened after the child has been forked from the parent), but they will all be sharing the same physical RAM mapping for that file.

Can I use MDBMs in two or more machines in a cluster mode by connecting them?

MDBM can’t do this out of the box. Explore YDBM or DISC-GDS.

Iteration

How do I initialize an MDBM iterator?

The MDBM_ITER_INIT() initializes an iterator.

While iterating across an entire database, am I guaranteed to see all key-values present in the database when the iteration starts, if deletes occur during the iteration? What if inserts or overwrites occur during iteration?

Deleting items will not affect iteration, assuming you only delete items you’ve already iterated over.

If you lock the database; and begin an iteration, you will see all key-values. Deletion of some key-values will not interfere with this, as long as you remove a key-value you’ve already iterated over.

If you started an iteration; and removed a key you knew was in the database but hadn’t iterated over yet; the iteration would not return the (now deleted) key-value pair, even though it was in the database when the iteration began.

Overwriting depends on what you’re doing. If you’re just fetching the value pointer and rewriting in-place, that’s safe. If you’re replacing the value with a different size, that may cause garbage collection, which may cause your iterator to miss records.

Inserting records may also trigger garbage collection, which may cause your iterator to miss records.

Locking

Do I need to use locking if I’m only doing read access and using mdbm_replace?

If you have a read-only MDBM (there are no store/delete operations) in a single-threaded application, you do not need to lock. This is because the access operations are smart enough to check for replacement and to acquire an internal lock.

However, if you use mdbm_replace in a multi-threaded application, you do need to lock around fetches. A future enhancement will remove this locking requirement for multi-threaded applications.

When should I use mdbm_lock?

When two or more processes are reading and writing to the same MDBM. mdbm_lock is used by a process reading or writing to obtain exclusive access.

There doesn’t appear to be a distinction between read locks and write locks. Is that correct?

For exclusive locks, that’s correct.

Is there any mechanism for allowing multiple readers and one writer (MROW) that doesn’t have the readers block each other?

MDBM V3 has shared locks (sometimes called read-write locks).

Are the lock requests FIFO?

No, locks are scheduled according to process priority.

Why doesn’t mdbm_fetch automatically lock?

mdbm_fetch doesn’t lock so that an application can take greater control over locking, and the corresponding performance in a few ways:

  • mdbm_fetch doesn’t copy-out the data. An application could lock, fetch, look at that pointer’s data contents, and unlock. In some situations, this can be much faster than lock, fetch, copy-out, unlock, and look at returned contents.
  • If you are willing trade off latency for higher throughput: locking, doing multiple fetches (copy-out or not), and unlock, you could achieve higher throughput. This approach is application and data dependent.
  • If you have a master record and dependent records (specified in that master record), your app might require that accesses to the master record and the dependent records be done in a single locked context. Otherwise, dependent records could be deleted or be modified, which could be incompatible with the master record.

mdbm_fetch_str locks, does a copy-out of the value, and unlocks. This, however, is only for string data. mdbm_fetch_buf also locks while copying into the provided buffer.

With MDBM V4, I’m getting the following error message: multiple different lockfiles exist

If you’re seeing the following error message when opening an MDBM:

mdbm_lock.cc:68 YourFile.mdbm: multiple different lockfiles exist! : No such file or directory ERROR (2 No such file or directory) in mdbm_open_inner() mdbm.c:3817

Then this is what is likely happening: someone has opened YourFile.mdbm using a 32 bit process and you are using 64-bit, or vice versa. Make sure any tools you are using match your executables (bin vs bin64).

Performance

If we have many little structures to store (possibly smaller than 64 bytes, keyed by registered user), how should we tune for that? (page-size?)

Many little structures work best. It’s bigger structures that create problems. You should try different page sizes to see what performs better. 8K or 16K are probably good starting points.

Are there are guidelines for tuning MDBM, or is it more of trial and error?

It’s mostly trial-and-error, but try to use the smallest page size that will fit the dataset without causing page overflows (a page overflow happens when a key to be inserted hashes to a page that’s already full and the database can’t be split because it’s already too big).

Use larger page sizes when key+value size is larger, smaller page sizes when key+value size is smaller. Larger page sizes are slower because effectively the hash buckets take longer to locate a specific key. In V3, however, this was significantly speeded up.

Also, if you know you don’t have duplicate keys (or don’t care if they get inserted), you can avoid the lookup that occurs on insert by use the MDBM_INSERT_DUP flags. That’ll speed things up even more.

There is a new mdbm_config tool that will help you select MDBM configuration parameters for your dataset.

What should be the ratio between main memory size and MDBM size (in order to maintain its performance)?

MDBM expects all of its data pages to be in physical memory. MDBM databases grow in power’s of 2, and not all the pages mapped necessarily have data on them. The mdbm_stat utility can analyze a database and show the various efficiencies (how full the pages are, how many non-empty pages there are).

Why is building an offline MDBM slow, and my resulting file is highly fragmented?

If you are building offline and you have a known maximum size of the MDBM:
  • Create the MDBM with the initial size set to the final size
  • Use mdbm_limit_size_v3 to ensure that MDBM doesn’t split in the future
  • Make sure that your physical memory is larger than the resulting MDBM

Setting the initial size to the final size will avoid MDBM splits, which also avoids the latency incurred during the split. The resulting MDBM directory will also have fewer levels (enabling faster lookups).

If the resulting MDBM is highly fragmented, you probably have highly a fragmented disk. Either use a non-fragmented disk, or use a ramdisk to build the MDBM, then copy the MDBM out of the ramdisk.

How can I speed up fetches in my read-only web application?

Frequently, web applications open an MDBM on a per-user-request basis. This is a very bad idea because each open can take several hundred (or more) microseconds. The best thing that you can do to improve your performance is to open the MDBM once at an application level, not at a per-request level. For a single (unlocked) memory-resident lookup, it should take ~4 microseconds on standard hardware.

Not only is an open call comparatively slow to a fetch, but concurrent calls to mdbm_open are single-threaded when it creates the locks for the first time (very slow) and then it maps the shared MDBM into process address space. After initial lock creation, subsequent mdbm_open calls will also be single-threaded as it creates state in each new handle and maps the shared MDBM into process address space.

If there are no writes taking place on the MDBM, and you are not using mdbm_replace, you can disable all locking overhead by specifying MDBM_OPEN_NOLOCK in your mdbm_open flags. This avoids creating the mutexes used for locking during mdbm_open.

Fetching a non-existent key is slower than fetching a key that exists. Non-existent keys require checking all keys on a page before determining that a key does not exist. In this regard, MDBM V3 format files are faster than MDBM V2 due to some saved key hash data.

The number of key comparisons will influence your fetch time. Large pages typically have more keys/data, thus have a slower lookup time. Using a smaller page size can get better performance.

Don’t use large objects, they are a little slower to reference.

Data Access and Management

Does MDBM automatically do garbage collection?

No. MDBM doesn’t do this for you. Data-specific garbage collection can be implemented using the “shake” function that is registered by using mdbm_limit_size_v3.

What is the upper limit size for MDBM, before its performance degrades?

Even at maximum size, the design of MDBM requires only two database pages to be accessed for any single fetch. If your system’s RAM available to MDBMs cannot contain your “working set” of MDBM data, your performance will degrade.

How much extra empty storage does an MDBM require?

It depends on the hash collision rate of the keys. The better the hashed key distribution is, the less likely a leaf page is to be filled and require a premature database split.

Is it possible to shrink an MDBM without rebuilding the entire database?

It’s sometimes possible that mdbm_compress will be able to shrink the database because it rebalances the tree.

How does mdbm_store with flag MDBM_MODIFY work?

When using mdbm_store with flag MDBM_MODIFY to change an existing record, what happens if the new record is of the same size as the original record? What happens if the new record is of a different size than the original?

If the new record is of the same size as the original, the data is replaced but the location in memory stays the same. If the new record is of a different size (larger or smaller) than the original, the original record is deleted and a new one is inserted, so the location in memory may change.

When do I use mdbm_popen vs. mdbm_open in PHP?

Usage might depend on the frequency of access. For example a PHP script running in yapache context that accesses the MDBM during all/most of the requests, then it should do a mdbm_popen.

Can I use mdbm_fetch without taking out a lock, to just check for key existence?

The short answer is that you must always take out a lock for a fetch operation if there are concurrent writes (store or delete operations).

Here is the reasoning on why you must lock around mdbm_fetch if there are concurrent writes.

  • An unlocked mdbm_fetch cannot guarantee a coherent use of a key’s fields for offset and size. An intervening write might produce a mismatched offset and/or size for a key.
  • Depending on your hardware architecture for memory byte-ordering of read/write access and atomicity of access, reading a field (ex., offset) as it’s being written could generate an undefined result. The operations that write key meta-data information are not atomic operations. For x86 architectures, this might not be an issue under some situations (ex., 2 or 4-byte writes with aligned access). This issue requires additional investigation. A further consideration is that the underlying access for mdbm_fetch is doing a 3-byte read.

If you have concurrent writes, and you do not lock an mdbm_fetch, the following consequences are possible, but (highly) unlikely:

  • False-positives – mdbm_fetch could indicate that a key is present when it is not

  • False-negatives – mdbm_fetch could indicate that a key is not present when it is

  • SEGV – if a key resolves to an off-page address, and that page has not been mapped, it is an access violation. Some related issues:

    • If you do get a crash, it will be very difficult to determine the overall reason for the crash. An invalid address access will be obvious, but there will be insufficient information to develop a scenario to explain the crash. The bad access generated by mdbm_fetch is transient, and not deterministically reproducible.
    • If a crash happens while an mdbm_store operation is taking place, the meta-data on the page could become corrupted. An mdbm_store can result in shuffling the meta-data on the page (to make room for the store) which could result in a partial update.

Miscellaneous

Is MDBM thread-safe?

MDBM V2 is not thread-safe in any context. There are known problems with mdbm_replace, as well as other unqualified issues.

MDBM V3 is thread-safe if used in a specific way:
  • Only 1 thread may use an MDBM handle at a time.
  • If an application needs concurrent MDBM access, then additional MDBM handles are required

It is up to the app to decide how to ensure that a handle may be used by only 1 thread at a time. Handles contains state, and you cannot use a handle concurrently across threads even if you are only doing fetches (reads). An mdbm handle is a context object, and as is frequently done in reentrant programming, you pass a context object to a reentrant routine for that routine to read/write thread-specific state information so that it does not to use external (global) state.

If you have a multi-threaded app, you will probably want to use dup handles to avoid remapping the same MDBM multiple times (use mdbm_dup_handle to create a new handle instead of mdbm_open).

There are various approaches for using MDBM handles in a threaded context. For high-performance applications, it’s recommended to use thread-local storage (TLS) containing an MDBM handle. Alternatively, it would be possible to create a pool of MDBM handles, but that would require a lock to acquire a handle, which might have an unwanted impact on latency.

Can I open an MDBM, then use it in forked child processes?

No, you cannot expect MDBM handles (MDBM *) to be valid across fork() calls. You will have to call mdbm_open() in the child process after forking that child process. The MDBM handle contains data items such as lock counts that cannot be expected to remain consistent when copied to another process’s memory space.

How do I know which processes are using a particular MDBM?

lsof | grep yourMdbmFile

Table Of Contents

Previous topic

mdbm_config

Next topic

Restrictions

This Page