mdbm_big_data_builder.pl¶

SYNOPSIS¶

mdbm_big_data_builder.pl [-c] [-d dbsize] [–delete-intermediate-files] [-D] [-h|–help] [–input-directory directory] [–input-prefix prefix] [-l] [-mdbm name] [-n|–num-buckets bucket-count] [–nodelete-intermediate-files] [–output-directory directory] [-p pagesize] [-s|–hash-function hash-function] [-S store-flag] [-y maxpages] [-z spillsize] input-files ...

DESCRIPTION¶

mdbm_big_data_builder.pl should be used for speeding up the build for a bigger-than-physical-memory MDBM. mdbm_big_data_builder.pl takes specified source data files and builds a V3 MDBM. Multiple input files may be specified for creating a new V3 database. A n existing V3 database may be used to add more data to it. An mdbm is built in a virtual-memory restricted manner. Only a contiguous range of virtual memory is constructed at a time to avoid paging/swapping.

OPTIONS¶

-c

This is the cdb format flag, input data files will be cdb format.

-d dbsize

Used by mdbm_import to presize the database. Initializing an mdbm to the expected final size will avoid dynamically growing that mdbm, and reduce build time.

--delete-intermediate-files

Delete the buckets when done. Default is to keep them in the output directory. This is a negatable option (i.e., –nodelete-intermediate-files).

-D

Delete keys flag, used by mdbm_import.

-h, --help

Shows usage of the script then exits.

--input-directory directory

The directory containing input data files in CDB or db_dump format.

--input-prefix prefix

This is the prefix of name of input data files, used to identify what files are the source input data files.

-l

This is the large object support flag, used by mdbm_import.

--mdbm name

This is the name of the mdbm file to be created or if it already exists, have more data added to it.

--nodelete-intermediate-files

Do not delete the buckets from the output directory when done. This is the default.

-n, --num-buckets bucket-count

Number of buckets to use when splitting the source input data

--output-directory directory

This is a REQUIRED parameter. It specifies the directory used to output the bucket files and new database.

-p pagesize

This option is used by mdbm_import when creating a new mdbm.

-s, --hash-function hash-function

This is a REQUIRED parameter. The valid values can be either a number or the associated name (case is ignored) as follows:

hash-function	Name
0	CRC32
1	EJB
2	PHONG
3	OZ
4	TOREK
5	FNV32
6	STL
7	MD5
8	SHA1
9	Jenkins
10	Hsieh

-S store-flag

This option is used by mdbm_import.

Flag	Option	Description
0	MDBM_INSERT	Stores will fail if there is an existing key
1 [Default]	MDBM_REPLACE	Stores will replace an existing entry, or create a new one
2	MDBM_INSERT_DUP	Stores will add duplicate entries for an existing key
3	MDBM_MODIFY	Stores will fail if an existing key does not exist

-y maxpages

Used by mdbm_import

-z spillsize

Used by mdbm_import

RETURN VALUE¶

Returns 0 upon success, non-zero upon failure.

EXAMPLES¶

Create a new V3 DB and load it with the data from cdb_dump format source files in the input directory /tmp/cdb:

mdbm_big_data_builder.pl -id /tmp/cdb -if cdb -s 5 -d 2 -S 2 -mdbm saturn.mdbm -od /tmp/saturnoutput -c -nb 10

Load an existing V3 DB with data form db_dump format source files:

mdbm_big_data_builder.pl -id /tmp/db_dump -mdbm saturn.mdbm -od /tmp/saturnoutput -ip bigdata -s 5 dbdumpdata1 dbdumpdata2

CONTACT¶

mdbm-users <mdbm-users@yahoo-inc.com>

mdbm_big_data_builder.pl¶

SYNOPSIS¶

DESCRIPTION¶

OPTIONS¶

RETURN VALUE¶

EXAMPLES¶

SEE ALSO¶

CONTACT¶

Table Of Contents

Previous topic

Next topic

This Page

Navigation

mdbm_big_data_builder.pl¶

SYNOPSIS¶

DESCRIPTION¶

OPTIONS¶

RETURN VALUE¶

EXAMPLES¶

SEE ALSO¶

CONTACT¶

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Navigation