Thursday, January 6, 2011

Getting Started with MongoDB

The name MongoDB comes from "humongous". Its most notable key-features are: FLOSS (Free/Libre Open Source Software), horizontally scalable, very fast, schema-less, document-oriented, written in C++ with native drivers to most programming languages out there e.g. C, C++, C# & .NET, ColdFusion, Erlang, Factor, Java, JavaScript, PHP, Python, Ruby, Perl, etc. belongs into the so called NoSQL category of DBMSs. The way it works is so that it manages collections (the equivalent to tables in RDBMSs) of JSON (JavaScript Object Notation) documents (the equivalent to rows in RDBMSs) which are stored in a binary format referred to as BSON (Binary JSON).

Following is a comparison of MySQL, Oracle, PostgreSQL (DBMSs) with MongoDB (NoSQL):

MySQL, PostgreSQL,...
- Database(s)
- Table(s)
- Row(s)
- Column(s)

- Index
- Join

- Database(s)
- Collection(s)
- Document(s)
- Field(s)

- Index
- embedding and linking

The concept of server, database and index are very similar but the concepts of table/collection, row/document as well as column/field are quite different.
In RDBMSs a table is a rectangle made of columns and rows. Each row has a fixed number of columns, if we add a new column; we add that column to each and every row.
In MongoDB a collection is more like a really big box and each document is like a little bag of stuff in that box. Each bag contains whatever it needs in a totally flexible manner (read schema-less).
Note that schema-less does not equal type-less i.e. it is just that with MongoDB any document has its own schema, which it may or may not share with other documents.
In practice it is normal to have the same schema for all the documents in a collection. The concept of a column in RDBMSs is closest to what we call a field (key/value pair) in MongoDB — note however what we said above: we can create/read/update/delete individual fields for a particular document in a collection. This is different from creating/reading/updating/deleting a column in RDBMSs, which happens for every row in the entire table.
Indexes are more or less the same for RDBMSs and MongoDB. Joins however do not exist in MongoDB but instead we can embed and/or link documents into/with other documents.
Good resources on GUIs can be found . Also, assuming a mongod process is running on localhost then we can access some statistics at http://localhost:28017/ and http://localhost:28017/_status.

The software implements a protocol called GridFS that is used to store and retrieve files from the database. This file storage mechanism has been used in plug-ins for NGINX and lighttpd.
MongoDB uses memory-mapped files, limiting data size to 2GB on 32-bit machines (64-bit systems have a much larger data size).

Getting Started:
Download MongoDB from link:
Unzip downloaded file and follow below instructions.

By default MongoDB will store data in \data\db, but it won't automatically create that folder, so we do so here:
C:\> mkdir \data
C:\> mkdir \data\db

The important binaries for a first run are:
• mongod.exe - the database server
• mongo.exe - the administrative shell
To run the database, click mongod.exe in Explorer, or run it from a CMD window.
Start the administrative shell, either by double-clicking mongo.exe in Explorer, or from the CMD prompt. By default mongo.exe connects to a mongod server running on localhost and uses the database named test. Run mongo --help to see other options.

By now MongoDB should be running. You can connect to it by using following Java samples. Download Java driver from link:

m = new Mongo();

// Two other ways to connect MongoDB running on localhost
// Mongo m = new Mongo( "localhost" );
// Mongo m = new Mongo( "localhost" , 27017 );

// Authentication (Optional), name and password are valid for the database then auth will be true
//boolean auth = db.authenticate(myUserName, myPassword);

//At this point, the db object will have a connection to MongoDB server for the specified database.
DB db = m.getDB("mydb");

//Getting A List Of Collections
Set colls = db.getCollectionNames();

for (String s : colls) {

//Once you have this collection object, you can now do things like insert data, query for data, etc
DBCollection coll = db.getCollection("testCollection");

* Example to insert a document in following format:
* {
* "name" : "MongoDB",
* "type" : "database",
* "count" : 1, "info" : { x : 203, y : 102 }
* }
BasicDBObject doc = new BasicDBObject();

doc.put("name", "MongoDB");
doc.put("type", "database");
doc.put("count", 1);

BasicDBObject info = new BasicDBObject();

info.put("x", 203);
info.put("y", 102);

doc.put("info", info);


//findOne() operation to get the first document in the collection
DBObject myDoc = coll.findOne();

Advantages of using MongoDB:
MongoDB combines the functionality of key-value stores - which are fast and highly scalable - and traditional RDBMS systems - which provide rich queries and deep functionality. It is designed for problems that are difficult to be solved by traditional RDBMSs, for example databases spanning many servers.
The database is document-oriented so it manages collections of JSON-like documents. Many applications can, thus, model data in a more natural way, as data can be nested in complex hierarchies and still be query-able and indexable.
Right now (October 2010) we can use MongoDB from at least C, C++, C# & .NET, ColdFusion, Erlang, Factor, Java, JavaScript, PHP, Python, Ruby, Perl. Of course, there might be more languages available in the future.
Scalability, fault tolerance and load balancing. Big words! Sounds scary, sounds hard, sounds expensive. All true, except when armed with MongoDB.
Ad hoc queries
In MongoDB, any field can be queried at any time. MongoDB supports range queries, regular expression searches, and other special types of queries in addition to exactly matching fields. Queries can also include user-defined JavaScript functions (if the function returns true, the document matches).
Queries can return specific fields of documents (instead of the entire document), as well as sorting, skipping, and limiting results.
Querying nested fields
Queries can "reach into" embedded objects and arrays. If the following object is inserted into the users collection:
"username" : "bob",
"address" : {
"street" : "123 Main Street",
"city" : "Springfield",
"state" : "NY"

We can query for this document (and all documents with an address in New York) with:
> db.users.find({"address.state" : "NY"})
Array elements can also be queried:
>{"fruit" : ["peach", "pear", "plum"]})
>{"fruit" : "pear"})

When not to use MongoDB:
1. We need strict transactional behavior with any query/write (read ACID) as for example often required with applications/problems in the financial/scientific domain. However, please note that for ordinary use cases the level of ACID provided by MongoDB is by and large sufficient.
2. Our data is very relational. In this case one should just stick to one of the many RDBMSs (Relational Database Management Systems) out there.
3. Related to 2, we want to be able to do joins on the server (but can not do embedded objects/arrays).
4. We need triggers on our tables (called collections in MongoDB parlance) — note: there might be triggers available soon.
5. Related to 4, we rely on triggers (or similar functionality) to do cascading updates or deletes. As for #4, this issue probably goes away once triggers are available.
6. We need the database to enforce referential integrity (MongoDB has no notion of this at all).
7. If we currently (October 2010) need 100% per node durability — there will be major improvements coming with version 1.8 however. Go here for more information.
8. Write ahead log. MongoDB currently (October 2010) does not have one simply because it does not need one — we believe true durability can only be achieved with a cluster setup and never with a single node (this is true for any DBMS used). However, since there is great demand/desire for improved per node durability, the decision has been made to improve per node durability with version 1.8 or sooner.
9. Crystal reports is an example of a type of use that MongoDB is not good at: Dynamic aggregation with ad-hoc queries. These reporting systems (business intelligence) require being able to aggregate and apply mathematical expression to multiple joined sets of data (like across collections). This is something that MongoDB can not handle very well, or at all. Data warehousing, large graph representations (with efficient traversal) and many other types of data, and analysis just do not fit well into the restrictions and choices MongoDB has made, but unlike most of those, reporting is a more generic need that is not well supported.

Useful tips:
1. Use 64-bit version. 32-bit version has a limit on about 2.5GB of data stored. Yeah, it’s probably enough for playing around. But when you start configuring your production (or staging) system, remember to choose 64bit flavor, since you can’t just “fix” that later on, you’ll have to reinstall everything.
2. _id and _ns elements have been added automatically by MongoDB to your document. Remember, MongoDB reserves element names that start with _ for internal use.
3. “Ctrl+C“ is a valid way to shutdown MongoDB.

Issues faced:
1. Exception in initAndListen std::exception: dbpath (/data/db/) does not exist, terminating
Solution: I could start it only with exact dbpath argument on Win - >mongod.exe --dbpath c:\data\db

1. MongoDB Road Map:
2. Helpful link:
3. Beginner’s tips:
4. Tutorial:
5. MongoDB Java driver usage tutorial:
6. Videos:
7. WIKI:
a. MongoDB downloads:
9. Drivers Downloads:
10. Books:
11. Recently added features:
12. Java Api Docs:

No comments:

Post a Comment