A video showcase of our project can be found here.
We are all familiar with the word SQL and the terminology related to it. Lately, a new concept has surfaced named NoSQL.
Databases under this classification provide a data retrieval and storage mechanism modeled different than the traditional tabular approach. NoSQL covers a lot of different types of database engines enumerating column databases, key/value stores, graph databases and document databases. Memcached or Redis are great examples of key-value store engines where data is manipulated by means of keys. Column databases are praised for their capacity of processing large amounts of data, most popular being Cassandra and Hadoop. Relations between entities are modeled by graph databases such as Neo4js. Document databases is represented by MongoDB and CouchDB.
MongoDB Core Concepts
MongoDB is schemaless. The smallest unit in a document database is the document. Documents are similar, up until a certain point, to rows in a SQL table. While a row must adhere to the structure of its table, a document in MongoDB can have a dynamic structure (i.e. two documents can have different fields, and the same field can have different datatypes). Another important feature: a document can contain arrays and / or sub-documents. MongoDB groups document objects into collections and multiple collections make up a database.
MongoDB stores structured data in JSON like documents. Internally MongoDB stores its data in BSON, or “Binary JSON”, format. Why BSON? One of the reasons is because it’s fast scannable. Considering JSON is just a string, in order to find a specific key you need to scan every single character in that string, keeping track of the level of nesting, until you found that specific key. That could be tons of data that needs to be scanned. BSON, however, stores the length of values. Therefore in order to find that specific key you can just skip past values and read the next key.
MongoDB takes a different approach when providing functionality and scaling than relational databases. MongoDB offers horizontal scalability and high availability out of the box through sharding and replication. But due to the way these features operate, joins and transactions are not feasible. Atomicity in MongoDB is only obtained at the level of a single document. At first sight, not allowing operations between collections (joins) may seem like a major drawback. However with the support of arrays and sub-documents we can accomplish the same tasks as in relational databases.
MongoDB replication
Replication offers a wide variety of benefits, most notable being high availability and redundancy. With multiple data copies spread across multiple machines, in the event of a hardware failure, we can ensure that our application remains available. Secondly, read performance can be greatly improved once replication is used to divide data reads across multiple machines. Data copies can be maintained in different data centers to boost locality and availability for distributed applications.
MongoDB handles replication in a way similar to the master-slave configuration concept. A group of mongod instances hosting the same data set forms a replica set. One mongod – primary, is used as the base for applying changes to secondary members. There are multiple types of replica set members.
Primary member: The primary handles all write operations from clients. All changes to primary data sets are stored in the oplog. A replica set can have only one primary.
Secondary members: Multiple secondary members may exist in a replica set. Secondary members replicate the primary’s oplog and apply changes to their own data sets. By default applications will query the primary for read operations, however you can configure clients to send read operations to secondaries. If the primary is unavailable, one of the secondary members will be elected.
Arbiter: An arbiter does not have a copy of the data set and is an optional member. An arbiter solely purpose is to act as a tie-breaker in elections. Replica sets can therefore have an uneven number of members without the overhead of a member that replicates data.
Our project
The pace of change in technology is frantic with software becoming more and more complex, constantly offering new ways to tackle different problems.
While we put a strongly emphasis on delivering, we are also adopters of a learning culture by encouraging developers to learn new technologies, and to improve their own skills.
We conduct some rigorous research on a technology to assess whether we include it in our technology stack. We do this every time we find something of interest. This is how we stay on the top of our game. Depending on the technology evaluated we have a framework of standard tests to provide a base point of comparison between different solutions, while crafting individual ones to test special features. On results analysis we try to deduct the best use-case for that particular technology and how it would provide value to our clients.
When designing scalable, reliable systems one has to consider ways to manage the resulting infrastructure. There are some great tools out there that provide powerful automation. MongoDB documentation does a great job at describing all administration procedures involved in replication and sharding, but there is a lack of management platforms that allow central administration of these features or tools to support live testing of various scenarios/operations.
We managed to build high-performance systems at scale using MongoDB. It is paramount to plan for infrastructure wide monitoring and understand the importance of automation when devising an application architecture. Building our custom monitoring solution for MongoDB did not pose any great challenges. We were left with tackling the problem of automation. We then wanted to see what would entail building a MongoDB replica set and sharding management platform. And this is how we started a mongodb replica set manager project called Mongo-Board!
Mongo-Board overview
Mongo-Board is an project that provides a graphical user interface to MongoDB replica set management and core-processes operations. Mongo-Board allows users to connect to multiple remote servers, start/stop mongod instances, deploy them into a replica set and perform member configuration / maintenance operations. Users can view a log of all commands executed on each instance/server in order to fulfill the required action. This project is a great asset to test in a real environment various replica set use-cases.
Mongo-Board is build upon a standalone core library that actually provides automation over MongoDB replica set administrative operations. The Mongo-Board UI is a showcase in the form of an educational tool of its underlying library possible usage.
Mongo-Board is an entirely PHP based project.
Mongo-Board core library
Below I will provide a basic summary of the Mongo-Board underlying library API and its components.
SSH
Replica set various operations implies starting/restarting members core database process database process mongod. Therefore building a mini MongoDB process management library was required. The first step towards that was to create a wrapper component over ssh which would allow us to connect to remote servers where the intended mongodb processes will reside.
MongoBoard uses the PHP SSH2 extension to accomplish this task. Multiple authentication handlers are available like Password or PublicKeyFile.
For brevity we will remove all php starting tags from examples. This is how you can connect to a remote server password based.
$authentication = array( "type" => "password", "username" => $username, "password" => $password ); $ssh = MongoBoard\SSH\SSHFactory::getSSH2Instance($host, $port, $authentication);
Process manager
We have process wrappers over mongod, mongodump and mongorestore. Currently these only supported on Linux.
//mongod process wrapper operations $processOptions = array( 'dbpath' => $fullMongodDbPath, 'port' => $uniquePort, 'fork' => '', 'logpath' => $logPath ); $mongodProcess1 = new MongodProcessManager($ssh, $processOptions); $mongodProcess1->startProcess(); $mongodProcess1->stopProcess(); $mongodProcess1->restartProcess(); $mongodProcess2 = new MongodProcessManager($ssh, array("config" => "/etc/mongodb/rs-test/sv2.conf","port" => 27026)); $mongodProcess3 = new MongodProcessManager($ssh, array("config" => "/etc/mongodb/rs-test/sv3.conf","port" => 27027)); $mongodProcess4 = new MongodProcessManager($ssh, array("config" => "/etc/mongodb/rs-test/sv4.conf","port" => 27028)); //also a process group manager exists $mongodGroupManager = new ProcessGroupManager(); $mongodGroupManager->addProcessManager($mongodProcess1); $mongodGroupManager->addProcessManager($mongodProcess2); $mongodGroupManager->addProcessManager($mongodProcess3); $mongodGroupManager->startProcess();
Member configuration
//used in getting a replica set manager instance if the replica set is already deployed $mongoConnectionToPrimary = null; $replSetInstance = new MongoBoard\ReplicaSetManager\ReplicaSetManager($mongoConnectionToPrimary, $replSetName); //$mongodGroupManager a mongod ProcessGroupManager //will start mongod processes accordingly and deploy the replica set $replSetInstance->deployReplicaSet($processGroupManager, $replSetName); //add a new member to an existing replica set $replSetInstance->addMember($mongodProcess4); //add a new arbiter to an existing replica set $replSetInstance->addArbiter($mongodProcess4); //remove from replica set $replSetInstance->removeMember("192.168.0.123:27018");
Mongo Autodiscovery
The core library contains a mongo auto-discovery component.
$mongoDiscovery = new MongoBoard\MongoDiscovery\MongoSrvDiscovery($ssh); //will contain data about each mongod instance residing on the server $data = $mongoDiscovery->getMongodSrvOverview(); //The mongo autodiscovery component provides various helpers to allow a factory create a replica set instance $replicaSetInstance = SomeReplicaSetFactory::getMemberReplicaSetManagerInstance($host, $mongoPort);
Adjust Priority for Replica Set Member
//change member’s priority value to 2 //will cause an election $replicaSetInstance->setMemberConfOption("192.168.0.123:27018", "priority", 2); //prevent secondary from becoming primary by assigning priority 0 $replicaSetInstance->setMemberConfOption("192.168.0.123:27019", "priority", 0);
Configure a Hidden Replica Set Member
//change member’s priority and hidden option $options = array( "priority" => 0, "hidden" => true ); $replicaSetInstance->setMemberConfOptions("192.168.0.123:27018", $options);
Configure a Delayed Replica Set Member
$options = array( "priority" => 0, "hidden" => true, "slaveDelay" => 3600, ); $replicaSetInstance->setMemberConfOptions("192.168.0.123:27018", $options);
Configure a Non-Voting Replica Set Member
//set member votes value to 0 $replicaSetInstance->setMemberConfOption("192.168.0.123:27018", "votes", 0);
Convert a Secondary to an Arbiter
//set member votes value to 0 $replicaSetInstance->convertSecToArb("192.168.0.123:27018");
Replica set maintenance
Change Oplog Size
//oplog size for a replica set member $replicaSetInstance->changeOplogSizeOps("192.168.0.123:27018", $sizeInBytes); //oplog size for a standalone mongod $mongodProcess ->setOption("oplogSize", $sizeInBytes); //restart to take effect $mongodProcess->restart();
Resync a Member
$replicaSetInstance->autoResynchStaleMember("192.168.0.123:27018");
Force a member to become primary
//force member to become primary by setting its priority high $replicaSetInstance->forceMemberToPrimaryPriority("192.168.0.123:27018"); //force member to become primary using database commands$replicaSetInstance->forceMemberToPrimaryDbCmd("192.168.0.123:27018");
Configure a Secondary’s Sync Target
$replicaSetInstance->setSyncFrom($server, $targetServer);
Configure Replica Set Tag Sets
//sets a replica member tags $replicaSetInstance->setReplicaMemberTags($server, $tags);
Manage Chained Replication
$replicaSetInstance->allowChainingReplication(); $replicaSetInstance->disableChainingReplication();
Change Hostnames
$replicaSetInstance->changeHostname($oldHostname, $newMongodInstance);
Reconfigure replica set with unavailable members
$replicaSetInstance->reconfReplSetCaseUnav();
Troubleshoot Replica Set
//get replica set member status $replicaSetInstance->getReplicaMemberStatus($hostPort); //get replication info $replicaSetInstance->getReplicationInfo() //get replica set members optime $replicaSetInstance->getReplicaMembersOptime() //get replication lag for a member $replicaSetInstance->getReplicationLag($hostPort);