You completed a MongoDB bulk load activity in our last activity. Now that your MongoDB database has some content, we will work through some activities to work with this set of data.
A Quick Introduction to JSON
JSON is a structured way to store information as opposed to free format HTML text. A well-known predecessor of JSON is XML. The reason that JSON has been gaining its popularity is that more and more information such as documents on the internet benefit from a structured organization, yet not restricted by relations and tables found in SQL based database.
A JSON object is a key-value list, almost like Python dictionary. Here is an example of a single JSON object.
Figure 1: A single JSON object
This JSON object has four pairs of key-value, address, borough, cuisine, and name. In the field of address, the value corresponding to the key is also a JSON object that contains three key-value pairs, building, street,, and zipcode. JSON keys can be any valid strings with a pair of surrounding quotes. The keys can even contain white spaces. To make the reading and programming a bit easier, it is best not to include white spaces, use underscores instead. JSON values can be one of the six simple data types [4]
- strings;
- numbers;
- objects;
- arrays;
- Boolean (true/false);
- null
A JSON file can contain a list of JSON objects, very much like a Python list that all individual JSON objects in the list are enclosed with a pair of square brackets '[' and ']'. Here is an example of three JSON object list.
Figure 2: A list of JSON objects
In the JSON file "small-dataset.json" you loaded in our last activity, for example, there are three JSON objects, each of which contains a set of information, similar to the one shown in Figure 2.
Read and Display JSON Objects
One of the advantages of using NoSQL database and objects such as JSON object to represent information in a NoSQL database is that it is easier for us to read the content of the database. However on the other hand, information represented in text can sometimes be very hard to be formatted properly for reading. If you examine the content of "small-dataset.json" using a text editor, you'd get a sense of this issue. Python has a library function called
pprint
(for pretty-printing) to read and display JSON objects (or any list-like objects). Download, read, and run this example Python program to see how pretty-printing works.Examine The Contents of MongoDB
You have bulk loaded a number of addresses into your MongoDB in our last exercise. Now you are to go into your MongoDB and run some queries to examine the content of your MongoDB. Refer to this list of commands to accomplish the following queries.
You first need to log into your MongoDB using the command line interface.
mongo --host eg-mongodb.bucknell.edu -u username -p --authenticationDatabase databbase database
Then try the following queries.
- List all collections (you should have only one Address at this point).
- Print all information in the Address collection.
- Find all restaurants that are in Manhattan.
- Find all restaurants that are bakery.
- Insert a new entry into the collection Address with the value of
{"address": {"building": "921", "coord": [-73.9691347, 40.6389857], "street": "Cortelyou Rd", "zipcode": "11218"}, "borough": "Brooklyn", "cuisine": "Other", "grades": [], "name": "Cold Press'D", "restaurant_id": "50018995"}Revisit Python Interface of MongoDB
In our last exercise, we used a Python program to bulk-load a collection of information Address into MongoDB. You are asked now to create a second collection using anything of your interest. You can use as a default the set of book data we used in our earlier exercise. Here is the books.csv we used in HW02. You may choose to just include a few books in your new collection.
- Open the books.csv file, select a few books to add to the collection.
- Convert the selected books into JSON format. You may use any mechanism to complete the conversion. This website does a good job of converting bulk data.
- Use the pymongo-load-db.py program to insert the books into a new collection called Books. Note that you must revise the Python program to reflect the fact that the new collection is not a part of the Address collection.
Once the new collection Books is in the database. Try the following queries.
- List all collections. Now you should have two collections.
- Print all information in the Books collection.
- Find all the books published in 2015.
- Find all the books published by Viking.
- Insert a new entry into the collection Books with the value of
{'Author First Name': 'Meg', 'Author Last Name': 'Wolitzer', 'Pub Year': 2018, 'Publisher': 'Riverhead Books', 'Title': 'The Female Persuasion', 'Translated or edited by': ''}Submission
Submit a screen capture of the two sets of exercises along with the modified Python program that bulk load the Books collection.
References
- AskLIT. Accessed 2018-04-18.
- MongoDB tutorial by TutorialPoint. Accessed 2018-04-18.
- MongoDB commonly-used command list
- An Introduction to JSON by DigitalOcean. Accessed 2018-04-19.
- A full list of MongoDB commands by MongoDB. Accessed 2018-04-20.
- read_json.py A Python example program to read and print JSON objects.