A hands-on tutorial for real-time analytics on MongoDB using Rockset

Share

NoSQL databases, like MongoDB, scale more easily, are adept at handling less structured data and can simplify development in many respects. There are also a number of reasons why you may need a real-time analytics system to complement your NoSQL database:

  • Better performance on analytical queries
  • More complex analytics, including SQL aggregations and joins
  • Tap into the ecosystem of SQL users and SQL tools
  • Protect the performance of your primary database by moving analytics to a separate system

In this tutorial, we will create a sample app to analyze weather and pollution data, following these steps:

  1. Connect Rockset to MongoDB Atlas
  2. Continuously write weather and air quality data into MongoDB
  3. Formulate a query in Rockset on the real-time data coming from MongoDB
  4. Create an API based on the query we wrote
  5. Execute the API from our app

Some basics:

SEE ALSO: Schema Performance Tuning in MongoDB

Setting Up MongoDB

  1. After you’ve created an account on MongoDB Atlas, navigate to Clusters and click on Collections then click on Create database. We are going to create a weather_pollution_db and it’s going to have a weather_data collection.
  2. Under the weather_pollution_db, there is going to be a plus sign. Click on the plus sign, and add another collection, air_pollution_data.
  3. Navigate to Database Access → Custom Roles and click on Add New Custom Role named rockset-role. Add the following Actions or roles to the newly created rockset-role: find, changeStream, collStats, and enter the names of the database and collections we have just created.
  4. Navigate to the Database Users tab and click on Add New Database User. Create a new user named rockset-user and attach rockset-role to it. Remember the password here, because we will need to use it again, when we create an integration on Rockset.
  5. Create another database user that has Atlas admin privileges. We will be using this database user in our Python app. You can name this user yourName-admin. Make sure you remember the password here, because we will use this in our Python app.
  6. Navigate to the Network Access click on Add IP Address and whitelist these IPs:
    • 50.112.53.111/32
    • 35.160.119.116/32
    • 54.214.107.166/32
    • Your current IP address that you will be working from. Mine is not shown below, but make sure you add yours. This will be used for our Python app that we will build shortly.
  7. Go back to Clusters and click on Connect and click on Connect your application. Copy the string, because we will use it in Rockset. When we are in our Rockset account, the username is rockset-user and the password is the password you used when you created rockset-user. In our Python app, the username will be yourName-admin and the password associated with that database user.

Building Our Python App

  1. Create a project folder, and in that project, create a file .env.
  2. In the .env file add this:
    • Mongo_URI=“<uri string>”
    • The “<uri string>” is your connection string from MongoDB. Make sure you replace the username and password in the connection string with yourName-admin and the password you used when you created that database user.
    • It should look something like this:
      • mongodb://nadine-role-admin:password….
  3. If you use a virtualenv go ahead activate a env for this project. I personally use Pyenv, but feel free to use whatever you want!
  4. Install python-dotenv:
    • $ pip install python-dotenv
  5. Install pymongo and dnspython:
    • $ pip install pymongo
    • ○ $ pip install dnspython==1.16.0
  6. Inside our project folder, go ahead and create settings.py This file should look like this: https://gist.github.com/nfarah86/f87a9d37f1f72bb2d4a73d9b73dc87b4#file-settings-py
  7. Create another file in the project folder called mongo_config.py. It should look like this: https://gist.github.com/nfarah86/1fc7bc9987d27edbec0fa9b32be95163#file-mongo_config-py
  8. In the project folder, go ahead and create and run a file called script.py. All we’re going to do is make sure our Python app is connecting to MongoDB: https://gist.github.com/nfarah86/4d8e87ff6e70e1da1c017e80b8daeef2#file-script-py
  9. Under Clusters, click on the collections button. Go to weather_pollution_db and click on weather_data. You should see the entry you inserted.
  10. Now that we know we can insert data into MongoDB, let’s go ahead and create a ClimaCell developer account and get an API KEY.
    • Go ahead and put your API_KEY in the .env file:
      https://gist.github.com/nfarah86/118155308a4adae8a5ca037ffd99e7c4#file-env
  11. In settings.py go ahead and add this:
    • ○ CLIMACELL_API_KEY = os.environ.get('CLIMACELL_API_KEY')
  12. I chose ClimaCell because they offer realtime data for weather and air pollution. We’re going to work with this API. They have different parameters that can be added to the request. You can explore those here.
  13. In our project folder go ahead and pip install a few libraries:
    • $ pip install requests
    • $ pip install timeloop
  14. In script.py go ahead modify the packages we’re going to use: https://gist.github.com/nfarah86/a49cbaa033239c636ef4f3bbe1dca2d0#file-script-py
    • Timeloop a library that can run jobs at designated intervals
  15. Keep insert_to_mongo() and add this function in script.py to get the weather data: https://gist.github.com/nfarah86/d2e3cc9236547e2fa630fd368dfee994#file-script-py
    • That lat and lon correspond to Beijing.
  16. Now, we’re going to add this function to get the air quality: https://gist.github.com/nfarah86/c598dbea0274d43215f15c9f01eca672#file-script-py
  17. We’ll modify insert_to_mongo() to look like this: https://gist.github.com/nfarah86/e43f4ad2d8f7e3ca4b8d761408bc853c#file-script-py
  18. To make sure we’re running continuously, write this: https://gist.github.com/nfarah86/959d875ad5ffcc08e16e3bf25358385a#file-script-py
  19. 19. After, write main() like this: https://gist.github.com/nfarah86/831e295b663aceb93603d9986c815b43#file-script-py
  20. Here’s a gist of what your script.py should look like:
    https://gist.github.com/nfarah86/85caee5b14639e238e34715094cc5436#file-script-py
  21. Now, run:
    • $ python script.py to populate MongoDB.
  22. While the script is running, let’s get started on Rockset.

Getting Started on Rockset

  1. Login to Rockset and navigate to the Integrations tab on the left. Click on Add Integration. Click on MongoDB and click on start.
  2. Check the first box MongoDB Atlas. We’re going to name this integration Rockset-Mongo-Integration. For the username and password, go ahead and put rockset-user and the password you use when you created this database user. Paste the connection string in the next box and click on Save Integration.
      • Each integration can be used to access multiple databases and collections in the same MongoDB cluster

  3. Under Collections click on Create Collection. Select MongoDB as the source.
  4. Click on the rockset-mongo-integration.
  5. We’re going to name our new collection on Rockset weather_data_collection. This is not tied to MongoDB. Go ahead and fill out the rest of the page with the database and collection we created on MongoDB. We’re going to add 2 collections, but let’s start with the weather_data from MongoDB.
    • You see, Rockset is able to connect to MongoDB. You can verify what data will be ingested into the Rockset collection on the right-hand side. When you’ve created a collection and running a data-driven app in real-time, Rockset will continuously sync with MongoDB so your data can have the latest information.
    • Let’s click Create at the bottom.
  6. Follow the same steps 3-5 to create air_pollution_data_collection

SEE ALSO: NoSQL and automated databases – “DBAs can be the thought leaders”

Construct a Query on Rockset

  1. On the left bar, let’s navigate to the Query Editor.
  2. On the tab if we write:
    • Select * from commons.air_pollution_data_collection; we should see some output:
  3. We’re going to write this sample query:
    https://gist.github.com/nfarah86/2d9c5bc316d55cfd0fcf17b4ded9141f#file-average_pm_10_data-sql
    • We’re averaging the PM10 data and the weather temperature data. We’re going to join both of these collections based on the date.
  4. Run the query.
  5. After we run this query, we want to embed it in our app.

Build an API from Our Query

  1. On the top corner, click on Create Query Lambda. A Query Lambda is a way to make an API endpoint to the SQL query you write.
  2. Give your Query Lambda a name and description. Afterwards, you should see some code snippets on the next screen.
  3. Let’s navigate back on the Query Editor and write another query to get current weather in a new tab. Sometimes we may get a null field, so let’s go ahead and write this in the Query Editor: https://gist.github.com/nfarah86/4581c6bc09d30045ae75a5f330a8d72f#file-current_weather-sql
  4. Create another new Query Lambda.
  5. If we want to grab the code snippet or URL of the Query Lambdas we just created, navigate on the left side menu to Query Lambda and click on the lambda you created.

Execute APIs on our app

  1. Once you create a Query Lambda, you’ll see something like this:
  2. There are two ways I’ll show how we can execute a lambda:
    • Make an HTTP Request
    • Rockset’s Python client
  3. Make an HTTP Request:
    • Let’s go ahead and make an HTTP request to get the current_weather data. Here are the steps to do this:
  4. Use the Rockset Client to send a request:
  5. Then, we’re going to display the result: https://gist.github.com/nfarah86/a0d1e15319bc117ef55ce35187fb6480#file-script-py
  6. We’re going to change sample_job_every_120s()to add make_requests so we can execute the Query Lambdas and display the data: https://gist.github.com/nfarah86/0a54e082c9026aa5c9940b24836d9c65#file-script-py
  7. Write make_requests() so it looks like this:
    https://gist.github.com/nfarah86/dea06329b25887bb58a0ef74c4a12fb0#file-script-py
  8. After you run the script, you should see this: https://gist.github.com/nfarah86/32b35bd3269fbd1701dc57252fa783e4#file-results-txt
  9. That’s it! This wraps it up for the MongoDB-Rockset Python app. You can find the full project, including the SQL statements here.

The post A hands-on tutorial for real-time analytics on MongoDB using Rockset appeared first on JAXenter.

Source : JAXenter