Introduction to the MongoDB Aggregation Framework
mongo "mongodb://cluster0-shard-00-00-jxeqq.mongodb.net:27017,cluster0-shard-00-01-jxeqq.mongodb.net:27017,cluster0-shard-00-02-jxeqq.mongodb.net:27017/aggregations?replicaSet=Cluster0-shard-0" --authenticationDatabase admin --ssl -u m121 -p aggregations --norc
Cluster0-shard-0:PRIMARY> show collections
air_airlines
air_alliances
air_routes
bronze_banking
customers
employees
exoplanets
gold_banking
icecream_data
movies
nycFacilities
silver_banking
solarSystem
stocks
system.views
mongosh "mongodb://cluster0-shard-00-00-jxeqq.mongodb.net:27017,cluster0-shard-00-01-jxeqq.mongodb.net:27017,cluster0-shard-00-02-jxeqq.mongodb.net:27017/aggregations?replicaSet=Cluster0-shard-0" --authenticationDatabase admin --ssl -u m121 -p aggregations --norc
The Concept of Pipelines
Which of the following is true about pipelines and the Aggregation Framework?
Chapter 0: Introduction and Aggregation Concepts
Aggregation Structure and Syntax
https://www.mongodb.com/docs/manual/meta/aggregation-quick-reference/
Which of the following statements is true?
$match: Filtering documents
https://www.mongodb.com/docs/manual/reference/operator/aggregation/match/?jmp=university
Which of the following is/are true of the $match stage?
Chapter 1: Basic Aggregation - $match and $project
Lab - $match
mongo "mongodb://cluster0-shard-00-00-jxeqq.mongodb.net:27017,cluster0-shard-00-01-jxeqq.mongodb.net:27017,cluster0-shard-00-02-jxeqq.mongodb.net:27017/aggregations?replicaSet=Cluster0-shard-0" --authenticationDatabase admin --ssl -u m121 -p aggregations --norc
mongosh "mongodb://cluster0-shard-00-00-jxeqq.mongodb.net:27017,cluster0-shard-00-01-jxeqq.mongodb.net:27017,cluster0-shard-00-02-jxeqq.mongodb.net:27017/aggregations?replicaSet=Cluster0-shard-0" --authenticationDatabase admin --ssl -u m121 -p aggregations --norc
Help MongoDB pick a movie our next movie night! Based on employee polling, we've decided that potential movies must meet the following criteria.
Assign the aggregation to a variable named pipeline, like:
var pipeline = [ { $match: { ... } } ]
As a hint, your aggregation should return 23 documents. You can verify this by typing db.movies.aggregate(pipeline).itcount()
Download the m121/chapter1.zip handout with this lab. Unzip the downloaded folder and copy its contents to the m121 directory.
Load validateLab1.js into mongo shell
load('validateLab1.js')
validateLab1(pipeline)
What is the answer?
-> 15
Chapter 1: Basic Aggregation - $match and $project
Shaping documents with $project
https://www.mongodb.com/docs/manual/reference/operator/aggregation/project/?jmp=university
Which of the following statements are true of the $project stage?
Our first movie night was a success. Unfortunately, our ISP called to let us know we're close to our bandwidth quota, but we need another movie recommendation!
Using the same $match stage from the previous lab, add a $project stage to only display the title and film rating (title and rated fields).
var pipeline = [{ $match: {. . .} }, { $project: { . . . } }]
load('./validateLab2.js')
validateLab2(pipeline)
What is the answer?
-> 15
Chapter 1: Basic Aggregation - $match and $project
Lab - Computing Fields
Our movies dataset has a lot of different documents, some with more convoluted titles than others. If we'd like to analyze our collection to find movie titles that are composed of only one word, we could fetch all the movies in the dataset and do some processing in a client application, but the Aggregation Framework allows us to do this on the server!
Using the Aggregation Framework, find a count of the number of movies that have a title composed of one word. To clarify, "Cinderella" and "3-25" should count, where as "Cast Away" would not.
Make sure you look into the $split String expression and the $size Array expression
To get the count, you can append itcount() to the end of your pipeline
db.movies.aggregate([...]).itcount()
-> 8066
Chapter 1: Basic Aggregation - $match and $project
Optional Lab - Expressions with $project
This lab will have you work with data within arrays, a common operation.
Specifically, one of the arrays you'll work with is writers, from the movies collection.
There are times when we want to make sure that the field is an array, and that it is not empty. We can do this within $match
{ $match: { writers: { $elemMatch: { $exists: true } } }
However, the entries within writers presents another problem. A good amount of entries in writers look something like the following, where the writer is attributed with their specific contribution
"writers" : [ "Vincenzo Cerami (story)", "Roberto Benigni (story)" ]
But the writer also appears in the cast array as "Roberto Benigni"!
Give it a look with the following query
db.movies.findOne({title: "Life Is Beautiful"}, { _id: 0, cast: 1, writers: 1})
This presents a problem, since comparing "Roberto Benigni" to "Roberto Benigni (story)" will definitely result in a difference.
Thankfully there is a powerful expression to help us, $map. $map lets us iterate over an array, element by element, performing some transformation on each element. The result of that transformation will be returned in the same place as the original element.
Within $map, the argument to input can be any expression as long as it resolves to an array. The argument to as is the name of the variable we want to use to refer to each element of the array when performing whatever logic we want. The field as is optional, and if omitted each element must be referred to as $this:: The argument to in is the expression that is applied to each element of the input array, referenced with the variable name specified in as, and prepending two dollar signs:
writers: {
$map: {
input: "$writers",
as: "writer",
in: "$$writer"
}
}
in is where the work is performed. Here, we use the $arrayElemAt expression, which takes two arguments, the array and the index of the element we want. We use the $split expression, splitting the values on " (".
If the string did not contain the pattern specified, the only modification is it is wrapped in an array, so $arrayElemAt will always work
writers: {
$map: {
input: "$writers",
as: "writer",
in: {
$arrayElemAt: [
{
$split: [ "$$writer", " (" ]
},
0
]
}
}
}
Let's find how many movies in our movies collection are a "labor of love", where the same person appears in cast, directors, and writers
Hint: You will need to use $setIntersection operator in the aggregation pipeline to find out the result.
Note that your dataset may have duplicate entries for some films. You do not need to count the duplicate entries.
To get a count after you have defined your pipeline, there are two simple methods.
// add the $count stage to the end of your pipeline
// you will learn about this stage shortly!
db.movies.aggregate([
{$stage1},
{$stage2},
{...$stageN},
{ $count: "labors of love" }
])
// or use itcount()
db.movies.aggregate([
{$stage1},
{$stage2},
{...$stageN}
]).itcount()
How many movies are "labors of love"?
-> 1596
$addFields and how it is similar to $project
https://www.mongodb.com/docs/manual/reference/operator/aggregation/addFields/?jmp=university
Chapter 2: Basic Aggregation - Utility Stages
geoNear Stage
https://www.mongodb.com/docs/manual/reference/operator/aggregation/geoNear/?jmp=university
Chapter 2: Basic Aggregation - Utility Stages
Cursor-like stages: Part 1
Chapter 2: Basic Aggregation - Utility Stages
Cursor-like stages: Part 2
Chapter 2: Basic Aggregation - Utility Stages
https://www.mongodb.com/docs/manual/reference/operator/aggregation/sample/?jmp=university
Chapter 2: Basic Aggregation - Utility Stages
Lab: Using Cursor-like Stages
MongoDB has another movie night scheduled. This time, we polled employees for their favorite actress or actor, and got these results
favorites = [
"Sandra Bullock",
"Tom Hanks",
"Julia Roberts",
"Kevin Spacey",
"George Clooney"]
For movies released in the USA with a tomatoes.viewer.rating greater than or equal to 3, calculate a new field called num_favs that represets how many favorites appear in the cast field of the movie.
Sort your results by num_favs, tomatoes.viewer.rating, and title, all in descending order.
What is the title of the 25th film in the aggregation result?
-> The Heat
Chapter 2: Basic Aggregation - Utility Stages
Lab - Bringing it all together
Calculate an average rating for each movie in our collection where English is an available language, the minimum imdb.rating is at least 1, the minimum imdb.votes is at least 1, and it was released in 1990 or after. You'll be required to rescale (or normalize) imdb.votes. The formula to rescale imdb.votes and calculate normalized_rating is included as a handout.
What film has the lowest normalized_rating?
-> The Christmas Tree
The $group Stage
https://www.mongodb.com/docs/manual/reference/operator/aggregation/group/?jmp=university
Chapter 3: Core Aggregation - Combining Information
Accumulator Stages with $project
https://www.mongodb.com/docs/manual/reference/operator/aggregation/#group-accumulator-operators
Chapter 3: Core Aggregation - Combining Information
Lab - $group and Accumulators
In the last lab, we calculated a normalized rating that required us to know what the minimum and maximum values for imdb.votes were. These values were found using the $group stage!
For all films that won at least 1 Oscar, calculate the standard deviation, highest, lowest, and average imdb.rating. Use the sample standard deviation expression.
HINT - All movies in the collection that won an Oscar begin with a string resembling one of the following in their awards field
Won 13 Oscars
Won 1 Oscar
Select the correct answer from the choices below. Numbers are truncated to 4 decimal places.
-> { "highest_rating" : 9.2, "lowest_rating" : 4.5, "average_rating" : 7.5270, "deviation" : 0.5988 }
Chapter 3: Core Aggregation - Combining Information
The $unwind Stage
https://www.mongodb.com/docs/manual/reference/operator/aggregation/unwind/
Chapter 3: Core Aggregation - Combining Information
Lab - $unwind
Let's use our increasing knowledge of the Aggregation Framework to explore our movies collection in more detail. We'd like to calculate how many movies every cast member has been in and get an average imdb.rating for each cast member.
What is the name, number of movies, and average rating (truncated to one decimal) for the cast member that has been in the most number of movies with English as an available language?
Provide the input in the following order and format
{ "_id": "First Last", "numFilms": 1, "average": 1.1 }
-> { "_id": "John Wayne", "numFilms": 107, "average": 6.4 }
Chapter 3: Core Aggregation - Combining Information
The $lookup Stage
https://www.mongodb.com/docs/manual/reference/operator/aggregation/lookup/
Chapter 3: Core Aggregation - Combining Information
The $lookup Stage
Which of the following statements is true about the $lookup stage?
Chapter 3: Core Aggregation - Combining Information
Lab - Using $lookup
Which alliance from air_alliances flies the most routes with either a Boeing 747 or an Airbus A380 (abbreviated 747 and 380 in air_routes)?
-> "SkyTeam"
Chapter 3: Core Aggregation - Combining Information
$graphLookup Introduction
Which of the following statements apply to $graphLookup operator? check all that apply
-> $graphLookup provides MongoDB a transitive closure implementation
-> Provides MongoDB with graph or graph-like capabilities
Chapter 3: Core Aggregation - Combining Information
$graphLookup: Simple Lookup
Which of the following statements is/are correct? Check all that apply.
Chapter 3: Core Aggregation - Combining Information
$graphLookup: Simple Lookup Reverse Schema
Chapter 3: Core Aggregation - Combining Information
$graphLookup: maxDepth and depthField
Chapter 3: Core Aggregation - Combining Information
$graphLookup: Cross Collection Lookup
Chapter 3: Core Aggregation - Combining Information
$graphLookup: General Considerations
Consider the following statement:
$graphLookup
is required to be the last element on the pipeline.
Which of the following is true about the statement?
Chapter 3: Core Aggregation - Combining Information
Lab: $graphLookup
Now that you have been introduced to $graphLookup, let's use it to solve an interesting need. You are working for a travel agency and would like to find routes for a client! For this exercise, we'll be using the air_airlines, air_alliances, and air_routes collections in the aggregations database.
The air_airlines collection will use the following schema:
{
"_id" : ObjectId("56e9b497732b6122f8790280"),
"airline" : 4,
"name" : "2 Sqn No 1 Elementary Flying Training School",
"alias" : "",
"iata" : "WYT",
"icao" : "",
"active" : "N",
"country" : "United Kingdom",
"base" : "HGH"
}
The air_routes collection will use this schema:
{
"_id" : ObjectId("56e9b39b732b6122f877fa31"),
"airline" : {
"id" : 410,
"name" : "Aerocondor",
"alias" : "2B",
"iata" : "ARD"
},
"src_airport" : "CEK",
"dst_airport" : "KZN",
"codeshare" : "",
"stops" : 0,
"airplane" : "CR2"
}
Finally, the air_alliances collection will show the airlines that are in each alliance, with this schema:
{
"_id" : ObjectId("581288b9f374076da2e36fe5"),
"name" : "Star Alliance",
"airlines" : [
"Air Canada",
"Adria Airways",
"Avianca",
"Scandinavian Airlines",
"All Nippon Airways",
"Brussels Airlines",
"Shenzhen Airlines",
"Air China",
"Air New Zealand",
"Asiana Airlines",
"Copa Airlines",
"Croatia Airlines",
"EgyptAir",
"TAP Portugal",
"United Airlines",
"Turkish Airlines",
"Swiss International Air Lines",
"Lufthansa",
"EVA Air",
"South African Airways",
"Singapore Airlines"
]
}
Determine the approach that satisfies the following question in the most efficient manner:
Find the list of all possible distinct destinations, with at most one layover, departing from the base airports of airlines from Germany, Spain or Canada that are part of the "OneWorld" alliance. Include both the destination and which airline services that location. As a small hint, you should find 158 destinations.
Select the correct pipeline from the following set of options:
db.air_alliances.aggregate([{
$match: { name: "OneWorld" }
}, {
$graphLookup: {
startWith: "$airlines",
from: "air_airlines",
connectFromField: "name",
connectToField: "name",
as: "airlines",
maxDepth: 0,
restrictSearchWithMatch: {
country: { $in: ["Germany", "Spain", "Canada"] }
}
}
}, {
$graphLookup: {
startWith: "$airlines.base",
from: "air_routes",
connectFromField: "dst_airport",
connectToField: "src_airport",
as: "connections",
maxDepth: 1
}
}, {
$project: {
validAirlines: "$airlines.name",
"connections.dst_airport": 1,
"connections.airline.name": 1
}
},
{ $unwind: "$connections" },
{
$project: {
isValid: { $in: ["$connections.airline.name", "$validAirlines"] },
"connections.dst_airport": 1
}
},
{ $match: { isValid: true } },
{ $group: { _id: "$connections.dst_airport" } }
])
Facets: Introduction
Chapter 4: Core Aggregation - Multidimensional Grouping
Facets: Single Facet Query
https://www.mongodb.com/docs/manual/reference/operator/aggregation/sortByCount/?jmp=university
Which of the following aggregation pipelines are single facet queries?
[
{"$match": { "$text": {"$search": "network"}}},
{"$sortByCount": "$offices.city"}
]
[
{"$unwind": "$offices"},
{"$project": { "_id": "$name", "hq": "$offices.city"}},
{"$sortByCount": "$hq"},
{"$sort": {"_id":-1}},
{"$limit": 100}
]
Chapter 4: Core Aggregation - Multidimensional Grouping
Facets: Manual Buckets
https://www.mongodb.com/docs/manual/reference/operator/aggregation/bucket/?jmp=university
Assuming that field1 is composed of double values, ranging between 0 and Infinity, and field2 is of type string, which of the following stages are correct?
{'$bucket': { 'groupBy': '$field2', 'boundaries': [ "a", "asdas", "z" ], 'default': 'Others'}}
Chapter 4: Core Aggregation - Multidimensional Grouping
Facets: Auto Buckets
https://www.mongodb.com/docs/manual/reference/operator/aggregation/bucketAuto/
Auto Bucketing will ...
Chapter 4: Core Aggregation - Multidimensional Grouping
Facets: Multiple Facets
https://www.mongodb.com/docs/manual/reference/operator/aggregation/facet/?jmp=university
Which of the following statement(s) apply to the $facet stage?
Chapter 4: Core Aggregation - Multidimensional Grouping
Lab - $facets
How many movies are in both the top ten highest rated movies according to the imdb.rating and the metacritic fields? We should get these results with exactly one access to the database.
Hint: What is the intersection?
-> 1
Chapter 4: Core Aggregation - Multidimensional Grouping
The $sortByCount Stage
https://www.mongodb.com/docs/manual/reference/operator/aggregation/sortByCount/
The $redact Stage
https://www.mongodb.com/docs/manual/reference/operator/aggregation/redact/
Chapter 5: Miscellaneous Aggregation
The $out Stage
regarding the $out stage?
Chapter 5: Miscellaneous Aggregation
$merge Overview
In MongoDB 4.2, the $merge Aggregation stage:
Chapter 5: Miscellaneous Aggregation
$merge Syntax
Consider an Aggregation Pipeline using the new $merge stage that outputs to the employee_data collection.
If we are not expecting to find any matching documents in the employee_data collection, which of the following stages should we use?
{
$merge: {
into: "employee_data",
whenNotMatched: "insert",
whenMatched: "fail"
}
}
Chapter 5: Miscellaneous Aggregation
Using $merge for Single View
Chapter 5: Miscellaneous Aggregation
Using $merge for a Temporary Collection
Chapter 5: Miscellaneous Aggregation
Using $merge for Rollups
Chapter 5: Miscellaneous Aggregation
Homework: Using $merge
Consider a potential $merge stage that:
Which of the following $merge stages will perform all of the above functionality?
{
$merge: {
into: "analytics",
on: "name",
whenMatched: "merge",
whenNotMatched: "insert"
}
}
Chapter 5: Miscellaneous Aggregation
Views
Which of the following statements are true regarding MongoDB Views?
Aggregation Performance
https://www.mongodb.com/docs/manual/core/query-plans/
With regards to aggregation performance, which of the following are true?
Chapter 6: Aggregation Performance and Pipeline Optimization
Aggregation Pipeline on a Sharded Cluster
https://www.mongodb.com/docs/manual/core/aggregation-pipeline-sharded-collections/?jmp=university
What operators will cause a merge stage on the primary shard for a database?
Chapter 6: Aggregation Performance and Pipeline Optimization
Pipeline Optimization - Part 1
Chapter 6: Aggregation Performance and Pipeline Optimization
Pipeline Optimization - Part 2
Which of the following statements is/are true?
Final: Question 1
Consider the following aggregation pipelines:
Pipeline 1
db.coll.aggregate([
{"$match": {"field_a": {"$gt": 1983}}},
{"$project": { "field_a": "$field_a.1", "field_b": 1, "field_c": 1 }},
{"$replaceRoot":{"newRoot": {"_id": "$field_c", "field_b": "$field_b"}}},
{"$out": "coll2"},
{"$match": {"_id.field_f": {"$gt": 1}}},
{"$replaceRoot":{"newRoot": {"_id": "$field_b", "field_c": "$_id"}}}
])
Pipeline 2
db.coll.aggregate([
{"$match": {"field_a": {"$gt": 111}}},
{"$geoNear": {
"near": { "type": "Point", "coordinates": [ -73.99279 , 40.719296 ] },
"distanceField": "distance"}},
{"$project": { "distance": "$distance", "name": 1, "_id": 0 }}
])
Pipeline 3
db.coll.aggregate([
{
"$facet": {
"averageCount": [
{"$unwind": "$array_field"},
{"$group": {"_id": "$array_field", "count": {"$sum": 1}}}
],
"categorized": [{"$sortByCount": "$arrayField"}]
},
},
{
"$facet": {
"new_shape": [{"$project": {"range": "$categorized._id"}}],
"stats": [{"$match": {"range": 1}}, {"$indexStats": {}}]
}
}
])
Which of the following statements are correct?
Pipeline 2 is incorrect because $geoNear needs to be the first stage of our pipeline
Pipeline 3 fails because $indexStats must be the first stage in a pipeline and may not be used within a $facet
Pipeline 1 fails since $out is required to be the last stage of the pipeline
Final Exam
Final: Question 2
Consider the following collection:
db.collection.find()
{
"a": [1, 34, 13]
}
The following pipelines are executed on top of this collection, using a mixed set of different expression accross the different stages:
Pipeline 1
db.collection.aggregate([
{"$match": { "a" : {"$sum": 1} }},
{"$project": { "_id" : {"$addToSet": "$a"} }},
{"$group": { "_id" : "", "max_a": {"$max": "$_id"} }}
])
Pipeline 2
db.collection.aggregate([
{"$project": { "a_divided" : {"$divide": ["$a", 1]} }}
])
Pipeline 3
db.collection.aggregate([
{"$project": {"a": {"$max": "$a"}}},
{"$group": {"_id": "$$ROOT._id", "all_as": {"$sum": "$a"}}}
])
Given these pipelines, which of the following statements are correct?
Pipeline 1 is incorrect because you cannot use an accumulator expression in a $match stage.
Pipeline 3 is correct and will execute with no error
Pipeline 2 fails because the $divide operator only supports numeric types
Final Exam
Final: Question 3
Consider the following collection documents:
db.people.find()
{ "_id" : 0, "name" : "Bernice Pope", "age" : 69, "date" : ISODate("2017-10-04T18:35:44.011Z") }
{ "_id" : 1, "name" : "Eric Malone", "age" : 57, "date" : ISODate("2017-10-04T18:35:44.014Z") }
{ "_id" : 2, "name" : "Blanche Miller", "age" : 35, "date" : ISODate("2017-10-04T18:35:44.015Z") }
{ "_id" : 3, "name" : "Sue Perez", "age" : 64, "date" : ISODate("2017-10-04T18:35:44.016Z") }
{ "_id" : 4, "name" : "Ryan White", "age" : 39, "date" : ISODate("2017-10-04T18:35:44.019Z") }
{ "_id" : 5, "name" : "Grace Payne", "age" : 56, "date" : ISODate("2017-10-04T18:35:44.020Z") }
{ "_id" : 6, "name" : "Jessie Yates", "age" : 53, "date" : ISODate("2017-10-04T18:35:44.020Z") }
{ "_id" : 7, "name" : "Herbert Mason", "age" : 37, "date" : ISODate("2017-10-04T18:35:44.020Z") }
{ "_id" : 8, "name" : "Jesse Jordan", "age" : 47, "date" : ISODate("2017-10-04T18:35:44.020Z") }
{ "_id" : 9, "name" : "Hulda Fuller", "age" : 25, "date" : ISODate("2017-10-04T18:35:44.020Z") }
And the aggregation pipeline execution result:
db.people.aggregate(pipeline)
{ "_id" : 8, "names" : [ "Sue Perez" ], "word" : "P" }
{ "_id" : 9, "names" : [ "Ryan White" ], "word" : "W" }
{ "_id" : 10, "names" : [ "Eric Malone", "Grace Payne" ], "word" : "MP" }
{ "_id" : 11, "names" : [ "Bernice Pope", "Jessie Yates", "Jesse Jordan", "Hulda Fuller" ], "word" : "PYJF" }
{ "_id" : 12, "names" : [ "Herbert Mason" ], "word" : "M" }
{ "_id" : 13, "names" : [ "Blanche Miller" ], "word" : "M" }
Which of the following pipelines generates the output result?
var pipeline = [{
"$project": {
"surname_capital": { "$substr": [{"$arrayElemAt": [ {"$split": [ "$name", " " ] }, 1]}, 0, 1 ] },
"name_size": { "$add" : [{"$strLenCP": "$name"}, -1]},
"name": 1
}
},
{
"$group": {
"_id": "$name_size",
"word": { "$push": "$surname_capital" },
"names": {"$push": "$name"}
}
},
{
"$project": {
"word": {
"$reduce": {
"input": "$word",
"initialValue": "",
"in": { "$concat": ["$$value", "$$this"] }
}
},
"names": 1
}
},
{
"$sort": { "_id": 1}
}
]
Final Exam
Final: Question 4
$facet is an aggregation stage that allows for sub-pipelines to be executed.
var pipeline = [
{
$match: { a: { $type: "int" } }
},
{
$project: {
_id: 0,
a_times_b: { $multiply: ["$a", "$b"] }
}
},
{
$facet: {
facet_1: [{ $sortByCount: "a_times_b" }],
facet_2: [{ $project: { abs_facet1: { $abs: "$facet_1._id" } } }],
facet_3: [
{
$facet: {
facet_3_1: [{ $bucketAuto: { groupBy: "$_id", buckets: 2 } }]
}
}
]
}
}
]
In the above pipeline, which uses $facet, there are some incorrect stages or/and expressions being used.
Which of the following statements point out errors in the pipeline?
facet_2 uses the output of a parallel sub-pipeline, facet_1, to compute an expression
can not nest a $facet stage as a sub-pipeline.
Final Exam
Final: Question 5
Consider a company producing solar panels and looking for the next markets they want to target in the USA. We have a collection with all the major cities (more than 100,000 inhabitants) from all over the World with recorded number of sunny days for some of the last years.
A sample document looks like the following:
db.cities.findOne()
{
"_id": 10,
"city": "San Diego",
"region": "CA",
"country": "USA",
"sunnydays": [220, 232, 205, 211, 242, 270]
}
The collection also has these indexes:
db.cities.getIndexes()
[
{
"v": 2,
"key": {
"_id": 1
},
"name": "_id_",
"ns": "test.cities"
},
{
"v": 2,
"key": {
"city": 1
},
"name": "city_1",
"ns": "test.cities"
},
{
"v": 2,
"key": {
"country": 1
},
"name": "country_1",
"ns": "test.cities"
}
]
We would like to find the cities in the USA where the minimum number of sunny days is 200 and the average number of sunny days is at least 220. Lastly, we'd like to have the results sorted by the city's name. The matching documents may or may not have a different shape than the initial one.
We have the following query:
var pipeline = [
{"$addFields": { "min": {"$min": "$sunnydays"}}},
{"$addFields": { "mean": {"$avg": "$sunnydays" }}},
{"$sort": {"city": 1}},
{"$match": { "country": "USA", "min": {"$gte": 200}, "mean": {"$gte": 220}}}
]
db.cities.aggregate(pipeline)
However, this pipeline execution can be optimized!
Which of the following choices is still going to produce the expected results and likely improve the most the execution of this aggregation pipeline?
var pipeline = [
{"$match": { "country": "USA"}},
{"$addFields": { "mean": {"$avg": "$sunnydays"}}},
{"$match": { "mean": {"$gte": 220}, "sunnydays": {"$not": {"$lt": 200 }}}},
{"$sort": {"city": 1}}
]
Final Exam
Final: Question 6
Consider the following people collection:
db.people.find().limit(5)
{ "_id" : 0, "name" : "Iva Estrada", "age" : 95, "state" : "WA", "phone" : "(739) 557-2576", "ssn" : "901-34-4492" }
{ "_id" : 1, "name" : "Roger Walton", "age" : 92, "state" : "ID", "phone" : "(948) 527-2370", "ssn" : "498-61-9106" }
{ "_id" : 2, "name" : "Isaiah Norton", "age" : 26, "state" : "FL", "phone" : "(344) 479-5646", "ssn" : "052-49-6049" }
{ "_id" : 3, "name" : "Tillie Salazar", "age" : 88, "state" : "ND", "phone" : "(216) 414-5981", "ssn" : "708-26-3486" }
{ "_id" : 4, "name" : "Cecelia Wells", "age" : 16, "state" : "SD", "phone" : "(669) 809-9128", "ssn" : "977-00-7372" }
And the corresponding people_contacts view:
db.people_contacts.find().limit(5)
{ "_id" : 6585, "name" : "Aaron Alvarado", "phone" : "(631)*********", "ssn" : "********8014" }
{ "_id" : 8510, "name" : "Aaron Barnes", "phone" : "(944)*********", "ssn" : "********6820" }
{ "_id" : 6441, "name" : "Aaron Barton", "phone" : "(234)*********", "ssn" : "********1937" }
{ "_id" : 8180, "name" : "Aaron Coleman", "phone" : "(431)*********", "ssn" : "********7559" }
{ "_id" : 9738, "name" : "Aaron Fernandez", "phone" : "(578)*********", "ssn" : "********0211" }
Which of the of the following commands generates this people_contacts view?
var pipeline = [
{
"$sort": {"name": 1}
},
{
"$project": {"name":1,
"phone": {
"$concat": [
{"$arrayElemAt": [{"$split": ["$phone", " "]}, 0]} ,
"*********" ]
},
"ssn": {
"$concat": [
"********",
{"$arrayElemAt": [{"$split": ["$ssn", "-"]}, 2]}
]
}
}
}
];
db.createView("people_contacts", "people", pipeline);
Final Exam
Final: Question 7
Using the air_alliances and air_routes collections, find which alliance has the most unique carriers(airlines) operating between the airports JFK and LHR, in either directions.
Names are distinct, i.e. Delta != Delta Air Lines
src_airport and dst_airport contain the originating and terminating airport information.
-> OneWorld, with 5 carriers