Skip to content

Commit ef2ae35

Browse files
DOCSP-50020 Remove EOL mentions (#133) (#135) (#136)
* DOCSP-50020 Remove EOL mentions * MM review * remove froom compat table (cherry picked from commit cebb98b) (cherry picked from commit 6347035) Co-authored-by: lindseymoore <[email protected]> (cherry picked from commit 05c3590)
1 parent a272b68 commit ef2ae35

File tree

5 files changed

+712
-25
lines changed

5 files changed

+712
-25
lines changed

source/aggregation.txt

Lines changed: 284 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,284 @@
1+
.. _scala-aggregation:
2+
3+
====================================
4+
Transform Your Data with Aggregation
5+
====================================
6+
7+
.. facet::
8+
:name: genre
9+
:values: reference
10+
11+
.. meta::
12+
:keywords: code example, transform, computed, pipeline
13+
:description: Learn how to use the Scala driver to perform aggregation operations.
14+
15+
.. contents:: On this page
16+
:local:
17+
:backlinks: none
18+
:depth: 2
19+
:class: singlecol
20+
21+
.. TODO:
22+
.. toctree::
23+
:titlesonly:
24+
:maxdepth: 1
25+
26+
/aggregation/aggregation-tutorials
27+
28+
Overview
29+
--------
30+
31+
In this guide, you can learn how to use the {+driver-short+} to perform
32+
**aggregation operations**.
33+
34+
Aggregation operations process data in your MongoDB collections and
35+
return computed results. The MongoDB Aggregation framework, which is
36+
part of the Query API, is modeled on the concept of data processing
37+
pipelines. Documents enter a pipeline that contains one or more stages,
38+
and this pipeline transforms the documents into an aggregated result.
39+
40+
An aggregation operation is similar to a car factory. A car factory has
41+
an assembly line, which contains assembly stations with specialized
42+
tools to do specific jobs, like drills and welders. Raw parts enter the
43+
factory, and then the assembly line transforms and assembles them into a
44+
finished product.
45+
46+
The **aggregation pipeline** is the assembly line, **aggregation stages** are the
47+
assembly stations, and **operator expressions** are the
48+
specialized tools.
49+
50+
Compare Aggregation and Find Operations
51+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
52+
53+
The following table lists the different tasks that find
54+
operations can perform and compares them to what aggregation
55+
operations can perform. The aggregation framework provides
56+
expanded functionality that allows you to transform and manipulate
57+
your data.
58+
59+
.. list-table::
60+
:header-rows: 1
61+
:widths: 50 50
62+
63+
* - Find Operations
64+
- Aggregation Operations
65+
66+
* - | Select *certain* documents to return
67+
| Select *which* fields to return
68+
| Sort the results
69+
| Limit the results
70+
| Count the results
71+
- | Select *certain* documents to return
72+
| Select *which* fields to return
73+
| Sort the results
74+
| Limit the results
75+
| Count the results
76+
| Rename fields
77+
| Compute new fields
78+
| Summarize data
79+
| Connect and merge data sets
80+
81+
Limitations
82+
~~~~~~~~~~~
83+
84+
Consider the following limitations when performing aggregation operations:
85+
86+
- Returned documents cannot violate the
87+
:manual:`BSON document size limit </reference/limits/#mongodb-limit-BSON-Document-Size>`
88+
of 16 megabytes.
89+
- Pipeline stages have a memory limit of 100 megabytes by default. You can exceed this
90+
limit by passing a value of ``true`` to the ``allowDiskUse()`` method and chaining the
91+
method to ``aggregate()``.
92+
- The :manual:`$graphLookup </reference/operator/aggregation/graphLookup/>`
93+
operator has a strict memory limit of 100 megabytes and ignores the
94+
value passed to the ``allowDiskUse()`` method.
95+
96+
.. _scala-run-aggregation:
97+
98+
Run Aggregation Operations
99+
--------------------------
100+
101+
.. note:: Sample Data
102+
103+
The examples in this guide use the ``restaurants`` collection in the ``sample_restaurants``
104+
database from the :atlas:`Atlas sample datasets </sample-data>`. To learn how to create a
105+
free MongoDB Atlas cluster and load the sample datasets, see the :atlas:`Get Started with Atlas
106+
</getting-started>` guide.
107+
108+
To perform an aggregation, pass a list containing the pipeline stages to
109+
the ``aggregate()`` method. The {+driver-short+} provides the ``Aggregates`` class,
110+
which includes helper methods for building pipeline stages.
111+
112+
To learn more about pipeline stages and their corresponding ``Aggregates`` helper
113+
methods, see the following resources:
114+
115+
- :manual:`Aggregation Stages </reference/operator/aggregation-pipeline/>` in the
116+
{+mdb-server+} manual
117+
- `Aggregates <{+api+}/org/mongodb/scala/model/Aggregates$.html>`__ in the API documentation
118+
119+
.. _scala-aggregation-example:
120+
121+
Filter, Group, and Count Documents
122+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
123+
124+
This code example produces a count of the number of bakeries in each borough
125+
of New York. To do so, it calls the ``aggregate()`` method and passes an aggregation
126+
pipeline as a list of stages. The code builds these stages by using the following
127+
``Aggregates`` helper methods:
128+
129+
- ``filter()``: Builds the :manual:`$match </reference/operator/aggregation/match/>` stage
130+
to filter for documents that have a ``cuisine`` value of ``"Bakery"``
131+
132+
- ``group()``: Builds the :manual:`$group </reference/operator/aggregation/group/>` stage to
133+
group the matching documents by the ``borough`` field, accumulating a count of documents for each
134+
distinct value
135+
136+
.. io-code-block::
137+
:copyable:
138+
139+
.. input:: /includes/aggregation.scala
140+
:start-after: start-match-group
141+
:end-before: end-match-group
142+
:language: scala
143+
:dedent:
144+
145+
.. output::
146+
:visible: false
147+
148+
{"_id": "Brooklyn", "count": 173}
149+
{"_id": "Queens", "count": 204}
150+
{"_id": "Bronx", "count": 71}
151+
{"_id": "Staten Island", "count": 20}
152+
{"_id": "Missing", "count": 2}
153+
{"_id": "Manhattan", "count": 221}
154+
155+
Explain an Aggregation
156+
~~~~~~~~~~~~~~~~~~~~~~
157+
158+
To view information about how MongoDB executes your operation, you can
159+
instruct the MongoDB query planner to **explain** it. When MongoDB explains
160+
an operation, it returns **execution plans** and performance statistics.
161+
An execution plan is a potential way in which MongoDB can complete an operation.
162+
When you instruct MongoDB to explain an operation, it returns both the
163+
plan MongoDB executed and any rejected execution plans by default.
164+
165+
To explain an aggregation operation, chain the ``explain()`` method to the
166+
``aggregate()`` method. You can pass a verbosity level to ``explain()``,
167+
which modifies the type and amount of information that the method returns. For more
168+
information about verbosity, see :manual:`Verbosity Modes </reference/command/explain/#verbosity-modes>`
169+
in the {+mdb-server+} manual.
170+
171+
The following example instructs MongoDB to explain the aggregation operation
172+
from the preceding :ref:`scala-aggregation-example` example. The code passes a verbosity
173+
value of ``ExplainVerbosity.EXECUTION_STATS`` to the ``explain()`` method, which
174+
configures the method to return statistics describing the execution of the winning
175+
plan:
176+
177+
.. io-code-block::
178+
:copyable:
179+
180+
.. input:: /includes/aggregation.scala
181+
:start-after: start-explain
182+
:end-before: end-explain
183+
:language: scala
184+
:dedent:
185+
186+
.. output::
187+
:visible: false
188+
189+
{"explainVersion": "2", "queryPlanner": {"namespace": "sample_restaurants.restaurants",
190+
"indexFilterSet": false, "parsedQuery": {"cuisine": {"$eq": "Bakery"}}, "queryHash": "865F14C3",
191+
"planCacheKey": "0FC225DA", "optimizedPipeline": true, "maxIndexedOrSolutionsReached": false,
192+
"maxIndexedAndSolutionsReached": false, "maxScansToExplodeReached": false, "winningPlan":
193+
{"queryPlan": {"stage": "GROUP", "planNodeId": 3, "inputStage": {"stage": "COLLSCAN",
194+
"planNodeId": 1, "filter": {"cuisine": {"$eq": "Bakery"}}, "direction": "forward"}},
195+
...}
196+
197+
Run an Atlas Full-Text Search
198+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
199+
200+
.. tip:: Only Available for Collections with an Atlas Search Index
201+
202+
This aggregation pipeline operator is only available for collections
203+
with an :atlas:`Atlas Search index </reference/atlas-search/index-definitions/>`.
204+
205+
To specify a full-text search of one or more fields, you can create
206+
a ``$search`` pipeline stage. The {+driver-short+} provides the
207+
``Aggregates.search()`` helper method to create this stage. The ``search()``
208+
method requires the following arguments:
209+
210+
- ``SearchOperator`` instance: Specifies the field and text to search for.
211+
- ``SearchOptions`` instance: Specifies options to customize the full-text
212+
search. You must set the ``index`` option to the name of the Atlas Search
213+
index to use.
214+
215+
This example creates pipeline stages to perform the following actions:
216+
217+
- Search the ``name`` field for text that contains the word ``"Salt"``
218+
- Project only the ``_id`` and ``name`` values of matching documents
219+
220+
.. io-code-block::
221+
:copyable:
222+
223+
.. input:: /includes/aggregation.scala
224+
:start-after: start-atlas-search
225+
:end-before: end-atlas-search
226+
:language: scala
227+
:dedent:
228+
229+
.. output::
230+
:visible: false
231+
232+
{"_id": {"$oid": "..."}, "name": "Fresh Salt"}
233+
{"_id": {"$oid": "..."}, "name": "Salt & Pepper"}
234+
{"_id": {"$oid": "..."}, "name": "Salt + Charcoal"}
235+
{"_id": {"$oid": "..."}, "name": "A Salt & Battery"}
236+
{"_id": {"$oid": "..."}, "name": "Salt And Fat"}
237+
{"_id": {"$oid": "..."}, "name": "Salt And Pepper Diner"}
238+
239+
.. important::
240+
241+
To run the preceding example, you must create an Atlas Search index on the ``restaurants``
242+
collection that covers the ``name`` field. Then, replace the ``"<search index name>"``
243+
placeholder with the name of the index. To learn more about Atlas Search indexes, see
244+
the :ref:`scala-atlas-search-index` guide.
245+
246+
Additional Information
247+
----------------------
248+
249+
MongoDB Server Manual
250+
~~~~~~~~~~~~~~~~~~~~~
251+
252+
To learn more about the topics discussed in this guide, see the following
253+
pages in the {+mdb-server+} manual:
254+
255+
- To view a full list of expression operators, see :manual:`Aggregation
256+
Operators </reference/operator/aggregation/>`.
257+
258+
- To learn about assembling an aggregation pipeline and to view examples, see
259+
:manual:`Aggregation Pipeline </core/aggregation-pipeline/>`.
260+
261+
- To learn more about creating pipeline stages, see :manual:`Aggregation
262+
Stages </reference/operator/aggregation-pipeline/>`.
263+
264+
- To learn more about explaining MongoDB operations, see
265+
:manual:`Explain Output </reference/explain-results/>` and
266+
:manual:`Query Plans </core/query-plans/>`.
267+
268+
.. TODO:
269+
Aggregation Tutorials
270+
~~~~~~~~~~~~~~~~~~~~~
271+
272+
.. To view step-by-step explanations of common aggregation tasks, see
273+
.. :ref:`scala-aggregation-tutorials-landing`.
274+
275+
API Documentation
276+
~~~~~~~~~~~~~~~~~
277+
278+
To learn more about the methods and types discussed in this guide, see the
279+
following API documentation:
280+
281+
- `aggregate() <{+api+}/org/mongodb/scala/MongoCollection.html#aggregate[C](pipeline:Seq[org.mongodb.scala.bson.conversions.Bson])(implicite:org.mongodb.scala.bson.DefaultHelper.DefaultsTo[C,TResult],implicitct:scala.reflect.ClassTag[C]):org.mongodb.scala.AggregateObservable[C]>`__
282+
- `Aggregates <{+api+}/org/mongodb/scala/model/Aggregates$.html>`__
283+
- `explain() <{+api+}/org/mongodb/scala/AggregateObservable.html#explain[ExplainResult](verbosity:com.mongodb.ExplainVerbosity)(implicite:org.mongodb.scala.bson.DefaultHelper.DefaultsTo[ExplainResult,org.mongodb.scala.Document],implicitct:scala.reflect.ClassTag[ExplainResult]):org.mongodb.scala.SingleObservable[ExplainResult]>`__
284+

0 commit comments

Comments
 (0)