PyMongo Tutorial 페이지 번역

GRoovAllstar·2020년 12월 7일

ref.
https://pymongo.readthedocs.io/en/stable/tutorial.html#

MongoClient와 연결하기

>>> import pymongo
>>> from pymongo import MongoClient
>>> client = MongoClient('mongodb://localhost:27017/')
>>> db = client.test_database
>>> db = client['test_database']

MongoDB의 컬렉션(및 데이터베이스)에 대한 중요한 참고 사항은 느리게 생성된다는 것입니다.
컬렉션과 데이터베이스는 첫번째 document가 삽입 될 때 생성됩니다.

>>> collection = db.test_collection

Documents

MongoDB의 데이터는 JSON 스타일 documents를 사용하여 표현(및 저장)됩니다. PyMongo에서 우리는 documents를 표현하기 위해 dictionary을 사용합니다.
예를 들어 다음 dictionary을 사용하여 블로그 게시물을 나타낼 수 있습니다.

>>> import datetime
>>> post = {'author' : 'Mike', 
        'text' : 'My first blog post!',
        'tags' : ['mongodb', 'python', 'pymongo'],
        'date' : datetime.datetime.utcnow()}

Inserting a Document

>>> posts = db.posts
>>> post_id = posts.insert_one(post).inserted_id
>>> post_id
ObjectId('5fcdd5b3c19cef988734099a')

문서를 삽입 할 때 문서에 "_id"키가 포함되어 있지 않으면 특수 키 "_id"가 자동으로 추가됩니다. "_id"값은 컬렉션 전체에서 고유 해야 합니다. insert_one()은 InsertOneResult의 인스턴스를 반환합니다. "_id"에 대한 자세한 내용은 _id에 대한 설명서를 참조하십시오.

첫 번째 document를 삽입 한 후 게시글 모음이 실제로 서버에 생성되었습니다. 데이터베이스의 모든 컬렉션을 나열하여 이를 확인할 수 있습니다.

>>> db.list_collection_names()
['posts']

find_one()으로 단일 Document 얻기

MongoDB 에서 수행 할 수 있는 가장 기본적인 쿼리 유형은 find_one()입니다. 이 메서드는 쿼리와 일치하는 단일 문서를 반환합니다 (또는 일치하는 항목이 없는 경우 None).
일치 하는 문서가 하나뿐이거나 첫 번째 일치에만 관심이 있는 경우에 유용합니다. 여기에서는 find_one()을 사용하여 posts 컬렉션에서 첫 번째 document를 가져옵니다.

>>> import pprint
>>> pprint.pprint(posts.find_one())
{'_id': ObjectId('5fcdd5b3c19cef988734099a'),
 'author': 'Mike',
 'date': datetime.datetime(2020, 12, 7, 7, 11, 47, 793000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}

"Eliot"과 같은 다른 작성자로 시도하면 결과가 나타나지 않습니다.

>>> posts.find_one({'author':'Eliot'})

Querying By objectId

_id로 게시물을 찾을 수도 있습니다. 이 예에서는 ObjectId입니다.

>>> post_id
ObjectId('5fcdd5b3c19cef988734099a')

>>> pprint.pprint(posts.find_one({'_id':post_id}))
{'_id': ObjectId('5fcdd5b3c19cef988734099a'),
 'author': 'Mike',
 'date': datetime.datetime(2020, 12, 7, 7, 11, 47, 793000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}

웹 애플리케이션의 일반적인 작업은 요청 URL에서 ObjectId를 가져와 일치하는 document를 찾는 것입니다. 이 경우 find_one 에 전달하기 전에 문자열에서 ObjectId를 변환 해야 합니다.

from bson.objectid import ObjectId

# The web framework gets post_id from the URL and passes it as a string
def get(post_id):
    # Convert from string to ObjectId:
    document = client.db.collection.find_one({'_id':ObjectId(post_id)})

Unicode 문자열에 대한 참고 사항

이전에 저장 한 일반 Python 문자열이 서버에서 검색 될 때 다르게 보이는 것을 눈치 챘을 것 입니다 (예 :‘Mike’대신 u’Mike’). 간단한 설명이 필요합니다.

MongoDB는 데이터를 BSON 형식으로 저장합니다. BSON 문자열은 UTF-8로 인코딩되므로 PyMongo는 저장하는 모든 문자열에 유효한 UTF-8 데이터만 포함되어 있는지 확인 해야 합니다.
일반 문자열 (<type‘str’>)은 유효성이 검사되고 변경되지 않은 상태로 저장됩니다. Unicode 문자열 (<type‘unicode’>)은 먼저 UTF-8로 인코딩 됩니다.
예제 문자열이 Python 셸에서‘Mike’대신 u’Mike’로 표시되는 이유는 PyMongo가 각 BSON 문자열을 일반 str이 아닌 Python Unicode 문자열로 디코딩하기 때문입니다.

대량 삽입

쿼리를 좀 더 흥미롭게 만들기 위해 document를 몇 개 더 삽입 해 보겠습니다. 단일 document를 삽입 하는 것 외에도 목록을 insert_many()에 첫 번째 인수로 전달하여 대량 삽입 작업을 수행 할 수도 있습니다. 이렇게하면 목록에 각 document가 삽입되고 단일 명령 만 서버에 전송됩니다.

>>> new_posts = [
      {
          'author' : 'Mike',
          'text' : 'Another post!',
          'tags' : ['bulk', 'insert'],
          'date' : datetime.datetime(2009, 11, 12, 11, 14)
      },
      {
          'author' : 'Eliot',
          'title' : 'MongoDB is fun',
          'text' : 'and pretty easy too!',
          'date' : datetime.datetime(2009, 11, 10, 10, 45)        
      }]
>>> result = posts.insert_many(new_posts)
>>> result.inserted_ids
[ObjectId('5fcdd5b4c19cef988734099b'), ObjectId('5fcdd5b4c19cef988734099c')]

이 예제에 대해 주목해야 할 몇 가지 흥미로운 사항이 있습니다.

insert_many()의 결과는 이제 삽입된 각 document 에 대해 하나씩 두 개의 ObjectId 인스턴스를 반환합니다. new_posts 에는 다른 게시물과 다른 "모양"이 있습니다. "tags" 필드가없고 "title"이라는 새 필드가 추가 되었습니다. 이것이 MongoDB 가 스키마가 없다고 말할 때 의미하는 바입니다.

둘 이상의 Document 쿼리

쿼리의 결과로 하나 이상의 Document를 얻으려면 find() 메서드를 사용합니다. find()는 Cursor 인스턴스를 반환하여 일치하는 모든 Document를 반복 할 수 있습니다. 예를 들어 posts 컬렉션의 모든 Document를 반복 할 수 있습니다.

>>> for post in posts.find():
    	pprint.pprint(post)

{'_id': ObjectId('5fcdd5b3c19cef988734099a'),
 'author': 'Mike',
 'date': datetime.datetime(2020, 12, 7, 7, 11, 47, 793000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}
{'_id': ObjectId('5fcdd5b4c19cef988734099b'),
 'author': 'Mike',
 'date': datetime.datetime(2009, 11, 12, 11, 14),
 'tags': ['bulk', 'insert'],
 'text': 'Another post!'}
{'_id': ObjectId('5fcdd5b4c19cef988734099c'),
 'author': 'Eliot',
 'date': datetime.datetime(2009, 11, 10, 10, 45),
 'text': 'and pretty easy too!',
 'title': 'MongoDB is fun'}

find_one()에서 했던 것처럼 document를 find()에 전달하여 반환된 결과를 제한 할 수 있습니다. 여기에서는 작성자가 "Mike" 인 document만 가져옵니다.

>>> for post in posts.find({'author': 'Mike'}):
    	pprint.pprint(post)

{'_id': ObjectId('5fcdd5b3c19cef988734099a'),
 'author': 'Mike',
 'date': datetime.datetime(2020, 12, 7, 7, 11, 47, 793000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}
{'_id': ObjectId('5fcdd5b4c19cef988734099b'),
 'author': 'Mike',
 'date': datetime.datetime(2009, 11, 12, 11, 14),
 'tags': ['bulk', 'insert'],
 'text': 'Another post!'}

Counting

쿼리와 일치하는 document수를 알고 싶다면 전체 쿼리 대신 count_documents() 작업을 수행 할 수 있습니다. 컬렉션에 있는 모든 document의 개수를 가져올 수 있습니다.

>>> posts.count_documents({})
3

또는 특정 쿼리와 일치하는 document만 :

>>> posts.count_documents({'author':'Mike'})
2

범위 쿼리

MongoDB는 다양한 유형의 고급 쿼리를 지원합니다. 예를 들어, 특정 날짜보다 오래된 게시물로 결과를 제한하고 작성자 별로 결과를 정렬하는 쿼리를 수행 할 수 있습니다.

>>> d = datetime.datetime(2009, 11, 12, 12)
>>>	for post in posts.find({'date' : {'$lt' : d}}).sort('author'):
    	pprint.pprint(post)

{'_id': ObjectId('5fcdd5b4c19cef988734099c'),
 'author': 'Eliot',
 'date': datetime.datetime(2009, 11, 10, 10, 45),
 'text': 'and pretty easy too!',
 'title': 'MongoDB is fun'}
{'_id': ObjectId('5fcdd5b4c19cef988734099b'),
 'author': 'Mike',
 'date': datetime.datetime(2009, 11, 12, 11, 14),
 'tags': ['bulk', 'insert'],
 'text': 'Another post!'}

여기서는 "$lt" 특수 연산자를 사용하여 범위 쿼리를 수행하고 sort()를 호출하여 작성자별로 결과를 정렬합니다.

Indexing

index를 추가하면 특정 쿼리를 가속화하는 데 도움이 될 수 있으며 document 쿼리 및 저장에 추가 기능을 추가 할 수도 있습니다. 이 예에서는 해당 키 값이 이미 index에 있는 document를 거부하는 키에 고유 index을 만드는 방법을 보여줍니다.

>>> result = db.profiles.create_index([('user_id', pymongo.ASCENDING)], unique=True)
>>> sorted(list(db.profiles.index_information()))
['_id_', 'user_id_1']

이제 두 개의 인덱스가 있습니다. 하나는 MongoDB가 자동으로 생성하는 _id의 인덱스이고 다른 하나는 방금 만든 user_id의 인덱스입니다.

이제 몇 가지 사용자 프로필을 설정해 보겠습니다.

>>> user_profiles = [
    	{'user_id' : 211, 'name' : 'Luke'},
    	{'user_id' : 212, 'name' : 'Ziltoid'}]
>>> result = db.profiles.insert_many(user_profiles)

인덱스는 user_id가 이미 컬렉션에 있는 document를 삽입하는 것을 방지합니다.

>>> new_profile = {'user_id' : 213, 'name' : 'Drew'}
>>> duplicate_profile = {'user_id' : 212, 'name' : 'Tommy'}
>>> result = db.profiles.insert_one(new_profile)
>>> result = db.profiles.insert_one(duplicate_profile)
DuplicateKeyError: E11000 duplicate key error collection