Python Basics III - Data Structure

MisCaminos·2021년 4월 12일

python & data analysis

목록 보기

3/7

Mutable vs. Immutable

Mutable data structure: list, dict, set
Immutable data structure: string, tuple

Mutable Object

Mutable data type은 추가, 삭제, 수정이 가능한 method를 가지고있다.

1. list

python 에서 가장 많이 사용되는 data 구조.
list에는 어떠한 python data type도 저장할 수 있다.

-목록을 포함하는 목록을 한 줄로 정의
my_list = ['new_item1', 'new_item2', [1, 2, 3, 4], 'new_item3']

-음수 인덱스는 목록의 마지막 원소에서 시작하여 첫 항목 쪽으로 거슬러 감
my_list[-1]
==> 결과: 'new item3'
my_list[-2]
==> 결과: [1,2,3,4]

-시작 인덱스의 원소는 포함하고 끝 인덱스의 원소는 제외한다. list[시작인덱스:끝인덱스]
print(my_string_list[0:2])
==> 결과: ['new_item1', 'new_item2']

-시작&끝 인덱스 외에도 간격표기 가능
list[시작인덱스:끝인덱스:간격]
new_list = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

print(new_list[0:5:2])
==> 결과: [2, 6, 10]
print(new_list[0:10:2])
==> 결과: [2, 6, 10, 14, 18]
시작 인덱스를 비워두면 기본값이 0을 지정한 것과 같이 동작하며, 끝 인덱스에 대해서는 목록의 길이가 기본값임
new_list[::2]
==> 결과: [2, 6, 10, 14, 18]

-목록의 복사본을 생성
my_list_copy = my_list[:]

-인덱스를 통해 특정한 위치로 목록을 삽입(insert)
new_list=[1,1,1]
another_list = ['a', 'b', 'c']
another_list.insert(2, new_list)
print(another_list)
==> 결과: ['a', 'b', [1, 1, 1], 'c']

-시작과 종료 위치를 동일하게 하여 목록으로부터 빈 조각을 만들어낸다. 시작과 끝 위치를 동일하게 쓰는 한, 어느 위치든 상관 없음.

이것에 대해 유용한 점은 이것을 빈 조각에 할당할 수 있다는 것이다. 이제, 'two' 목록을 'one'목록의 빈 조각에 할당함으로써 실제로는 'two' 목록을 'one' 목록에 삽입한다.

one = ['a', 'b', 'c', 'd']
two = ['e', 'f']
one[2:2] = two
print(one)
==> 결과: ['a', 'b', 'e', 'f', 'c', 'd']

-del 구문을 사용하여 값 또는 값의 범위를 목록으로부터 제거. 모든 다른 원소가 이동하여 빈 공간을 채움에 유의해야한다.
new_list3 = ['a','b','c','d','e','f']
del new_list3[2]
print(new_list3)
==> 결과: ['a', 'b', 'd', 'e', 'f']

del new_list3[1:3]
print(new_list3)
==> 결과: ['a', 'e', 'f']

-pop과 remove 함수를 사용하여 목록으로부터 값을 제거.
인덱스가 2인 원소를 pop한 값 리턴
new_list=['a','b','c','d','e','f','g']
print(new_list.pop(2))
==> 결과: 'c'
print(new_list)
==> 결과: ['a', 'b', 'd', 'e', 'f', 'g']

목록에서 처음으로 나타나는 'g' 문자를 제거, 리턴은 안함!
new_list.remove('g')
print (new_list)
==> 결과: ['a', 'b', 'd', 'e', 'f']

-list함수: 어떤 주어진 값에 대한 인덱스를 반환
new_list=[1,2,3,4,5,6,7,8,9,10]
print(new_list.index(4))
==> 결과: 3
(4는 new_list에서 3번 인덱스에 위치한다.)

-값 변경
new_list[4] = 30
print(new_list)
==> 결과: [1, 2, 3, 4, 30, 6, 7, 8, 9, 10]
(인덱스가 4인 원소의 값을 변경)

-값 추가
new_list.append(6)
print(new_list)
==> 결과: [1, 2, 3, 4, 30, 6, 7, 8, 9, 10, 6]
처음 나타나는 값의 인덱스번호를 반환
print(new_list.index(6))
==> 결과: 5
(6이 두번 반복되지만, 처음 찾는 5번 인덱스값이 반환된다.)

주어진 값과 동등한 원소의 갯수를 반환하도록 count() 함수를 사용
print(new_list.count(2))
==> 결과: 1
print(new_list.count(6))
==> 결과: 2

목록의 값들을 정렬
new_list.sort()
print(new_list)
==> 결과: [1, 2, 3, 4, 6, 6, 7, 8, 9, 10, 30]

목록의 값들의 순서를 반대로
new_list.reverse()
print(new_list)
==> 결과: [30, 10, 9, 8, 7, 6, 6, 4, 3, 2, 1]

리스트 요소 정렬하기(sort, sorted)
sort() : 원본 리스트의 내용을 정렬된 형태로 변경한다.(리턴값 없음!)
sorted(인자): 원본 리스트는 그대로 두고 정렬한 결과 리스트를 리턴한다.

namelist = ['Mary', 'Sams', 'Aimy', 'Tom', 'Michale', 'Bob', 'Kelly']
ret = namelist.sort()
print(ret)
print(namelist)
==> None
==> ['Aimy', 'Bob', 'Kelly', 'Mary', 'Michale', 'Sams', 'Tom']

namelist = ['Mary', 'Sams', 'Aimy', 'Tom', 'Michale', 'Bob', 'Kelly']
sorted_list = sorted(namelist)
r_sorted_list = sorted(namelist, reverse=True)
print(sorted_list)
print(r_sorted_list)
==> ['Aimy', 'Bob', 'Kelly', 'Mary', 'Michale', 'Sams', 'Tom']
==> ['Tom', 'Sams', 'Michale', 'Mary', 'Kelly', 'Bob', 'Aimy']

-리스트의 모든 요소를 인덱스와 쌍으로 추출하기(enumerate)
shopping_list = ['kitchen towel', 'washer', 'cereal', 'egg', 'broccolis']
ret = list(enumerate(shopping_list))
print(ret)
==> [(0, 'kitchen towel'), (1, 'washer'), (2, 'cereal'), (3, 'egg'), (4, 'broccolis')]

for i, body in enumerate(shopping_list):
print('쇼핑리스트 %d 번째 item: %s'%(i,body))
==> 결과:
쇼핑리스트 0 번째 item: kitchen towel
쇼핑리스트 1 번째 item: washer
쇼핑리스트 2 번째 item: cereal
쇼핑리스트 3 번째 item: egg
쇼핑리스트 4 번째 item: broccolis

-리스트의 모든 요소의 합 구하기(sum)
listdata = [2, 2, 1, 3, 8, 5, 7, 6, 3, 6, 2, 3, 9, 4, 4]
ret = sum(listdata)
print(ret)
==> 65

-리스트 요소가 모두 참인지 확인하기(all, any)
all(list): only when all are True -- True, otherwise -- False
any(list): if at least one is True -- True, if none is True -- False
listdata1 = [0, 1, 2, 3, 4] #0만 False 0이상 True

-리스트안에 function(함수)를 넣을수도 있다.

def myfunc():
	print('안녕하세요')
list = [1, 2, myfunc]

아래와 같이 호출한다.

list[2]()

==> 결과: 안녕하세요

2. set

set: 원소들의 순서가 없고 중복이 허용되지 않는 데이터들의 집합.

set은 set() 함수로만 생성이 가능하다

myset=set([1,2,3,4,5])
print(myset)
==> 결과: {1, 2, 3, 4, 5}

-set에 추가
myset.add(6)
print(myset)
==> 결과: {1, 2, 3, 4, 5, 6}

-중복된 원소를 추가하려고하면, 추가되지 않는다. 이미포함된 4는 1번만 출력된다.
myset.add(4)
print(myset)
==> 결과: {1, 2, 3, 4, 5, 6}

-set operators사용:
s1 = set(['python','cpython','ironpython'])
s2 = set(['python','ironpython','pypy'])
print(s1.intersection(s2))
==> {'ironpython', 'python'}
print(s1.difference(s2))
==> {'cpython'}
print(s2.difference(s1))
==> {'pypy'}
print(s1.symmetric_difference(s2))
==> {'cpython', 'pypy'}
print(s2.symmetric_difference(s1))
==> {'cpython', 'pypy'}
print(s1.union(s2))
==> {'ironpython', 'python', 'pypy', 'cpython'}

set.pop() : 임의의 원소를 제거
set.discard(값): set에 '값'과 동일한 원소가 존재하는 경우 폐기
set1.intersection_update(set2): set1과 set2의 교집합으로 set1을 갱신(set1과 set2 양쪽 모두에 속하는 원소들만을 가지도록 set1을 갱신)
set.clear(): set의 모든 원소를 제거
set1.update(set3): set3의 모든 원소를 포함하도록 set1을 갱신 (set1에 set3값을 포함시켜서 갱신)

3. dictionary

dictionary: 전체 항목이 정렬되지 않는 key-value 쌍으로 구성된 집합. 순서가없고, key-value pair가 하나의 원소로, {}를 사용해서 생성한다.

빈 사전 및 채워진 사전을 생성한다.
myDict={}
myDict['one'] = 'first'
myDict['two'] = 'second'
print(myDict)
==> 결과: {'one': 'first', 'two': 'second'}

-사전에서 키를 찾아봄
print('firstkey' in myDict)
==> False

-사전에 키/값 쌍을 추가
myDict['firstkey'] = 'firstval'

-사전의 내용을 출력
print(myDict)
==> {'one': 'first', 'two': 'second', 'firstkey': 'firstval'}

-사전의 길이(키/값 쌍이 얼마나 들어있는지)를 표시
print(len(myDict))
==> 3

-사전의 값들을 나열
print(myDict.values())
==> dict_values(['first', 'second', 'firstval'])

-사전의 키들을 나열
print(myDict.keys())
==> dict_keys(['one', 'two', 'firstkey'])

-key를 찾는 방법:

myDict2 = {'r_wing':'Josh','l_wing':'Frank','center':'Jim','l_defense':'Leo','r_defense':'Vic'}

해당 key의 value를 찾는 방법1
print(myDict2.get('r_wing'))

해당 key의 value를 찾는 방법2
print(myDict2['r_wing'])

-사전 traverse하는 방법:

사전의 항목들에 대하여 반복:
for player in myDict2.items():
print (player)

for문으로 myDict2 자체를 traverse하면, 기본적으로 key값을 훌터본다:
for player in myDict2:
print (player)

키와 값을 독립적인 개체에 할당한 다음 출력:
for key,value in myDict2.items():
print (key, value)

-정렬:

사전 정렬하기
sorted()는 키를 오름차순으로 정렬하여 리스트로 리턴한다.

names = {'Mary':10999, 'Sams':2111, 'Aimy':9778, 'Tom':20245, 
'Michale':27115, 'Bob':5887, 'Kelly':7855}
ret1 = sorted(names.items())
print(ret1)

==> 결과:
[('Aimy', 9778), ('Bob', 5887), ('Kelly', 7855), ('Mary', 10999), ('Michale', 27115), ('Sams', 2111), ('Tom', 20245)]

sorted()의 key인자를 이용하여 정렬할 기준이 되는 값을 지정할 수 있다. key의 지정값은 함수이어야 한다.

def f1(x):
    return x[0]
def f2(x):
    return x[-1]

ret2 = sorted(names.keys(), key=f1)
print(ret2)

==> 결과:
['Aimy', 'Bob', 'Kelly', 'Mary', 'Michale', 'Sams', 'Tom']

ret3 = sorted(names.keys(), key=f2)
print(ret3)

==> 결과:
['Bob', 'Michale', 'Tom', 'Sams', 'Mary', 'Aimy', 'Kelly']

ret4 = sorted(names.items(),key=f1, reverse=True)
print(ret4)

==> 결과:
[('Tom', 20245), ('Sams', 2111), ('Michale', 27115), ('Mary', 10999), ('Kelly', 7855), ('Bob', 5887), ('Aimy', 9778)]

-사전 뷰 객체:
(Dictionary view object)
Built-in functions supported by dictionary view objects
(1) len(dictview) - returns number of entries in the dictionary
len(names.keys())
len(names.values())
len(names.items())

(2) iter(dictview) - returns an interator over the keys, values or items

iter(dictionary)를 그냥 쓰면 부조건 key값으로만 iterate해야한다.

for i in iter(names):
    print(i)

아래와 같이 values, iterms 값들로 dictionary를 traverse할 수 있다.

for i in iter(names.values()):
    print(i)

(3) reversed(dictview) - returns a reverse iterator over keys, values, items
주의!! reversed() only works for python 3.8 or later

(4) sorted(dictview) - Return a sorted list over the keys, values, or items of the dictionary
#sorted(dict) -Return a sorted list of keys ONLY!

(5) list(dictview) - list over the keys, values, or items of the dictionary.
print (list(names.values()))
print (list(names.items()))

주의: dictionary view object를 사용하는 동시에 adding/deleting을 하면 runtime error발생할 수 있음.
(dict view object는 dictionary의 상태를 알려주는것이니, 사용중에 변경되면 문제가 발생하는게 당연한듯...)
Iterating views while adding or deleting entries in the dictionary may raise a RuntimeError or fail to iterate over all entries

Immutable Object

1. tuple

tuple형태: 변경할 수 없는 list형
tuple형은 서로다른 종류의 데이터형으로 이루어진 항목들을 바로 풀어쓰는 언패킹 또는 색인을 매기는 용도로 사용한다.
t2 = [1,2,3],[4,5,6]

list와 다르게 tuple의 구성원소 자체를 변경은 불가능하다.

-숫자 튜풀생성
tuple1 = (1, 2, 3, 4, 5)

-문자 튜풀생성
tuple2 = ('a', 'b', 'c')

-다양한 타입의 튜풀생성
tuple3 = (1, 'a', 'abc', [1, 2, 3, 4, 5], ['a', 'b', 'c'])

-오류
#tuple1[0] = 6

def myfunc():
print('안녕하세요')

-함수도 포함해서 튜풀생성
tuple4 = (1, 2, myfunc)
tuple42
==> 결과: '안녕하세요'

t = 2
y = 6
def printall():
print('from a to z')
tuple5 = 54542433245, [t,y], 'come on', printall
tuple53
==> 결과: 'from a to z'

2. string

# 시퀀스
strdata = 'Time is money!!'
print(strdata[2])     # ‘m'가 출력됨
print(strdata[-3])    # ‘y’가 출력됨

listdata = [1, 2, [1, 2, 3]]
print(listdata[0])     # 1이 출력됨
print(listdata[-1])    # [1, 2, 3]이 출력됨
print(listdata[2][-1])  # 3이 출력됨

# 시퀀스 스라이싱
strdata = 'Time is money!!'
print(strdata[1:5])   # ‘ime’가 출력됨
print(strdata[:7])    # ‘Time is’가 출력됨
print(strdata[9:])    # ‘oney!!’가 출력됨
print(strdata[:-3])   # ‘Time is mone’이 출력됨
print(strdata[-3:])   # ‘y!!’이 출력됨
print(strdata[:])    # ‘Time is money!!’가 출력됨
print(strdata[::2])  # ‘Tm smny!’가 출력됨

-문자열 formatting

txt1 = 'java'
txt2 = 'python'
num1 = 5
num2 = 10
print('나는 %s보다 %s에 더 익숙합니다.' %(txt1, txt2))
print('%s은 %s보다 %d배 더 쉽습니다.' %(txt2, txt1, num1))
print('%d + %d = %d' %(num1, num2, num1+num2))
print('작년 세계 경제 성장률은 전년에 비해 %d%% 포인트 증가했다.' %num1)

==> 결과:
나는 java보다 python에 더 익숙합니다.
python은 java보다 5배 더 쉽습니다.
5 + 10 = 15
작년 세계 경제 성장률은 전년에 비해 5% 포인트 증가했다.

-문자열을 다루는 함수들:
문자열 구성 변형:
str.upper()
str.lower()
str.lstrip()
str.rstrip()
str.strip()

문자열 구성에 대해 판단:
str.count('char' or 'substring') #몇개?
len(str) #길이?
str.isalpha()
str.isdigit()
str.isalnum()

-list = str.split() : 문자열에서 리스트로, 공백으로 구분
-delimiter.join( list ) : 리스트 원소들을 하나의 문자열으로 delimiter를 통해 연결

Call by Object Reference

python은 Call by Object Assignment/ Object Reference방식을 사용한다.

함수의 매개변수가 참조에 의해 전달되는데, 더 자세하게는:
-Immutable obj를 call/pass 할때에는 call by value 방식을,
-Mutable obj를 call/pass할때에는 call by reference 방식을 사용한다.

# Call by value 예시1
string = "Geeks"

def test(string):
      
    string = "GeeksforGeeks"
    print("Inside Function:", string)
      
test(string)
# 문자열은 원본값이 변하지 않는다.
print("Outside Function:", string)

==> 결과:
Inside Function: GeeksforGeeks
Outside Function: Geeks

Python에서 불변하는 것(string, tuple, number등)을 다룰때에는 "Call by value"방식과 동일이다.
불변하는 문자열 객체를 함수로 넘겨주고 변경하는 경우에는 분자열의 복사본이 만들어진다.
id는 객체의 참조값 (주소처럼 사용되는 유일한 값(java에서는 hashcode))을 알려준다.

# Call by value 예시 2
def changestr(mystr):
    print(id(mystr))
    mystr = mystr + '_changed'
    print('The string inside the function: {}'.format(mystr))
    print(id(mystr))
    return

mystr = 'hello'
changestr(mystr)

# 문자열은 원본값이 변하지 않는다.
print(id(mystr))
print(mystr)

==> 결과:
2070362916528
The string inside the function: hello_changed
2070385728304
2070362916528
hello

Python에서 변하는 것(list, set, dictionary,등)을 다룰때에는 "Call by reference"방식.

# Call by reference 예시1

def add_more(list):
    list.append(50)
    print("Inside Function", list)

mylist = [10,20,30,40]
  
add_more(mylist)
print("Outside Function:", mylist)

==> 결과:
Inside Function [10, 20, 30, 40, 50]
Outside Function: [10, 20, 30, 40, 50]

"Call by reference"방식에 대한 자세한 설명은 예전 Java를 처음 배우면서 메모리영역에 대해 알아보면서 정리했던 내용이있다.

참고: https://blog.naver.com/sojun1221/222207801560

References: