[Pandas] DataFrame

AI/Pandas

[Pandas] DataFrame

sangwonYoon 2023. 3. 9. 00:22

파이썬의 데이터 처리 라이브러리인 pandas에서 사용하는 자료구조 중 하나인 DataFrame에 대해 알아보자.

DataFrame

Table 구조의 데이터를 저장하는 객체이다.
Series가 모여 DataFrame을 구성한다.
DataFrame을 구성하는 값(element)들은 numpy의 ndarray로 이루어져 있다.

raw_data = {"A":[1,4,7], "B":[2,5,8], "C":[3,6,9]}

# dict의 key 값이 dataframe의 column으로 변환된다.
df = pd.DataFrame(raw_data, index = ['first', 'second', 'third'])
print(df)
# 출력: 
#         A  B  C
# first   1  2  3
# second  4  5  6
# third   7  8  9

print(df.values)
# 출력:
# [[1 2 3]
#  [4 5 6]
#  [7 8 9]]

print(type(df.values))
# 출력: <class 'numpy.ndarray'>

df = DataFrame(raw_data, columns = ["first_name", "last_name", "age", "city"])

# DataFrame으로부터 Series를 추출하는 코드
# 아래 두 코드는 동일한 역할을 한다.
f_name = df.first_name
f_name = df["first_name"]

DataFrame에서 데이터 추출

DataFrame에서 데이터를 추출하는 방식은 크게 3가지가 있다.

column 이름으로 추출
loc 함수
iloc 함수

1. column 이름으로 추출

선택할 column의 이름들을 list에 담아 추출한다.

데이터프레임[[추출할 column의 이름들]]

raw_data = {"A":[1,4,7], "B":[2,5,8], "C":[3,6,9]}

df = pd.DataFrame(raw_data, index = ['first', 'second', 'third'])
print(df[["A", "C"]])
# 출력:
#         A  C
# first   1  3
# second  4  6
# third   7  9

# print(df["second"])와 같이 index의 이름으로는 추출할 수 없다.

row를 기준으로 추출하고 싶을 때는 index의 이름이 아닌, 인덱스의 위치를 사용해 추출해야 한다.

데이터프레임[[추출할 column의 이름들]][인덱스 위치]

raw_data = {"A":[1,4,7], "B":[2,5,8], "C":[3,6,9]}

df = pd.DataFrame(raw_data, index = ['first', 'second', 'third'])
print(df[:2])
# 출력:
#         A  B  C
# first   1  2  3
# second  4  5  6

print(df[["A", "C"]][:2])
#         A  C
# first   1  3
# second  4  6

2. loc 함수

데이터프레임.loc[row 이름 , column 이름]

주의해야될 점은, loc 함수는 slicing할 때 파이썬의 slicing과 달리, 마지막 범위가 포함된다.

raw_data = {"A":[1,4,7], "B":[2,5,8], "C":[3,6,9]}

df = pd.DataFrame(raw_data, index = ['first', 'second', 'third'])
print(df.loc[:'second'])
# 출력:
#         A  B  C
# first   1  2  3
# second  4  5  6    second 인덱스가 포함되어 출력된다.

3. iloc 함수

데이터프레임.iloc[row 위치 , column 위치]

raw_data = {"A":[1,4,7], "B":[2,5,8], "C":[3,6,9]}

df = pd.DataFrame(raw_data, index = ['first', 'second', 'third'])
print(df.iloc[1, :2])
# 출력:
# A    4
# B    5
# Name: second, dtype: int64

'AI > Pandas' 카테고리의 다른 글

[Pandas] DataFrame 출력할 때 보여지는 데이터 개수 늘리는 방법 (0)	2023.03.09
[Pandas] apply 함수와 applymap 함수 (0)	2023.03.09
[Pandas] map 함수 VS replace 함수 (0)	2023.03.09
[Pandas] DataFrame과 Series 간의 연산 (0)	2023.03.09
[Pandas] Series (0)	2023.03.09

현재글[Pandas] DataFrame

선한 영향력을 나누는 지속 가능한 개발자가 되기 위해 공부중입니다.

numpy, airflow, 부스트캠프, pytorch, LSTM, 부스트캠프 AI TECH, 논문 리뷰, Andrew Ng, prompt engineering, 판다스, pytest, mojo, 넘파이, 네이버 부스트캠프, github, 파이썬 테스트, 파이썬, 부스트캠프 AI Tech 5기, github actions, pandas,

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

지식을 나누면 두배로