'utf-8' codec can't decode byte 0xe0 in position 39: invalid continuation byte

Notice

Recent Posts

Recent Comments

Link

« 2025/11 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Tags more

Archives

Today

Total

관리 메뉴

AI·빅데이터 융합 경영학 Study Note

'utf-8' codec can't decode byte 0xe0 in position 39: invalid continuation byte 본문

기타/오류 해결

'utf-8' codec can't decode byte 0xe0 in position 39: invalid continuation byte

SubjectOwner 2023. 11. 21. 16:03

https://gmnam.tistory.com/291

[Pandas] UnicodeDecodeError: 'utf-8' codec can't decode byte 해결방법

증상 Pandas api로 csv파일을 읽을 때, 다음과 같은 UnicodeDecodeError가 날 수 있다. df = pd.read_csv('test.csv') 이 에러는 읽어드릴 파일의 포맷이 UTF-8이 아니기 때문에 발생한다. 해결 위의 pandas.read_csv 함수

gmnam.tistory.com

인코딩 포맷을 찾아서 해결하는 방법

두 번째 방법은 csv파일의 포맷을 찾아서 read_csv에 알려주는 것이다.

이것은 chardet이란 모듈을 이용한다. 모듈이 없으면 다음과 같이 설치할 수 있다.

pip install chardet 
# or 
conda install chardet

설치를 하고 다음과 같이 실행해보자.

import chardet

with open('test.csv', 'rb') as rawdata:
    result = chardet.detect(rawdata.read(10000))

# check what the character encoding might be
print(result)

73%의 확률로 ISO-8859-1 포맷이라고 알려준다. 100%가 아닌 이유는 파일의 첫 1만 개의 bytes만으로 판단을 했기 때문이다. 이 정도만으로도 포맷을 특정 짓기에 충분하다.

그러면 ISO-8859-1포맷으로 파일을 읽어보자. encoding이 일치한다면 에러 없이 파일을 읽을 것이다.

....근데 계속 오류남 ㅠㅠ

저작자표시 비영리 변경금지 (새창열림)

'기타 > 오류 해결' 카테고리의 다른 글

'Series' object has no attribute 'split' (0)	2023.11.23
unindent does not match any outer indentation level (1)	2023.11.22
No module named 'kerastuner' (0)	2023.11.21
name 'CatBoostClassifier' is not defined (1)	2023.11.19
'utf-8' codec can't decode byte 0xc1 in position 0: invalid start byte (0)	2023.11.16

'기타/오류 해결' Related Articles

AI·빅데이터 융합 경영학 Study Note

'utf-8' codec can't decode byte 0xe0 in position 39: invalid continuation byte 본문

'utf-8' codec can't decode byte 0xe0 in position 39: invalid continuation byte

인코딩 포맷을 찾아서 해결하는 방법

'기타 > 오류 해결' 카테고리의 다른 글

티스토리툴바