Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

中文乱码 #2

Open
lixinyiabc123 opened this issue Nov 29, 2018 · 1 comment
Open

中文乱码 #2

lixinyiabc123 opened this issue Nov 29, 2018 · 1 comment

Comments

@lixinyiabc123
Copy link

lixinyiabc123 commented Nov 29, 2018

抓取2017版最新数据,发现部分区域名称存在乱码情况,
国家统计局页面源码的编码定义为gb2312,实际为gbk
因此 需要手工指定编码
def getUrl(url,num_retries = 5):
ua = UserAgent()
headers = {'User-Agent':ua.random}
try:
response = requests.get(url,headers = headers)
response.encoding = "GBK"
data = response.text
print(url)
return data
except Exception as e:
if num_retries > 0:
time.sleep(10)
print(url)
print("requests fail, retry!")
return getUrl(url,num_retries-1) #递归调用
else:
print("retry fail!")
print("error: %s" % e + " " + url)
return #返回空值,程序运行报错`

@dta0502
Copy link
Owner

dta0502 commented Dec 2, 2018

感谢!已修改代码!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants