上一篇
Python下载模块教程:requests与urllib实战指南 - Python文件下载技巧
- Python
- 2025-08-14
- 1323
Python下载模块完全指南:高效实现文件下载
作者:Python技术专家
更新日期:2023年10月15日
为什么需要专门的下载模块?
在日常开发中,文件下载是常见需求,但需要考虑诸多因素:网络异常、大文件处理、进度显示、性能优化等。Python提供了多种下载解决方案,本教程将深入讲解最实用的两种:requests和urllib。
教程目录
- ▶ 使用requests库下载文件
- ▶ urllib模块基础下载
- ▶ 添加下载进度条
- ▶ 大文件分块下载
- ▶ 处理下载异常和错误
- ▶ 设置请求头和参数
- ▶ 并发下载优化
- ▶ 实战:图片下载器
1. 使用requests库下载文件
requests是Python中最流行的HTTP库,安装简单:pip install requests
基础下载示例
import requests
def download_file(url, save_path):
response = requests.get(url)
if response.status_code == 200:
with open(save_path, 'wb') as f:
f.write(response.content)
print(f"文件已保存到: {save_path}")
else:
print(f"下载失败,状态码: {response.status_code}")
# 使用示例
download_file('https://example.com/image.jpg', 'downloaded_image.jpg')
流式下载大文件
def download_large_file(url, save_path):
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(save_path, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
print(f"大文件下载完成: {save_path}")
2. 使用urllib模块下载文件
Python标准库中的urllib无需额外安装,适合基础下载需求。
基础下载方法
from urllib import request
def download_with_urllib(url, save_path):
try:
request.urlretrieve(url, save_path)
print(f"下载成功: {save_path}")
except Exception as e:
print(f"下载失败: {str(e)}")
# 使用示例
download_with_urllib('https://example.com/document.pdf', 'downloaded_document.pdf')
添加进度显示
def progress_callback(count, block_size, total_size):
percent = int(count * block_size * 100 / total_size)
print(f"下载进度: {percent}%", end='\r')
def download_with_progress(url, save_path):
request.urlretrieve(url, save_path, reporthook=progress_callback)
print("\n下载完成!")
3. 高级下载技巧
自定义请求头
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
'Referer': 'https://example.com/'
}
response = requests.get(url, headers=headers)
错误重试机制
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
session = requests.Session()
retries = Retry(total=3, backoff_factor=0.1)
session.mount('http://', HTTPAdapter(max_retries=retries))
session.mount('https://', HTTPAdapter(max_retries=retries))
多线程下载加速
import threading
import os
def download_chunk(url, start, end, filename):
headers = {'Range': f'bytes={start}-{end}'}
response = requests.get(url, headers=headers, stream=True)
with open(filename, "r+b") as f:
f.seek(start)
f.write(response.content)
def parallel_download(url, num_threads=4):
response = requests.head(url)
file_size = int(response.headers.get('content-length', 0))
chunk_size = file_size // num_threads
with open("downloaded_file", "wb") as f:
f.truncate(file_size)
threads = []
for i in range(num_threads):
start = i * chunk_size
end = start + chunk_size - 1 if i < num_threads - 1 else file_size - 1
thread = threading.Thread(
target=download_chunk,
args=(url, start, end, "downloaded_file")
)
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
print("多线程下载完成!")
4. 实战:图片下载器
结合所学知识,创建一个功能完整的图片下载器:
import requests
import os
from urllib.parse import urlparse
def download_image(url, folder="images"):
if not os.path.exists(folder):
os.makedirs(folder)
try:
response = requests.get(url, stream=True, timeout=10)
response.raise_for_status()
# 从URL获取文件名
parsed = urlparse(url)
filename = os.path.basename(parsed.path)
if not filename:
filename = f"image_{int(time.time())}.jpg"
save_path = os.path.join(folder, filename)
# 下载并显示进度
file_size = int(response.headers.get('content-length', 0))
downloaded = 0
with open(save_path, 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
downloaded += len(chunk)
f.write(chunk)
progress = int(downloaded * 100 / file_size) if file_size > 0 else 0
print(f"下载进度: {progress}%", end='\r')
print(f"\n图片已保存到: {save_path}")
return True
except Exception as e:
print(f"下载失败: {str(e)}")
return False
# 使用示例
download_image("https://example.com/sample.jpg")
功能特点:
- 自动创建保存目录
- 智能文件名提取
- 实时进度显示
- 异常处理和超时设置
- 流式下载节省内存
掌握Python下载的核心技巧
本教程涵盖了Python文件下载的关键技术:
requests库使用
urllib标准库
进度条实现
大文件处理
错误重试机制
多线程加速
将这些技术应用到实际项目中,可以构建出高效可靠的文件下载功能!
本文由XiangHui于2025-08-14发表在吾爱品聚,如有疑问,请联系我们。
本文链接:http://pjw.521pj.cn/20258135.html
发表评论