【bb的博客】

更新于 2020-03-05

快速上手

html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(soup.prettify())
print(soup.title.string)

阅读全文

requests模块运用总结

更新于 2020-03-05

笔记总结python模块

python | requests

快速上手

import requests
url='https://www.baidu.com'
# 使用get请求访问
r=requests.get(url,headers=None,proxy=None,allow_redirects=False,timeout=None)

属性	描述	返回格式
r.text	响应内容	str
r.content	二进制响应内容	str
r.cookies	发送的cookies	dict(cookies_are=’working’)
r.url	访问的url	url
r.header	请求头	header={‘useragent’:’firefox’}
r.status_code	访问状态200	2XX,4XX,5XX
r.request.header	返回头	字典格式
r.encoding	编码模式	utf-8
r.json	返回的json内容	字典格式

阅读全文

re模块运用

更新于 2020-03-05

笔记总结python模块

python | re | 正则表达式

模块简介

在无规律的内容中，匹配到自己想要的内容

常见匹配模式

模式	描述
\w	匹配字母数字及下划线
\W	匹配非字母数字下划线
\s	匹配任意空白字符，等价于 [\t\n\r\f].
\S	匹配任意非空字符
\d	匹配任意数字，等价于 [0-9]
\D	匹配任意非数字
\A	匹配字符串开始
\Z	匹配字符串结束，如果是存在换行，只匹配到换行前的结束字符串
\z	匹配字符串结束
\G	匹配最后匹配完成的位置
\n	匹配一个换行符
\t	匹配一个制表符
^	匹配字符串的开头
$	匹配字符串的末尾。
.	匹配任意字符，除了换行符，当re.DOTALL标记被指定时，则可以匹配包括换行符的任意字符。
[…]	用来表示一组字符,单独列出：[amk] 匹配 ‘a’，’m’或’k’
[^…]	不在[]中的字符：[^abc] 匹配除了a,b,c之外的字符。
*	匹配0个或多个的表达式。
+	匹配1个或多个的表达式。
?	匹配0个或1个由前面的正则表达式定义的片段，非贪婪方式
{n}	精确匹配n个前面表达式。
{n, m}	匹配 n 到 m 次由前面的正则表达式定义的片段，贪婪方式
a\	b	匹配a或b
( )	匹配括号内的表达式，也表示一个组

阅读全文

xlwt模块使用总结

更新于 2020-03-05

笔记总结python模块

python | excle | csv | xlrt

安装模块

1	pip install xlwt

手动下载安装移步官网
文档

快速上手

import xlwt

# 创建一个excle文件
file = xlwt.Workbook(encoding='ascii') #注意这里的Workbook首字母是大写

# 新建一个sheet
table = file.add_sheet('sheet name')
"""
如果对一个单元格重复操作
会引发Exception: Attempt to overwrite cell
所以在打开时加cell_overwrite_ok=True 解决
"""
table = file.add_sheet('sheet name',cell_overwrite_ok=True )

# 写入数据table.write(行,列,value)
table.write(0,0,'test')
file.save('demo.xls')  # 保存文件

阅读全文

pytesseract模块使用总结

更新于 2020-03-05

笔记总结python模块

python | pytesseract

安装环境

Python-tesseract需要python 2.6+或python 3.x.
需要Python Imaging Library（PIL）（或Pillow fork）。在Debian / Ubuntu下，这是包python-imaging或python3-imaging。
安装 Tesseract OCR （有关如何在Linux，Mac OSX和Windows上安装引擎的其他信息）。你必须能够调用的Tesseract命令正方体。如果不是这种情况，例如因为tesseract不在您的PATH中，则必须更改“tesseract_cmd”变量pytesseract.pytesseract.tesseract_cmd。在Debian / Ubuntu下，您可以使用包tesseract-ocr.pytesseract文档

阅读全文