-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathatom.xml
2331 lines (2230 loc) · 84.5 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<id>https://foreup.github.io/</id>
<title>ForeUP's Blog</title>
<updated>2022-02-15T09:21:10.238Z</updated>
<generator>https://github.com/jpmonette/feed</generator>
<link rel="alternate" href="https://foreup.github.io/"/>
<link rel="self" href="https://foreup.github.io/atom.xml"/>
<subtitle>Daily Notes</subtitle>
<logo>https://foreup.github.io/images/avatar.png</logo>
<icon>https://foreup.github.io/favicon.ico</icon>
<rights>All rights reserved 2022, ForeUP's Blog</rights>
<entry>
<title type="html"><![CDATA[0215-New C++]]></title>
<id>https://foreup.github.io/post/0215-new-c/</id>
<link href="https://foreup.github.io/post/0215-new-c/">
</link>
<updated>2022-02-15T09:20:17.000Z</updated>
<content type="html"><![CDATA[<h2 id="画树">画树</h2>
<pre><code class="language-python">from turtle import Turtle,done,colormode,setup,title
from random import randint,uniform
title('给我一点时间,还你一棵树')
colormode(255) #设置画笔颜色模式,为随机生成RGB色彩做准备
p1 = Turtle() #实例化一个画笔
p1.width(20) #设置画笔宽度,初始宽度,主树干宽度
p1.speed(100) #设置画笔速度,具体是多少最快,查阅一下
p1.pencolor(randint(0,254),randint(0,254),randint(0,254)) #初始画笔颜色,这是随机颜色,可以用0,0,0,表示黑色
p1.hideturtle() #隐藏画笔外形
l = 150 #初始树干长度
s = 45 #侧枝生长角度 ,这里可以改,观察生长状态
p1.lt(90) #默认画笔在画布正中央,方向向右,lt(90)调整为向上。
p1.penup() #提起画笔,以便直接调整画笔位置,移动路径画布上没有痕迹
p1.bk(l) #向后(向下)移动,调整树干起点
p1.pendown() #落下画笔,可以继续画
p1.fd(l) #画主树干
plist = [p1] #列表化画笔,便于树干分支控制
def draw_tree(plist,l,s,level): #参数level为分支层数,数值越大分支越多,相对画的时间也越长。
l = l*uniform(0.7,0.8) #分支树干为上一个的随机比例,这里可以改
for p1 in plist:
w = p1.width()
p1.width(w*3/4)
p1.pencolor(randint(0,254),randint(0,254),randint(0,254))
p2 = p1.clone()
p1.lt(s)
p1.fd(l)
p2.rt(s)
p2.fd(l)
lst = []
lst.append(p1)
lst.append(p2)
if level > 0:
draw_tree(lst,l,s,level-1)
draw_tree(plist,l,s,5)
done()
</code></pre>
<h1 id="开始c暂停python">开始C++,暂停Python</h1>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[0214-第三方模块]]></title>
<id>https://foreup.github.io/post/0214-di-san-fang-mo-kuai/</id>
<link href="https://foreup.github.io/post/0214-di-san-fang-mo-kuai/">
</link>
<updated>2022-02-14T11:13:28.000Z</updated>
<content type="html"><![CDATA[<h2 id="pillow模块">pillow模块</h2>
<p><strong>制作验证码</strong></p>
<ul>
<li>编写返回随机颜色与随机字符的函数</li>
<li>创建图片画布</li>
<li>创建文字</li>
<li>创建ImageDraw.Draw对象</li>
<li>填充随机色块底色</li>
<li>填充文字</li>
<li>模糊</li>
</ul>
<pre><code class="language-python">from PIL import Image, ImageDraw, ImageFont, ImageFilter
import random
# 随机字母:
def rndChar():
return chr(random.randint(65, 90))
# 随机颜色1:
def rndColor():
return (random.randint(64, 255), random.randint(64, 255), random.randint(64, 255))
# 随机颜色2:
def rndColor2():
return (random.randint(32, 127), random.randint(32, 127), random.randint(32, 127))
# 240 x 60:
width = 200 * 4
height = 200
image = Image.new('RGB', (width, height), (255, 255, 255))
# 创建Font对象:
font = ImageFont.truetype(r"H:\APP_data\学习\CODE学习\Python\abc\black.ttf", 140)
# 创建Draw对象:
draw = ImageDraw.Draw(image)
# 填充每个像素:
for x in range(width):
for y in range(height):
draw.point((x, y), fill=rndColor())
# 输出文字:
for t in range(4):
draw.text((140 * t + 120, 25), rndChar(), font=font, fill=rndColor2())
# 模糊:
image = image.filter(ImageFilter.BLUR)
image.save('code.jpg', 'jpeg')
</code></pre>
<h2 id="requests">requests</h2>
<h2 id="chardet">chardet</h2>
<p>根据搜索同时安装<code>chardet</code>与<code>bs4</code>并使用如下准确性更高 :</p>
<pre><code class="language-python">from bs4 import UnicodeDammit
data = '离离原上草,一岁一枯荣'.encode('utf-8')
dammit = UnicodeDammit(data)
print(dammit.unicode_markup)
print(dammit.detector.chardet_encoding)
# 输出
离离原上草,一岁一枯荣
utf-8
</code></pre>
<h1 id="图形界面开始">图形界面开始</h1>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[0213-内置模块]]></title>
<id>https://foreup.github.io/post/0213-nei-zhi-mo-kuai/</id>
<link href="https://foreup.github.io/post/0213-nei-zhi-mo-kuai/">
</link>
<updated>2022-02-13T10:54:09.000Z</updated>
<content type="html"><![CDATA[<h2 id="collections">collections</h2>
<p><strong>namedtuple</strong></p>
<blockquote>
<pre><code>>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> p = Point(1, 2)
>>> p.x
>>> 1
>>> p.y
>>> 2
</code></pre>
</blockquote>
<p><strong>deque</strong><br>
支持<code>appendleft()</code>和<code>popleft()</code>;</p>
<blockquote>
<pre><code>>>> from collections import deque
>>> q = deque(['a', 'b', 'c'])
>>> q.append('x')
>>> q.appendleft('y')
>>> q
deque(['y', 'a', 'b', 'c', 'x'])
</code></pre>
</blockquote>
<p><strong>defaultdict</strong></p>
<blockquote>
<pre><code>>> from collections import defaultdict
>> dd = defaultdict(lambda: 'N/A')
>> dd['key1'] = 'abc'
>> dd['key1'] # key1存在
'abc'
>> dd['key2'] # key2不存在,返回默认值
'N/A'
</code></pre>
</blockquote>
<p><strong>OrderedDict</strong></p>
<pre><code class="language-python">>>> from collections import OrderedDict
>>> d = dict([('a', 1), ('b', 2), ('c', 3)])
>>> d # dict的Key是无序的
{'a': 1, 'c': 3, 'b': 2}
>>> od = OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> od # OrderedDict的Key是有序的
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
</code></pre>
<p><strong>ChainMap</strong></p>
<pre><code class="language-python">from collections import ChainMap
import os, argparse
# 构造缺省参数:
defaults = {
'color': 'red',
'user': 'guest'
}
# 构造命令行参数:
parser = argparse.ArgumentParser()
parser.add_argument('-u', '--user')
parser.add_argument('-c', '--color')
namespace = parser.parse_args()
command_line_args = { k: v for k, v in vars(namespace).items() if v }
# 组合成ChainMap:
combined = ChainMap(command_line_args, os.environ, defaults)
# 打印参数:
print('color=%s' % combined['color'])
print('user=%s' % combined['user'])
</code></pre>
<p><strong>Counter</strong></p>
<pre><code class="language-python">>>> from collections import Counter
>>> c = Counter()
>>> for ch in 'programming':
... c[ch] = c[ch] + 1
...
>>> c
Counter({'g': 2, 'm': 2, 'r': 2, 'a': 1, 'i': 1, 'o': 1, 'n': 1, 'p': 1})
>>> c.update('hello') # 也可以一次性update
>>> c
Counter({'r': 2, 'o': 2, 'g': 2, 'm': 2, 'l': 2, 'p': 1, 'a': 1, 'i': 1, 'n': 1, 'h': 1, 'e': 1})
</code></pre>
<h2 id="base64">base64</h2>
<pre><code class="language-python">>>> import base64
>>> base64.b64encode(b'binary\x00string')
b'YmluYXJ5AHN0cmluZw=='
>>> base64.b64decode(b'YmluYXJ5AHN0cmluZw==')
b'binary\x00string'
>>> base64.b64encode(b'i\xb7\x1d\xfb\xef\xff')
b'abcd++//'
>>> base64.urlsafe_b64encode(b'i\xb7\x1d\xfb\xef\xff')
b'abcd--__'
>>> base64.urlsafe_b64decode('abcd--__')
b'i\xb7\x1d\xfb\xef\xff'
</code></pre>
<h2 id="struct">struct</h2>
<pre><code class="language-python">>>> import struct
>>> struct.pack('>I', 10240099)
b'\x00\x9c@c'
</code></pre>
<p>检查任意文件是否是位图文件,如果是,打印出图片大小和颜色数(只需检测bmp文件):</p>
<pre><code class="language-python"># -*- coding: utf-8 -*-
import base64, struct
bmp_data = 图片的base64码
def bmp_info(data):
s = struct.unpack('<ccIIIIIIHH', data[:30])
if s[0] == b'B' and s[1] == b'M' :
print ('这是位图')
return {
'width': s[6],
'height': s[7],
'color': s[9]
}
</code></pre>
<p><strong>hashlib/哈希算法</strong><br>
用户密码“加盐”存储MD5的作用与方法;</p>
<pre><code class="language-python">import hashlib
md5 = hashlib.md5()
md5.update('how to use md5 in python hashlib?'.encode('utf-8'))
print(md5.hexdigest())
sha1 = hashlib.sha1()
sha1.update('how to use sha1 in '.encode('utf-8'))
sha1.update('python hashlib?'.encode('utf-8'))
print(sha1.hexdigest())
</code></pre>
<p><strong>hmac</strong></p>
<pre><code class="language-python">>>> import hmac
>>> message = b'Hello, world!'
>>> key = b'secret'
>>> h = hmac.new(key, message, digestmod='MD5')
>>> # 如果消息很长,可以多次调用h.update(msg)
>>> h.hexdigest()
'fa4ee7d173f2d97ee79022d1a7355bcf'
</code></pre>
<p><strong>itertools</strong><br>
<code>count()</code>创建一个无限的迭代器,打印出自然数序列:</p>
<pre><code class="language-python">>>> import itertools
>>> natuals = itertools.count(1)
>>> for n in natuals:
... print(n)
...
1
2
3
...
</code></pre>
<p><code>cycle()</code>会把传入的一个序列无限重复下去:</p>
<pre><code class="language-python">>>> import itertools
>>> cs = itertools.cycle('ABC') # 注意字符串也是序列的一种
>>> for c in cs:
... print(c)
...
'A'
'B'
'C'
'A'
'B'
'C'
...
</code></pre>
<p><code>repeat()</code>负责把一个元素无限重复下去,不过如果提供第二个参数就可以限定重复次数:</p>
<pre><code class="language-python">>>> ns = itertools.repeat('A', 3)
>>> for n in ns:
... print(n)
...
A
A
A
</code></pre>
<p><code>takewhile()</code>等函数根据条件判断来截取出一个有限的序列:</p>
<pre><code class="language-python">>>> natuals = itertools.count(1)
>>> ns = itertools.takewhile(lambda x: x <= 10, natuals)
>>> list(ns)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
</code></pre>
<p><strong>chain()</strong></p>
<pre><code class="language-python">>>> for c in itertools.chain('ABC', 'XYZ'):
... print(c)
# 迭代效果:'A' 'B' 'C' 'X' 'Y' 'Z'
</code></pre>
<p><strong>groupby()</strong></p>
<pre><code class="language-python">>>> for key, group in itertools.groupby('AAABBBCCAAA'):
... print(key, list(group))
...
A ['A', 'A', 'A']
B ['B', 'B', 'B']
C ['C', 'C']
A ['A', 'A', 'A']
</code></pre>
<p>忽略大小写分组:</p>
<pre><code class="language-python">>>> for key, group in itertools.groupby('AaaBBbcCAAa', lambda c: c.upper()):
... print(key, list(group))
...
A ['A', 'a', 'a']
B ['B', 'B', 'b']
C ['c', 'C']
A ['A', 'A', 'a']
</code></pre>
<p><strong>contextlib</strong><br>
<code>from contextlib import contextmanager</code> 连接上下文用来使用<code>with</code>语句:</p>
<pre><code class="language-python">@contextmanager
def tag(name):
print("<%s>" % name)
yield
print("</%s>" % name)
with tag("h1"):
print("hello")
print("world")
#结果
<h1>
hello
world
</h1>
</code></pre>
<p>用<code>closing()</code>来把该对象变为上下文对象:</p>
<pre><code class="language-python">from contextlib import closing
from urllib.request import urlopen
with closing(urlopen('https://www.python.org')) as page:
for line in page:
print(line)
</code></pre>
<p><strong>urllib</strong><br>
<code>urllib</code>的<code>request</code>模块</p>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[0212-爬取图片-随想]]></title>
<id>https://foreup.github.io/post/0212-pa-qu-tu-pian-sui-xiang/</id>
<link href="https://foreup.github.io/post/0212-pa-qu-tu-pian-sui-xiang/">
</link>
<updated>2022-02-12T11:49:35.000Z</updated>
<content type="html"><![CDATA[<p>使用BeautifulSoup与requests爬取美图:<br>
<code>r = requests.get(url, headers=headers, timeout = 30)</code>以请求头访问指定的URL;<br>
<code>r.encoding = r.apparent_encoding</code>网页的编码方式;<br>
<code>return r.text</code>返回网页的文字;</p>
<pre><code class="language-python">soup = BeautifulSoup(html, 'lxml')
img_ul = soup.find_all('img', {"class": "progressive__img progressive--not-loaded"})
</code></pre>
<p>上面代码用来选择网页的元素;</p>
<pre><code class="language-python">with open('路径\img\%s' % (image_name), 'wb') as f: #不知为啥必须是绝对路径且不能访问未创建的文件夹
for chunk in r.iter_content(chunk_size=128): #将网页内容可迭代
f.write(chunk)
</code></pre>
<p>上面用来写入图片;</p>
<pre><code class="language-python">import requests
from bs4 import BeautifulSoup
import os
def gethtmltext(url):
try:
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/51.0.2704.63 Safari/537.36'}
r = requests.get(url, headers=headers, timeout = 30)
r.raise_for_status()
r.encoding = r.apparent_encoding
return r.text
except:
print ("错误")
for i in range(11, 21):
URL = "https://bing.wilii.cn/index.asp"+("?page=%s" % i)
html = gethtmltext(URL)
soup = BeautifulSoup(html, 'lxml')
img_ul = soup.find_all('img', {"class": "progressive__img progressive--not-loaded"})
#print (soup) #打印网页源码
for img in img_ul:
url = "https://bing.wilii.cn"+img['src']
print (url)
r = requests.get(url, stream=True)
image_name = url.split('/')[-1]
#image_name = image_name0.split('?')[0]
with open('F:\MIUI_data\python_test\img\%s' % (image_name), 'wb') as f:
for chunk in r.iter_content(chunk_size=128):
f.write(chunk)
print('Saved %s' % image_name)
</code></pre>
<h2 id="随想">随想</h2>
<ul>
<li>文件批量重命名脚本<br>
<code>list</code>与<code>str</code>的相互转化;<br>
<code>list</code>的截取;</li>
<li>文件复制<br>
<code>shutil.copyfile</code><br>
<code>shutil.coytree</code></li>
</ul>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[0211-进程线程-爬虫基础]]></title>
<id>https://foreup.github.io/post/0211-jin-cheng-xian-cheng-pa-chong-ji-chu/</id>
<link href="https://foreup.github.io/post/0211-jin-cheng-xian-cheng-pa-chong-ji-chu/">
</link>
<updated>2022-02-11T11:36:56.000Z</updated>
<content type="html"><![CDATA[<h1 id="进程和线程">进程和线程</h1>
<h2 id="多进程">多进程</h2>
<h3 id="multiprocessing">multiprocessing</h3>
<p><strong><code>multiprocessing</code></strong> 模块提供了一个 <code>Process</code> 类来代表一个进程对象.<br>
<code>join()</code>方法可以等待子进程结束后再继续往下运行,或者说说将进程加入主进程,通常用于进程间的同步。</p>
<pre><code class="language-python">from multiprocessing import Process
import os
# 子进程要执行的代码
def run_proc(name):
print('Run child process %s (%s)...' % (name, os.getpid()))
if __name__=='__main__':
print('Parent process %s.' % os.getpid())
p = Process(target=run_proc, args=('test',))
print('Child process will start.')
p.start()
p.join()
print('Child process end.')
#结果
Parent process 928.
Child process will start.
Run child process test (929)...
Process end.
</code></pre>
<h2 id="多进程-2">多进程</h2>
<p><code>threading</code>高级模块,对<code>_thread</code>进行了封装。绝大多数情况下,我们只需要使用<code>threading</code>这个高级模块。<br>
启动一个线程就是把一个函数传入并创建<code>Thread</code>实例,然后调用<code>start()</code>开始执行:</p>
<pre><code class="language-python">import time, threading
# 新线程执行的代码:
def loop():
print('thread %s is running...' % threading.current_thread().name)
n = 0
while n < 5:
n = n + 1
print('thread %s >>> %s' % (threading.current_thread().name, n))
time.sleep(1)
print('thread %s ended.' % threading.current_thread().name)
print('thread %s is running...' % threading.current_thread().name)
t = threading.Thread(target=loop, name='LoopThread')
t.start()
t.join()
print('thread %s ended.' % threading.current_thread().name)
#执行结果
thread MainThread is running...
thread LoopThread is running...
thread LoopThread >>> 1
thread LoopThread >>> 2
thread LoopThread >>> 3
thread LoopThread >>> 4
thread LoopThread >>> 5
thread LoopThread ended.
thread MainThread ended.
</code></pre>
<h3 id="lock">Lock</h3>
<p>多进程的变量各自独立互不影响,多线程的变量相互共享,线程之间共享数据最大的危险在于多个线程同时改一个变量,把内容给改乱了。<br>
<strong>例子:</strong></p>
<pre><code class="language-python">import time, threading
# 假定这是你的银行存款:
balance = 0
def change_it(n):
# 先存后取,结果应该为0:
global balance
balance = balance + n
balance = balance - n
def run_thread(n):
for i in range(2000000):
change_it(n)
t1 = threading.Thread(target=run_thread, args=(5,))
t2 = threading.Thread(target=run_thread, args=(8,))
t1.start()
t2.start()
t1.join()
t2.join()
print(balance)
#结果
3
</code></pre>
<p>t1、t2交替执行时,只要循环次数足够多,<code>balance</code>的结果就不一定是<code>0</code>了。<br>
创建一个锁就是通过<code>threading.Lock()</code>来实现:</p>
<pre><code class="language-python">balance = 0
lock = threading.Lock()
def run_thread(n):
for i in range(100000):
# 先要获取锁:
lock.acquire()
try:
# 放心地改吧:
change_it(n)
finally:
# 改完了一定要释放锁:
lock.release()
</code></pre>
<p>获得锁的线程用完后一定要释放锁,否则那些苦苦等待锁的线程将永远等待下去,成为死线程,所以使用<code>try....finally....</code>确保释放锁。</p>
<h2 id="threadlocal">ThreadLocal</h2>
<p>全局变量<code>local_school</code>就是一个<code>ThreadLocal</code>对象,每个<code>Thread</code>对它都可以读写<code>student</code>属性,但互不影响;<br>
每个属性如<code>local_school.student</code>都是线程的局部变量,可以任意读写而互不干扰,也不用管理锁的问题,<code>ThreadLocal</code>内部会处理;<br>
可以理解为全局变量<code>local_school</code>是一个<code>dict</code>,不但可以用<code>local_school.student</code>,还可以绑定其他变量,如<code>local_school.teacher</code>等等:</p>
<pre><code class="language-python">import threading
# 创建全局ThreadLocal对象:
local_school = threading.local()
def process_student():
# 获取当前线程关联的student:
std = local_school.student
print('Hello, %s (in %s)' % (std, threading.current_thread().name))
def process_thread(name):
# 绑定ThreadLocal的student:
local_school.student = name
process_student()
t1 = threading.Thread(target= process_thread, args=('Alice',), name='Thread-A')
t2 = threading.Thread(target= process_thread, args=('Bob',), name='Thread-B')
t1.start()
t2.start()
t1.join()
t2.join()
#结果:
Hello, Alice (in Thread-A)
Hello, Bob (in Thread-B)
</code></pre>
<h2 id="分布式进程">分布式进程</h2>
<p>利用<code>managers</code>模块分装<code>queue</code>,<code>register</code>到网络上并设置<code>端口</code>与验证<code>authkey</code>,具体见下:<br>
两个<code>.py</code>文件,<code>master.py</code>和<code>worker.py</code>;<br>
下列代码可能在Windows下有错误,修改见 <strong><a href="https://www.liaoxuefeng.com/discuss/969955749132672/1441664361037857">廖雪峰网站评论区</a></strong></p>
<pre><code class="language-python"># task_master.py
import random, time, queue
from multiprocessing.managers import BaseManager
# 发送任务的队列:
task_queue = queue.Queue()
# 接收结果的队列:
result_queue = queue.Queue()
# 从BaseManager继承的QueueManager:
class QueueManager(BaseManager):
pass
# 把两个Queue都注册到网络上, callable参数关联了Queue对象:
QueueManager.register('get_task_queue', callable=lambda: task_queue)
QueueManager.register('get_result_queue', callable=lambda: result_queue)
# 绑定端口5000, 设置验证码'abc':
manager = QueueManager(address=('', 5000), authkey=b'abc')
# 启动Queue:
manager.start()
# 获得通过网络访问的Queue对象:
task = manager.get_task_queue()
result = manager.get_result_queue()
# 放几个任务进去:
for i in range(10):
n = random.randint(0, 10000)
print('Put task %d...' % n)
task.put(n)
# 从result队列读取结果:
print('Try get results...')
for i in range(10):
r = result.get(timeout=10)
print('Result: %s' % r)
# 关闭:
manager.shutdown()
print('master exit.'
</code></pre>
<p>添加任务到<code>Queue</code>不可以直接对原始的<code>task_queue</code>进行操作,那样就绕过了<code>QueueManager</code>的封装,必须通过<code>manager.get_task_queue()</code>获得的<code>Queue</code>接口添加。</p>
<pre><code class="language-python"># task_worker.py
import time, sys, queue
from multiprocessing.managers import BaseManager
# 创建类似的QueueManager:
class QueueManager(BaseManager):
pass
# 由于这个QueueManager只从网络上获取Queue,所以注册时只提供名字:
QueueManager.register('get_task_queue')
QueueManager.register('get_result_queue')
# 连接到服务器,也就是运行task_master.py的机器:
server_addr = '127.0.0.1'
print('Connect to server %s...' % server_addr)
# 端口和验证码注意保持与task_master.py设置的完全一致:
m = QueueManager(address=(server_addr, 5000), authkey=b'abc')
# 从网络连接:
m.connect()
# 获取Queue的对象:
task = m.get_task_queue()
result = m.get_result_queue()
# 从task队列取任务,并把结果写入result队列:
for i in range(10):
try:
n = task.get(timeout=1)
print('run task %d * %d...' % (n, n))
r = '%d * %d = %d' % (n, n, n*n)
time.sleep(1)
result.put(r)
except Queue.Empty: #或者是queue.Empty
print('task queue is empty.')
# 处理结束:
print('worker exit.')
</code></pre>
<p><strong>结果</strong><br>
先启动task_master.py服务进程:</p>
<pre><code class="language-python">$ python3 task_master.py
Put task 3411...
Put task 1605...
Put task 1398...
Put task 4729...
Put task 5300...
Put task 7471...
Put task 68...
Put task 4219...
Put task 339...
Put task 7866...
Try get results...
</code></pre>
<p><code>task_master.py</code>进程发送完任务后,开始等待<code>result</code>队列的结果。现在启动<code>task_worker.py</code>进程:</p>
<pre><code class="language-python">$ python3 task_worker.py
Connect to server 127.0.0.1...
run task 3411 * 3411...
run task 1605 * 1605...
run task 1398 * 1398...
run task 4729 * 4729...
run task 5300 * 5300...
run task 7471 * 7471...
run task 68 * 68...
run task 4219 * 4219...
run task 339 * 339...
run task 7866 * 7866...
worker exit.
</code></pre>
<p><code>task_worker.py</code>进程结束,在<code>task_master.py</code>进程中会继续打印出结果:</p>
<pre><code class="language-python">Result: 3411 * 3411 = 11634921
Result: 1605 * 1605 = 2576025
Result: 1398 * 1398 = 1954404
Result: 4729 * 4729 = 22363441
Result: 5300 * 5300 = 28090000
Result: 7471 * 7471 = 55815841
Result: 68 * 68 = 4624
Result: 4219 * 4219 = 17799961
Result: 339 * 339 = 114921
Result: 7866 * 7866 = 61873956
</code></pre>
<h1 id="莫烦爬虫">莫烦爬虫</h1>
<p><strong>打开网页</strong></p>
<pre><code class="language-python">from urllib.request import urlopen
# if has Chinese, apply decode()
html = urlopen(
"https://mofanpy.com/static/scraping/basic-structure.html"
).read().decode('utf-8')
print(html)
</code></pre>
<p><strong>BeautifulSoup</strong><br>
利用beautifulsoup<strong>爬取百度百科</strong>词条:<br>
重点:</p>
<pre><code class="language-python">#注意结果的形式及匹配方式
soup.find_all('a', {"target":"_blank", "href":re.compile("/item/(%.+)+$")})
soup.find('h1').get_text()
#注意随机排序与获取一个例子
sublink.append(random.sample(nexturl, 1)[0]['href'])
</code></pre>
<p>代码:</p>
<pre><code class="language-python">from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
import random
sublink = ["/item/%E7%BD%91%E7%BB%9C%E7%88%AC%E8%99%AB/5162711"]
mainlink= "https://baike.baidu.com"
for i in range(20):
alink = mainlink + sublink[-1]
html = urlopen(alink).read().decode('utf-8')
soup = BeautifulSoup(html, features='lxml')
print (soup.find('h1').get_text(), ' url:', sublink[-1])
nexturl = soup.find_all('a', {"target":"_blank", "href":re.compile("/item/(%.+)+$")})
#print (nexturl)
if len(nexturl)!=0 :
sublink.append(random.sample(nexturl, 1)[0]['href'])
else :
sublink.pop()
</code></pre>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[0210-文件操作]]></title>
<id>https://foreup.github.io/post/0210-wen-jian-cao-zuo/</id>
<link href="https://foreup.github.io/post/0210-wen-jian-cao-zuo/">
</link>
<updated>2022-02-10T09:46:31.000Z</updated>
<content type="html"><![CDATA[<h1 id="io编程">IO编程</h1>
<p>本章主要讲同步IO,异步IO太复杂,后续涉及到服务端再讲.</p>
<h2 id="文件读写">文件读写</h2>
<p><strong>读文件</strong><br>
<code>open()</code> 函数, <code>f.close()</code> 关闭文件;<br>
调用<code>read()</code>会一次性读取文件的全部内容, <code>read(size)</code>读取<code>size</code>个字符/字节的内容.<br>
<code>readline()</code>可以每次<strong>读取一行</strong>内容,调用<code>readlines()</code>一次读取所有内容并<strong>按行返回list</strong></p>
<pre><code class="language-python">>>> with open('ABC.txt', 'r') as fname :
>>> str = fname.read()
>>> f = open('/Users/michael/test.jpg', 'rb') #以二进制打开, 打开文本可以用decode解码为字符串
>>> f.read()
b'\xff\xd8\xff\xe1\x00\x18Exif\x00\x00...' # 十六进制表示的字节
for line in f.readlines():
print(line.strip()) # strip参数为空,把末尾的'\n'或空格删掉
</code></pre>
<p><strong>写文件</strong><br>
调用<code>open()</code>函数时,传入标识符<code>'w'</code>或者<code>'wb'</code>表示写文本文件或写二进制文件:新建或删除存在的内容.<br>
追加内容用标识符:<code>a</code>.<br>
<strong>StringIO和BytesIO</strong></p>
<ul>
<li><code>tell</code> 方法获取当前文件读取指针的位置</li>
<li><code>seek</code> 方法,用于移动文件读写指针到指定位置,有两个参数,第一个<strong>offset</strong>: 偏移量,需要向前或向后的字节数,<strong>正为向后,负为向前</strong>;第二个<strong>whence</strong>: 可选值,默认为<code>0</code>,表示文件<strong>开头</strong>,<code>1</code>表示相对于<strong>当前</strong>的位置,<code>2</code>表示文件<strong>末尾</strong></li>
<li>用<code>seek</code>方法时,需注意,如果你打开的文件没有用<b><code>'b'</code></b>的方式打开,则<code>offset</code>无法使用<strong>负值</strong>哦</li>
</ul>
<pre><code class="language-python"># stringIO 比如说,这时候,你需要对获取到的数据进行操作,但是你并不想把数据写到本地硬盘上,这时候你就可以用stringIO
from io import StringIO
from io import BytesIO
def outputstring():
return 'string \nfrom \noutputstring \nfunction'
s = outputstring()
# 将函数返回的数据在内存中读
sio = StringIO(s)
# 可以用StringIO本身的方法
print(sio.getvalue())
# 也可以用file-like object的方法
s = sio.readlines()
for i in s:
print(i.strip())
# 将函数返回的数据在内存中写
sio = StringIO()
sio.write(s)
# 可以用StringIO本身的方法查看
s=sio.getvalue()
print(s)
# 如果你用file-like object的方法查看的时候,你会发现数据为空
sio = StringIO()
sio.write(s)
for i in sio.readlines():
print(i.strip())
# 这时候我们需要修改下文件的指针位置
# 我们发现可以打印出内容了
sio = StringIO()
sio.write(s)
sio.seek(0,0)
print(sio.tell())
for i in sio.readlines():
print(i.strip())
# 上面涉及到了两个方法seek 和 tell
# stringIO 只能操作str,如果要操作二进制数据,就需要用到BytesIO
# 上面的sio无法用seek从当前位置向前移动,这时候,我们用'b'的方式写入数据,就可以向前移动了
bio = BytesIO()
bio.write(s.encode('utf-8'))
print(bio.getvalue())
bio.seek(-36,1)
print(bio.tell())
for i in bio.readlines():
print(i.strip())
</code></pre>
<h2 id="操作文件与目录">操作文件与目录</h2>
<p><code>os.chdir(file_path)</code><br>
进入file_path路径;<br>
<code>os.path.abspath('.')</code><br>
当前位置的绝对路径<br>
<code>os.path.join(pwd,x)</code><br>
合并路径<br>
<code>os.listdir(pwd)</code><br>
列出当前路径所有文件及目录(不包括子目录)<br>
<code>os.path.isfile(os.path.join(pwd,x))</code><br>
判断括号内的路径是否为文件<br>
<code>os.path.isdir(os.path.join(pwd,x))</code><br>
判断括号内的路径是否为目录<br>
<code>shutil</code>模块提供了<code>copyfile()</code>的函数</p>
<pre><code class="language-python">>>> os.path.split('/Users/michael/testdir/file.txt')
('/Users/michael/testdir', 'file.txt')
>>> os.path.splitext('/path/to/file.txt') #获得文件拓展名
('/path/to/file', '.txt')
</code></pre>
<p><strong>查找</strong>当前目录及子目录的指定文件名:</p>
<pre><code class="language-python">import os
def findfile(s, file_path):
#进入当前目录
os.chdir(file_path)
#查找当前目录包含输入字符串的文件
L = [x for x in os.listdir('.') if os.path.isfile(x)]
for x in L:
#查找是否包含s,包含返回开始s的下标,不包含返回-1
if x.find(s)!=-1:
print(os.path.join(file_path,x))
else:
pass
#查找各目录中是否包含输入字符串的文件
Y = [x for x in os.listdir('.') if os.path.isdir(x)]
for x in Y:
file_path2=os.path.join(file_path,x)
findfile(s,file_path2)
def main():
#path = input('请输入绝对路径:')
path = os.path.abspath('.')
s = input('请输入要查找的字符串:')
result = findfile(s, path)
if result == None :
print ('没找到或查找结束!')
if __name__ == '__main__':
main()
</code></pre>
<h2 id="序列化">序列化</h2>
<p><strong>pickle</strong><br>
<code>pickle.dumps()</code>方法把任意对象序列化成一个<code>bytes</code>;<br>
<code>pickle.loads()</code>方法反序列化出对象;<br>
<code>pickle.dump()</code>直接把对象序列化后写入一个<code>file-like Object</code>即写入文件;<br>
<code>pickle.load()</code>方法从一个<code>file-like Object</code>中直接反序列化出对象.</p>
<pre><code class="language-python">>>> import pickle
>>> d = dict(name='Bob', age=20, score=88)
>>> pickle.dumps(d)
b'\x80\x03}q\x00(X\x03\x00\x00\x00ageq\x01K\x14X\x05\x00\x00\x00scoreq\x02KXX\x04\x00\x00\x00nameq\x03X\x03\x00\x00\x00Bobq\x04u.'
>>> f = open('dump.txt', 'wb')
>>> pickle.dump(d, f)
>>> f.close()
>>> f = open('dump.txt', 'rb')
>>> d = pickle.load(f)
>>> f.close()
>>> d
{'age': 20, 'score': 88, 'name': 'Bob'}
</code></pre>
<p><strong>JSON</strong></p>
<table>
<thead>
<tr>
<th>JSON类型</th>
<th>Python类型</th>
</tr>
</thead>
<tbody>
<tr>
<td>{}</td>
<td>dict</td>
</tr>
<tr>
<td>[]</td>
<td>list</td>
</tr>
<tr>
<td>"string"</td>
<td>str</td>
</tr>
<tr>
<td>1234.56</td>
<td>int或float</td>
</tr>
<tr>
<td>true/false</td>
<td>True/False</td>
</tr>
<tr>
<td>null</td>
<td>None</td>
</tr>
</tbody>
</table>
<p><code>json</code>模块,<code>dumps()</code>方法返回一个<code>str</code>,内容就是标准的<code>JSON</code>;类似的,<code>dump()</code>方法可以直接把<code>JSON</code>写入一个<code>file-like Object</code>。</p>
<pre><code class="language-python">>>> import json
>>> d = dict(name='Bob', age=20, score=88)
>>> json.dumps(d)
'{"age": 20, "score": 88, "name": "Bob"}'
>>> json_str = '{"age": 20, "score": 88, "name": "Bob"}'
>>> json.loads(json_str)
{'age': 20, 'score': 88, 'name': 'Bob'}
</code></pre>
<p><strong><code>class</code></strong> 转为json对象:<br>
<code>dumps()</code>方法的<code>default</code>参数可以定义将<code>class</code>转为<code>dict</code>的函数;</p>
<pre><code class="language-python">import json
class Student(object):
def __init__(self, name, age, score):
self.name = name
self.age = age
self.score = score
s = Student('Bob', 20, 88)
def student2dict(std):
return {
'name': std.name,
'age': std.age,
'score': std.score
}
>>> print(json.dumps(s, default=student2dict))
{"age": 20, "name": "Bob", "score": 88}
</code></pre>
<p>通常<code>class</code>的实例都有一个<code>__dict__</code>属性,它就是一个<code>dict</code>,用来存储实例变量; 少数例外,比如定义了<code>__slots__</code>的<code>class</code>。</p>
<pre><code class="language-python">print(json.dumps(s, default=lambda obj: obj.__dict__))
</code></pre>
<p><code>object_hook</code>函数负责把<code>dict</code>转换为<code>Student</code>实例:</p>
<pre><code class="language-python">def dict2student(d):
return Student(d['name'], d['age'], d['score'])
>>> json_str = '{"age": 20, "score": 88, "name": "Bob"}'
>>> print(json.loads(json_str, object_hook=dict2student))
<__main__.Student object at 0x10cd3c190>
</code></pre>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[0209-对象高级编程-错误测试]]></title>
<id>https://foreup.github.io/post/0209-dui-xiang-gao-ji-bian-cheng-cuo-wu-ce-shi/</id>
<link href="https://foreup.github.io/post/0209-dui-xiang-gao-ji-bian-cheng-cuo-wu-ce-shi/">
</link>
<updated>2022-02-09T11:42:07.000Z</updated>
<content type="html"><![CDATA[<h2 id="使用__slots_">使用__slots_</h2>
<p>限制实例可添加的属性, 但对当前类中的<code>__slots__</code>对子类无作用:</p>
<pre><code class="language-python">class Student(object):
__slots__ = ('name', 'age') # 用tuple定义允许绑定的属性名称
>>> s = Student() # 创建新的实例
>>> s.name = 'Michael' # 绑定属性'name'
>>> s.age = 25 # 绑定属性'age'
>>> s.score = 99 # 绑定属性'score'--报错
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Student' object has no attribute 'score'
#对子类不起作用
>>> class GraduateStudent(Student):
... pass
...
>>> g = GraduateStudent()
>>> g.score = 9999 #不报错
</code></pre>
<h2 id="使用property">使用@property</h2>
<p><code>@property</code>默认只赋予<code>getter</code>属性, 如需<code>setter</code>需要用到<code>属性名.setter</code>:<br>
<code>@property</code>给一个Screen对象加上<code>width</code>和<code>height</code>属性,以及一个只读属性<code>resolution</code>:</p>
<pre><code class="language-python">class Screen(object):
@property
def width(self):
return self._width
@property
def height(self):
return self._height
@property
def resolution(self):
return self._width*self._height
@height.setter
def height(self, x):
if isinstance(x, int):
self._height = x
else:
raise ValueError('check your value')
@width.setter
def width(self, x):
if isinstance(x, int):
self._width= x
else:
raise ValueError('check your value')
# 测试:
s = Screen()
s.width = 1024