记录一次cookies免登录
cookies的作用cookies在我们浏览器的客户端,通过cookies我们可以把我们的个人标识信息传送到服务器端,而在服务器端正好有与我们cookies相对应的session包含个人信息(登录信息,偏好信息,等等)。当我们访问服务器时(每一个requests都带有cookies),服务器会response我们的个人信息。这样我们每一次访问同一个域名网站就不用一直填写登录信息了。 爬虫中cookies的应用在爬虫中,我们每发送一次request都要把cookies带上(之前就是忽略了某些request的cookies导致找了很久都不知道哪里错) 在浏览器的开发者选项中,我们都能找到我们的cookies,我们只需要复制他们,并改写成字典形式,便可以传递到request中。 原始cookies 1wordpress_test_cookie=WP%20Cookie%20check;...
scrapy
概况下面就是scrapy的大致框架图,我们先做一个案例再慢慢介绍 准备工作确保scrapy项目要在根目录上运行,原因是: Scrapy 项目通常是一个 Python 包,当你在 Scrapy 项目的根目录之外运行代码时,Python 可能无法正确找到 scrapytutorial 这个包,导致 无法解析 scrapytutorial.items。这是因为 Python 的 模块搜索路径 (sys.path) 不包含 Scrapy 项目的根目录。 创建scrapy项目 12scrapy startproject <project_name>scrapy startproject scrapytutorial 创建spider 12345cd <project_name>scrapy genspider <spider_name> <domain>cd scrapytutorialscrapy genspider quotes...
在终端中配置代理
使用什么端口号?当我们用代理软件之后,要传送到哪个端口号呢?我们先说结论: 如果是在浏览器、终端、Python 代码 ,用本地代理 127.0.0.1:10808(Mixed Port) 如果是在 Clash/V2Ray 里配置远程代理 , 用远程 123.45.67.89:443(代理服务器端口) 为什么会这样呢?本地软件(浏览器、终端、Python 代码等) 只需要和 本地代理(Clash/V2Ray) 通信,而不需要直接访问远程代理服务器。 Clash/V2Ray 会在本地开启一个 Mixed Port(如 127.0.0.1:10808),这个端口会:✅ 接收 HTTP/SOCKS5 代理请求(浏览器、Python 代码等)✅ 自动选择最佳的远程代理服务器✅ 处理数据加密、分流等复杂逻辑 终端配置代理方法12set http_proxy=http://127.0.0.1:10808(v2ray提供给本地的mixed port)set...
JavaScript反爬虫原理
css和JS修改浏览器的DOM在我们可以浏览到的网页中,他们设置的反爬虫机制就是通过css和js修改原本html文件中的标签内容(把数据内容放到js中),使得我们无法直接从html文件中获得我们想要的数据。也就是利用了我们平常的爬虫工具中没有js解释器和css解释器这一个弊端达到了反爬的效果 而这个dom就是经浏览器渲染之后的标签。虽然css和js是不能修改html文件中的标签,但能修改dom。
python comprehensions
List ComprehensionList comprehension offers a shorter syntax when you want to create a new list based on the values of an existing list. The Syntaxnewlist = [expression for item in iterable if condition == True] The return value is a new list, leaving the old list unchanged. ConditionThe condition is like a filter that only accepts the items that valuate to True. IterableThe iterable can be any iterable object, like a list, tuple, set etc. ExpressionThe expression is the...
Failed to connect to github.com port 443 after 21193 ms: Timed out
这种连接不上github的问题也不是第一次见了,一直被这种问题困扰。 解决方法: 修改DNS: 114.114.114.114 or 其他
Tkinter
IntroductionThe foundational element of a Tkinter GUI is the window. Windows are the containers in which all other GUI elements live. These other GUI elements, such as text boxes, labels, and buttons, are known as widgets. windowThe first thing you need to do is import the Python GUI Tkinter module: 1import tkinter as tk A window is an instance of Tkinter’s Tk class. Go ahead and create a new window and assign it to the variable window: 1window = tk.Tk() widgetsUse the tk.Label class to...
Basic syntax
stringUser Input Stringscin considers a space (whitespace, tabs, etc) as a terminating character, which means that it can only display a single word (even if you type many words): 1234567string fullName;cout << "Type your full name: ";cin >> fullName;cout << "Your name is: " << fullName;// Type your full name: John Doe// Your name is: John That’s why, when working with strings, we often use the getline() function to read a line of text. It takes...
jQuery
jQuery IntroductionjQuery is a JavaScript Library. jQuery greatly simplifies JavaScript programming. jQuery also simplifies a lot of the complicated things from JavaScript, like AJAX calls and DOM manipulation. The jQuery library contains the following features: HTML/DOM manipulation CSS manipulation HTML event methods Effects and animations AJAX Utilities Tip: In addition, jQuery has plugins for almost any task out there. jQuery SyntaxThe jQuery syntax is tailor-made for selecting...
Ajax
AJAX IntroductionAJAX is a developer’s dream, because you can: Read data from a web server - after the page has loaded Update a web page without reloading the page Send data to a web server - in the background AJAX = Asynchronous JavaScript And XML. AJAX is not a programming language. AJAX just uses a combination of: A browser built-in XMLHttpRequest object (to request data from a web server) JavaScript and HTML DOM (to display or use the data) AJAX applications might use XML to...
