from requests_html import HTMLSession
def get_url(search_text):
session = HTMLSession()
template = 'https://www.amazon.com/s?k={}'
search_term = search_text.replace(' ', '+')
url = template.format(search_term)
url += '&pages={}'
products = []
以下是我感到困惑的部分
for x in range(1,21):
url_inc = url.format(x)
r = session.get(url_inc)
r.html.render(sleep=1)
for item in r.html.xpath('//*[@class="a-size-medium a-color-base a-text-normal"]'):
product = item.text
products.append(product)
return products
它是将在main()上运行的功能之一。
我希望函数将内部的x值更新&page={x}
20次并html.xpath
为每个页面执行#。
最终,我希望它xpath and item.text
从所有迭代中返回附加的信息列表(20次)。
当前,它给出第一次迭代的输出,并打印相同的结果20次。
我是否在嵌套for循环中缺少某些东西?
代码中没有错误,缩进错误导致未生成Urls。我修复了缩进。这是有效的解决方案。
from requests_html import HTMLSession
def get_url(search_text):
session = HTMLSession()
template = 'https://www.amazon.com/s?k={}'
search_term = search_text.replace(' ', '+')
url = template.format(search_term)
url += '&pages={}'
products = []
for x in range(1, 21):
url_inc = url.format(x)
print(url_inc)
r = session.get(url_inc)
r.html.render(sleep=1)
items = r.html.find('h2')
for item in items:
product = item.text
print(product)
products.append(product)
return products
s_list = get_url('Laptop')
输出:-
https://www.amazon.com/s?k=Laptop&pages=1
Laptop 14 Inch, Intel Celeron Processor J3455, Quad-Core, Windows 10, 6GB RAM, 128GB SSD Storage, HD IPS Display, Touch Numeric Keypad, Webcam Baffle, Winbook 140, Space Grey
Acer Aspire 5 Slim Laptop, 15.6 inches Full HD IPS Display, AMD Ryzen 3 3200U, Vega 3 Graphics, 4GB DDR4, 128GB SSD, Backlit Keyboard, Windows 10 in S Mode, A515-43-R19L, Silver
Lenovo IdeaPad 3 14" Laptop, 14.0" FHD 1920 x 1080 Display, AMD Ryzen 5 3500U Processor, 8GB DDR4 RAM, 256GB SSD, AMD Radeon Vega 8 Graphics, Narrow Bezel, Windows 10, 81W0003QUS, Abyss Blue
Acer Nitro 5 Gaming Laptop, 9th Gen Intel Core i5-9300H, NVIDIA GeForce GTX 1650, 15.6" Full HD IPS Display, 8GB DDR4, 256GB NVMe SSD, Wi-Fi 6, Backlit Keyboard, Alexa Built-in, AN515-54-5812
Acer Aspire 5 Slim Laptop, 15.6 inches Full HD IPS Display, AMD Ryzen 3 3200U, Vega 3 Graphics, 4GB DDR4, 128GB SSD, Backlit Keyboard, Windows 10 in S Mode, A515-43-R19L, Silver
HP 15-dy1036nr 10th Gen Intel Core i5-1035G1, 15.6-Inch FHD Laptop, Natural Silver
Acer Chromebook Spin 311 Convertible Laptop, Intel Celeron N4020, 11.6" HD Touch, 4GB LPDDR4, 32GB eMMC, Gigabit Wi-Fi 5, Bluetooth 5.0, Google Chrome, CP311-2H-C679
Acer Predator Helios 300 Gaming Laptop, Intel i7-10750H, NVIDIA GeForce RTX 2060 6GB, 15.6" Full HD 144Hz 3ms IPS Display, 16GB Dual-Channel DDR4, 512GB NVMe SSD, Wi-Fi 6, RGB Keyboard, PH315-53-72XD
ASUS F512JA-AS34 VivoBook 15 Thin and Light Laptop, 15.6” FHD Display, Intel i3-1005G1 CPU, 8GB RAM, 128GB SSD, Backlit Keyboard, Fingerprint, Windows 10 Home in S Mode, Slate Gray
ASUS F512DA-EB51 VivoBook 15 Thin And Light Laptop, 15.6” Full HD, AMD Quad Core R5-3500U CPU, 8GB DDR4 RAM, 256GB PCIe SSD, AMD Radeon Vega 8 Graphics, Windows 10 Home,Slate Gray
ASUS ROG G531GT-BI7N6 15.6" FHD Gaming Laptop Computer, Intel Hexa-Core i7-9750H Up to 4.5GHz, 8GB DDR4, 512GB SSD, NVIDIA GeForce GTX 1650, 802.11ac WiFi, HDMI, USB 3.0, Windows 10
2019 ASUS ROG 15.6" FHD Gaming Laptop Computer, Intel Hexa-Core i7-9750H Up to 4.5GHz, 16GB DDR4, 1TB HDD + 512GB SSD, NVIDIA GeForce GTX 1650, 802.11ac WiFi, HDMI, USB 3.0, Windows 10
Acer Aspire 5 Slim Laptop, 15.6 inches Full HD IPS Display, AMD Ryzen 3 3200U, Vega 3 Graphics, 4GB DDR4, 128GB SSD, Backlit Keyboard, Windows 10 in S Mode, A515-43-R19L, Silver
MSI GL65 Leopard 10SFK-062 15.6" FHD 144Hz 3ms Thin Bezel Gaming Laptop Intel Core i7-10750H RTX2070 16GB 512GB NVMe SSD Win 10
HP 15-dy1036nr 10th Gen Intel Core i5-1035G1, 15.6-Inch FHD Laptop, Natural Silver
iProda Laptop, 14.1 Inch Notebook (Intel Core i3-6157U 2.4GHz, 8GB RAM, 256GB SSD, Windows 10 Professional) with 1080P FHD Display, Lightweight, Best for Work at Home
...
https://www.amazon.com/s?k=Laptop&pages=2 #Now second URL
它会打印从page = 1到21的URL。但是它只会打印相同的xpath结果20次,而不打印20个不同的xpath结果。
在这种情况下,我们需要使用其他选择器。
@doghoney更新了答案。
谢谢您的帮助。但是嵌套的for循环仍然有问题。外循环迭代执行20次,然后执行内循环。我如何使外循环执行一次->内循环一次->下一个外循环->下一个内循环-> ... 20次?
感谢您的后续跟进!大有帮助。