Newest 'beautifulsoup' Questions - Stack Overflow

Questions tagged [beautifulsoup]

Beautiful Soup is a Python package for parsing HTML/XML. The latest version of this package is version 4, imported as bs4.

1
vote
0answers
23 views

Crawling over a website directories using BeautifulSoup?

This is my code: https://pastebin.com/R11qiTF4 from bs4 import BeautifulSoup as soup from urllib.request import urlopen as req from urllib.parse import urljoin import re urls = ["https://www.helios-...
0
votes
1answer
28 views

Using Beautifulsoup to scrape tags from website but skip / ignore some others

I'm using the following code to scrape the tags amongst others: for content in soup.find_all(): try: link = content.find('enclosure') link = link.get('url') print "\...
1
vote
2answers
37 views

Scraping data from a page body that isn't visible on initial load

I am trying to use beautiful soup to scrape data from this website. If you scroll down to the Individual Plays section, click "share and more > get table as csv" a CSV form of the tabulated data will ...
0
votes
1answer
18 views

How to change the page no. of a page's search result for Web Scraping using Python?

I am scraping data from a webpage which contains search results using Python. I am able to scrape data from the 1st search result page. I want to loop using the same code, changing the search result ...
0
votes
2answers
25 views

Redfin scraper for getting redfin estimate

I have a few posts about this but I have found a new issue. You will notice from link1 and link2 that depending on whether the house is on the market or not the page will have a different way of where ...
2
votes
0answers
22 views

How to convert Navigable String to File Object

I am trying to get some data from a website (using the modules named requests & BeautifulSoup) and print it in a text file but every time I try to do so, it says the following: TypeError: ...
1
vote
2answers
40 views

Python parse XML file into pandas dataframe

I have the below xml structure and I am trying to convert the xml data into a structured pandas dataframe. I have read a number of stackoverflow posts about xml conversion using both xml.etree....
1
vote
2answers
37 views

How to unwrap parent using beautiful soup4

Given an html page source such as: <html> <head></head> <body> <p><nobr><a href="...">Some link text</a></nobr><p> </body> &...
0
votes
0answers
24 views

soup.select('.r a') in f'https://google.com/search?q={query}' brings back empty list in Python BeautifulSoup. **NOT A DUPLICATE**

The "I'm Feeling Lucky!" project in the "Automate the boring stuff with Python" ebook no longer works with the code he provided. Specifically, the linkElems = soup.select('.r a') I've already tried ...
0
votes
2answers
36 views

how do we select the child element tbody after extracting the entire html?

I'm still a python noob trying to learn beautifulsoup.I looked at solutions on stack but was unsuccessful Please help me to understand this better. i have extracted the html which is as shown below &...
0
votes
1answer
24 views

use beautifulsoup for content in dataframe pandas

I have a text file sent to me regularly with html content in one of the columns. I was hoping I can do a beautifulsoup against that column but seems like sources are limited out there. sample.csv: ...
1
vote
1answer
33 views

Can't find specific table using BeautifulSoup

I have been using BeautifulSoup to scrape the pricing information from "https://www.huaweicloud.com/pricing.html#/ecs" I want to extract the table information of that website, but I get nothing. I ...
1
vote
2answers
26 views

How to use pd.DataFrame method to manually create a dataframe from info scraped using beautifulsoup4

I made it to the point where all tr data data has been scraped and I am able to get a nice printout. But when I go to implement the pd.DataFrame as in df= pd.DataFrame({"A": a}) etc, I get a syntax ...
0
votes
1answer
31 views

Downloading pandas output as saved files instead of links

I'm trying to add an image to each episode for a podcast scrape. The RSS feed is a great help, but it seems the unique link for each episode can't be used as it's not an actual link as there's no .jpg ...
0
votes
3answers
33 views

Python beautifulsoup printing the element desired

After I find the desired element I have this: [<div class="statsValue">$1,615,422</div>, <div class="statsValue">1</div>, <div class="statsValue">2</div>] I would ...
0
votes
2answers
47 views

Python Scrape of Wikipedia table then export to csv

I have followed a tuitorial to scrape a table then export the data to a csv file. I am getting an error through PyCharm when I try to execute the file saying " Traceback (most recent call last): ...
0
votes
1answer
40 views

How to scrape data from javascript environment that don't have any table in its source?

I'm developing a code to get into a javascript environment, then I want to scrape the data from the website using BeautifulSoup. The point is that I realized that there isn't any table in the ...
0
votes
4answers
54 views

Given an html paragraph and a link, is there a way to retrieve the text before and the text after the link inside the paragraph in Python?

I am using urllib3 to get the html of some pages. I want to retrieve the text from the paragraph where the link is, with the text before and after the link stored separately. For example: import ...
0
votes
0answers
29 views

(python) urlopen doesn't work and gives an error [on hold]

I'm trying to write the code below and I dont see any mistakes but it still gives out errors My code: from bs4 import BeautifulSoup as soup from urllib.request import urlopen url = 'https://www....
0
votes
0answers
40 views

Convert Python selenium code to work faster

Below code is working fine but it is taking too much time. Is there any way i can do this faster? I tried BeautifulSoup but was unable to fetch the nested table entry data. Can anyone help how this ...
0
votes
0answers
31 views

Randomly through “AttributeError: 'NoneType' object has no attribute 'text'” error during web-scraping a list of urls

I have a list of URL which I like to loop through and scrape them. My code works fine for each URL alone! Though when I loop through the list, it starts giving me this error: Traceback (most recent ...
0
votes
1answer
29 views

missing extracting text with beautifulsoup

I am using beautiful soup to extract data from ul and li tags. I can get a date, but some words are missing and there is no place between the line. <li>Developing <span class="bte bte-78432-...
0
votes
1answer
54 views

Python / BeautifulSoup Image Scrapping Does Not Save Animated GIFs Correctly

I have a piece of Python code that helps me with scraping some images from a website every morning - for a daily project I am responsible for. It all works fine and I get JPGs and PNGs with no issues. ...
0
votes
1answer
31 views

Scraping Variants in <ul Tag with BS4 and Python

I would like to scrape this webpage https://www.off---white.com/en/GB/men/products/omia139f198000403020# / view-source:https://www.off---white.com/en/GB/men/products/omia139f198000403020# for the ...
1
vote
5answers
47 views

How to extract links from elements?

I am trying to extract the links of every individual member but I am not getting output: from bs4 import BeautifulSoup import requests r = requests.get('https://www.asklaila.com/search/Delhi-NCR/-/...
0
votes
1answer
37 views

Scraping Training.gov.au of tables

I am trying to automate some of my work. The website in question is training.gov.au where under specific pages they nest table e.g. https://training.gov.au/Training/Details/BSBWHS402 Really what I ...
0
votes
3answers
51 views

Separate elements from BeautifulSoup Resultset

I'm working on a project using Python(3.7) and BeautifulSoup(4) in which I need to scrap some data without knowing the exact structure of HTML but by making an assumption that user's relevant ...
3
votes
3answers
33 views

Using beautiful soup to get html attribute value

What i'm trying to do is use beautiful soup to get the value of an html attribute. What i have so far is: soup = BeautifulSoup(html, "html.parser") print("data-sitekey=" + soup.find("div", {"class" :...
0
votes
1answer
32 views

Web-scraping code using selenium and beautifulsoup not working properly

I wrote the python code for web-scraping Sydney morning herald newspaper. This code first clicks all the show more button and then scrape all the articles. Selenium part is working correctly. But I ...
-2
votes
2answers
54 views

Trying To Scrape HTML After JavaScript Using Python… No Luck

I've tried just about everything... PyQt5, Selenium, BS4, requests_html, etc... Still cannot get what I'm looking for. I am trying to web scrape the data from https://www.tokenanalyst.io/exchange , ...
0
votes
3answers
31 views

How to get the url of soup represents?

How to get the url of the page after BeautifulSoup? res = requests.get('http://www.example.com') soup = BeautifulSoup(res.text, 'lxml') How to get http://www.example.com from soup?
1
vote
2answers
50 views

Why can't I access information in tbody?

[This is the source code of the website][1]I am doing web scraping with BeautifulSoup but cannot find tr in tbody; there actually is tr in tbody in the source code of the website; however, the ...
0
votes
2answers
41 views

Trying to retrieve substring but keep getting error [on hold]

Attached is a small web scraping script. The goal is to grab the 4 letter "ticker symbol" inside the "|" character. I'm trying to return the location of the substring "|" so I can retrieve the ...
0
votes
2answers
26 views

How to find the direct children (not the children of children) of a div in html using BeautifulSoup?

Markup : <div class = "parent-div"> <div class = "child-1"> <div class = "child-1.1"> </div> </div> <div class = "child-2"> <...
1
vote
0answers
21 views

Reformat Beautiful Soup Output to include CSS

I am trying to parse through the text of emails to expedite my workflow using Python. I first save the email has a .htm on my local drive. Then, I want to try pulling certain pieces of information out ...
0
votes
1answer
25 views

How to extract values from UL and LI?

I am using beautifulsoup, however I am unable to get the value in each . I want to get the value "Phnom Penh" and "full-time" <ul class="key-list"> <li class="clearfix"> <...
2
votes
1answer
37 views

Finding Audio and Text between two <td> tags Python BeautifulSoup

I am working with this website http://www.nemoapps.com/phrasebooks/hebrew. And for every td element, for example, I would like to get the first mp3 audio file /audio/mp3/HEBFND1_1395.mp3 and then ...
2
votes
1answer
43 views

Scraping attributes from drop down sub menu

I am attempting to scrape specific Data from a website. The data exists only in drop down sub menu of another drop down, and gets generated only after selecting specific option of main drop down menu. ...
0
votes
0answers
21 views

How to convert a list to a dataframe with 1 row [duplicate]

row = ['Gr', 'http://www.purriodictableofcats.com/images/b-grumpycat.jpg', 'Female', 'Real Name: Tardar Sauce', 'Hit the internet: 2012', "Interesting Facts: Being both grumpy and adorableNamed ...
-3
votes
0answers
49 views

Can someone explain why and how this piece of code works [on hold]

I have been following a web scraping course on Udemy and I came across this, can someone please explain how the find_all function is able to take a function as its argument without the function ...
1
vote
2answers
27 views

How to convert a string variable with special characters in Python to print properly

Sorry if the title is confusing but I have literally researched this for two hours and have no idea how I am supposed to ask this question so anyone feel free to edit this post. I have this string ...
1
vote
2answers
31 views

Beautiful Soup changes html code in a wrong way

I'm trying to parse html code using Beautiful Soup. I make get-requests via requests module in Python and then convert html code to bs object. But I was faced with a problem. When I make BeautifulSoup ...
0
votes
2answers
45 views

how to scrap a web-page with search bar results, when the search query does not appear in the url

I am trying to scrape search results within a webpage but when i type int te search bar (i.e. ABC) it does not reflect the search in the URL so when I use BeautifulSoup4 to scrap the URL it give me '...
-1
votes
2answers
63 views

Nested for loop keeps repeating

I have a python scraper main purpose Read list of postcodes from text to an array for each postcode in array search 10 pages pull out certain content. i seem to be getting the results like: page 1 ...
0
votes
2answers
51 views

Get all tags except a list of tags BeautifulSoup

I have to extract text from a website with the text boundary i.e. enclosed within a tag. I wanna filter out all unwanted tags such as 'style', 'script', 'head', 'title', 'meta', '[document]' and ...
-2
votes
1answer
45 views

Scraping a HTML table using python selenium

I have a written a code for scraping a HTML table by using python selenium. but the issue is, it was returning an empty object from which I was written. Thank you. I can scrape any values but not in ...
-2
votes
0answers
18 views

Need help getting CSRF Token with Beautiful Soup located in a hidden field

Can anyone help with getting the CSRF Token from nakedcph.com with beautifulsoup? I am able to retrieve cookies, but for some reason cannot retrieve the csrftoken (which I can see in the websites ...
0
votes
2answers
50 views

Unable to read the page with requests

I want to read the page https://www1.hkexnews.hk/listedco/listconews/index/lci.html?lang=zh. Here is my code: import requests from bs4 import BeautifulSoup headers = {'User-Agent': 'Mozilla/5.0 (...
0
votes
2answers
44 views

BS4 cant find text

I am trying to print this text https://i.imgur.com/SLl1URt.png I used "soup.find_all("p", class_="review")" and tried to use .getText or check inside .contents but none of them worked web link https:/...
0
votes
1answer
19 views

Python requests/n bs4 work/doesn't work based on directory

I am very new to Python and I am running into few problems. One of them is that I have both requests and bs4 libraries installed and the problem is that they work or don't work based on where is my ....