Newest 'web-scraping' Questions - Stack Overflow

Questions tagged [web-scraping]

Web scraping is the process of extracting specific information from websites that do not readily provide an API or other methods of automated data retrieval. Questions about "How To Get Started With Scraping" (e.g. with Excel VBA) should be *thoroughly researched* as numerous functional code samples are available. Web scraping methods include 3rd-party applications, development of custom software, or even manual data collection in a standardized way.

0
votes
0answers
8 views

how to loop pages using beautiful soup 4 and python and selenium?

I'm Fairly new to Python and using beautiful soup first time though I have some experience with selenium. I am trying to scrape a website ("http://cbseaff.nic.in/cbse_aff/schdir_Report/userview.aspx" )...
-2
votes
0answers
7 views

scrape secret facebook group post comments

I'm looking for a script, software, API or anything to scrape data from comments in a secret Facebook group which I don't administer nor moderate. I'm only trying to get data from a single post's ...
0
votes
0answers
13 views

Is there a simple means of converting Highcharts series data into Python dictionary data?

I was wondering whether if there was a simple means of converting scraped Highcharts series data into Python dictionary data with a certain software package. I've referred to this helpful previous ...
-1
votes
0answers
18 views

BeautifulSoup taking too long to parse a page

I was trying to parse the a My Anime List page to obtain the "ID" field which I later need for making an API call. The function prepare_soup is taking too long (over good 2 minutes) which is a problem....
0
votes
0answers
11 views

FantomJS and Webdriverio browses website as mobile

const phantomjs = require('phantomjs-prebuilt') const webdriverio = require('webdriverio') const wdOpts = { desiredCapabilities: { browserName: 'phantomjs' } } export default class SomeClass { ...
0
votes
2answers
26 views

Scrape a web page's contents using Python/selenium

I'm trying to scrape the contents of a table. I believe the table is rendered in JavaScript, so I'm using the selenium package and Python3. To do such a task, I've seen others find the tables xpath in ...
0
votes
0answers
39 views

Exit a function in R

I'm writing a web scraping project of extracting the username from Gmail account in R. I have written a code where e is a list of the user name. function f contains code for scraping & for loop ...
-1
votes
2answers
37 views

Beautiful soup returns empty list on one website, but works on another website

I'm currently learning Python through "Automate the boring stuff with Python". I'm doing the Web Scraping part at the moment. I wrote code that gets that price of a product from one website. However, ...
0
votes
1answer
29 views

How can I emulate a click with PHP or JavaScript?

I want to perform web scraping on this page http://www.rfea.es/web/estadisticas/ranking.asp. But to show me all the content, I must press the "Mostrar Ranking" button. I have been looking for ...
1
vote
0answers
11 views

scrapping with python and run script from php form + passing data

I'm currently working on how to scrapping everything on a page with python then execute & passing the url input from php, and then save the scrap page to csv file to later import to database. I'm ...
0
votes
0answers
14 views

Is there a way to accomplish the scraping of a site with the information divided into blocks and with recaptcha?

In python3 I need to do a scraping of a site Each query on the site generates a series of information that I want to capture: 1 - number of legal process, 2 - type of the process (example - ...
0
votes
0answers
17 views

While scraping a website some data is being skipped

Trying to scrape following website, https://www.trollandtoad.com/magic-the-gathering/aether-revolt/10066, and it scrapes almost all the data perfectly but in certain situations where there are many ...
0
votes
1answer
24 views

Executing page scripts before retrieving it's contents

I've a page where i need to automate some tasks and scrap some data, but the page runs some JS after loading to inject some data into the DOM; that i cannot intercept (not in a good format anyway), I ...
0
votes
0answers
20 views

How to Scrape Usernames From an Instagram Hashtag? [on hold]

I am attempting to collect as many Instagram users as possible from a specific hashtag. I am unable to find a simple method to collect these users from a hashtag using python. I tried to go to the ...
2
votes
3answers
57 views

How do I extract text after <i class> tag?

I am trying to print out the text 'Dealer' from div class by using beautifulSoup, but I do not know how to extract it. I tried to print the i class, but the text Dealer did not come out url = 'https:...
0
votes
2answers
28 views

how to get the class name of a span using xpath

I was trying to get the class name of the first span of header class. In this case, I would like to print out "all-star 40 main-title-rating". I successfully printed out the user name, in this case, "...
0
votes
1answer
21 views

How to view the address of the POST request made when an HTML button is clicked?

I am creating a project involving web scraping and web automation. I would like to first submit this form (http://rgsntl.rgs.cuhk.edu.hk/rws_prd_applx2/Public/tt_dsp_timetable.aspx) then once you ...
0
votes
1answer
35 views

How Do I Extract a link of a button that is defined using a class within a '<p>' tag in html using selenium

I have been trying to web-scrape a website that does not offer API's and I have ran into a problem where I want the link of a button but the button is not defined using the traditional button or input ...
2
votes
2answers
27 views

How do I gather all the urls in this table using rvest?

I am trying to get all the links in the first column of the table here I can only get the first link/row. library(rvest) url <- "https://di.hkex.com.hk/di/NSSrchPersonList.aspx?sa1=pl&...
1
vote
1answer
37 views

Web scraping from Google Finance: returned data list always empty

I would like to scrape data (e.g., market capitalization, PE-ratio, etc.) from Google Finance using the BeautifulSoup-library of Python. However, when I try to extract certain passages (like "div", "...
1
vote
1answer
33 views

How to webscrape nested div and ol classes

I am trying to scrape this webpage. I am looking to download some photos from the 'photo-stream container', but without any success. Below is the codeblock I am currently working with. Looking for ...
1
vote
1answer
19 views

Print title importing from one location to another

I've created a vba script to parse the title of diffetent posts along with the editing status of those posts from a website. What I wish to do now is let my script parse the title from it's landing ...
1
vote
2answers
44 views

Extracting relevant information from a sentence through web scraper?

I am scraping this website. I have the script which scrapes the sentence which contains the relevant information. Now what i want to do is extract following information from the scraped sentence. The ...
1
vote
2answers
49 views

Inner Loop design for webscrapping

I want to import restaurant data like Restaurant name, phone number, website & address to excel but unfortunately, I am getting sponsored results & also not getting website & full address ...
1
vote
1answer
16 views

How to extract absolute URL of href with relative path?

I am trying to extract download links from this link. Here is the page source (viewing in Google Chrome) of that link: When I point at ../matlab/licensing.pdf on the page source, a link https://www....
-2
votes
1answer
15 views

How to integrate google image search into my app? [duplicate]

I am trying to build an app that can take an input image and look for similar images with Google's image search engine.Then on the base of similar images information it can give me information about ...
0
votes
0answers
15 views

Scraping website through a for loop using Page(QWebEnginePage) resets python and stops running script [duplicate]

I am trying to scrape a website using QWebEnginePage through a for loop but after the second iteration python reset and the code stops running. I tried different webpages as well and I still get the ...
0
votes
0answers
16 views

Scrapy - Handling [Errno 10060] after getting the response

This is the page that I am crawling. https://classi.com.br/carros/busca/carro#{%22pagina%22:1} The AJAX request and payload are implemented correctly, and spider extracts the data without any ...
0
votes
2answers
68 views

Issue with the divisions while importing data

I have been extracting data from different webs with success and had been successful so far but now I am stuck on one website. I have modified my code according to the web and I am new to web scraping....
0
votes
2answers
39 views

how can i extract phrases that begin with phone and finish with }

how can i extract phrases that begin with phone and finish with '}' with regex and python i tried to extract data from a page source. this {"meta":{"subtitle":"Apartment for Rent in Marina Gate 1, ...
1
vote
1answer
29 views

Trouble defining a selector meant to locate two types of trs

I've written two expressions to locate some elements from a webpage. The elements are within tr which are within a table. The problem is there are two types of tr's like: <tr class="even"> <...
2
votes
3answers
45 views

Using getElementsBy??Name in Excel VBA

I'm attempting to use VBA to scrape the link to a .gif file from this HTML fragment: <div class="row"> <div class="col-md-12"> <div id='imageDiv' ...
0
votes
0answers
17 views

Not able to reach end of page in GOIBIBO using infinite scroll

city name city_name="Goa" # trip dates year="2019" start_month="06" start_date="21" end_month="06" end_date="23" start_total=year+start_month+start_date # combine the start date end_total=year+...
-2
votes
0answers
27 views

how to Download Embeded .pdf by VBA or ANY other method?

link : https://app.box.com/embed/s/l500hjo00if7jbdrrfrie27iuqaucf50 hey, i want to download a embed-ed PDF file from this link !! by any kind of method. if you know plz support my project.
0
votes
1answer
44 views

How can I move from one side of the slider bar to the other?

I am webscraping https://www.rogers.com/web/totes/wireless/build-plan and as you click on the phone and go to the page that contains the different permutations of plans you can move the slider to ...
1
vote
1answer
33 views

I have a button that im trying to click with selenium however there are more css selectors than buttons

I am trying to click on the 'details' button on https://www.rogers.com/web/totes/wireless/choose-phone. However it seems as though I cant click it. The page seems to be dynamic and so the link does ...
-2
votes
0answers
11 views

Scraping Python using google api

I want to scrape google searched using Google API. I have the below code that gives me the desired information. Please help me integrating the Google API in the below code I have written a query in ...
0
votes
1answer
18 views

How to access inner text in DOM tree with VBA

I am writing a macro to access some text in the DOM tree of a website but I cannot seem to get access to the text. Please see attached image for the html section in question and see attached code for ...
1
vote
2answers
34 views

How to use Python and BeautifulSoup to parse classes

I am trying to parse only the independent claims off of google.com/patents, but they use the same class name as the children dependent claims. I am new, but I think what I am trying to ask is how do I ...
0
votes
3answers
39 views

Scrapy program is not scraping all data

I am writing a program in scrapy to scrape following page, https://www.trollandtoad.com/magic-the-gathering/aether-revolt/10066, and it is only scraping the first line of data and not the rest. I ...
1
vote
1answer
34 views

Data scraper: the contents of the div tag is empty (??)

I am data scraping a website to get a number. This number changes dynamically every split second, but upon inspection, the number is shown. I just need to capture that number but the div wrapper that ...
1
vote
2answers
54 views

Variable from a for loop not showing in another loop

I've been practicing webscraping with the nba.com playerlist, but I've ran into a problem where a link that I scraped in one for loop does not appear when I call on it in another for loop. I have ...
1
vote
0answers
38 views

Scraping data with VBA: Why can't I get access to html elements on certain webpages?

On certain webpages I cannot get access to HTML elements using VBA. What am I doing wrong? For example I have two different pages on the same website. This code returns number of matches on the page. ...
0
votes
1answer
29 views

Using Excel VBA to change HTML attribute

I am trying to change the HTML attribute within a class <div class="treeNodeStyle" id="trMenu_14" nowrap="" style="visibility: visible;"> <div class="treeNodeWrapperStyle" nowrap=""> ...
0
votes
1answer
37 views

How can i extract data from a chart selenium python

I want to know how can I extract data from a chart using selenium python. I want to extract data from this web. I'm using python. I want to extract all the point that are in the chart. Thanks. I have ...
-2
votes
2answers
82 views

Puppeteer: proper selection of inner text

I want to grab a string that has a particular class name, lets say 'CL1'. This is what is used to do and it worked: (we are inside an asycn function) var counter = await page.evaluate(() => { ...
1
vote
1answer
20 views

How do I import a list of songs from the Billboard website to a Google Sheet?

I have seen online that you can pull data from websites using the IMPORTHTML function in Google Sheets. However, you can only import tables and lists. I have tried both of these on the Billboard top ...
0
votes
0answers
29 views

Request with proxy get stuck without error code in Node JS

I'm trying to figure out why sometimes some requests just get stuck running. I parsed some errors but it continues getting stuck from time to time. What it does: I have a forEach iterating an array ...
0
votes
0answers
31 views

How can I write this equivalent code using gocolly

I try to use https://github.com/gocolly/colly. I try to visit a URL and save the full response on my local disc for example visit google.com and save the full response body of the URL google.html. &...
0
votes
0answers
25 views

Puppeteer - How to scrape request response

Hey i want to get a specific file within response of client request. Request www.example.com -----> example.com Respond ----> Request called by website -----> *Response i wan't to scrap* ...