In this tutorial, we are going to show you how to scrape information from Yahoo Finance.
For Yahoo Finance, you could visit our easy-to-use "Task Template" on the main screen of the Octoparse scraping tool. All you need is to type in several parameters and the task is ready to go. For further details, you may check it out here: Task Templates
To follow through, you may want to use this URL in the tutorial:
It is hard to create pagination on Yahoo Finance. However, when we loop through the pages, we can always find some patterns of those URLs of different pages and turn the pagination problem as a batch input problem.
In this case, after adding "?count=50&offset=0" behind the original website URL (https://finance.yahoo.com/cryptocurrencies), we can open the page with the first 50 lines of information.We will scrape data such as the Symbol and Name from cryptocurrency chart with Octoparse.
Here are the main steps in this tutorial: [Download task file here ]
- "Go To Web Page" - to open the targeted web page
- Create a "Loop Item" - to loop extract element on each row
- Extract data - to select the data for extraction
- Start extraction - to run the task and get data
1. "Go To Web Page" - to open the targeted web page
- Click "+ Task" to start a new task
- Paste the URL into the "Extraction URL" box and click "Save URL" to move on
Octoparse can generate the URLs with the same format automatically with the funtion
Check this tutorial for more details: Batch URL input
2. Create a "Loop Item" - to loop extract each element on each row.
- Click the name of the "Bitcoin USD" in the first line.
- Click "Expand" icon on the "Action Tips" panel
Octoparse will automatically select the item. The selected item will be highlighted in green while other items with the same structure will be highlighted in red.
The data present in the form of Table. Thus, we want to extract by rows rather by columns. Expand the area will help us select the rows.
- Click "Select all sub-element" and then click "Select all" to create a loop list
Octoparse will detect all the sub-element with similar structures.
- Click "Extract data in the Loop"
3. Extract data - to select the data for extraction
After you click "Extract data in the loop", Octoparse will extract all selected elements in the same row.
- Edit the name by selecting the name from the pre-defined list names or create on your own
Here is a sample for the fields' names.
4. Save and start extraction - to run the task and get data
- Click “Start Extraction” on the upper left side
- Select “Local Extraction” to run the task on your computer, or select “Cloud Extraction” to run the task in the Cloud (for premium users only)
Here is the sample output: