w3resource

Python Web Scraping: Retrieves an arbitary Wikipedia page of "Python" and creates a list of links on that page

Python Web Scraping: Exercise-10 with Solution

Write a Python program to that retrieves an arbitary Wikipedia page of "Python" and creates a list of links on that page.

Sample Solution:

Python Code:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("https://en.wikipedia.org/wiki/Python")
bsObj = BeautifulSoup(html)
for link in bsObj.findAll("a"):
  if 'href' in link.attrs:
    print(link.attrs['href'])

Sample Output:

#mw-head
#p-search
https://en.wiktionary.org/wiki/Python
https://en.wiktionary.org/wiki/python
#Snakes
#Ancient_Greece
#Media_and_entertainment
#Computing
#Engineering
#Roller_coasters
#Vehicles
#Weaponry
#See_also
/w/index.php?title=Python&action=edit§ion=1
/wiki/Pythonidae
/wiki/Python_(genus)
/w/index.php?title=Python&action=edit§ion=2
/wiki/Python_(mythology)
/wiki/Python_of_Aenus
/wiki/Python_(painter)
/wiki/Python_of_Byzantium
/wiki/Python_of_Catana
/w/index.php?title=Python&action=edit§ion=3
/wiki/Python_(film)
/wiki/Pythons_2
/wiki/Monty_Python
/wiki/Python_(Monty)_Pictures
/w/index.php?title=Python&action=edit§ion=4
/wiki/Python_(programming_language)
/wiki/CPython
/wiki/CMU_Common_Lisp
/wiki/PERQ#PERQ_3
/w/index.php?title=Python&action=edit§ion=5
/w/index.php?title=Python&action=edit§ion=6
/wiki/Python_(Busch_Gardens_Tampa_Bay)
/wiki/Python_(Coney_Island,_Cincinnati,_Ohio)
/wiki/Python_(Efteling)
/w/index.php?title=Python&action=edit§ion=7
/wiki/Python_(automobile_maker)
/wiki/Python_(Ford_prototype)
/w/index.php?title=Python&action=edit§ion=8
/wiki/Colt_Python
/wiki/Python_(missile)
/w/index.php?title=Python&action=edit§ion=9
/wiki/Cython
/wiki/Pyton
/wiki/File:Disambig_gray.svg
/wiki/Help:Disambiguation
//en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/Python&namespace=0
https://en.wikipedia.org/w/index.php?title=Python&oldid=845762125
/wiki/Help:Category
/wiki/Category:Disambiguation_pages
/wiki/Category:Disambiguation_pages_with_short_description
/wiki/Category:All_article_disambiguation_pages
/wiki/Category:All_disambiguation_pages
/wiki/Category:Animal_common_name_disambiguation_pages
/wiki/Special:MyTalk
/wiki/Special:MyContributions
/w/index.php?title=Special:CreateAccount&returnto=Python
/w/index.php?title=Special:UserLogin&returnto=Python
/wiki/Python
/wiki/Talk:Python
/wiki/Python
/w/index.php?title=Python&action=edit
/w/index.php?title=Python&action=history
/wiki/Main_Page
/wiki/Main_Page
/wiki/Portal:Contents
/wiki/Portal:Featured_content
/wiki/Portal:Current_events
/wiki/Special:Random
https://donate.wikimedia.org/wiki/Special:FundraiserRedirector?utm_source=donate&utm_medium=sidebar&utm_campaign=C13_en.wikipedia.org&uselang=en
//shop.wikimedia.org
/wiki/Help:Contents
/wiki/Wikipedia:About
/wiki/Wikipedia:Community_portal
/wiki/Special:RecentChanges
//en.wikipedia.org/wiki/Wikipedia:Contact_us
/wiki/Special:WhatLinksHere/Python
/wiki/Special:RecentChangesLinked/Python
/wiki/Wikipedia:File_Upload_Wizard
/wiki/Special:SpecialPages
/w/index.php?title=Python&oldid=845762125
/w/index.php?title=Python&action=info
https://www.wikidata.org/wiki/Special:EntityPage/Q747452
/w/index.php?title=Special:CiteThisPage&page=Python&id=845762125
/w/index.php?title=Special:Book&bookcmd=book_creator&referer=Python
/w/index.php?title=Special:ElectronPdf&page=Python&action=show-download-screen
/w/index.php?title=Python&printable=yes
https://commons.wikimedia.org/wiki/Category:Python
https://af.wikipedia.org/wiki/Python
https://als.wikipedia.org/wiki/Python
https://bn.wikipedia.org/wiki/%E0%A6%AA%E0%A6%BE%E0%A6%87%E0%A6%A5%E0%A6%A8_(%E0%A6%A6%E0%A7%8D%E0%A6%AC%E0%A7%8D%E0%A6%AF%E0%A6%B0%E0%A7%8D%E0%A6%A5%E0%A6%A4%E0%A6%BE_%E0%A6%A8%E0%A6%BF%E0%A6%B0%E0%A6%B8%E0%A6%A8)
https://be.wikipedia.org/wiki/Python
https://bg.wikipedia.org/wiki/%D0%9F%D0%B8%D1%82%D0%BE%D0%BD_(%D0%BF%D0%BE%D1%8F%D1%81%D0%BD%D0%B5%D0%BD%D0%B8%D0%B5)
https://cs.wikipedia.org/wiki/Python_(rozcestn%C3%ADk)
https://da.wikipedia.org/wiki/Python
https://de.wikipedia.org/wiki/Python
https://eo.wikipedia.org/wiki/Pitono_(apartigilo)
https://eu.wikipedia.org/wiki/Python_(argipena)
https://fa.wikipedia.org/wiki/%D9%BE%D8%A7%DB%8C%D8%AA%D9%88%D9%86
https://fr.wikipedia.org/wiki/Python
https://ko.wikipedia.org/wiki/%ED%8C%8C%EC%9D%B4%EC%84%A0
https://hr.wikipedia.org/wiki/Python_(razdvojba)
https://io.wikipedia.org/wiki/Pitono
https://id.wikipedia.org/wiki/Python
https://ia.wikipedia.org/wiki/Python_(disambiguation)
https://is.wikipedia.org/wiki/Python
https://it.wikipedia.org/wiki/Python_(disambigua)
https://he.wikipedia.org/wiki/%D7%A4%D7%99%D7%AA%D7%95%D7%9F
https://ka.wikipedia.org/wiki/%E1%83%9E%E1%83%98%E1%83%97%E1%83%9D%E1%83%9C%E1%83%98_(%E1%83%9B%E1%83%A0%E1%83%90%E1%83%95%E1%83%90%E1%83%9A%E1%83%9B%E1%83%9C%E1%83%98%E1%83%A8%E1%83%95%E1%83%9C%E1%83%94%E1%83%9A%E1%83%9D%E1%83%95%E1%83%90%E1%83%9C%E1%83%98)
https://kg.wikipedia.org/wiki/Mboma_(nyoka)
https://la.wikipedia.org/wiki/Python_(discretiva)
https://lb.wikipedia.org/wiki/Python
https://hu.wikipedia.org/wiki/Python_(egy%C3%A9rtelm%C5%B1s%C3%ADt%C5%91_lap)
https://mr.wikipedia.org/wiki/%E0%A4%AA%E0%A4%BE%E0%A4%AF%E0%A4%A5%E0%A5%89%E0%A4%A8_(%E0%A4%86%E0%A4%9C%E0%A5%8D%E0%A4%9E%E0%A4%BE%E0%A4%B5%E0%A4%B2%E0%A5%80_%E0%A4%AD%E0%A4%BE%E0%A4%B7%E0%A4%BE)
https://nl.wikipedia.org/wiki/Python
https://ja.wikipedia.org/wiki/%E3%83%91%E3%82%A4%E3%82%BD%E3%83%B3
https://no.wikipedia.org/wiki/Pyton
https://pl.wikipedia.org/wiki/Pyton
https://pt.wikipedia.org/wiki/Python_(desambigua%C3%A7%C3%A3o)
https://ru.wikipedia.org/wiki/Python_(%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D1%8F)
https://sd.wikipedia.org/wiki/%D8%A7%D8%B1%DA%99
https://sk.wikipedia.org/wiki/Python
https://sh.wikipedia.org/wiki/Python
https://fi.wikipedia.org/wiki/Python
https://sv.wikipedia.org/wiki/Pyton
https://th.wikipedia.org/wiki/%E0%B9%84%E0%B8%9E%E0%B8%97%E0%B8%AD%E0%B8%99
https://tr.wikipedia.org/wiki/Python
https://uk.wikipedia.org/wiki/%D0%9F%D1%96%D1%84%D0%BE%D0%BD
https://ur.wikipedia.org/wiki/%D9%BE%D8%A7%D8%A6%DB%8C%D8%AA%DA%BE%D9%88%D9%86
https://vi.wikipedia.org/wiki/Python
https://zh.wikipedia.org/wiki/Python_(%E6%B6%88%E6%AD%A7%E4%B9%89)
https://www.wikidata.org/wiki/Special:EntityPage/Q747452#sitelinks-wikipedia
//en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License
//creativecommons.org/licenses/by-sa/3.0/
//wikimediafoundation.org/wiki/Terms_of_Use
//wikimediafoundation.org/wiki/Privacy_policy
//www.wikimediafoundation.org/
https://wikimediafoundation.org/wiki/Privacy_policy
/wiki/Wikipedia:About
/wiki/Wikipedia:General_disclaimer
//en.wikipedia.org/wiki/Wikipedia:Contact_us
https://www.mediawiki.org/wiki/Special:MyLanguage/How_to_contribute
https://wikimediafoundation.org/wiki/Cookie_statement
//en.m.wikipedia.org/w/index.php?title=Python&mobileaction=toggle_view_mobile
https://wikimediafoundation.org/
//www.mediawiki.org/
/usr/local/lib/python3.6/dist-packages/bs4/__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 4 of the file /tmp/sessions/0f56b56f1170593f/main.py. To get rid of this warning, change code that looks like this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "lxml")
 

Flowchart:

Python Web Scraping Flowchart: Retrieves an arbitary Wikipedia page of 'Python' and creates a list of links on that page

Python Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Python program to extract and display all the image links from en.wikipedia.org/wiki/Peter_Jeffrey_(RAAF_officer)
Next: Write a Python program to check whether a page contains a title or not.

What is the difficulty level of this exercise?



Become a Patron!

Follow us on Facebook and Twitter for latest update.

It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.

https://w3resource.com/python-exercises/web-scraping/web-scraping-exercise-10.php