Python Project: Extract Information from URLs
URL Analyzer: Build a program that analyzes and extracts information from a given URL.
Input values:
User provides a URL to be analyzed.
Output value:
Extract information and analysis results from the given URL.
Example:
Input values: URL to analyze: https://www.example.com/about-us Output value: Analysis results: - Domain: example.com - Protocol: HTTPS - Path: /about-us - Query parameters: None - HTTP status: 200 OK - Page title: About Us - Example - Meta description: Learn more about our company and our mission. Input values: URL to analyze: https://www.example.com/products?category=electronics Output value: Analysis results: - Domain: example.com - Protocol: HTTPS - Path: /products - Query parameters: category=electronics - HTTP status: 200 OK - Page title: Products - Example - Meta description: Browse our wide selection of electronics products. Input values: URL to analyze: https://www.example.com/non-existent-page Output value: Analysis results: - Domain: example.com - Protocol: HTTPS - Path: /non-existent-page - Query parameters: None - HTTP status: 404 Not Found - Error message: The requested page does not exist.
Solution: Using requests and urllib Modules
Code:
Output:
Analysis results: - Domain: www.w3resource.com - Protocol: HTTPS - Path: - Query parameters: None - HTTP status: 200 - Page title: Web development tutorials | w3resource - Meta description: Web development tutorials on HTML, CSS, JS, PHP, SQL, MySQL, PostgreSQL, MongoDB, JSON and more.
Analysis results: - Domain: www.w3resource.com - Protocol: HTTPS - Path: /privacy/ - Query parameters: None - HTTP status: 404 - Page title: 404 Not Found - Meta description: None
Explanation:
- URL Parsing: Extracts protocol, domain, path, and query parameters from the URL.
- HTTP Request: Sends a GET request and retrieves HTTP status and HTML content.
- Metadata Extraction: Extracts page title and meta description from the HTML.
- Error Handling: Handles any request errors gracefully.