Wikipedia
Website: https://wikipedia.org CLI Tool: curl Authentication: None required for reading (anonymous access allowed)
Description
Wikipedia is the world's largest free online encyclopedia, containing over 60 million articles in 300+ languages. AI agents can access Wikipedia content through the MediaWiki API, which provides structured access to articles, search functionality, and metadata. The API is designed for programmatic access and returns JSON or XML responses.
Commands
Search Articles
# Search Wikipedia articles
curl "https://en.wikipedia.org/w/api.php?action=opensearch&search=Artificial%20Intelligence&limit=10&format=json"
Search for Wikipedia articles by keyword. Returns article titles, descriptions, and URLs. Use URL encoding for spaces and special characters.
Get Article Content (Plain Text)
# Get article extract in plain text
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Artificial%20Intelligence&prop=extracts&explaintext=true&format=json"
Retrieve article content as plain text. Use explaintext=true to remove HTML formatting. Returns full article text or summary.
Get Article Content (HTML)
# Get article content with HTML formatting
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Machine%20Learning&prop=extracts&format=json"
Retrieve article content with HTML formatting preserved. Useful for preserving structure and links.
Get Article Summary
# Get article summary (first section only)
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Python&prop=extracts&exintro=true&explaintext=true&format=json"
Get just the introduction/summary of an article. Use exintro=true to limit to the first section. Ideal for quick lookups.
Get Article by Page ID
# Get article by numeric page ID
curl "https://en.wikipedia.org/w/api.php?action=query&pageids=1234567&prop=extracts&explaintext=true&format=json"
Retrieve article using its numeric page ID instead of title. Useful when you have stored page IDs.
Get Multiple Articles
# Get multiple articles in one request
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Python|JavaScript|Ruby&prop=extracts&exintro=true&explaintext=true&format=json"
Fetch multiple articles in a single API call. Separate titles with pipe character (|). Maximum 50 titles per request.
Get Article Metadata
# Get article info (page ID, last edit, length)
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Wikipedia&prop=info&format=json"
Retrieve metadata about an article including page ID, last revision timestamp, page length, and protection status.
Get Article Categories
# Get categories for an article
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Albert%20Einstein&prop=categories&format=json"
List all categories assigned to an article. Useful for understanding article classification.
Get Article Links
# Get all links from an article
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Quantum%20Computing&prop=links&pllimit=50&format=json"
Get all internal Wikipedia links from an article. Use pllimit to control number of results (max 500).
Get Article Images
# Get images used in an article
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Solar%20System&prop=images&format=json"
List all images included in an article. Returns image filenames.
Get Image URL
# Get actual URL of an image file
curl "https://en.wikipedia.org/w/api.php?action=query&titles=File:Example.jpg&prop=imageinfo&iiprop=url&format=json"
Get the full URL to download an image file. Use titles=File: prefix for image lookups.
Search with Suggestions
# Get search suggestions (autocomplete)
curl "https://en.wikipedia.org/w/api.php?action=opensearch&search=Quantum&limit=10&format=json"
Get search suggestions for partial queries. Useful for autocomplete functionality. Includes typo correction.
Advanced Search (Full Text)
# Full-text search with snippets
curl "https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=artificial%20intelligence&srlimit=10&format=json"
Perform full-text search across Wikipedia. Returns snippets showing search term context. More detailed than opensearch.
Get Random Article
# Get random article
curl "https://en.wikipedia.org/w/api.php?action=query&list=random&rnnamespace=0&rnlimit=1&format=json"
Get a random Wikipedia article. Use rnnamespace=0 for main articles only (excludes talk pages, etc.).
Get Article Revisions
# Get revision history for an article
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Blockchain&prop=revisions&rvlimit=10&rvprop=timestamp|user|comment&format=json"
Get revision history showing who edited an article and when. Use rvlimit to control number of revisions returned.
Get Article in Different Language
# Get article title in other languages
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Computer&prop=langlinks&lllimit=50&format=json"
Get links to the same article in other language editions of Wikipedia. Useful for multilingual content.
Check if Article Exists
# Check if page exists
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Example%20Article&format=json"
Check if an article exists. Response includes "missing" key if page doesn't exist.
Get Article Coordinates
# Get geographic coordinates for an article
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Eiffel%20Tower&prop=coordinates&format=json"
Get GPS coordinates for articles about places. Returns latitude and longitude.
Get Page View Statistics
# Get page view count (requires different API)
curl "https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Python/daily/20240101/20240131"
Get page view statistics for an article over a date range. Uses Wikimedia REST API (separate from MediaWiki API).
Examples
Simple Article Lookup Workflow
# Search for article
SEARCH=$(curl -s "https://en.wikipedia.org/w/api.php?action=opensearch&search=Python%20programming&limit=5&format=json")
echo $SEARCH | jq '.[1][0]' # First result title
# Get article summary
curl -s "https://en.wikipedia.org/w/api.php?action=query&titles=Python%20(programming%20language)&prop=extracts&exintro=true&explaintext=true&format=json" | jq '.query.pages[].extract'
Research Topic Workflow
# Get main article
TOPIC="Artificial Intelligence"
curl -s "https://en.wikipedia.org/w/api.php?action=query&titles=$TOPIC&prop=extracts|categories|links&explaintext=true&format=json" > article.json
# Extract text
jq '.query.pages[].extract' article.json
# Get related topics via links
jq '.query.pages[].links[].title' article.json | head -20
# Get categories
jq '.query.pages[].categories[].title' article.json
Multi-Language Content Access
# Get article in English
curl -s "https://en.wikipedia.org/w/api.php?action=query&titles=Berlin&prop=extracts&exintro=true&explaintext=true&format=json" | jq '.query.pages[].extract'
# Get same article in German
curl -s "https://de.wikipedia.org/w/api.php?action=query&titles=Berlin&prop=extracts&exintro=true&explaintext=true&format=json" | jq '.query.pages[].extract'
# Get article in French
curl -s "https://fr.wikipedia.org/w/api.php?action=query&titles=Berlin&prop=extracts&exintro=true&explaintext=true&format=json" | jq '.query.pages[].extract'
Fact-Checking Workflow
# Search for topic
curl -s "https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=climate%20change&format=json" | jq '.query.search[] | {title: .title, snippet: .snippet}'
# Get full article with metadata
curl -s "https://en.wikipedia.org/w/api.php?action=query&titles=Climate%20change&prop=extracts|info|revisions&explaintext=true&rvlimit=1&format=json" > climate.json
# Check when last updated
jq '.query.pages[].revisions[0].timestamp' climate.json
# Get article text
jq '.query.pages[].extract' climate.json
Image Extraction Workflow
# Get images from article
IMAGES=$(curl -s "https://en.wikipedia.org/w/api.php?action=query&titles=Mars&prop=images&format=json")
echo $IMAGES | jq '.query.pages[].images[].title'
# Get URL for first image
IMAGE_NAME=$(echo $IMAGES | jq -r '.query.pages[].images[0].title')
curl -s "https://en.wikipedia.org/w/api.php?action=query&titles=$IMAGE_NAME&prop=imageinfo&iiprop=url&format=json" | jq '.query.pages[].imageinfo[0].url'
Python Script Example
import requests
import json
def get_wikipedia_summary(title):
"""Get Wikipedia article summary."""
url = "https://en.wikipedia.org/w/api.php"
params = {
"action": "query",
"titles": title,
"prop": "extracts",
"exintro": True,
"explaintext": True,
"format": "json"
}
response = requests.get(url, params=params)
data = response.json()
# Extract the page content
pages = data["query"]["pages"]
page_id = list(pages.keys())[0]
if "missing" in pages[page_id]:
return None
return pages[page_id]["extract"]
# Usage
summary = get_wikipedia_summary("Machine Learning")
print(summary)
Batch Article Processing
# Create list of topics
TOPICS=("Python" "JavaScript" "Ruby" "Go" "Rust")
# Fetch all articles
for topic in "${TOPICS[@]}"; do
echo "=== $topic ==="
curl -s "https://en.wikipedia.org/w/api.php?action=query&titles=$topic&prop=extracts&exintro=true&explaintext=true&format=json" | jq -r '.query.pages[].extract' | head -5
echo ""
done
Monitor Article Changes
# Get current revision info
curl -s "https://en.wikipedia.org/w/api.php?action=query&titles=Bitcoin&prop=revisions&rvlimit=1&rvprop=timestamp|user|comment&format=json" > bitcoin_latest.json
# Check latest edit
jq '.query.pages[] | {
timestamp: .revisions[0].timestamp,
user: .revisions[0].user,
comment: .revisions[0].comment
}' bitcoin_latest.json
Geographic Data Extraction
# Get articles with coordinates near a location
curl -s "https://en.wikipedia.org/w/api.php?action=query&list=geosearch&gscoord=37.7749|-122.4194&gsradius=10000&gslimit=10&format=json" | jq '.query.geosearch[] | {title: .title, dist: .dist}'
# Get coordinates for specific place
curl -s "https://en.wikipedia.org/w/api.php?action=query&titles=Golden%20Gate%20Bridge&prop=coordinates&format=json" | jq '.query.pages[].coordinates[0] | {lat: .lat, lon: .lon}'
Notes
-
No Authentication Required: Wikipedia's API is fully open for reading. No API keys or registration needed for anonymous access.
-
Rate Limits:
- No hard rate limit for anonymous users, but excessive usage may be throttled
- Recommended: Max 200 requests per second for bursts, average 1-2 requests per second
- Use User-Agent header to identify your bot:
curl -A "MyBot/1.0 (contact@example.com)" -
Respectful usage is enforced by community guidelines, not technical limits
-
API Endpoints:
- Action API:
https://en.wikipedia.org/w/api.php(main API, used in all examples) - REST API:
https://en.wikipedia.org/api/rest_v1/(newer, mobile-focused) -
Wikimedia API:
https://wikimedia.org/api/rest_v1/(cross-wiki statistics) -
Language Support: Change domain for different languages:
- English:
en.wikipedia.org - Spanish:
es.wikipedia.org - French:
fr.wikipedia.org - German:
de.wikipedia.org -
Full list: https://meta.wikimedia.org/wiki/List_of_Wikipedias
-
Output Formats:
- JSON (recommended):
format=json - XML:
format=xml - PHP serialized:
format=php - YAML:
format=yaml -
Always use
format=jsonfor AI agent consumption -
Text Extraction Options:
explaintext=true: Returns plain text without HTMLexintro=true: Returns only the introduction sectionexsentences=N: Returns first N sentences-
exchars=N: Returns first N characters (approximate) -
API Limits Per Request:
- Multiple titles: Max 50 per request (use pipe separator:
Python|Java|C++) - Links: Max 500 per request (use
pllimit=500) - Categories: Max 500 per request
- Images: Max 500 per request
-
Revisions: Max 500 per request (use
rvlimit=500) -
Best Practices for AI Agents:
- Always include a descriptive User-Agent header
- Cache responses to avoid repeated requests for same content
- Use
exintro=truefor summaries instead of full articles when possible - Batch requests using pipe-separated titles when fetching multiple articles
- Use
continueparameter for paginated results - Handle "missing" pages gracefully in your code
-
Respect Wikipedia's content licenses (CC BY-SA 4.0)
-
Error Handling:
- Missing page: Response includes
"missing": ""key - Invalid title: Response includes
"invalid": ""key - API errors: Check
"error"key in response - Network timeouts: Implement retry logic with exponential backoff
-
Always check response structure before accessing nested fields
-
URL Encoding:
- Spaces: Use
%20or+in URLs - Special characters: URL encode using standard encoding
- Bash: Use
curlwith quotes to handle spaces -
Python: Use
requestslibrary which handles encoding automatically -
Content Parsing Tips:
- Use
jqfor JSON parsing in bash scripts - Page ID is the key in
.query.pagesobject (not always sequential) - Extract page content:
.query.pages[].extract - Handle multiple pages: Iterate over
.query.pages | to_entries -
Remove HTML: Use
explaintext=trueor parse HTML with library -
Page Namespaces:
- 0: Main articles (default)
- 1: Talk pages
- 2: User pages
- 6: Files/Images
- 14: Categories
-
Use
rnnamespace=0to limit results to main articles -
MediaWiki API Documentation:
- Full API docs: https://www.mediawiki.org/wiki/API:Main_page
- API sandbox (interactive): https://en.wikipedia.org/wiki/Special:ApiSandbox
- Query examples: https://www.mediawiki.org/wiki/API:Query
-
Help with specific action: Add
action=help&modules=queryto any request -
Content Licensing:
- Text: Creative Commons BY-SA 4.0 and GFDL
- Images: Various licenses (check individual file pages)
- You must attribute Wikipedia and preserve license
- Commercial use is allowed with proper attribution
-
See: https://en.wikipedia.org/wiki/Wikipedia:Copyrights
-
Alternative Tools:
wikipediaPython library: Simplified Wikipedia API wrapperwptoolsPython library: Advanced Wikipedia/Wikidata toolwtf_wikipediaJavaScript library: Wikipedia text parsermwclientPython library: MediaWiki API client-
pywikibotPython framework: Bot framework for Wikipedia -
Advanced Features:
- Wikidata integration: Get structured data via Wikidata API
- Page previews: Use REST API for mobile-optimized previews
- Nearby pages: Use geosearch for location-based queries
- Citation extraction: Parse references from article HTML
-
Infobox data: Parse from HTML or use Wikidata API
-
Performance Optimization:
- Use compression: Add
Accept-Encoding: gzipheader - Request only needed properties: Limit
prop=parameter - Use page IDs when possible: Faster than title lookups
- Enable HTTP/2: Supported on all Wikipedia domains
-
Keep-alive connections: Reuse TCP connections for multiple requests
-
Common Gotchas:
- Page titles are case-sensitive (except first character)
- Disambiguation pages have "(disambiguation)" suffix
- Redirects: Check for
"redirect": ""key in response - Some articles have protection (can't be edited)
- Mobile vs desktop: Different content sometimes
-
Infoboxes and tables: Difficult to parse from plain text, use HTML
-
Mobile/Summary API (Alternative):
bash # Simpler summary endpoint curl "https://en.wikipedia.org/api/rest_v1/page/summary/Python_(programming_language)" - Returns structured summary, image, and coordinates
- Easier to parse than main API
- Mobile-optimized content
- Includes thumbnail image URL
Comments (0)
Add a Comment
No comments yet. Be the first to comment!