Content Discovery
Discovering content on the website is a important step in the reconnaissance phase of web pentesting.
Table of contents:
Robots.txt
The robots.txt file is a document that tells search engines which pages they are and aren't allowed to show on their search engine results or ban specific search engines from crawling the website altogether.
http(s)://site.com/robots.txt
Favicon
https://wiki.owasp.org/index.php/OWASP_favicon_database
Command:
curl <https://static-labs.tryhackme.cloud/sites/favicon/images/favicon.ico> | md5sum
Sitemap.xml
Unlike the robots.txt file, which restricts what search engine crawlers can look at, the sitemap.xml file gives a list of every file the website owner wishes to be listed on a search engine.
http(s)://site.com/sitemap.xml
HTTP Headers
When we make requests to the web server, the server returns various HTTP headers. These headers can sometimes contain useful information such as the webserver software and possibly the programming/scripting language in use.
Command:
curl <http://10.10.252.141> -v
Wappalyzer extension
Find out the technology stack of any website. Create lists of websites that use certain technologies, with company and contact details. Use our tools for lead generation, market analysis and competitor research.
Wayback machine
Search cached websites and webpages using the wayback machine.
S3 Buckets
S3 Buckets are a storage service provided by Amazon AWS, allowing people to save files and even static website content in the cloud accessible over HTTP and HTTPS. The owner of the files can set access permissions to either make files public, private and even writable. Sometimes these access permissions are incorrectly set and inadvertently allow access to files that shouldn't be available to the public.
http(s)://{name}.s3.amazonaws.com
Automated tools
ffuf
ffuf -w /usr/share/wordlists/SecLists/Discovery/Web-Content/common.txt -u <http://10.10.252.141/FUZZ>
dirb
dirb <http://10.10.252.141/> /usr/share/wordlists/SecLists/Discovery/Web-Content/common.txt
gobuster
gobuster dir --url http://10.10.252.141/ -w /usr/share/wordlists/SecLists/Discovery/Web-Content/common.txt
finding files:
gobuster dir -u 10.10.10.191/ -w /usr/share/seclists/Discovery/Web-Content/raft-medium-files.txt
Find API endpoints using all HTML responses
ffuf -w ~/Tools/SecLists/Discovery/Web-Content/big.txt -u <http://prd.m.rendering-api.interface.htb/api/FUZZ> -mc all -fs 50 -X POST
Last updated