Learn the various ways of discovering hidden or private content on a webserver that could lead to vulnerabilities.
Robots.txt: a document that tells search engines which pages they are and aren’t allowed to show on their search engine results or ban specific search engines from crawling the website altogether.
Take a look at the robots.txt file on the Acme IT Support website to see.
Favicon: a small icon displayed in the browser’s a address bar or tab used for branding a website
Enter the url.
In the AttackBox terminal, run the above command, It will download the favicon and get its md5 hash value.
Then lookup on the https://wiki.owasp.org/index.php/OWASP_favicon_database
Sitemap.xml file gives a list of every file the website owner wishes to be listed on a search engine. These can sometimes contain areas of the website that are a bit more difficult to navigate to or even list some old webpages that the current site no longer uses but are still working behind the scenes.
Take a look at the sitemap.xml file on the Acme IT Support website. And we can find the secret area.
When we make request to the web server, the server returns various HTTP headers. These headers can sometimes contain useful information such as the webserver software and possibly the programming/scripting language in use.
Running the curl command against the webserver.
Once you’ve established the framework of a website, you can then locate the framework’s website. From there, we an learn more about the software and other information, possibly leading to more content we can discover.
Enter the framework’s website. And click the documentation.
Then we can see documentation. They say ‘navigate to the /thm-framework-login path on your website. You can login with the username admin and password admin’ So go to that path.
Login with the admin information. Then we can get the flag.
OSINT or Open-Source Intelligence are external resources available that can help in discovering information about your target website.
Google Filter
Wappalyzer is an online tool and browser extension that helps identify what technologies a website uses, such as frameworks, Content Management Systems, payment processors and much more, and it can even find version number as well.
The Wayback Machine is a historical archive of websites that dates back to the late 90s. You can search a domain name, and it will show yow the service scraped the web page and saved the contents. https://archive.org/web/
Git is a version control system that tracks changes to files in a project. When users have finished making their changes, they commit them with a message and then push them back to a central location for the other users to then pull those changes to their local machines.
GitHub is a hosted version of Git on the internet.
S3 Buckets are a storage service provided by Amazon AWS, allowing people to save files and even static website content in the cloud accessible over HTTP and HTTPS. The owner of the files can set access permissions to either make files public, private and even writable. The format of the S3 buckets is http(s)://{name}.s3.amazonaws.com where {name} is decided by the owner.
This process is automated as it usually contain hundreds, thousands or even millions of requests to a web server. These requests check whether a file or directory exists on a website, giving us access to resources we didn’t previously know existed. This process is made possible by using a resource called wordlists.
Wordlists are just text files that contain a long list of commonly used words.
I used Gobuster.