Information Gathering – Sub-domains

One of the talks I was most looking forwards to at the excellent SecurityFest recently, was that of Frans Rosén, who was talking about his time as a bug bounty hunter. Regular readers will know that these are projects that I also participate in, so I was eager to see what techniques the professionals use, in order to be able to beef up my bug hunting career.

The videos from SecurityFest are also available at the above link, and I would highly recommend that you check them out.

One of the things that Frans dedicated at least one slide to was that of sub-domain enumeration. It is where you try to ascertain all the sub domains of any particular target or host. For example, api.example.com, or blog.example.com. If you are looking for bugs for Uber, then you are much more likely to find bugs if you include blog.uber.com as well as www.uber.com.

Essentially you are increasing your attack surface. The problem is that not all of these are easily discovered. If you were to visit a site such as google, there is no way from there to click through links to all the 150-ish sub-domains they have. They do not really publish a list either, so as a bug hunter, it is up to you to figure out what they are, as obviously you want to increase the likelihood of finding a bug by around 150 times.

Previously, I was doing this by simple brute-forcing. I used the absolutely excellent subbrute by TheRook. I feed it a domain name, either something I am researching, or a domain of a bug bounty program, and it will happily churn away at a list of several thousand “most common” sub-domains generated by searching through thousands of open source projects.

This has it’s limits, as it will not find obscure sub-domains such as blog.ponies.are.great.example.com, but it returns around 10-30 sub-domains of interest each time I run it against a target. It also has it’s advantages, as it uses a list of “open resolvers” that it will query each sub-domain against, which are essentially DNS servers on the internet that anyone can ask whether or not it has records for.

That last part is very useful as it means that at no point do you ever send any traffic to the target domain, which will remove the possibility of getting DNS rate limited, or your ip added to some kind of blacklist.

You will note in Frans’ slide, there are a couple of places he goes to check which sub-domains are recorded for those particular sites, such as VirusTotal, RateIP and SimilarWeb. As far as I can tell, they all operate some kind of DNS infrastructure, and record each query made to it, and then make that data available in some structure, either through open API’s or monetised.

Obviously I was inspired by the talks at the conference, and was eager to start hacking at something when I got back, and this seemed like an excellent thing to start with. My goals were pretty much to take some of the leg (mouse?) work out of information gathering and enumerating attack surface.

I wanted to be able to simply start a command line tool that would go off and find all the information about the domain I wanted, along with anything else of interest. I set aside a Saturday, and came up with the list of what this script would do.

  1. Take a domain at the command line.
  2. Enumerate all the sub-domains using subbrute.
  3. Ask some kind of passive DNS structure what other sub-domains it knew of.
  4. Parse the robots.txt (if any) for a list of places the web developers did not want crawled.
  5. Output everything in a vaguely formatted way for later use.

I started off with about 37 lines of code I wrote the Friday after returning, and it kind of ballooned after that. I ended up with 92, as you can see just below.

As we can all tell, I am still *not a developer*, but we can go through the functions one by one.

The find_and_parse_robots is not really very special, it just gets the text from the domain and checks for each line, if the word “Disallow” is in it, and if so, clips the beginning off it and pre-pends the domain, and adds it to a list. Something I noticed when trying it out on several test domains is that some sites will repeat the same disallowed domain for multiple crawlers (bing/google/yahoo etc), so I had to put in some duplicate checking too.

Next I made the create_sub_list function, which starts off by sleeping for 15 seconds. This increases the total time the tool takes to run from 1 second to around 16. It is there because I am asking the VirusTotal API to tell me what sub-domains it has on record for the domain I am asking about. It is a rate-limited API, and allows only 4 requests per minute, maximum, so I hard coded that limit in.

The function returns a list of all of the sub-domains, not only from VirusTotal API, but also with those that the user supplied to the tool as a command line argument. These are typically ones I got from subbrute.

Finally, I created the resolve_sub_domains function, which attempts to figure out which IP address all the sub-domains are associated with, as both blog.example.com and api.example.com could be hosted on the same IP address. I also wanted the results to be displayed in a somewhat useful manner, so I used the pprint module which pretty-prints things, so that the dictionary would not look bad when displayed.

The output of the tool looks somewhat like this.

There is usually another section at the bottom, with just a sorted list of the IP addresses too. It is a real output of the tool, but with the names of the innocent changed, and some of their octets too.

Overall, I was very happy with how that Saturday turned out, and I look forward to adding more passive DNS querying to this script, as it really does take the leg work out of guessing. The VirusTotal API is super useful for enumerating sub-domains that are actually used, rather than guessing all of them, similarly, it did nor return all the sub-domains that the subbrute tool managed to identify, so they both have their uses.

I really wanted to be able to include the subbrute tool into mine somehow, but looking at the source code just made me feel a bit dizzy, and trying to run that tool from within my tool would not work either as for some reason, the subbrute tool never exits on my machine. I only found out that it was done by switching it to verbose mode and seeing what it was up to.

I would like to encourage you, whoever you are, to use the code above (you will have to get your own API key from VirusTotal though, but they are free) to enumerate the surface area of whichever application or domain you are looking at, because the more surface there is, the easier it is to find a bug, which can then be patched before the bad guys figure it out.

I will put up something next week on a vulnerable VM I have been hacking away at, as I am about to embark on the OSCP, which will consume my life for some time. Fortunately though, it is *not* the DVWA this time!

I hope you have a fantastic week.

Leave a Reply

Your email address will not be published. Required fields are marked *