Writing a Web application scanner in python : Coding for Cyber security (Program №6)

Anandita
3 min readNov 20, 2020
Requesting for a webpage

I have explained the steps for writing a web crawler.

WHAT IS A WEB CRAWLER?

Photo by Nathan Dumlao on Unsplash

Let’s look at this program from an ethical perspective, for instance, in order to find a bug in a website, we first need to know what are the different web pages or files that a website contains. They might be over the internet visible to everyone, or hidden, so that nobody could be able to access them. We can do this manually or create a program to automate this process of crawling the web pages and listing them so that they can be tested. The program that I am going to explain is a basic program that would extract the links from the website and display them on the terminal automatically.

I have written this code in python, so make sure that python2 is installed on your machine. I have used pycharm to run this code with python 2.7 interpreter.

ENUMERATING SUB-DOMAINS OR WRITING A BASIC CRAWLER

Step 1 : Importing modules.

import requests
import re
import urlparse

Step 2 : Taking user input & creating a list to display the results in an organized manner.

url = raw_input("Enter the domain >> ")
links = []

Step 3 : We need to extract the links from the website, for this, I have defined the following function.

def extract_links_from(url):
response = requests.get(url)
return re.findall('(?:href=")(.*?)"', response.content)

Step 4 : Defining a function to perform crawling.

def crawl(target_url):
href_links = extract_links_from(target_url)
for link in href_links:
link = urlparse.urljoin(target_url, link)

if "#" in link:
link = link.split("#")[0]

if url in link and link not in links:
links.append(link)
print(link)
crawl(link)

Step 5 : The last step is to print the results in string format. When the user presses CTRL+c, the program is tend to exit with a message on the terminal.

try:
crawl(str(url))
except KeyboardInterrupt:
print("\rCtrl+c detected...... Quitting.......!!!")
exit(0)

We are all done now! Our code looks like this —

RUNNING THE CRAWLER —

  1. You can run the above program by installing some dependencies such as -
$ pip install requests
$ python crawler.py

2. This basic web crawler is also available on github. It can be accessed by —

$ git clone https://github.com/An4ndita/Basic-web-crawler
$ cd Basic-web-crawler
$ python crawler.py
Enter the domain to crawl >>

3. You can also use other subdomain enumeration tools with multiple features. The usage is provided in detail in my previous article on Some tools for Bug bounty hunting and how to use them.

Thanks for reading. 😊 Keep supporting! This content is made available for educational & informational purposes only!🌼

--

--