Writing a Web application scanner in python : Coding for Cyber security (Program №6)

Requesting for a webpage

I have explained the steps for writing a web crawler.

WHAT IS A WEB CRAWLER?

Photo by Nathan Dumlao on Unsplash

Let’s look at this program from an ethical perspective, for instance, in order to find a bug in a website, we first need to know what are the different web pages or files that a website contains. They might be over the internet visible to everyone, or hidden, so that nobody could be able to access them. We can do this manually or create a program to automate this process of crawling the web pages and listing them so that they can be tested. The program that I am going to explain is a basic program that would extract the links from the website and display them on the terminal automatically.

I have written this code in python, so make sure that python2 is installed on your machine. I have used pycharm to run this code with python 2.7 interpreter.

ENUMERATING SUB-DOMAINS OR WRITING A BASIC CRAWLER

Step 1 : Importing modules.

import requests
import re
import urlparse

Step 2 : Taking user input & creating a list to display the results in an organized manner.

url = raw_input("Enter the domain >> ")
links = []

Step 3 : We need to extract the links from the website, for this, I have defined the following function.

def extract_links_from(url):
response = requests.get(url)
return re.findall('(?:href=")(.*?)"', response.content)

Step 4 : Defining a function to perform crawling.

def crawl(target_url):
href_links = extract_links_from(target_url)
for link in href_links:
link = urlparse.urljoin(target_url, link)

if "#" in link:
link = link.split("#")[0]

if url in link and link not in links:
links.append(link)
print(link)
crawl(link)

Step 5 : The last step is to print the results in string format. When the user presses CTRL+c, the program is tend to exit with a message on the terminal.

try:
crawl(str(url))
except KeyboardInterrupt:
print("\rCtrl+c detected...... Quitting.......!!!")
exit(0)

We are all done now! Our code looks like this —

RUNNING THE CRAWLER —

  1. You can run the above program by installing some dependencies such as -
$ pip install requests
$ python crawler.py

2. This basic web crawler is also available on github. It can be accessed by —

$ git clone https://github.com/An4ndita/Basic-web-crawler
$ cd Basic-web-crawler
$ python crawler.py
Enter the domain to crawl >>

3. You can also use other subdomain enumeration tools with multiple features. The usage is provided in detail in my previous article on Some tools for Bug bounty hunting and how to use them.

Thanks for reading. 😊 Keep supporting! This content is made available for educational & informational purposes only!🌼

--

--

--

Cyber Security Enthusiast

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How to Fix wab.exe in Windows 10

Shell Sort Algorithm

Machine Learning with Python (Part 1: Overview and Installation)

Breaking CAPTCHA or why you should be using reCAPTCHA V3

Display Rich Text In The Console Using Python

What is End to End Testing? Why is it Important? — Testbytes

Flutter Builds in the Cloud.

Best Hosting apex hosting in 2021 for all Minecraft servers

Best Hosting apex hosting in 2021 for all Minecraft servers

Get the Medium app

Anandita

Anandita

Cyber Security Enthusiast

More from Medium

What is the purpose of using a VPN?

A VPN, or virtual private network, extends a private network across a safer public network.

Creating Lists for Brute Force Attacks Using Python

Pro way of making a GitHub Profile views-counter

Top 5 Linux Distro for Ethical Hacking & Pentesting.

@Linux