Spidey: Python Web Crawler

I created a web crawler using python and its modules. It follows certain conditions like it reads robots.txt before crawling a page. If the robots.txt allows the page to be crawled the spidey crawls it. It dives in recursively. But there are certain limitations I have set. It do not go beyond 20 pages, as… Continue reading Spidey: Python Web Crawler

Socket Programming: Handling multiclients

Referring to previous post, we will continue with same code. In this post we will try to handle multiple clients. Multiple clients means that multiple client programs can connect to server program. For this we will use threads. Threads are nothing but process and runs only the part that is required (we decide what is… Continue reading Socket Programming: Handling multiclients

We will be understanding the basics of socket programming using Python.You may be thinking why socket programming? Well, its because if you need to send data over a network you need to know about socket. The HTTP websites runs on port 80. Each website has a IP adress. So when you request for a website actually your browser is trying to get data from someipaddress:80. 80 is default port that browser uses, otherwise specified. You might have used Apache. What it does it creates server(we call apache server) and binds it to and port 80. is called  loopback address as it tells the browser to look in the consumer’s computer only instead of searching web.

Lets start, first of all socket is a connection point like your phone. You connect to some socket (someone on other side of phone) and send and receive message or data (chat). 
Requires: Python 2.6 + Using: Lets try to create a server first.


# Import all from module socket
from socket import *

# Defining server address and port
host = ''  #'localhost' or '' or '' are all same
port = 52000 #Use port > 1024, below it all are reserved

#Creating socket object
sock = socket()
#Binding socket to a address. bind() takes tuple of host and port.
sock.bind((host, port))
#Listening at the address
sock.listen(5) #5 denotes the number of clients can queue

#Accepting incoming connections
conn, addr = sock.accept()

#Sending message to connected client
conn.send('Hi! I am server') #send only takes string
#Receiving from client
data = conn.recv(1024) # 1024 stands for bytes of data to be received
print data

#Closing connections
Now a client.
from socket import *

host = 'localhost' # '' can also be used
port = 52000

sock = socket()
#Connecting to socket
sock.connect((host, port)) #connect takes tuple of host and port

data = sock.recv(1024)
print data
sock.send('HI! I am client.')

Everything is explained in comments.
  • The program above is waiting for type of programming. Look at the send and recv part in both programming, if server is sending something client must always know when server will send and vice versa or it should wait for that. Of course if you don’t want to drop anything you are sending.
  • # is used for comments except the first line it is for telling where python program is. Use path to python.exe on Windows.
  • Always use sock.close() to close the socket otherwise socket is in use error will be thrown by python. You may also see it if the program terminates in between.
  • 5 in sock.listen is of no use right now as the server.py will terminate as soon as it is done with first client.
  • sock.recv() waits till it does not receive something.
  • print data will not work with python 3.0+. Use print(data).
There will be more on it. Till if you want to read more about it, see here.