Overview of Web Applications
CPSC 330 - Fall 2003
web history
CERN, 1989 - Tim Berners-Lee, document exchange, hypertext links
client/server model
html - hypertext markup language
url - uniform resource locator (scheme:location)
http - hypertext protocol
"pull" - have to re-request to see changes at server (vs. "push")
explosive growth
search engines
html tags
name
...
url schemes
http http://servername[:port][/pathname[#html_anchor]][?arguments]
ftp
nntp ^ argument1&argument2&... spaces given by "%20"
mailto
telnet
defaults ~user -> /home/user/public_html
directory -> look for index.html
http
request/reply protocol - one resource per request
simple access control - either open or password-protected
content types
static - disk file
dynamic - results from running a program or script file
(1) server side - e.g., CGI (common gateway interface) - C/C++/Perl/etc.
appl. or script runs on server; however, increases load on server
(2) client side - applet or script runs on client; however, client's
firewall or browser may not allow/support
(3) scripting languages - Javascript, VBscript
(4) other tools that provide macros and access to databases
SSI (server side includes, ".shtml")
PHP (php html preprocessor, ".php")
ASP (active server pages - script in html document rather than
having to invoke a separate program like CGI, ".asp")
CFM (cold fusion, ".cfm")
MIME types - if not directly supported by browser, then plug-in available
(e.g., Adobe Acrobat, Quicktime)
text/html
image/gif
application/zip
...
http 1.0 - separate TCP connection for each invocation
http 1.1 - persistent TCP connection that lasts over several invocations
web client - browser name server web server
-------------------- ----------- ----------
(1) DNS query using UDP --> (port 53)
<-- DNS response
message gives
IP address
(2) HTTP get using TCP -------------------> (port 80)
get
:
<------------------- HTTP response message
200 OK
404 not found
proxy gateway
* client side * server side
* can serve some requests but * translate protocols
rewrites and forwards most * authenticates
.-------------------. .-------------------.
| client proxy | | gateway server |
| .----. .----. | | .----. .----. |
| | |---->| |------> ------>| |---->| | |
| | |<----| |<------ <------| |---->| | |
| `----' `----' | | `----' `----' |
`-------------------' `-------------------'
web caching
content delivery networks (CDN)
proxy caches of large files, hold content closer to network edges
Akamai - 13K servers in datacenetera and ISPs around the world
expiration when age > time_to_live
DNS routes name resolution query to Akamai server, which responds with
IP address of nearest regional server
load balancing
"For example, when a Web surfer connected to an ISP in Boston clicks on
a photo of New England Patriots quarterback Tom Brady on CNN.com's Web
site, the photo is "pulled" to the Akamai server closest to that user's
ISP. Anyone else in the network who subsequently requests Brady's photo
gets it from the same cache, rather than CNN's origin server, thus cutting
down on bandwidth and eliminating router hops. Meanwhile, infrequently
accessed content and content that has passed its "time-to-live" freshness
date are regularly flushed out of the network."
[http://www.apertura.com.au/articles.html]
-----
CNN 9/11 case study [http://www.tcsa.org/lisa2001/cnn.txt]
CNN.com: Facing A World Crisis
William LeFebvre, CNN Internet Technologies
Who are we? CNN.com (Turner Broadcasting)
- 50 web sites (CNN.com, cartoon network, etc)
- 200 servers
Network Bandwidth (on 9/11)
- 2 OC-12 1,244 Mbps
- 7 OC-3 1,085 Mbps
- Total 2,329 Mbps
Hardware
- Standard web server: Sun 420R 4x4 (4 CPU, 4GB RAM)
- CNN.com normally used a 15 server pool
- Load balancers front-end all web services
Typical Loads
Peak Total Total
Date Hits/min Hits/min Page Views
- 9/11/00 220K 148M 11.8M One year prior
- 11/8/00 1,217K 722M 139.4M Day after Election Day
- 9/10/01 156K 104M 14.4M Day before
11/8/00 was the record page views to date
Managing Unexpected Loads
- swing servers (move servers from one web service to another)
- add additional servers
- reduce page complexity (remove advertisements, pictures, text)
Reducing Page Complexity
- Three page styles (standard, split, ultra light)
- Standard
- Split (half page with link to more info)
- Untra light (minimal information, with links for more info)
On the morning of Sept 11, there were 10 servers providing CNN.com.
Loads on 9/11-12
Peak Total Total
Date Hits/min Hits/min Page Views
- 9/10/01 156K 104M 14.4M Day before
- 9/11/01 1,110K 411M 132.4M Day of
- 9/12/01 948K 797M 304.8M Day after
On 9/11, the peak demand was estimated at 1.8M hits per minute, or
20 times normal.
-----
[http://www.dvwebvideo.com/2000/0500/gordon0500.html]
CNN Interactive uses a custom-designed live-capture system that encodes
five different streaming media files in a single pass. This is accomplished
by taking an analog feed out of the Media 100, then splitting it five ways
to separate PCs: two for RealNetworks streams and two for Windows Media at
28kbps and 80kbps data rates. The fifth machine creates an AVI file that's
converted separately to QuickTime. All render at a frame size of 176x132
pixels