Overview of Web Applications CPSC 330 - Fall 2003 web history CERN, 1989 - Tim Berners-Lee, document exchange, hypertext links client/server model html - hypertext markup language url - uniform resource locator (scheme:location) http - hypertext protocol "pull" - have to re-request to see changes at server (vs. "push") explosive growth search engines html tags name ... url schemes http http://servername[:port][/pathname[#html_anchor]][?arguments] ftp nntp ^ argument1&argument2&... spaces given by "%20" mailto telnet defaults ~user -> /home/user/public_html directory -> look for index.html http request/reply protocol - one resource per request simple access control - either open or password-protected content types static - disk file dynamic - results from running a program or script file (1) server side - e.g., CGI (common gateway interface) - C/C++/Perl/etc. appl. or script runs on server; however, increases load on server (2) client side - applet or script runs on client; however, client's firewall or browser may not allow/support (3) scripting languages - Javascript, VBscript (4) other tools that provide macros and access to databases SSI (server side includes, ".shtml") PHP (php html preprocessor, ".php") ASP (active server pages - script in html document rather than having to invoke a separate program like CGI, ".asp") CFM (cold fusion, ".cfm") MIME types - if not directly supported by browser, then plug-in available (e.g., Adobe Acrobat, Quicktime) text/html image/gif application/zip ... http 1.0 - separate TCP connection for each invocation http 1.1 - persistent TCP connection that lasts over several invocations web client - browser name server web server -------------------- ----------- ---------- (1) DNS query using UDP --> (port 53) <-- DNS response message gives IP address (2) HTTP get using TCP -------------------> (port 80) get
:
<------------------- HTTP response message 200 OK 404 not found proxy gateway * client side * server side * can serve some requests but * translate protocols rewrites and forwards most * authenticates .-------------------. .-------------------. | client proxy | | gateway server | | .----. .----. | | .----. .----. | | | |---->| |------> ------>| |---->| | | | | |<----| |<------ <------| |---->| | | | `----' `----' | | `----' `----' | `-------------------' `-------------------' web caching content delivery networks (CDN) proxy caches of large files, hold content closer to network edges Akamai - 13K servers in datacenetera and ISPs around the world expiration when age > time_to_live DNS routes name resolution query to Akamai server, which responds with IP address of nearest regional server load balancing "For example, when a Web surfer connected to an ISP in Boston clicks on a photo of New England Patriots quarterback Tom Brady on CNN.com's Web site, the photo is "pulled" to the Akamai server closest to that user's ISP. Anyone else in the network who subsequently requests Brady's photo gets it from the same cache, rather than CNN's origin server, thus cutting down on bandwidth and eliminating router hops. Meanwhile, infrequently accessed content and content that has passed its "time-to-live" freshness date are regularly flushed out of the network." [http://www.apertura.com.au/articles.html] ----- CNN 9/11 case study [http://www.tcsa.org/lisa2001/cnn.txt] CNN.com: Facing A World Crisis William LeFebvre, CNN Internet Technologies Who are we? CNN.com (Turner Broadcasting) - 50 web sites (CNN.com, cartoon network, etc) - 200 servers Network Bandwidth (on 9/11) - 2 OC-12 1,244 Mbps - 7 OC-3 1,085 Mbps - Total 2,329 Mbps Hardware - Standard web server: Sun 420R 4x4 (4 CPU, 4GB RAM) - CNN.com normally used a 15 server pool - Load balancers front-end all web services Typical Loads Peak Total Total Date Hits/min Hits/min Page Views - 9/11/00 220K 148M 11.8M One year prior - 11/8/00 1,217K 722M 139.4M Day after Election Day - 9/10/01 156K 104M 14.4M Day before 11/8/00 was the record page views to date Managing Unexpected Loads - swing servers (move servers from one web service to another) - add additional servers - reduce page complexity (remove advertisements, pictures, text) Reducing Page Complexity - Three page styles (standard, split, ultra light) - Standard - Split (half page with link to more info) - Untra light (minimal information, with links for more info) On the morning of Sept 11, there were 10 servers providing CNN.com. Loads on 9/11-12 Peak Total Total Date Hits/min Hits/min Page Views - 9/10/01 156K 104M 14.4M Day before - 9/11/01 1,110K 411M 132.4M Day of - 9/12/01 948K 797M 304.8M Day after On 9/11, the peak demand was estimated at 1.8M hits per minute, or 20 times normal. ----- [http://www.dvwebvideo.com/2000/0500/gordon0500.html] CNN Interactive uses a custom-designed live-capture system that encodes five different streaming media files in a single pass. This is accomplished by taking an analog feed out of the Media 100, then splitting it five ways to separate PCs: two for RealNetworks streams and two for Windows Media at 28kbps and 80kbps data rates. The fifth machine creates an AVI file that's converted separately to QuickTime. All render at a frame size of 176x132 pixels