Today, 25 years after the internet came into existence, for its users, the World Wide Web is a thing taken for granted. For developers, however, it is still a platform that encourages the creation, learning and implementation of new technologies that would ultimately transform the web into the Matrix. Or so they would hope.
I have been developing on the web for the last three years. I had started out with ASP.Net / C# and till date I continue to use the Microsoft stack of web technologies for development at work. Very recently I started experimenting with other programming languages and frameworks and discovered the many new technologies that drive web development today. Then, suddenly, a thought occurred to me the other day. I had started looking into Node.js and wanted to build a simple web app using HTML5 with Node as the backend. Everywhere I looked I found that this approach required the use of Express.js, a web application framework based on Node. I wanted to tie the client to the server without the use of any framework and began wondering how web pages went from being static files to dynamic entities since Tim Berners-Lee built the first ever website at CERN back in ‘91. How did HTML pages have access to databases before web frameworks and web services came into existence?
The first step towards a dynamic web was CGI, the Common Gateway Interface, a specification introduced in 1993 that serves as the mechanism for transferring data between the web server and a CGI program. CGI programs can be written in a variety of programming languages and can execute on the server to perform various tasks that HTML is not capable of, for example, read information contained within database tables via SQL queries. Once you have the information, you can format and send it to the client. In this case, the CGI program serves as a gateway to the Oracle database. The one thing to keep in mind is that the web server must implement the CGI specification for the scripts to work. Apache and IIS both have CGI modules that need to be configured. Apache uses mod_CGI and IIS requires the CGI role service to be added. We can simply invoke a CGI script from HTML as thus:
<img src="/path/to/cgi/counter.cgi" />
CGI scripts may be invoked with the extension determined by the scripting language used or with .cgi, depending on how the server is configured. The output from a CGI program should be formatted to a type that is supported by the client browser. This is done by explicitly defining the Content-type in the program such that it is the first line to be returned as output. Else the browser will reject the output from the script. I won’t go into details of writing CGI programs, because that’s not what this article is about. Here’s a simple perl CGI script:
#! /usr/bin/perl
print "Content-type: text/html\n\n";
print "<html><body><h1>Hello World!";
print "</h1></body></html>\n";
(All web servers need modules to extend their functionality. These are usually written in C. It is by the help of such modules that a web server is able to execute server side code written in different programming languages.)
Then came Server Side Includes (SSIs), a simple interpreted server-side scripting language used almost exclusively for the Web. Server side includes (SSI)s applied to an HTML document, provide for interactive real-time features such as echoing current time, conditional execution based on logical comparisons, querying or updating a database, sending an email, etc., with no programming or CGI scripts. An SSI consists of a special sequence of characters (tokens) on an HTML page. As the page is sent from the HTTP server to the requesting client, the page is scanned by the server for these special tokens. When a token is found the server interprets the data in the token and performs an action based on the token data.
For example, you might place a directive into an existing HTML page, such as:
<!--#echo var="DATE_LOCAL" →
And, when the page is served, this fragment will be evaluated and replaced with its value:
Tuesday, 15-Jan-2013 19:28:54 EST
SSI is a great way to add small pieces of information, such as the current time - shown above. The output type of all SSI directives is text. Again, the web server needs to implement SSI. Apache and IIS both allow SSI but they need to be enabled.
Post this era, we had the advent of PHP and ColdFusion, in 1995, closely followed by ASP (Active Server Pages) in ‘96. It is important to note that PHP is a server side scripting language + scripting engine while ASP is a framework / scripting environment. ASP uses VBScript as its default scripting language. Other scripting languages such as Perl, Python JScript (Microsoft’s implementation of JavaScript) can also be used. To do so the appropriate script engines need to be installed. There is one feature that makes PHP in itself comparable to a web framework: web server integration. PHP quickly took over from Perl as the preferred language for web development as it was faster to run PHP using mod_php on Apache than using Perl CGI scripts. PHP can also be included in an HTML document within a tag. These are saved as .php files. Similarly ASP includes scripts inside <% %> tags and the pages are .asp files. ColdFusion was another framework that used its own Cold Fusion Markup Language within HTML to include server side code within special tags. It was developed to easily connect to a DB from HTML. Initially, ColdFusion was written using Microsoft Visual C++ and was largely limited to running on Windows. In its sixth iteration, ColdFusion MX 6, it was re-written in Java and thus could run on any OS / Web Server that supported the JRE. ColdFusion today, with the correct configuration and use of connectors, works with IIS and Apache. ColdFusion can also be configured to use its own web server.
Around this time, Brendan Eich developed JavaScript while working at Netscape. It, as we all know, has gone on to become the de-facto scripting language for the web. It is dependent on a browser’s implementation of the language. All modern browsers support the ECMAScript standard that forms the basis of JS.
The next in the works was Java Servlets in 1997. Servlets are a Java class that extend the capabilities of a web server with their ability to respond to requests (primarily HTTP). The generated content is usually HTML. Then there was JavaServer Pages, JSP, a scripting engine very similar in functionality to PHP. Java code is embedded in HTML in JSP. To run both Servlets and JSP we need Apache Tomcat, a Java Servlet engine / container. Apache Tomcat can serve as a simple HTTP web server as well without requiring Apache. As a result we may only use Apache Tomcat to host Servlet / JSP applications. However, in most production environments Apache is used as a front to Apache Tomcat as the Apache HTTP Server has additional modules to provide an extensive array of functionalities. Apache is used to receive requests / serve static content. It only forwards Servlet / JSP requests to Tomcat for processing. The two are connected using mod_jk.
1998 saw XML officially become a standard. This paved the way for web services. SOAP was defined by Microsoft in ‘99. The advent of web services and their evolution is a story for a later article.
2002 saw Microsoft release ASP.Net as part of its first iteration of the .Net Framework. It was the successor to ASP. ASP.NET Web pages, known officially as Web Forms, are the main building blocks for application development. Web forms are contained in files with a ".aspx" extension; these files typically contain static (X)HTML markup, as well as markup defining server-side Web Controls and User Controls. Additionally, dynamic code which runs on the server can be placed in a page within a block <% -- dynamic code -- %>, which is similar to other Web development technologies such as PHP, JSP, and ASP. With ASP.NET Framework 2.0, Microsoft introduced a new code-behind model which allows static text to remain on the .aspx page, while dynamic code remains in an .aspx.vb or .aspx.cs or .aspx.fs file (depending on the programming language used - Visual Basic, C-Sharp, F-Sharp).
I’ll take a moment here to talk about the web server. We have seen that in order for the server to have access to any non HTML data sources it requires to implement various specifications. CGI and SSI are examples. IIS was the first to implement ISAPI (Internet Server Application Programming Interface) as an alternative to CGI. Each time a client requests a CGI application, a new instance of the executable is created. This becomes very expensive for the web server in times of high traffic. ISAPI relies on Dynamic Link Libraries (DLLs). Each ISAPI application (ISAPI extension) is in the form of a single DLL that is loaded into the same address space as the web server upon the first request and remains in memory to answer all subsequent requests for the application until explicitly released. However, this means that ISAPI extensions need to be thread-safe such that multiple threads can be instantiated into the DLL for multiple requests without causing problems.
ISAPI has another component, filters. ISAPI filters are also DLLs. Filters always run on the IIS server filtering every request until they find one they need to process, contrary to extensions, which are loaded only for a specific request. The ability to examine and modify both incoming and outgoing streams of data makes ISAPI filters powerful and flexible. Filters are registered at either the site level or the global level (that is, global filters apply to all sites on the IIS server), and are initialized when the worker process is started. A filter listens to all requests to the site on which it is installed.
IIS 6.0 and previous versions allowed the development of .NET application components via the ASP.NET platform. ASP.NET integrated with IIS via an ISAPI extension, and exposed its own application and request processing model. This effectively exposed two separate server pipelines, one for native ISAPI filters and extension components (no CLR / garbage collection), and another for managed application components (ASP.Net). ASP.NET components would execute entirely inside the ASP.NET ISAPI extension bubble and only for requests mapped to ASP.NET in the IIS script map configuration.
IIS 7.0 integrates the ASP.NET runtime with the core web server, providing a unified request processing pipeline that is exposed to both native and managed components known as modules.
ISAPI is also implemented in Apache via the mod_ISAPI module. Most Apache modules, ISAPI extensions, ISAPI filters, etc. are written in C. Web servers, by themselves, do little more than serve static HTML. (Maybe not entirely true.) For most other processing, they require modules to extend their functionality.
In 2004 we had the most successful MVC (Model View Controller) full stack framework, Ruby on Rails. Again, RoR requires either Thin, Unicorn or Phusion Passenger. Apache may be used in front of them in a production scenario. This was followed by Django, a Python MVC framework, in 2005. Django runs on Apache with mod_wsgi, an Apache module that implements the Web Server Gateway Interface. It is a specification (Python standard) that describes how a web server communicates with web applications, and how web applications can be chained together to process one request. Microsoft released their implementation of ASP.Net MVC in 2009.
Since then there has been a surge in the new technologies being introduced for the web. A large number of JavaScript libraries (jQuery, Knockout, Backbone, Spine, CanJS) and frameworks (AngularJS, Ember, Meteor, Batman) have come up. (Libraries can be used without imposing a structure to your code, unlike frameworks that define how you write an application.) Client Side MVC Frameworks, some mentioned above, have made Single Page Applications (SPAs) very popular in recent times. By nature and design they are intended to be stateful unlike earlier web applications as most of the code resides on the client (browser) side. There is only a single server roundtrip for the first request to the application, when the server sends back the basic HTML structure (shell), JavaScripts and CSS. All further HTML creation / DOM manipulation is taken care of within the browser depending on user interaction. In traditional web applications, the server is responsible for creating the entire HTML for each request and sending it to the client. Usually each separate request maps to a separate URL and there in no state maintenance between these requests. In SPAs, there is actually only a single page that uses multiple views to extend functionality. Using routers provided by these client side MVC frameworks we are able to maintain state by assigning separate URLs to separate views within the same page.
The number of technologies that have evolved is so vast that as a developer it is almost impossible to learn them all. However, awareness is something we must not ignore. Knowing what is available may on any given day help us decide what we should use to develop an application based on its functional requirements. Anyhow, I believe I must now stop. I hope this write up helps all new web developers (in the making) understand where and how it all started.