Sieve: A proxy server filter for the Scoop web log system Sieve is a program that acts as an intermediary between a web browser and a Scoop-based web log system. Sieve has the power to enhance your Scoop experience. Documentation updates: 2005-02-09: First release 2005-02-10: Document configuration file 2005-02-15: Document (sieve) link 2005-03-23: Doubled main buffer size Disclaimer: The source code to Sieve could be used in textbooks as an example of insecure programming. Potential buffer overflows abound. Sieve is intended to be run in conjunction with a web browser on a single computer with a single user. Any other configuration is fraught with peril. Availability: Sieve licensed under the terms of the GNU General Public License (GPL) and is available from the author's web site at: http://www.heurtley.com/richard/sieve Function: Sieve opens a TCP/IP port (4004 by default) on the computer on which Sieve is running. HTTP requests made to this port are relayed to a web server (http://www.ip-wars.net by default). Replies are relayed back to the requesting agent. By default anonymous comments are deleted if they have no score or if the score is 3.0 or less. Sieve is a command line program. It is invoked by entering the following into a command prompt window: Linux: [user@host user]# sieve Windows: C:\>sieve Sieve first displays its configuration and the name and IP address of the web server and of the computer on which Sieve is running. Then Sieve displays information on incoming and outgoing data as it is relayed. Sieve is accessed by entering into a web browser's address bar the name of the computer on which Sieve is running, a colon, and the port number. For example: http://linuxbox.mynet:4004 While Sieve is running the URL http://linuxbox.mynet:4004 should behave exactly the same as the server's URL except for the comment filtering function. Any other deviation is a bug and should be fixed. Sieve is stopped by pressing Control-C. Configuration: Sieve configures itself from file sieve.conf which must be in the local directory. A copy of the default sieve.conf is reproduced here: ------------------------------------------------------------ # sieve.conf: The Sieve configuration file # The '#' character at the beginning of a line indicates a comment. # All configuration lines in this file are commented out. # Remove the '#' character in front of a configuration line # to configure Sieve. # If you want to see all unrated posts then set "nonescore" to 5.0. # A post is filtered if its score is # less than or equal to the poster's threshold. # To plonk someone set his threshold to 5.0. # If defined, the threshold for member "*" applies to all members # for whom a threshold not explicitly defined. # The name of the web server with which Sieve corresponds (default) #server www.ip-wars.net # The name of computer on which Sieve is running (optional) #host mycomputer.mydomain # The TCP/IP port to which Sieve listens (default) #port 4004 # Assigned score for unrated posts (default) #nonescore 0.0 # Hold Potential Recruits to a higher standard (default) #3.0 Potential Recruit # Plonk this dweeb (optional) #5.0 heurtley # Don't view off topic/irrelevant posts (optional) #2.0 * # Threshold to assign (sieve) link additions. # If sievescore is set to 0.0 then the (sieve) link is # disabled. (default) #sievescore 0.0 ------------------------------------------------------------ If a post of a member who has no threshold defined in sieve.conf is rated at or below the "sievescore" threshold then Sieve inserts into the post a link "(sieve)" after the member's name. Clicking on this link will add the member to sieve.conf with the threshold specified by "sievescore". Construction: Sieve is contained in a single C language source code file sieve.c that compiles cleanly with high warning settings under Watcom 11.0c (Windows), Microsoft 11.0 (Windows) and gcc 3.3.3 (Linux). Sieve is built in Linux with the following command: [user@host user]# gcc -o sieve sieve.c Issues: (1) The name of the computer on which Sieve runs must contain a dot for cookies to work. If the name that Sieve finds for the computer does not contain a dot then Sieve appends ".localdomain" to the computer's name and tries to resolve that combined name. This is a horrible kludge that should be replaced with something else as it is not expected to work on most computers. In the meantime add a line to the computer's hosts file: Linux: /etc/hosts Windows: c:\winnt\system32\drivers\etc\hosts (approximately) (2) Sieve uses large static buffers and is vulnerable to buffer overflow attacks. If this is a concern then it would be good idea to use a firewall to block outside access to Sieve's TCP/IP port. (3) The main buffer is 2MB. If a Scoop reply is larger than 2MB then the reply is ignored and not relayed to the web browser. 2MB is about 1200 comments. (4) The HTTP 1.1 keep-alive headers are removed in the interest of simplifying the relaying logic. Roadmap: There is quite a lot of potential to Sieve. The program could easily be adapted for use with non-Scoop based web sites such as Slashdot. Sieve could also be made considerably more intelligent. One possibility is using some kind of heuristic to automatically score posts based on the author and post content. Copyright (c) 2005 Richard Heurtley. Verbatim copying and distribution of this entire document is permitted in any medium, provided this notice is preserved.