Artikel vonKristian Köhntopp |
|
| Home | | Blog | Sitemap | Suchen | Webmaster | |
To our knowledge, most of the problems and objections here have not been addressed by PICS or any other ICR&S scheme.
The presence provider or web hoster provides the means to serve this content to the Internet by running the machines, the server software, and maintaining a network connection.
The recipient is either a person to be protected against harmful content, or an adult, which still should be able to access harmful, but not prohibited, content. The recipient's hard- and software is maintained by system services which is a separate department in a school or library situation, in an Internet Cafe, or within a company. The recipient's system may be a single system or a network of systems with proxy servers and intranet servers.
A rating service will apply the rating system and the rules that come with it to create ratings. These ratings, an identifier for the rated content, a date, an identifier for the rating source, and additional information (i.e. a checksum against the rated content and a digital signature) are collected to form a content label.
Content filter vendors create software which can regulate access to content, depending on local settings (filtering rules) and content labels.
Content filter control is often exercised by the party who controls a machine, that is, the adult party in a household, the dean of a school, the directorate of a library, and so on. These filter settings are then deployed, often by system services mentioned above, sometimes by an access provider located upstream.
Attackers may be content providers, recipients, or other parties who want to communicate outside of the control of a filtering system.
In Second Party Rating, the recipient provides ratings and shares them with other recipients. This is sometimes referred to as a Community Rating process. Since the sharing of ratings again involves a Label Bureau, for the purpose of this discussion Third Party Rating and Second Party Rating can be treated alike.
First Party Rating is different because here the sender provides a Content Label with the content itself. Usually, this label is embedded into the content or sent with the content. A Label Bureau is not needed in this context.
USENET articles are authored on the fly. Often they contain no keywords, subject lines of little use, or are inappropriately addressed. It is very unlikely that authors of USENET articles will take the time required to do proper First Party Rating for their articles, even if current USENET access programs offered such a feature.
Chatrooms and the Internet Relay Chat consist of even faster communication and no structure at all. IRC channels are often created on the fly and, unlike USENET, communication on IRC is not stored but only passed through by the servers. Also, the basic unit of communication in IRC is a single line of text which is much too small to rate properly. An IRC channel at any point in time is a very complex web of rapidly changing and overlapping individual and nonindividual communications that make up the overall tone and content of the channel.
Also, the public discussions in USENET newsgroups or IRC channels are only the publicly visible part of a much larger communication process which involves private communication by mailing lists, private e-mail, server based message communication, and direct client-client communication. Often the public communication channel is only used to meet other persons with similar interests, and then a more secluded communication channel is established.
Because of these properties of the more interactive communication mechanisms, Internet Content Rating and Selection will be ineffective in these type of media.
It can be safely assumed that this kind of cooperation will not be available in many cases and that it specifically will not be available in the most severe cases where the content provider places content on the web with a malign purpose.
Thus, it is insufficient to block pages that are rated as containing harmful or prohibited content. Instead, all unrated pages must be blocked as well, so that the pages of non-cooperating content providers will be made inaccessible. In this case, the rating of harmful and prohibited content becomes a moot point because such content will be blocked automatically. The Internet as viewed from behind a filter will be immediately clean in such a scenario because it will initially contain no content at all. Clearly, this is not very attractive and most of the targeted audience for content filtering services will not tolerate such filters if large portions of harmless or valuable content will be made unavailable to them by the content filter.
For a Content Rating Solution, it becomes a requirement to provide Content Labels for non-objectionable content, so that at least some pages will be available to those persons behind the selection filter. Thus, all burden and cost of content rating, labelling, and label distribution is placed on the providers of perfectly legal, and in most cases, valueable content.
This situation is not different from, for example, the movie and video industry where a film without a rating is automatically rated as not suitable for minors (at least this is the case e.g. in Germany and the United States). Unlike the movie industry, many content providers have little or no incentive to have their content rated because in many cases they do not sell content or do not cater specifically to a younger audience. Specifically, we can expect the large commercial websites positioned towards a younger audience to provide content labels on a voluntary basis, while most private content and content not specifically geared towards a young audience will not have labels at all.
Consequently, it may be necessary to enforce the provision of labels with content to make a sufficiently large portion of the Internet available to minors. Some content providers will probably try to evade such a requirement by providing a "safe" default rating, such as "harmful content, not suitable for minors", because they do not want to do the work to properly evaluate and individually label their content. Others will do this because they are insecure about the correct rating for their content, and will rate their content more harmful as it actually is, just to be legally safe. For a requirement to provide labels to be effective, it may be necessary to treat the provision of a too high label just like the provision as a label that is too low.
This turns the situation perfectly upside down: To protect against harmful and prohibited content, all work and all legal liability is at the side of the well-behaved members of the community, while it is technically completely unnecessary for the providers of harmful content to do anything.
In some countries, rating your own content cannot be legally enforced (see below).
A rating system is a cultural code, too, because with the dimensions and values provided with any such metric comes a decription on how to interpret and apply them. As a cultural code, a rating system cannot be objective or valid outside a specific culture. Specifying the dimensions of the rating system, for example, defines which issues can be addressed by a particular rating system. Issues and values outside of the dimensions of the rating system cannot be addressed, and are therefore inaccessible to reflective discussion within the space of the rating system. For the purpose of the rating system, such issues and values essentially do not exist.
Values within the dimensions of the rating systems can be used to define proximity relationship if these values are ordered. Many people tend to think of content that is in close proximity as being similar, being of similar value, or promoting similar values. Some content providers insist that their content is not to be rated with some content rating systems because rating them with that particular system would place them into the proximity of other content they do not like, and with which they do not want to be lumped together [1][2]. For example, both hardcore pornographic bondage content and a documentation about torture methods in the holocaust would receive very similar content ratings within the RSACi rating system. The RSACi system is too coarse and has no appropriate dimensions to differentiate between these two types of content. More complex rating systems raise ease-of-use issues in First Party Rating situations, though. Also, more context sensitive rating systems raise other issues, too (see below).
Establishing a metric in a social context and granting rewards under this metric tends to create dysfunctional social systems. For example, measuring the efficiency of programmers by measuring the lines of code produced by them tends to work in favour of code that has been produced by cutting and pasting large segments of code. Such code is generally large, ineffective, and expensive to maintain. Thus, while the metric provides useful information when used short-term, it tends to work destructively when used for longer periods of time (Tom DeMarco, "Why does software cost so much?", Essay 2). Essentially, degradation begins when the metric is used for a period long enough to allow for feedback loops.
A classic dysfunctional metric that can be observed on the web is rating the efficiency of a banner ad on a web page by counting hits to that banner ad. Using this metric for a longer time tends to encourage people to put content on their pages that creates page impressions and to load many banner ads on such pages. The result is your average free porn page. It also encourages spamming search engines and the network with ads for a particular page to increase page impressions. The ultimate dysfunctional perversion is to load the page with JavaScript code that opens further pages with banner ads when the user tries to leave or close the page. While the banner ads are not recognized by the user, at best, and create a negative image for the product advertised, at worst, such pages do generate lots of hits on a counter and are therefore favoured by the metric.
Because cultural and moral values cannot be expressed and measured directly like volts and ohms, any such metric will show dysfunctional behaviour and finally invalidate itself when used for a longer time. Obvious examples are things like exempting news sites from the requirement to rate content to relieve them from the burden to provide ratings for rapidly changing pages - this will provoke providers of content that rates bad under any given rating system to present their content in the format of a news site to get the bonus of exemption. Similarly, exempting web discussion forums from a requirement to provide ratings will make badly rated sites to adapt and to chose a forum-like format.
Because of the inability to address cultural or moral values objectively and outside of the context of a particular culture or moral system, it would be only logical and very tempting to create context sensitive rating systems. Context sensitivity here goes two ways: it may rate a specific content component within the presentation context, and it may rate specific content within a specific cultural context.
For example, singular pornographic images may rate bad in a certain system, but a photomosaic(tm) of girlie-power icon Lara Croft consisting of lots of such images may rate good within the same system because of artistic value and the message promoted by this artwork. So while the individual photo may still rate bad, in this particular context of presentation it will rate good. Also, a photo of James Dean leaning against his car and smoking a cigarette may rate good in some cultural contexts by showing a celebrity in a pop-art context, and bad in others as promoting environmentally bad modes of transport and nicotine drug abuse, depending on the priority of values of the culture perceiving the image. These priorities may change within the same culture or even the same recipient, depending on fashion or even personal mood.
What is more, only the process of filtering content which may appear as rating bad in a given context may actually create objectionable content in another context. For example, by suppressing all images which rate bad in one context there may be created a pattern that conveys a message of objectionable content [3].
So while context specific labelling is seductive as a way out of the dysfunctionality problem, it is also error prone, unenforceable (Would you please try to rate your web pages within the cultural context of the Amish People, the values of the current Nipponese, Spanish and Scandinavic societies?) and favouring one specific cultural context is of course an act of culture imperialism. It is also a value rating ("The context rating on your web page/for our web page in your rating service implies that you have authority to decide what Christian/Scientologian/Buddhist/... values are and that our page is bad according to what you assert these values are.") with all problems that come with such ratings, including a ton of liability issues.
Some proposals offer functionality that can be used to bundle presets in different rating systems into a single profile. For example, the PICSRules extension to the PICS system allows a user to define a single profile that contains an arbitrary number of selective criteria applied to different PICS compliant content ratings systems.
A user of such a PICSRule may be able to view content that rates this-or-that under SafeSurf ratings or such-and-such under RSACi ratings. Using such a system may create the impression that the this-or-that SafeSurf rating and the such-and-such RSACi rating are equivalent and that it is therefore possible to map one system onto the other. This is a thouroughly false impression.
What does work (in PICSRules) is to bundle arbitrary selective preferences into a profile, but this profile is in general (and probably in practice in most cases) not consistent or well-defined. Consider content that has been rated under two different rating systems and a majority of reviewers agreed in both cases that these ratings are correctly applied to this particular content. Assume further a recipient that is asked to define a filtering profile for his or her child in both of the rating systems reflecting the personal cultural values that this child should be able to expose itself to. Unless the page has a marginal rating (i.e. totally harmless or totally unacceptable), it will be common that the page will be rated accessible under one rating system, but inaccessible under the other, i.e. the ratings do not functionally map onto each other because of the different implied cultural values within the metric itself.
On the Internet, all persons have been cut out except for the sender and the recipient. Production, reproduction and distribution of content are automated or controlled by either party. In many cases, content location has been automated, too, by using search engines, portal sites, web rings, or bookmark lists. In some cases, even content processing has been automated where automatic translation services, summary generator services, or XML postprocessing are already deployed, thus limiting the personal involvement of the receiver as well. Also, in cases where content is being produced automatically, such as in catalog systems, database driven web shops, or in community content systems like slashdot.org, the involvement of the sender in content creation is limited, too.
This does not mean that the absolute number of people involved in the creating or processing of content has decreased. In fact, there are many people needed to keep the Internet running and to update web content. But these people are no longer involved in a single specific project (working as craftsmen), instead they keep running a generalized infrastructure which is used to transport and produce many, many different works (they are working as industrial workers).
E-commerce directly benefits from this, shortening supply and retail chains to a length of 1, or where processing is completely automated, even to a length of 0. E-commerce requires that concealed, tamperproof 1:1 communication is possible, i.e. that the infrastructure can be used to transport certain transactional data, but without the ability to gain knowledge of the actual transactional content, and without the ability to change that content.
This is essentially the opposite of the requirements for ICR&S, because ICR&S is all about gaining knowledge about the transported content and from case to case tampering with it (i.e. replacing the original content with a "this content is inaccessible" message). You cannot build secure e-commerce on any network that is capable of ICR&S and you cannot build functioning ICR&S on any network that provides secure e-commerce services - see below.
Most local content filters can be easily deactivated by setting a few registry keys, rewriting a line in an *.ini file, or by copying some original operating system files that have been replaced by the filter back into the system. The process only has to be reenginieered once and can then be automated, requiring no expertise at all from the user of such an automated attack tool.
For static pages, current webservers offer a lot of facilities to negotiate the actual content which is being served in response to a specific URL. In particular, serving language specific content is very popular. For example, the homepages of multinational projects (like the Debian Linux project, http://www.debian.org) are negotiated based on the language preferences that were expressed with a HTTP request. Also, many pages serve different content, depending on the IP-address or domain the request originated from or customize their content depending on the current time [4].
The problem worsens when it comes to dynamically generated pages, where the actual webpage which the client receives never exists on the server, but is created just as it is requested. The contents of such a page may change from one request to the next, even if all other parameters are identical. The server may take any parameters into account when creating the page, including internal state (memory of previous requests). Because internal state is kept on the server only, it is impossible to fully describe all request parameters needed to recreate a specific content without thorough knowledge of the application running on that particular server.
But even if we have perfectly static, non-negotiated content, URLs are inadequate to address content. URLs describe storage display objects such as images and HTML pages, but content can be smaller or larger than such pages.
For example, the start page of any portal, news site or discussion forum contains many small, unrelated articles which are presented in overview form. Each individual article is a semantic unit with different properties regarding a rating system, but with URLs we can only address the whole page and assign a rating to the entire page.
Conversely, a page or a site may be pieced together from individual documents which form a greater semantic unit that deserves a certain rating only when viewed as a whole. This may be true, for example, for an AIDS information site or a holocaust memorial site which may contain texts and images that deserve a very different rating when taken out of their original context.
HTML provides no mechanism to address subentities of a page or to address pages collectively. Some extentions to the XML specification will slightly improve this situation.
The problem is worsened for both ICR&S systems and search engines because there is no mechanism for web pages to notify indexing and rating services of a change. Web sites in general do not know which other services keep pointers to them and need to be notified of updates. Also, there is no mechanism to bind content identities to certain versions of that content, i.e. there is no way to work a modification date and/or a content checksum into a URL which designates certain content.
Third Party Rating is also impossible for content that is available only after passing a password protection (closed user groups, [5] talks about a "publicly indexable Web"). Some closed user groups are not so closed after all, though: With a simple registration, the protected area opens up. In such cases, the password protection essentially keeps all automated content processors and some casual browsers outside.
To get labels from a Label Bureau, the client system has to ask the Label Bureau each time a piece of content is being requested. Because it can be expected that the Label Bureaus provides its services for money, we can assume that each of these requests is identified and authenticated, so that the Label Bureau can charge for its services. Label Bureaus will therefore automatically get large and detailed trails of all requests a certain user creates while surfing the web, creating a large user profile of (identity, time, URL) triples.
In current implementations, there is no standard procedure with which a content provider is being notified that content has been rated or on which grounds these ratings have been established. In fact, some (usually propietary, non-PICS) rating services do not expose their rating criteria for public inspection or even encrypt their ratings in trying to hide which services they disapprove of [6]. These services often argue that this procedure is necessary so that their rating service cannot be inverted, i.e. turned into a search engine for objectionable content. Decryption of such rating files invariably showed questionable or overly broad ratings, though. For example, in such rating files it is common to find blocking entries for websites that are critic of ICR&S or of this particular rating service. Also, there are often very vague and general blocking entries to be found, for example wildcard blocks for anything that has the letters "xxx" or "sex" in an URL [7].
Even if a content provider is being notfied that some content has been rated, there is no standard complaints procedure with which a content provider can go against a rating that he feels to be inadequate. Also, the whole rating process is intransparent and inaccessible to revision: there are no standard provisions that enable any involved party to detect why false ratings have been created: Was it a true false rating, a storage problem, a transmission problem, or a misconfiguration? On the other hand, false filtering can have substantial financial impact or can be equivalent to libel.
But blocking unrated content unconditionally is open to unexplored civil damages issues.
Third Party Rating cannot keep up with the change of the web because of the sheer number of pages and because there is no way for content providers to communicate changes to Third Parties (see above).
First Party Rating puts the burden of rating content onto the content providers themselves which is good because it will automatically solve the problem of communicating changes. First Party Ratings must be controlled and validated, though, because a First Party will rate with a bias to promote own interests.
Thus, First Party Rating can be viewed as a mechanism to scale down the rating problem. Instead of n pages of content that have to be rated, a supervising authority now has the problem of controlling the correctness of ratings provided by m rating providers. We can only guess the relation of n:m, but we assume it to be in the range of approximately 100. m would still be a multi-million number.
The second reason for tamperproof, digitally signed labels is that the central authority controlling all First Party Raters has to have an instrument to put pressure on First Parties that do not rate according to the rating guidelines. The central authority would revoke the certificate needed to generate digitally signed labels from such parties, rendering all labels generated by them invalid.
Generating and administering a multi-million number of digital signatures is expensive, even if such signatures are low security (unfit to sign monetary transactions). Someone has to provide a database of all certificates issued, to verify identities of applications so that banned First Parties cannot apply again for a new certificate, and to perform all other administrative duties that come with running a certificate authority. Even if the cost for an individual certificate can be kept low, the other factor in the equation is still a multi-million number.
a
It is impossible for these sites to rate their content, even with a general rating that applies to the whole site, because the content on the sites changes much faster than even First Party Raters can react and because in this case the First Party sometimes does not even have full control over the content on their own site.
On the other hand, exemption from rating for such sites will quickly generate dysfunctional behaviour and it will also violate the principle of equal treatment.
Current implementations of ICR&S systems ignore this requirement. They require content labels to be sent /before/ the actual content within the HTTP header, or to be part of the HTML document header. If the content label has to include a content checksum and a digital signature or even if it is just being generated dynamically, the label can only be generated after the actual content has been written. Content generation now becomes a two pass process, in which the content generator first writes out the actual content, creating the content checksum and the digital signature, and then begins sending content label and checksum first, then the actual content.
Buffer sizes for a busy site can easily become gigantic. For example, the Slashdot site generates much content dynamically at request time. A Slashdot page can easily be as large as 200 KB, depending on the number of discussion entries on such a page. Assuming ISDN transfer speeds, sending such a page consumes approximately 25 seconds. Slashdot can easily have 100 to 200 requests per second, or 2500 to 5000 simultaneous connections. Simple multiplication shows that this will consume between 500 and 1000 MB of buffer space in the form of either RAM or harddisk bandwith (the problem with disk buffers is often not space, but read/write capacity). For Slashdot, dynamic content with digitally signed content labels translates easily into twice the hardware it has now.
Streaming media is potentially infinite in length, so checksumming becomes a much more complicated process because checksums can only be calculated for chunks of data and must be embedded into a multimedia stream. This requires that the multimedia format used anticipates the need for such a feature or must be redefined (i.e. existing software must be rewritten). Another problem occurs with compound content in which some components are optional. Conventional checksums don't work in this situation.
Finally, sending Content Labels after the content is not a good solution performancewise, either: It increases latency on the client side (that is, the PICS designers put the label before the content for a reason). When a client downloads content from the web, it has to decide whether to display the content or not. The client can only decide this after it has seen the label for that content. If sending the label is being delayed until after the actual content has been sent, building the display will substantially slow down because the client has to hold back content until it has seen the label for it. Incremental building of a display, as it is customary in current browsers, is out of the question in such cases. Besides, it will be frustrating and expensive for users if their client downloads a large file, only to decide to throw it away after the download has been completed due to the label for that content.
In the case of third party rating, the label bureau is a single point of failure: without access to the label bureau, no access to the network will be granted. The label bureau may also become a performance bottleneck.
Design changes at the local platform are necessary to enable mixed use (filtering and non-filtering) of that platform. For example, private web caches have to be redesigned. All current browsers have such caches and they are generally accessible via the filesystem with typically no access control. As a result, it is currently possible for somebody to access information directly in the web cache, which would not be available via a web browser due to it's content label. Also, to associate certain access rights with certain users, it is necessary to identify that user. That is, user authentication (a login prompt) becomes mandatory on systems where ICR&S is being deployed.
In many jurisdictions (e.g. in Germany) making use of a constitutional right so difficult that persons will no longer make use of this particular right is equivalent to illegally limiting that right. This implies that the filtering component of an ICR&S system must clearly signal its presence to the user and must clearly advertise instructions on how it can be turned off.
Also, ICR&S that are difficult to turn off are unlikely to be accepted by users on systems that see mixed use. Things that become inconvenient are simply deinstalled.
A high number of false positives will not only create the impression of an unreliable and untrustworthy ICR&S system, but it will also undermine trust into the Internet as a reliable platform for communication of political ideas and social processes. If ICR&S systems are being publicly perceived primarily as a tool to leverage political or commerical interests, their value as a tool for the protection of minors is gone. This implies that ICR&S systems must have some mechanism to defend themselves against such an attack.
'Jonathan Wallace, thus, in an article called "Why I Will Not Rate My Site" asks how he is to rate "An Auschwitz Alphabet", his powerful and deeply chilling work of reportage on the Holocaust. The work contains descriptions of violence done to camp inmates' sexual organs. A self-rating system, Wallace fears, would likely force him to choose between the unsatisfactory alternatives of labeling his work as suitable for all ages, on the one hand, or "lump[ing it] together with the Hot Nude Woman page" on the other. It seems to me that at least some of the rating services problems' in assigning ratings to individual documents are inherent. It is the nature of the process that no ratings can classify documents in a perfectly satisfactory manner, and this theoretical inadequacy has important real-world consequences.'
'Kiyoshi Kuromiya, founder and sole operator of Critical Path Aids Project, has a web site that includes safer sex information written in street language with explicit diagrams, in order to reach the widest possible audience. Kuromiya doesn't want to apply the rating "crude" or "explicit" to his speech, but if he doesn't, his site will be blocked as an unrated site. If he does rate, his speech will be lumped in with "pornography" and blocked from view. Under either choice, Kuromiya has been effectively blocked from reaching a large portion of his intended audience teenage Internet users as well as adults. [ ... ] Kuromiya could distribute the same material in print form on any street corner or in any bookstore without worrying about having to rate it. In fact, a number of Supreme Court cases have established that the First Amendment does not allow government to compel speakers to say something they don't want to say - and that includes pejorative ratings.'
Both modules enable the server to deliver different static pages for the same URL, depending on other information that is part of the browsers request and that is /not/ part of the URL. It is the URL though, which ties a thrid party rating to a specific page.
"The coverage of the major Web search engines investigated varies by an order of magnitude (variation will differ for different queries, e.g. more popular queries)."
"The major Web search engines index only a fraction of the total number of documents on the Web. No engines indexes more than about one third of the "publicly indexable Web"."
http://www.liii.com/~just4fun/news/article1.htm reports an incident where material of writer Anne Sexton is being blocked, because her name matches the string "sex".
http://www.anonymizer.com/ offers services like
Some parts quoted in http://www.inet-one.com/cypherpunks/dir.97.08.28-97.09.03/msg00149.html
Report submitted by the Expert Group to G8 Ministers and Chief Advisors of Science and Technology (Carnegie Group)
http://www2.echo.lu/legal/en/internet/wp2en.html "Illegal and harmful content on the Internet Interim report on Initiatives in EU Member States with respect to Combating", Version 7, (June 4, 1997)
http://www2.echo.lu/legal/de/internet/resolde.html "ENTSCHLIESSUNG DES RATES ZU ILLEGALEN UND SCHÄDLICHEN INHALTEN IM INTERNET"
http://www.gtlaw.com.au/pubs/newdarkage.html "Censorship and Amendments to the Broadcasting Services Act" (Australia, April 1999)
The article discusses non-PICS methods of blocking content and why they do create even more harm than PICS. It serves as an introduction to this article.
| Top | Geändert:15-Feb-2004 15:10:43 Url: http://kris.koehntopp.de/artikel/rating_does_not_work.html |