This section will hold notes on character sets until we get it all finally straight.
This was on the tomcat site
Character encoding has been the source of quite a bit of debate on the tomcat- dev list in recent weeks. There have been a few changes (see summary below) as a result. Essentially some additional configuration options have been provided. The UTF-8 issue (also reported in bug 22666) has also been fixed. Character encoding summary ========================== There are a number of situations where there may be a requirement to use non- US ASCII characters in a URI. These include: - Parameters in the query string - Servlet paths There is a standard for encoding URIs (http://www.w3.org/International/O-URL- code.html) but this standard is not consistently followed by clients. This causes a number of problems. The functionality provided by Tomcat (4 and 5) to handle this less than ideal situation is described below. 1. The Coyote HTTP/1.1 connector has a useBodyEncodingForURI attribute which if set to true will use the request body encoding to decode the URI query parameters. - The default value is true for TC4 (breaks spec but gives consistent behaviour across TC4 versions) - The default value is false for TC5 (spec compliant but there may be migration issues for some apps) 2. The Coyote HTTP/1.1 connector has a URIEncoding attribute which defaults to ISO-8859-1. 3. The parameters class (o.a.t.u.http.Parameters) has a QueryStringEncoding field which defaults to the URIEncoding. It must be set before the parameters are parsed to have an effect. Things to note regarding the servlet API: 1. HttpServletRequest.setCharacterEncoding() normally only applies to the request body NOT the URI. 2. HttpServletRequest.getPathInfo() is decoded by the web container. 3. HttpServletRequest.getRequestURI() is not decoded by container. Other tips: 1. Use POST with forms to return parameters as the parameters are then part of the request body.