Archive

Posts Tagged ‘Java’

Java i18n

May 2nd, 2009 Animesh No comments

internationalizeTrivia: “i18n” 18 is the number of letters between the “i” and the “n” in “internationalization”

Basics:

Most traditional character encoding standards are 8-bit, and can only represent 256 different characters. While internationalization this becomes a bottleneck since 256 characters can’t accommodate every character possible. The solution to this problem is the adoption of universal character encoding: Unicode.

Unicode is
a) an industry standard designed
b) to bring together texts and symbols from all the writing systems of the world by
c) providing a unique number (not glyph) for every character.

This means that it represents a character in a number and the underlying application        will render the character (symbol, font, size, or shape) with some rendering/mapping algorithm.

There are several possible representations of Unicode data indicated by the Unicode Transformation Format (UTF). UTF is an algorithmic mapping from every Unicode code point to a unique byte sequence.

There are various UTF algorithms available such as, UTF-8, UTF-16 or UTF-32, but the preferred character encoding used in web environments is UTF-8, which is
a) a variable-length character encoding able to represent any character in the Unicode standard, yet
b) the initial encoding of bytecodes and character assignments for UTF-8 is consistent with ASCII, though not with Latin-1, because the characters greater that 127 differ.

Enabling full internationalization in a typical java web system:

A typical flow will look something like below:

Client   <–>   Internet <–> Web Server <–> Application Sever <–> DBMS
Let’s look at each layer and enable i18n to it.

Client: Web browsers (like Internet Explorer, Mozilla Firefox, Safari, and Opera) represent the client side of a web application. The best way to tell a browser about UTF-8 encoding is by putting the character-set information in the HTTP response header:

Server: Apache-Coyote/1.1
Pragma: No-cache
Cache-Control: no-cache
Expires: <date>
Content-Type: text/html;charset=utf-8
Transfer-Encoding: chunked
Date: <date>

Web Server:

1.      Most web servers use the encoding of the operating system, defined in the system property file.encoding.

This property is usually defined as
a) ISO-8859-1 in unix-based systems or
b) Cp1252 in windows systems.

To ensure UTF-8 support, the file.encoding property has to be redefined during system startup.

2.      Apache2 on Windows NT use UTF-8 for all filename encodings, but otherwise, recommends changing the Tomcat/JBoss startup script (run/catalina) to add the switch

-Dfile.encoding=UTF-8

to the startup call to the JVM to ensure that the HTTP response encoding will be defaulted to UTF-8. However, this can be overridden within the Java Servlet code as needed.

3.      Static hypertext documents should at the top of the <head> section include:
<meta http-equiv=”content-type”content=”text/html; charset=utf-8″>

4.      JavaScript block or file should include the charset attribute:
<script src=” scriptFile.js” type=”text/javascript” charset=”utf-8″></script>

Application Server: Application servers are programs that sit between web server and backend business applications or databases.

1.      Java files do not require any UTF-8 configuration, where JSP files enable UTF-8 encoding by placing a page directive at the top of the file and including pageEncoding and contentType attributes:
<%@ page contentType=”text/html;charset=utf-8″ pageEncoding=”utf-8″ %>

This page directive should be used in all JSP files that are included with the <jsp:include> tag (not the <%@ include %> page directive).

2.      Moreover, if JSP file contains a (X)HTML <head> tag, it should to include UTF-8 page directive:
<meta http-equiv=”content-type” content=”text/html; charset=utf-8″>

3.      When sendRedirect() method is used, query string parameters should be encoded with java.net.URLEncoder.encode() method.

4.      HTML forms should include charset attribute:

<form action=”processData.jsp” method=”post” enctype=”multipart/form-data; charset=utf-8″>
……..
</html:form>

The upper input form submits the form data in UTF-8. And a filter must be implemented to specify character encoding before reading the form parameters.

response.setContentType(”text/html; charset=UTF-8″);

5.      A request submitted through JavaScript with the form’s “GET”, multilanguage query string parameters should be encoded by using the JavaScript encodeURI method, and so should all standard HTML hyperlink tags <a href=”">.

6.      Java Dictionary Files (message bundles) are key-value hash kept to lookup for internationalized data. They do not provide a mechanism for indicating the encoding.

Therefore they have to be encoded manually. Java comes with a native2ascii converter which takes an -encoding switch to indicate the encoding of the file, the name of the source file and the name of the target file:

native2ascii -encoding UTF-8 SourceFile TargetFile

Database: Database management systems (DBMS) require character-set information when a new database or table is created.

1.      Databases that don’t support UTF-8 by default, default character set has to be defined as, for example, in MySQL’s configuration file (my.ini):
default-character-set=utf8

2.      Database drivers usually require extra configuration as for example when connecting to a MySQL database using a Java database connectivity (JDBC) driver:
Connection db =DriverManager.getConnection(”jdbc:mysql://localhost/myDatabase?useUnicode=true&characterEncoding=utf-8″,”username”,”password”);

References:

Synchronized Java

April 24th, 2009 Animesh 1 comment

synchronizedswimmingIn the Java Language the job of managing coordination between threads is largely pushed on to the developer. The primary tool for managing coordination between threads in Java programs is the synchronized keyword, in absence of which the JVM is free to take a great deal of liberty in the timing and ordering of operations (Refer JLS - Java Language Specification) executing in different threads. Most of the time this is desirable, but if not administered properly, such optimizations could compromise program’s correctness.

What is Synchronized?

Think of Java as in where “each thread runs on its own processor with its own local memory, each talking to and synchronizing with a shared main memory.”

While the semantics of synchronized include mutual exclusion (mutex) and atomicity, it’s far more than this in reality. It’s a guarantee that only one thread has access to the protected section at one time, but there are rules about the synchronizing thread’s interaction with main memory. In particular, the acquisition or release of a lock triggers a memory barrier — a forced synchronization between the thread’s local memory and main memory. When a thread exits a synchronized block, it performs a write barrier — it must flush out any variables modified in that block to main memory before releasing the lock. Similarly, when entering a synchronized block, it performs a read barrier — it is as if the local memory has been invalidated, and it must fetch any variables that will be referenced in the block from main memory.

Benefits:

When threads A and B synchronize on the same object, JMM guarantees that thread B sees the changes made by thread A, and that changes made by thread A inside the synchronized block appear atomically (either the whole block executes or none of it does) to thread B. Also, JMM ensures that synchronized blocks that synchronize on the same object will appear to execute in the same order as they do in the program.

Consequences of failing:

Data corruption and Race conditions (A race condition is a situation in which two or more threads are reading or writing some shared data, and the final result depends on the timing of how the threads are scheduled.): they can cause programs to crash, or behave unpredictably, or produce incorrect results. Worse, these conditions are likely to occur only rarely and sporadically making the problem hard to detect and reproduce.

Performance:

In tuning an application’s use of synchronization we should try hard to reduce the amount of actual contention, rather than simply trying to avoid using synchronization at all. During contending for the lock there will be several thread switches and system calls, raising the performance penalty substantially higher. According to a source, A synchronized call to an empty method may be 20 times slower than an unsynchronized call to an empty method.

Volatile?

It is a commonly held belief that since JLS guarantees that 32-bit reads will be atomic; you do not need to acquire a lock to simply read an object’s fields. This intuition is incorrect. Unless the fields are declared volatile, JMM guarantees no cache coherency and no sequential consistency. Also, however, while JMM prevents writes to volatile variables from being reordered with respect to one another and ensures that they are flushed to main memory immediately, it still permits reads and writes of volatile variables to be reordered with respect to nonvolatile reads and writes.

Techniques to write thread-safe programs:

Latency and scalability are two factors affecting program’s performance. Latency describes how long it takes for a given task to complete, and scalability describes how a program’s performance varies under increasing load or given increased computing resources. A high degree of contention is bad for both. When multiple threads contend for the same monitor, the JVM has to maintain a queue of threads waiting for that monitor (and this queue must be synchronized across processors), which means more time spent in the JVM or OS code and less time spent in your program code. Also, Contention impairs scalability because it forces the scheduler to serialize operations. When one thread is executing a synchronized block, any thread waiting to enter that block is stalled and processors may sit idle.

In short, we must reduce contention for critical resources in order to be able to maintain the scalability of our program.

Idea 1: Do only what needs to be done: Make synchronized blocks as short as possible. Do any thread-safe pre-processing or post-processing outside of the synchronized block.

Idea 2: Lock what needs to be safeguarded: Spread your synchronizations over more locks.

Idea 3: Provide a synchronized wrapper: The Collections classes are a good example of this technique; they are unsynchronized, but for each interface defined in the framework, there is a synchronized wrapper (for example, Collections.synchronizedMap()) that wraps each method with a synchronized version.

Idea 3: Collapse Locks: Obtain a broader lock once. In the code below, a broader lock has been obtained on the vector object before loops start. So, when elementAt() attempts to acquire the lock, the JVM sees that the current thread already has the lock, It lengthens the synchronized block, and less time will be spent on scheduling overhead. Though this violets Idea-1, it can be considerably faster, as less time will be lost to the scheduling overhead.

Vector v;

...

synchronized (v) {

for (int i=0; i

String s = (String) v.elementAt(i);

...

}

}

Idea 3: Reduce contention by giving each thread its own copy of certain critical objects using ThreadLocal which allows us to bypass the complexity of determining when to synchronize and it improves scalability because it doesn’t require any synchronization since each thread holds a separate copy of critical object.


References:

Image courtesy Phoenix Synchronized Swimming