Archive

Archive for the ‘Technology’ Category

Java i18n

May 2nd, 2009 Animesh No comments

internationalizeTrivia: “i18n” 18 is the number of letters between the “i” and the “n” in “internationalization”

Basics:

Most traditional character encoding standards are 8-bit, and can only represent 256 different characters. While internationalization this becomes a bottleneck since 256 characters can’t accommodate every character possible. The solution to this problem is the adoption of universal character encoding: Unicode.

Unicode is
a) an industry standard designed
b) to bring together texts and symbols from all the writing systems of the world by
c) providing a unique number (not glyph) for every character.

This means that it represents a character in a number and the underlying application        will render the character (symbol, font, size, or shape) with some rendering/mapping algorithm.

There are several possible representations of Unicode data indicated by the Unicode Transformation Format (UTF). UTF is an algorithmic mapping from every Unicode code point to a unique byte sequence.

There are various UTF algorithms available such as, UTF-8, UTF-16 or UTF-32, but the preferred character encoding used in web environments is UTF-8, which is
a) a variable-length character encoding able to represent any character in the Unicode standard, yet
b) the initial encoding of bytecodes and character assignments for UTF-8 is consistent with ASCII, though not with Latin-1, because the characters greater that 127 differ.

Enabling full internationalization in a typical java web system:

A typical flow will look something like below:

Client   <–>   Internet <–> Web Server <–> Application Sever <–> DBMS
Let’s look at each layer and enable i18n to it.

Client: Web browsers (like Internet Explorer, Mozilla Firefox, Safari, and Opera) represent the client side of a web application. The best way to tell a browser about UTF-8 encoding is by putting the character-set information in the HTTP response header:

Server: Apache-Coyote/1.1
Pragma: No-cache
Cache-Control: no-cache
Expires: <date>
Content-Type: text/html;charset=utf-8
Transfer-Encoding: chunked
Date: <date>

Web Server:

1.      Most web servers use the encoding of the operating system, defined in the system property file.encoding.

This property is usually defined as
a) ISO-8859-1 in unix-based systems or
b) Cp1252 in windows systems.

To ensure UTF-8 support, the file.encoding property has to be redefined during system startup.

2.      Apache2 on Windows NT use UTF-8 for all filename encodings, but otherwise, recommends changing the Tomcat/JBoss startup script (run/catalina) to add the switch

-Dfile.encoding=UTF-8

to the startup call to the JVM to ensure that the HTTP response encoding will be defaulted to UTF-8. However, this can be overridden within the Java Servlet code as needed.

3.      Static hypertext documents should at the top of the <head> section include:
<meta http-equiv=”content-type”content=”text/html; charset=utf-8″>

4.      JavaScript block or file should include the charset attribute:
<script src=” scriptFile.js” type=”text/javascript” charset=”utf-8″></script>

Application Server: Application servers are programs that sit between web server and backend business applications or databases.

1.      Java files do not require any UTF-8 configuration, where JSP files enable UTF-8 encoding by placing a page directive at the top of the file and including pageEncoding and contentType attributes:
<%@ page contentType=”text/html;charset=utf-8″ pageEncoding=”utf-8″ %>

This page directive should be used in all JSP files that are included with the <jsp:include> tag (not the <%@ include %> page directive).

2.      Moreover, if JSP file contains a (X)HTML <head> tag, it should to include UTF-8 page directive:
<meta http-equiv=”content-type” content=”text/html; charset=utf-8″>

3.      When sendRedirect() method is used, query string parameters should be encoded with java.net.URLEncoder.encode() method.

4.      HTML forms should include charset attribute:

<form action=”processData.jsp” method=”post” enctype=”multipart/form-data; charset=utf-8″>
……..
</html:form>

The upper input form submits the form data in UTF-8. And a filter must be implemented to specify character encoding before reading the form parameters.

response.setContentType(”text/html; charset=UTF-8″);

5.      A request submitted through JavaScript with the form’s “GET”, multilanguage query string parameters should be encoded by using the JavaScript encodeURI method, and so should all standard HTML hyperlink tags <a href=”">.

6.      Java Dictionary Files (message bundles) are key-value hash kept to lookup for internationalized data. They do not provide a mechanism for indicating the encoding.

Therefore they have to be encoded manually. Java comes with a native2ascii converter which takes an -encoding switch to indicate the encoding of the file, the name of the source file and the name of the target file:

native2ascii -encoding UTF-8 SourceFile TargetFile

Database: Database management systems (DBMS) require character-set information when a new database or table is created.

1.      Databases that don’t support UTF-8 by default, default character set has to be defined as, for example, in MySQL’s configuration file (my.ini):
default-character-set=utf8

2.      Database drivers usually require extra configuration as for example when connecting to a MySQL database using a Java database connectivity (JDBC) driver:
Connection db =DriverManager.getConnection(”jdbc:mysql://localhost/myDatabase?useUnicode=true&characterEncoding=utf-8″,”username”,”password”);

References:

Open Source Gospels

April 29th, 2009 Animesh 1 comment

Last night, I was reading “The Cathedral and the Bazaar” by Eric S. Raymond, an old article (Date: 1998/11/22) about how Linux was developed breaking the then common beliefs and practices of The Programming World and what lesson were there to learn. In general, the article talks about the cultural shift in programming methodologies and guides to open-up and collaborate to cultivate benefits from the crowd.

Eric called them Gospels. I thought I will list them here for my own future revisions.

  • Every good work of software starts by scratching a developer’s personal itch.
    Perhaps this should have been obvious; it’s long been proverbial that “Necessity is the mother of invention”.
  • Good programmers know what to write. Great ones know what to rewrite (and reuse).
  • “Plan to throw one away; you will, anyhow.” (Fred Brooks, “The Mythical Man-Month”, Chapter 11)
  • If you have the right attitude, interesting problems will find you.
  • When you lose interest in a program, your last duty to it is to hand it off to a competent successor.
  • Treating your users as co-developers is your least-hassle route to rapid code improvement and effective debugging.
    The power of this effect is easy to underestimate. In fact, until Linus Torvalds proved differently, everyone in the open-source world drastically underestimated how well it would scale up with number of users and against system complexity.

    Linus’s cleverest and most consequential hack was not the construction of the Linux kernel itself, but rather his invention of the Linux development model. To this, he comments: “I’m basically a very lazy person who likes to get credit for things other people actually do.” Lazy like a fox, or, as Robert Heinlein might have said, too lazy to fail.
  • Release early. Release often. And listen to your customers.Linus was directly aiming to maximize the number of person-hours thrown at debugging and development, even at the possible cost of instability in the code and user-base burnout if any serious bug proved intractable. Linus was behaving as though he believed something like this:
  • Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone.
  • Smart data structures and dumb code works a lot better than the other way around.
    Brooks, Chapter 9: ‘”Show me your [code] and conceal your [data structures], and I shall continue to be mystified. Show me your [data structures], and I won’t usually need your [code]; it’ll be obvious.”
  • If you treat your beta-testers as if they’re your most valuable resource, they will respond by becoming your most valuable resource.
  • The next best thing to having good ideas is recognizing good ideas from your users. Sometimes the latter is better.
  • If you are completely and self-deprecatingly truthful about how much you owe other people, the world at large will treat you like you did every bit of the invention yourself and are just being becomingly modest about your innate genius. We can all see how well this worked for Linus!
  • Often, the most striking and innovative solutions come from realizing that your concept of the problem was wrong.
  • It’s often time to ask not whether you’ve got the right answer, but whether you’re asking the right question. Perhaps the problem needs to be reframed.
  • Don’t hesitate to throw away superannuated features when you can do it without loss of effectiveness. Antoine de Saint-Exupéry (who was an aviator and aircraft designer when he wasn’t being the author of classic children’s books) said: “Perfection (in design) is achieved not when there is nothing more to add, but rather when there is nothing more to take away.
  • Any tool should be useful in the expected way, but a truly great tool lends itself to uses you never expected.
  • Don’t disturb the data stream  — and *never* throw away information unless the recipient forces you to!
  • A security system is only as secure as its secret. Beware of pseudo-secrets.
  • To solve an interesting problem, start by finding a problem that is interesting to you.
  • Provided the development coordinator has a personality and medium at least as good as the Internet, and knows how to lead without coercion, many heads are inevitably better than one.

Synchronized Java

April 24th, 2009 Animesh 1 comment

synchronizedswimmingIn the Java Language the job of managing coordination between threads is largely pushed on to the developer. The primary tool for managing coordination between threads in Java programs is the synchronized keyword, in absence of which the JVM is free to take a great deal of liberty in the timing and ordering of operations (Refer JLS - Java Language Specification) executing in different threads. Most of the time this is desirable, but if not administered properly, such optimizations could compromise program’s correctness.

What is Synchronized?

Think of Java as in where “each thread runs on its own processor with its own local memory, each talking to and synchronizing with a shared main memory.”

While the semantics of synchronized include mutual exclusion (mutex) and atomicity, it’s far more than this in reality. It’s a guarantee that only one thread has access to the protected section at one time, but there are rules about the synchronizing thread’s interaction with main memory. In particular, the acquisition or release of a lock triggers a memory barrier — a forced synchronization between the thread’s local memory and main memory. When a thread exits a synchronized block, it performs a write barrier — it must flush out any variables modified in that block to main memory before releasing the lock. Similarly, when entering a synchronized block, it performs a read barrier — it is as if the local memory has been invalidated, and it must fetch any variables that will be referenced in the block from main memory.

Benefits:

When threads A and B synchronize on the same object, JMM guarantees that thread B sees the changes made by thread A, and that changes made by thread A inside the synchronized block appear atomically (either the whole block executes or none of it does) to thread B. Also, JMM ensures that synchronized blocks that synchronize on the same object will appear to execute in the same order as they do in the program.

Consequences of failing:

Data corruption and Race conditions (A race condition is a situation in which two or more threads are reading or writing some shared data, and the final result depends on the timing of how the threads are scheduled.): they can cause programs to crash, or behave unpredictably, or produce incorrect results. Worse, these conditions are likely to occur only rarely and sporadically making the problem hard to detect and reproduce.

Performance:

In tuning an application’s use of synchronization we should try hard to reduce the amount of actual contention, rather than simply trying to avoid using synchronization at all. During contending for the lock there will be several thread switches and system calls, raising the performance penalty substantially higher. According to a source, A synchronized call to an empty method may be 20 times slower than an unsynchronized call to an empty method.

Volatile?

It is a commonly held belief that since JLS guarantees that 32-bit reads will be atomic; you do not need to acquire a lock to simply read an object’s fields. This intuition is incorrect. Unless the fields are declared volatile, JMM guarantees no cache coherency and no sequential consistency. Also, however, while JMM prevents writes to volatile variables from being reordered with respect to one another and ensures that they are flushed to main memory immediately, it still permits reads and writes of volatile variables to be reordered with respect to nonvolatile reads and writes.

Techniques to write thread-safe programs:

Latency and scalability are two factors affecting program’s performance. Latency describes how long it takes for a given task to complete, and scalability describes how a program’s performance varies under increasing load or given increased computing resources. A high degree of contention is bad for both. When multiple threads contend for the same monitor, the JVM has to maintain a queue of threads waiting for that monitor (and this queue must be synchronized across processors), which means more time spent in the JVM or OS code and less time spent in your program code. Also, Contention impairs scalability because it forces the scheduler to serialize operations. When one thread is executing a synchronized block, any thread waiting to enter that block is stalled and processors may sit idle.

In short, we must reduce contention for critical resources in order to be able to maintain the scalability of our program.

Idea 1: Do only what needs to be done: Make synchronized blocks as short as possible. Do any thread-safe pre-processing or post-processing outside of the synchronized block.

Idea 2: Lock what needs to be safeguarded: Spread your synchronizations over more locks.

Idea 3: Provide a synchronized wrapper: The Collections classes are a good example of this technique; they are unsynchronized, but for each interface defined in the framework, there is a synchronized wrapper (for example, Collections.synchronizedMap()) that wraps each method with a synchronized version.

Idea 3: Collapse Locks: Obtain a broader lock once. In the code below, a broader lock has been obtained on the vector object before loops start. So, when elementAt() attempts to acquire the lock, the JVM sees that the current thread already has the lock, It lengthens the synchronized block, and less time will be spent on scheduling overhead. Though this violets Idea-1, it can be considerably faster, as less time will be lost to the scheduling overhead.

Vector v;

...

synchronized (v) {

for (int i=0; i

String s = (String) v.elementAt(i);

...

}

}

Idea 3: Reduce contention by giving each thread its own copy of certain critical objects using ThreadLocal which allows us to bypass the complexity of determining when to synchronize and it improves scalability because it doesn’t require any synchronization since each thread holds a separate copy of critical object.


References:

Image courtesy Phoenix Synchronized Swimming

Wikipedia Primer

April 9th, 2009 Animesh No comments

wikipedia-logoWhat is Wikipedia? It’s a free online encyclopedia of and about anything one can think of, created by anyone, edited modified and controlled by everyone. Simply put, it’s a huge dashboard where anybody can contribute to any article – or create one himself. Futurist Jaron Lanier called Wikipedia an example of “Digital Maoism” – the closest humanity has ever come to a functioning mob rule.

Why should we care? Because,

  1. Wikipedia’s articles are among the top results of internet searches.
  2. Wikipedia’s standards of inclusion affect the work of journalists.
  3. Studies have found that Wikipedia’s articles are remarkably accurate, despite several accusations by academic experts based upon the very fact that they are written, edited and controlled by volumes of volunteers.
  4. Wikipedia community has seen explosive growth.
  5. If the stuff in Wikipedia didn’t seem true enough, you wouldn’t keep going back to the website.
  6. It’s more that 7 millions registered user has evolved a set of policies and procedures to eradicate untruths.

With all those people and their different perspectives, is truth getting hampered somewhere?

What are those policies and procedures?

  1. Objective truth has no value. (There is no place for objective truth here. No place for anything obvious. Something is true only if it is verifiable, vetted by verifiable publications, newspapers, magazines or journals or anywhere reachable by a mouse click.) Threshold for inclusion in Wikipedia is verifiability, not truth. It also defines the order of reliability of cited sources:
    a.    Peer reviewed journals and books published in university press
    b.    University level text-books
    c.    Books published by respected publishing houses
    d.    Mainstream newspapers (excluding opinion pages)
  2. No original work, and
  3. Neutral point of view.

References:
1. http://www.technologyreview.com/web/21558/?a=f

Categories: Technology Tags: , ,

Firefox and Apache

April 4th, 2009 Animesh No comments

Came across this frustrating fact, while trying to connect to Tomcat proxied behind Apache though an Ajax connection… and continuously getting “page not found”:

  1. Firefox doesn’t pass the content-length header on POST requests.
  2. Apache has a flag “SecFilterEngine” which if set to ‘On’ would tell Apache to reject requests without a content-length header.

Dilemma:

  1. Shouldn’t Firefox send the content-length header?
  2. Or, why should Apache demand it?

answers?

Categories: Did you know? Tags: , , , ,