Internationalization: Automating localized UI using Selenium WebDriver

In my last company, we had a web app that supported seven different languages. The challenge was to develop UI automated tests which could run on all localized UIs. We used Selenium WebDriver and its Java binding to simulate browser interactions with the web app. At the time we already had automated significant portion of the app for English UI. All test development was done on windows and in Eclipse IDE. Eclipse wouldn’t understand native characters from most of the languages, although I played around changing encoding to UTF-8 etc.  The text used to appear as garbled (My latest experience on OSX is that native text in Eclipse appear correctly, perhaps has to do with default encoding in OSX).

So my research began and I started to looking into internals of WebDriver and how it worked. I learnt that Selenium Webdriver uses JsonWireProtocol to communicate with browser and it supports UTF-8. A quick note from JsonWireProtocol documentation, “Although the server may be extended to respond to other content-types, the wire protocol dictates that all commands accept a content-type of application/json;charset=UTF-8. Likewise, the message bodies for POST and PUT request must use an application/json;charset=UTF-8 content-type“. That tells me that the underlying WebDriver implementation for any browser is going to convert request body into UTF-8 and communicate to the browser. However I had to deal with Eclipse errors and funky characters. So my next step was to find a tool that can convert native characters into UTF-8. I found that JDK provides an excellent utility called native2ascii which can easily convert native characters into their ASCII representation. Since this was a one time or once a while process, I decided to leave this step as manual. However there is a native2ascii maven plugin or an API which can be potentially used if you desire to automate this step. You can also provide desired encoding. So lets take below for example (If you are developing on OSX this step is perhaps not required)

source.txt
ラドクリフ、マラソン五輪代表に1万m出場にも含み

command
native2ascii -encoding utf8 source.txt output.txt

output.txt
\u30e9\u30c9\u30af\u30ea\u30d5\u3001\u30de\u30e9\u30bd\u30f3\u4e94\u8f2a
\u4ee3\u8868\u306b1\u4e07m\u51fa\u5834\u306b\u3082\u542b\u307f

Try this sample quick code below,

WebDriver driver = new FirefoxDriver();
driver.get("http://www.google.com.hk/");
String textToSend = "\u30e9\u30c9\u30af\u30ea\u30d5\u3001\u30de\u30e9\u30bd\u30f3\u4e94\u8f2a
\u4ee3\u8868\u306b1\u4e07m\u51fa\u5834\u306b\u3082\u542b\u307f";
WebElement searchButton = driver.findElement(By.name("q"));
searchButton.sendKeys(textToSend);

You will notice that the search will actually be populated with ラドクリフ、マラソン五輪代表に1万m出場にも含み. Nice and nice!!!! Now I have a way to easily transform something that Selenium WebDriver + my IDE can understand. This was promising indeed. So I decided to externalize text that I am verifying in properties files (key-value pair); one per locale. Obviously keys across locale will remain the same, just the values would differ. Based on the locale under test, I decided to load the appropriate property file and the automation will run seamlessly.

I was happy with this approach and went ahead with it. Our application contained many pages and two weeks later I realized that my property files are growing big. I had seven property files for seven languages to maintain and things were going out of control. I wanted to retrospect on my approach and find if I could change my approach quickly before its too late. Agile taught me well! I posted questions on forums but didn’t get any convincing answers.

I went to our front end developers to understand the actual implementation of localized UIs and learnt that backend implementation is independent of locale. They received locale files from external vendors who translated text into desired languages. These locale files were essentially a key/value pair, similar to my earlier approach. Keys across different locale files were same, but values would change based on the language. All Html files for individual locales were pre-compiled as a part of build and based on the users language preference at the login screen corresponding pre-compiled resources would be loaded, backend would remain the same. In some pages, based on user actions javascript would return dynamic text (locale specific success/error messages etc). I then realized that I was reinventing the wheel earlier and I could just straight use these resource bundles from vendors. After further thoughts, I decided to use locale files from vendors directly, parse them as Properties file and read in my code. Before I parse them, I converted all resource bundles from native characters to ASCII characters (UTF-8) so that Eclipse wouldn’t complain plus WebDriver could understand them as well. Like I showed above,

native2ascii -encoding utf8 japanese_native.txt japanese_ascii.txt
native2ascii -encoding utf8 chinese_native.txt chinese_ascii.txt
and so on....

So to put things into perspective, Below is what we would get from the vendors. Columns indicate locale file and two rows demonstrate sample content of those files.

english.text japanese_native.txt chinese_native.txt
SUBMIT_LABLE= Submit SUBMIT_LABLE= 提出する SUBMIT_LABLE= 提交
ERROR= Error ERROR=エラー ERROR=錯誤

native2ascii tool easily converted these native source files to ASCII file which were essentially fed into my automation suite. So the conversion would look like below,

english.text japanese_ascii.txt chinese_ascii.txt
SUBMIT_LABLE= Submit SUBMIT_LABLE= \u63d0\u51fa\u3059\u308b SUBMIT_LABLE= \u63d0\u4ea4
ERROR= Error ERROR=\u30a8\u30e9\u30fc ERROR=\u932f\u8aa4

And here is some psuedo-code to give you an idea. Based on the language of preference, I load appropriate locale file,

if(language.equalsIgnoreCase("japanese"))
properties.load(new FileInputStream("japanese_ascii.txt"));
else if (language.equalsIgnoreCase("spanish"))
properties.load(new FileInputStream("spanish_ascii.txt"));
and so on...

My page objects would have something like below. So just like the application code, my automation suite is only dependent on loading the correct file, page objects were independent of the locale.

String key = properties.get("SUBMIT_LABLE");
submitButton.getAttribute("value").equals(key)

Benefits of this approach

  • No need to maintain and create locale files.
  • UI functional automated tests ran on all locales seamlessly
  • Straight forward, less complicated- UI automation is difficult in itself; don’t complicate 
  • Automation suite found many locale specific bugs
    • Some languages were missing few keys during translation due to obvious human errors. Such errors resulted in default english text to appear when locale specific text was expected
    • Integration bugs detected with Javascript like dynamic success/error messages etc.

The approach worked great and provided significant value from testing standpoint.  The lesson I learnt from this experience is that we need to retrospect constantly and improve testing practices. Fail fast so that you can recover quickly. Work closely with developers and product stake holders. Working closely with developers and product stake holders, I was able to understand internals of our web application and it led me to answers I was looking for. UI automation is a hard problem to solve, not to mention the ever changing/evolving web app and need for testing on different browsers, platforms etc. I believe complicating browser automation suites further with adding more layers to support testing different locales is just an overhead and destined to fail. In my case the web app used resource bundles, some apps may have different implementation, I believe that for testing localized UIs, we should leverage native application solutions for internationalization as much as possible .

What do you think? How would you approach?