Friday, December 1, 2017

A Month With Amazon Echo Dot.

Amazon Echo was released in India last month and I bought one right away. I was actually waiting
for any of the voice assistants to be released, I would have personally wanted the Google Home to be released as I have been trying to build some Google assistant apps lately. Also Google kind of has all my personal data and it could just crunch and give me the right information in right time like Google now does in phones. I had recently successfully run Google Assitant and Amazon Alexa on a Raspberry Pi3 but the features there were very limited. The Echo Dot was very cheap and I had a spare Bose sound doc with Bluetooth receiver which I could connect to Echo Dot. Following is my impression after using it for a month
  • Voice recognition is amazing (both the word "Alexa" and the question followed). Even from the other corner of the room, I ask something not very clearly (e.g. in grumpy voice) it recognizes it. It even worked with my friends who have different accent than me. Great job Amazon there.
  • I am a big-time radio listener it can play and switch to almost any internet radio channel from TuneIn in a second. It's really painful to search and change stations from a Bluetooth connected phone and if you get any call or notification in the phone it would interrupt. I usually play BBC world service, VOA, 977 adult hits, NDTV India etc.
  • I could also play any songs (Primarily English  & Hindi) from Prime Music randomly as it occurs to me, right from my desk or sofa while working on a laptop or playing with my daughter. When you play from the phone, it can be interrupted during notification and calls and the phone has to be near me, I need to locate the music app and search the song. Also sometimes when you want to immediately pause/stop/lower the volume it's not straightforward with the phone, with a lot of apps running. There is no custom playlist yet that you make and let Alexa play. It just learns from your pattern and tries to play the artists you may like when you don't mention what songs to play. I am waiting for the prime music to be launched in mobile. I will probably cancel my Google Music subscription (I can't play them in Chromecast and completely go Amazon.
  • I don't really have any smart light yet as they are very expensive but it could be useful. I am planning to buy TP-Link HS100 Wi-Fi Smart Plug.
  • It can read the news headlines from your choice of sources i.e. NDTV, TOI, and Cricinfo when you ask for Flash Briefings but once you enable the sources it will play all of them one by one, which is very annoying. Why can't I ask for the source I want. In most cases, content is same across sources.
  • I am not able to add my work calendar yet (Microsoft Exchange) as it requires my company's IT admin rights. I don't really much on Google Calendar except reminders to pay credit card bill.
  • I am not really interested in asking trivial questions like (how are you? and how old are you), it could be funny to demo but pretty much useless. Even weather does not change much in Hyderabad.
  • I am a podcast listener as well but the AnyPod skill is not really easy to manage and it can't pause the episode.
  • Once in a while, I have asked cricket scores during the current Ashes and India-Srilanka series.
  • I have set reminders and it's more useful then smartphone as it just shouts outs loud when if you get to check your phone.
  • I have not found a good skill to ask about my investments stock market (app idea?).
  • Uber skill is buggy as it goes into loop and I am yet to try Ola. Hope they fix the skill soon.
  • My most wanted feature is to be able to make calls but again not sure if I ever call someone with speakerphone on.

As you see I am mostly using it for radio and music, however, I am looking for ideas to build skills myself which can make things easier for me. Feel free share ideas in the comment :)

Sunday, November 19, 2017

GDG Devfest Hyderabad 2017 - Building Google Assitant App

Growing up we read stories where we talk to robots and get our things done. There has been decent amount of advancement in robotics to help us in the manufacturing industry and space exploration.However, it is still not a thing that someone can own one easily and dictate with commands.  Voice-based interaction and computing is a new and exciting area. With so much of investment in Amazon's Alexa, Google's Assistant, Apple's Siri and Microsoft's Cortana etc, it's definitely going to be a serious business in near future. Right now most of us are just having fun with knowledge based questions and puzzle/riddles. These platforms also allow us to create our own app and make it accessible to the whole world to interact with just user's voice.

Lately, I have been trying out building such apps with Google Voice Actions and got a chance to demo the same at GDG Devfest Hyderabad this year. Here is the slide, however it was mostly a live demo.

I tried to create a Google assistant app which will respond with the stock price of any company registered in NSE.

Creating an Agent and Intent

  1.  Head over to to create your own agent. Name it "Market-Guru". Note every agent is an app once integrated to Google Assitant.
  2. In the Intents section, you can change the default and fallback intent to customize what should it respond when you enter the app (e.g "Welcome to Market Guru") and when it does not recognize any intent. (e.g "I can't tell that, you can try asking stock price of companies.")
  3.  Create a new Intent called "Stock-Teller". This is something that should be invoked when you ask the app a certain question. 
  4. You can enter this question and its varients in the "User says" section of the intent.
  5. In the "Text Response" section at the bottom, you can mention what it should answer when you ask the question.
  6. Once you save the question, you will get a toaster saying "Agent training started". This is when dialogflow trains its model using the sample questions you provided for the intent and its own language knowledge. 
  7. You can test your agent right now by typing the questions in the top right section. e.g. "What is the price of Infosys limited?" and you should get the hardcoded response that you have typed in earlier. However if you ask something like "How are you?" It does not really match with the questions in the intent "Stock-Teller" or any other intent defined, it will fall back to the fallback intent and respond with a generic answer that you may have hardcoded there. 
  8. You can also ask the variant of the question like "What is the share price of Infosys Limited" and it should be already smart enough to match with the intent "Stock-Teller" although this exact question is not mentioned.
  9. Now you may want to recognize the company name that was spoken in the question. So you can click on the phrase "Infosys Limited" and create a parameter and name it company_name. The entity of the parameter should be @sys.any because it could be any text, right now we don't have an exhaustive list of company names.
  10. You can change the response from hardcoded string to "I don't know $company_name price yet." and test again. It should be able to respond the recognized company name.
  11. Extracting keywords/parameters from questions.
  12. Try adding more varients of questions and tag the company name for better accurancy. like "How much does Infosys cost?"


So far we have just hardcoded the response. To dynamically fetch the required data from any web service you can make it talk to a web service.
  1. Since dialogflow calls web service and expects the response in a specific format you have to create an endpoint to plug.
  2. You can create a cloud function at Google Cloud Platform Console and use the following Node.js code to return the stock price given the company name. This actually calls another web service to fetch the stock price and formats in a way that dialogflow can understand.
  3. In Fulfillment section add the generated endpoint URL.
  4. After saving, you should go to the intent "Stock-Teller" and check the box at the bottom which says "Use webhook" under Fulfillment section. This will make it call the webhook URL instead of responding with the hardcoded reponse.
  5. Now you can test again and if the URL call is sucessful, it should return the stock price and you can try the play button again the reponse to read it loud also. If the URL call failed, it would fall back to the default hardcoded reponse.  
  6. If something goes wrong, check the JSON request/response from the bottom right side and verify if correct parameters are sent or the webhook response code is HTTP 200 or not. If the webhook execution fails, then you may want to check the cloud function log to see any problems in Node.js code. 


Finally we want to make it available in Google Assitant.
  1. In the integrations tab enable "Google Assitant" so that it would call dialogflow API (Google Assitant would convert speech to text and send a query to dialogflow with question in text format.) to answer your questions. Note, in this screen you have a lot more options like Skype and Slack to create a chatbot, however we need a voice conversation.  
  2. Once you click the Google Assitant integration, you can click on the "Test" to make your app available for simulation and then click "View" to head over to the Google Voice Actions simulator. You can also select the agent that you have created at the top.
  3. In the simulator you can just type or speaki using the mic button to say "Talk to my test app" and it should pull up your recently created agent from dialogflow.
  4. First it would greet you with the response mentioned in "Default response"
  5. Since your app is up for simulation already, now you can try this from any of your device live phone's Google Assitant , Google Home or Raspberry Pi which is signed in with the same Google accout that you created your test app.
  6.  You can also just import the while agent which I have exported to setup everything at once, instead of creating from scratch in the console. You may have to adjust the webhook URL to make sure it point to the current one. I may take down the demo cloud function to avoid cost.
  7. All the things that you do within the dialogflow console can also be programetically done using dialogflow APIs.


If at any time during testing the intents ad parameters are not recognized properly then  you can head over to the training tab and correct the behaviour for any bad executions in the history. This will re-train the machine learning model helping with future convesations.

Other Possibilities.

Here are some more stuff you can do to make your UX better.
  1. Make the parameters "Action" section of the intent mandatory to ask the user back if he gives incomplete information in the question.
  2. You can create a follow up intent for any intent to make confirmation or get user choice before you execute what user says. 
  3. For follow up questions, you can use the parameters from the existing intent's context. i.e. #context_name.param_name would retrive any earlier parameters and user does not have to repeat the whole question all over again just because he missed out on a specific piece of information.
  4. You can set output context and take decisions based in the intent name from within the webhook prgramatically.
  5. You can optionally send rich output to your phone's Google Assitant which can include images, cards etc.

Saturday, May 6, 2017

The curious case of W & V in Indian English

You have to agree we Indians are one of the best non-native speakers of English language. In my opinion, the measure is based on the probability that the other party can understand irrespective of where he is from and if he is a native speaker. For example, Indian accent is such that it stresses most of the syllables which make them easier to understand of course it is sometimes very very different than how natives speakers (British/American) would say. In this era, English is definitely not just the way the English (people of from England ? ) speak. I think the way Indians speak is far more understandable than they way they speak in most Europian countries minus Britain, after Brexit I can just say most countries in the europian union. Probably that is the reason why we have very good relationship with USA from IT/Software industry perspective.

Having said that, within India itself there are so many ways people speak English depending on their first language. I am sure this is the case in USA and Britain also given that the accent in southern states of USA speaks bit differently than the north-eastern ones. Here in India most of the times you can very accurately tell the where the person is from when you hear them speaking English. For example, the same word "bus" is spoken by a person from Bengal as "baas" but an Odia guys would speak as "boss". Probably the correct pronunciation is somewhere between these. Same for the work "bug", "cup" etc. The sound of "u" that is used in the word bus does not exist in any of the two languages so they just map it to one of the closest vowels.

If you go to the western India like Gujarat, you may hear the word "Snack" as "Snake" and "wrap" as "rape".

South India (for me the state of Andhra Pradesh and south), its a totally different ball game. I tend to generalise because it all look the same for me. I am sure, same is the case from the perspective of a person from the south about the north Indian states. If you are an Indian probably its a piece of cake for you to distinguish people from south India from their English accent, sometimes from the tone itself. Speaking of specific words you may hear the word "Fixed" as "Fix-ed", "against" as "egg-nest", "environment" as "en-vee-ron-ment". These differences becomes more prominent, more the south you go in India. The way words starting with "H" is pronouces is also has a pattern i.e. "Honest" is spoken as "hon-est" not "on-est". Same for "Honour" , people may say "hon-or" not "on-or". Putting all these deviations of English pronounciation aside I must say people in south are better in English than other states becaise lack of familiarity with Hindi. English is the only fall back language if you are talking to someone from another state. If I can take the liberty of steriotyping, they learn English well because ultmate goal is to settle in USA :)

Now lets talk about North !

Most of state speak Hindi. When you hear English you may observ the letter "W" and "V" used interchangably, in fact swapped most of the times. "Venus" is spoken as "Winus" and "Wicket" is spoken as Viket". In fact I have friends who in the chat write "ve" instead of "We",  not how it is a short hand because both are two word letters. Its just how they think it should sound like I guess :)

Its very hard to realise this mistake for a native Hindi speaker but no one really teaches the specifics of pronouciation of the letter "V" i.e. the upper theeth should touch lower lips not just "we, I think the culprit is the hindi letter "व" (wo), which people end up mapping the letter V to. However in Bengal and Odisha people map it to the closest sound "" (ভ in Bangla and ଭ in Odia) because there is no equivalent of Hindi "व". Some some people may end up saying "Bhinus" for "Venus" and "Bherification" for "verification". 

The exclusivity of some sounds to a specific language plays an important role when we pronounce names. We can to literal translation of words to our own language but can't do the same for name. One has to pronouce the names as it is given to the person in his language because that is the point of giving the name, it does not have to have a meaning its just to indentify & address people. Some people just say the name the way they write in English but I believe thats not a good idea. They write the name English in a different way because there is usually no sound available in English alphabet  to depict the sounds that contitute the name. And name is spoken first and then written so does not matter how the name is written in English letters , one must pronouce it the way the person himlef does. The sound takes priority. For example if the name is written like "Jose" I can't say it like "Jo-se" just because I am speaking in English, it must not be translated , it has to be keept same as original ("Ho-se") no matter what language you are speaking.

Ambiguity can also asrise becaise same english letter can have two sounds. E.g. if you see the name "Serge", it could be "ser-ge" or "ser-je" , I have also heard "surge".

 Well I can't blame any of them, if you don't have an equivalent sound of a letter from foreign language and not taught how to make that sound, there is no option other than mapping to the closest sound from your mother toung. However some people can actually adapt very quickly and speak in the foreign sound some just can't which gives rise to stereitypes. Hey that also gives a lot of content for humour :)

While we can make fun of each other we have to understand the variations would exists no matter what. It exists when people from europe and south asia as well. India is such a diverse country the variation exists even within the state for a same regional language. The only goal should be to speak such a way that the other person can understand. You could be travelling or seeking a job and there could be very small window where you can express yourself without putting a lot of cognitive load on the other person. To take full advantage of that its never too late to step back and take a look at the words again how it could be spoken in a more neutral way without any accent. Again it does not have to be phony to impress someone or mimic a native speaker from America or Britain, it just have to be standard enough to be understood by the other person without having to repeat.

Here are some videos, book and DVDs from Dr. Ranbir Sinha, give it a try if you think the way you speak needs improvement.

  • Modern Spoken English for Science Students Textbook + Timed Audio CD

    UPDATE: I also forgot to mention that it could be little annoying and difficult to understand (not just accent, but also if your colleague is unable to form a simple sentence to convey his thought )  if you are not speaking words the way most fluent speakers speak. It does not just apply to Indian English speakers, but also the French and Germans. Probably the key thing is to take your time, speak less and slowly and make sure the other party understands it. Meetings in engineering jobs can go unproductive if ideas and solutions can not be explained properly. There is another book that Dr Sinha has written I would recommend for scientists and engineers. There are fancier and costlier English speaking courses in the market but this is directly based out of personal and professional industry experience, not only from academics. Give it a try.

    Sunday, January 1, 2017

    New Year Resolution

    I never understood what's the deal with New Year. It's just the first day of the gregorian calendar. Why not other calendars like Lunar (not sure what is the first day ) and regional calendar (starts somewhere in April for the Bangla and Odia calendars) ? Call me an asocial person, in my opinion, financial calendar (April 1st to March 31st) matters the most as a lot of day to day work and planning depends on that. Well, the very fact I am referring date of gregorian calendar dates and month to define other calendars tell that it is something everyone is most familiar with even in the eastern word. However 31st Dec and Jan 1 is still I think not very important as a day except some coupons and offers expire. Anyway, if someone wants to party and takes resolution (and sticks to it) with New Year being the reason, I have absolutely no problem. It also so happens people who are not in touch with their friends and distant family for a while, finds a reason to do it.

    Well, this is turning out to be an abstract post. If this inspires me to make a new post in a neglected personal blog, so be it :) Let that be my new year resolution. Haha.. I don't really believe picked a specific date for starting a habit, I think it should be started as soon as possible. Here is a list I want to do start doing, I think it applies to all.
    • Blog frequently, understand its a public thing so everything personal can't be written (may be private post ? ). I think pointless discussion and small talk kills a lot of time unnecessary time.
    • Use task management tool like etc to make a list instead of cluttering the mind with all the TODO things.
    • When bothered by a lot of short term and long term tasks, start doing it in scum style (you either have action item or blockers at any point of time, no other excuse ) instead of thinking about them.
    • Write more code.
    • Always take notes with pen paper/ Google Notes in meetings so that I don't have interrupt others while speaking with a fear of forgetting my points. It's important for healthy discussion, I try to do my best but I think if a single person becomes impatient, the meeting goes to chaos.
    • Volunteer to send MOMs so that people agree or disagree to something.If people ignore it, it means my understanding is correct.