Saturday, September 24, 2016

Wrong calls and missed calls of Atlanta

I used to get many wrong SMS to my mobile, intended to be sent to Keith. After communicating with those who send them, I realized my number was previously indeed used by someone called Keith. Seems in the US (or at least in GA), they are recycling the mobile number. That means, when I leave, my number will be given to someone else too. I often got missed calls too. After realizing what happened to Keith, his mates stopped messaging or calling my mobile. But worse is yet to be discussed.

There comes the marketing calls to my land line! First they want to sell me security services and auto-insurance. In many cases, they are just bots calling you. Some are in fact scammers. One claimed that they found that my auto-insurance is faulty (I do not even drive!). Another claimed something to do with the US tax authority. They transmit your call to a human, if you answer them patiently. Those humans are far worse than the bots.

The chat goes like "Hello, this is Cathy, how are you doing today?". If you say something like "Hi" or "Hello", the chat proceeds to the next level - immediately to the marketing/sales pitch of the bot. If you say something that cannot be processed by the bot, it will disconnect. Sad part was, the bots called me more frequently than real connections did. :D

One thing I learned was, it is easy to receive calls from these bots. But when you really want to connect to a hospital billing service, it is a night mere. I did this mistake of giving access to two of my debit cards to Piedmont Urgent Care, by WellStreet.

I initially give them my debit card. But later the insurance company sent me a large bill - after paying just 25% or so of the original bill of the hospital. (I learned that the health care in the USA is very expensive. You better not get sick. If you get sick, you better not visit the doctor if at all possible. On top of that, my insurance is a joke - unlike the Swedish one I had in Europe. I should have got a real working insurance). So I asked them to use my HSA savings card to charge me instead of the debit card I previously authorized them to charge, by visiting them again. They accepted my request. What happened next was funny! They charged my debit card despite the request for change well on time and then they also authorized payment for the same amount from the second card (HSA) too, with a pending payment scheduled!

I learned it hard way. Never give access to multiple cards to any vendor. If it is a single card, it has all these security measures in place, mostly. A vendor cannot charge you twice in a single card. The duplicate transaction will usually be failed by the bank. But when you give them access to multiple cards, they of course can charge both your cards, since you are giving away this protection offered by the bank. 

I called the billing service to sort this out, to make sure they are not charging me twice. First, the bot notified me that I must call them Mon - Fri 9 - 5. Then I called them during that time. It starts with this long useless message, "Hi there, Welcome to the WellStreet Urgent Care Services. As our valued customer, your time is very important to us. Listen carefully as most of our options have changed lately. We will be happy to help you.. bla bla bla.. To continue in English, press 1"

Then eventually, once you followed all the shitty things you were expected to do by that moronic phone bot, you get the message that all their customer representatives are busy and cannot attend your call. The bot indicates you to leave your full name, phone number, and details so that some one would get back to you in 24 hours. No one did. Worse, once I somehow managed to get hold of a human after hours of attempt - she disconnected after I mentioned the situation! Probably a honest mistake - she may have mistakenly dropped something, though I suspect that was the case. Eventually, after frustrating days of calling - which I came to realize that there is no customer service exist - I sent them an email to the billing department and the public relations. They immediately cancelled the duplicate payment with an apology.

Finally, email worked better than the phone. Seems they are hiring uneducated and untrained staff for the customer care and billing hotlines. Sometimes these employees are not much better than the bots. Hence coming to my conclusion, here it is easy to get a call from a useless bot. But when you really need to get something done, it is impossible to get hold of a responsible human.

When I was in Sri Lanka, it was easy to assume that the developed countries handle these issues in a better way. When I was in Portugal, it was equally easy to assume that if I am in an English speaking country, the situation would be better. So here I am, in the USA, a developed and English speaking country. Nothing much changed. Bad customer service is everywhere - whether you are in a developed English speaking nation, or a developing alien-language speaking country.

I should of course give credits to awesome customer services offered by many other organizations here - for example, dealing with the GA Power for electricity was always smooth. They have a working web site, and the most helpful social media team ever who is willing to go beyond their duty to give you assistance. So it is all about the teams finally. Not the country actually. Hope for better.

Friday, September 23, 2016

Winning the customers' trust back

A 100$ bonus from Wells Fargo. Any takers?
It was quite noisy here with the Wells Fargo scandal, many asking the CEO to step down. If you are unaware of the news, basically it is a bank in the US, which decided to cheat their customers and stakeholders by faking accounts and credit cards. They reached their business margins at the loss of their customers. In the current world, banks are already earning a lot for just keeping the assets of the customers. Customers trust them. Wells Fargo broke that trust. It needs time to recover. They need a strategy. CEO should of course go, to have overlooked such a mass-scale scam.

On the other hand, I received the above letter. I have lived here in the US for 6 months now, and this is the first time I received a letter from Wells Fargo. I am not sure where they got my contact. But I will assure you - this is not the time to try to acquire more customers. Specially, I am double-suspicious about their 100$ bonus. Too good to be true - from an apparently bogus bank. Pls, first rebuild the trust of your existing customers before trying to get new ones. I am definitely not going to sign up, even if you allure me with 10,000$ bonus!

Query disjoint data bases in parallel and combine, compose, and return the output using Drill

Instead of using MongoDB as a single or a clustered data store, we may partition the data in independent MongoDB instances that are hosted remotely. Then we may use the UNION operator of Drill to join the results accordingly. 

Why do we need to do this? 
i) Because we may already have the data partitioned in different sources.
ii) Due to the domain knowledge, we may do a better job in partitioning the data.
iii) Even in a dumb partitioning, Drill scales and performs well.
iv) There are some interesting research questions, leveraging locality of data to provide better and faster outputs than a clustered or distributed Mongo deployment.

Be warned that Drill has its limitations in data structures that may hurt the performance - for example, nested complex schema such as multi-dimensional arrays. We previously have discussed a work-around for this.
In this post, we will see the simplest example of achieving this.

1. Define the Mongo Storage Plugins
For each of the Mongo Server, define the storage plugin separately in Drill.

Multiple definition of Mongo Storage Plugin, pointing to various Mongo deployments

For example, above mongo3 is defined as below in http://localhost:8047/storage/mongo3

{
  "type": "mongo",
  "connection": "mongodb://184.72.102.246:27017/",
  "enabled": true
}

2. Now query through the query browser:
Querying from the multiple Mongo Deployments and UNION them to the results.



select last_name as id from mongo.employee.empinfo
union all
select first_name as id from mongo2.employee.empinfo
union all
select first_name as id from mongo3.employee.empinfo



Now you may execute this, and get the results. Depending on the nature of the query and partitioning and scale of the data, you may be able to experience performance benefits due to the data partitioning. How do we actually partition the data in each of the MongoDB deployment, with related items co-located in a single partition is a research question, and probably deserves another post.

Monday, September 19, 2016

Drill Integration to Bindaas

Apache Drill has been integrated to Emory BMI's Bindaas Data Server, as a data source provider. The screencast below shows the basic usage of the Drill provider. Please note that the Drill provider is currently experimental and only available in the maven-restructure and maven-restructure-dev branches.

While maven-restructure remains a stable branch following a major restructuring on Bindaas to enhance its usability by the developers, maven-restructure-dev is a branch that is built on top of maven-restructure-dev. Mostly these branches are synced with minor latter developments may only be available at maven-restructure-dev till the merge.

These latter developments will be merged to the master branch eventually. The released versions of Bindaas can be found here.


If your Drill is configured with JPam for authentication, the user of the operating system also functions as the Drill user, as defined in the configurations of your Drill instance.

As Drill driver is based on the JDBC driver, the Drill JDBC url has a similar form. However, user name and password are optional for Drill Provider. If your Drill instance is not configured with JPam, leave the username and password entries blank when you define the data source in the Data Provider Creation step shown in the above screencast.

An example would be,
jdbc:drill:drillbit=localhost:31010 for a Drill configured stand-alone.

P.S: This screencast was captured using gtk-recordmydesktop. It works well on my Ubuntu-16.04. I highly recommend it for your screencasts.

Friday, September 16, 2016

Messing with the data schema to make it work with Drill (without using Drill's additional functions.)

I must warn that this is not practical - you may not have the access or capacity to modify the schema of the data you want to query in the first place. Unless the data bulk was taken as a dump and queried a million times, there is no performance benefit in doing the below attempt. But for research purposes, why not? :)

I had this multi-dimensional array in my data that was impossible to query with Drill due to its complex data schema. Before the readers pointing me out that it is indeed possible to query complex arrays, what I mean is, it is impossible to query with the same performance level, as we need to use flatten function which ruins the performance and output format. On the other hand, knowing the exact indices of arrays is impractical too.

I have this multi-dimensional array that makes it impossible for me to proceed:
"coordinates":[[ [.. , ..] , [.. , ..] , [.. , ..] ]]

Error message was:
Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST

Fragment 0:0

[Error Id: 5cc520ff-9594-4b9b-998d-20bf8569981b on llovizna:31010] (state=,code=0)


I had to transition the above into the below structure to make it work without any Drill functionality such as flatten.

"coordinates" : [ { "x" : .., "y" : .. }, { "x" : .., "y" : .. }, { "x" : .., "y" : .. } ]

0: jdbc:drill:zk=local> create table dfs.tmp.camic as select * from dfs.`/home/pradeeban/programs/apache-drill-1.6.0/head2.json`;
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 0_0       | 1                          |
+-----------+----------------------------+
1 row selected (1,392 seconds)



What I did essentially was to make the multi-dimensional array into a map.

0: jdbc:drill:zk=local> select * from dfs.tmp.camic;
+-----+------+-----------+---------+---------------+-------------+---+---+------------+------+----------+-----------+------------+------------+-------------+
| _id | type | parent_id | randval | creation_date | object_type | x | y | normalized | bbox | geometry | footprint | properties | provenance | submit_date |
+-----+------+-----------+---------+---------------+-------------+---+---+------------+------+----------+-----------+------------+------------+-------------+
| {"$oid":"56a784647b7b51c562"} | Feature | self | 0.3712421875 | 2026-11-16 01:17:13.101 | nucleus | 0.049646965 | 0.435353796 | true | [0.042729646965,0.85353796,0.8105608,0.7145562075] | {"type":"Polygon","coordinates":[{"x":0.04795445442,"y":0.87641187789917},{"x":0.0427805229,"y":0.87187789917}]} | 17.0 | {"scalar_features":[{"ns":"http://u24.bi.rk.eu/v1","nv":[{"name":"Hty","value":242.50489875},{"name":"ty","value":25.0},{"name":"Hty","value":-12.11},{"name":"ee","value":2.11}]}]} | {"image":{"case_id":"TC-2-00-01-01-T2","subject_id":"TC-02-000"},"analysis":{"execution_id":"ta-test","study_id":"tdma:::tue-jan-6-19:17:13-est-2011","source":"computer","computation":"segmentation"},"data_loader":"1.3"} | 2016-01-16 01:17:13.102 |
+-----+------+-----------+---------+---------------+-------------+---+---+------------+------+----------+-----------+------------+------------+-------------+
1 row selected (1,31 seconds)

Now, this works. :P

Thursday, September 15, 2016

Apache Drill and the lack of support for nested arrays

Apache Drill is very efficient and fast, till you try to use it with huge chunk of one file (such as a few GB) or if you attempt to query a complex data structure with nested data. Now, this is what I am trying to do right now - attempting to query large segments of data with a dynamic structure and nested schema.
 
I may construct a parquet data source from a nested array, as below,  
 
create table dfs.tmp.camic as ( select camic.geometry.coordinates[0][0] as geo_coordinates from dfs.`/home/pradeeban/programs/apache-drill-1.6.0/camic.json` camic);
 
Here I am giving the indices of the array. 
 
Then I can query the data efficiently. For example,  
select * from dfs.tmp.camic;
 
However, giving the indices won't work as I need, as I don't just need the first element. Rather I need the entire elements - in a large and dynamic array, representing the coordinates of geojson.
 
 
$ create table dfs.tmp.camic as ( select camic.geometry.coordinates[0] as geo_coordinates from dfs.`/home/pradeeban/programs/apache-drill-1.6.0/camic.json` camic);
Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST

Fragment 0:0

[Error Id: a6d68a6c-50ea-437b-b1db-f1c8ace0e11d on llovizna:31010]

  (java.lang.UnsupportedOperationException) Unsupported type LIST
    org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():225
    org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():187
    org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():172
    org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():155
    org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():103
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
    org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.physical.impl.BaseRootExec.next():104
    org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
    org.apache.drill.exec.physical.impl.BaseRootExec.next():94
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1657
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():251
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
    java.lang.Thread.run():744 (state=,code=0)
 
 
Here, I am trying to query a multi-dimensional array, which is not straight-forward.

(I set the error messages to be verbose using  SET `exec.errors.verbose` = true;
 above).
 
The commonly suggested options to query multi-dimensional arrays are:

1. Using the array indexes in the select query: This is impractical. I do not know how many elements I would have in this geojson - the coordinates. It may be millions or as low as 3.
2. Flatten keyword: I am using Drill on top of Mongo - and finding an interesting case where Drill outperforms certain queries in a distributed execution than just using Mongo. Using Flatten basically kills all the performance benefits I have with Drill otherwise. Flatten is just plain expensive operation for the scale of my data (around 48 GB. But I can split them into a few GB each).
 
This is a known limitation of Drill. However, this significantly reduces its usability, as the proposed workarounds are either impractical or inefficient.

Wednesday, September 14, 2016

Moments with Llovizna: Random Thoughts of a Gypsy Student

I started my blog in 2009 mostly as an internship diary and then continued to blog about my Google Summer of Code projects, final year project, and the relevant findings on programming and software in general. I blogged about AbiWord almost 50 times, and never thought I would blog about something else before moving to Lisboa for my masters, EMDC. I enjoy presenting my PhD (EMJD-DC) work at conferences. Due to the mandatory mobility of my master and PhD, I travelled and migrated across the countries and continents, which made Llovizna a travel blog too.


I like to stay awake during the take off and landing, as I usually get to sit in the middle seat. However, for some unknown reason, I mostly fall asleep just before taking off and wake up immediately when the flight is almost in the cruising altitude. I like watching movies in the long flights. They come with subtitles, so that I can understand all foreign language movies - mostly I watch Asian movies. The flights that are shorter than 7 hours mostly do not have movies. The airlines in the US, specially the domestic ones, are terrible, even if they travel as long as 5 hours. I remember a flight from Atlanta to San Francisco of 5.5 hours in United Airways - not offering meals, citing that this is the norm in domestic flights.

View from the empire state building
When I do not watch movies, sleep, or have meals, I just look outside the window and see the big world outside. I often do not sleep the night before sleep, as I get busy with packing, and also because I fear not being able to wake up on time for the flight. This makes me have the typical zombie walk - a walk without realizing or sensing the environment, more like a zombie.

I enjoy conferences - meeting, listening to, and share ideas with fellow researchers. I enjoy travels. However, I must admit that it is a hard situation where you are in a new and beautiful city (which most conference venues are), but have to focus on the conference and networking instead. Recently, I have somehow mastered the balance of it, winning over the jet lag and the tiredness related to the long flights. Perks of being a gypsy student. In addition to the conference related travels, I do enjoy some small trips when I have time and money at the same time, as the recent trip to NYC during a long weekend.

We enjoyed cooking in Atlanta. It has fresh food in decent prices. YDFM was our favourite. As the time to return to Portugal comes, I am foreseeing yet another intercontinental flight of the year. Mostly I end up packing the last moment, throwing away things at the eleventh hour or packing/storing things in bulk. One exception was my move back from Rijeka to Lisboa, where I managed to finish all the food items I bought - including oils, rice, and vegetables. Since 2012, I have this weird migration pattern of Sri Lanka -> Portugal -> Sri Lanka -> Sweden -> Portugal -> Croatia -> Portugal -> USA -> Portugal -> Belgium (expected) -> Portugal (expected). This summarizes my gypsy life so far, and in the near future.

I still remember my first walk from my apartment to my lab through the beautiful lanes of Atlanta, with a map. There was a guy in front of me walking. I was under the impression that he was going to the University as well. I was wrong - he turned to the opposite direction. Luckily I was not blindly following him (which I never do actually. :) ) The summer in Atlanta is much longer. When we return to Lisboa in two weeks, it would be autumn there already, with getting colder and wet. Hope it was still warm enough for some walks at the Parque das Nações.

Saturday, September 10, 2016

A Dynamic Data Warehousing Platform for Creating and Accessing Biomedical Data Lakes.

This week we had our paper titled "A Dynamic Data Warehousing Platform for Creating and Accessing Biomedical Data Lakes." presented in Second International Workshop on Data Management and Analytics for Medicine and Healthcare (DMAH'16), co-located with 42 nd International Conference on Very Large Data Bases (VLDB 2016). Sep. 2016.

Abstract: Medical research use cases are population centric, unlike the clinical use cases which are patient or individual centric. Hence the research use cases require accessing medical archives and data source repositories of heterogeneous nature. Traditionally, in order to query data from these data sources, users manually access and download parts or whole of the data sources. The existing solutions tend to focus on a specific data format or storage, which prevents using them for a more generic research scenario with heterogeneous data sources where the user may not have the knowledge of the schema of the data a priori.

In this paper, we propose and discuss the design, implementation, and evaluation of Data Café, a scalable distributed architecture that aims to address the shortcomings in the existing approaches. Data Café lets the resource providers create biomedical data lakes from various data sources, and lets the research data users consume the data lakes efficiently and quickly without having a priori knowledge of the data schema.

Thursday, September 8, 2016

Finding your student apartment in Lisbon and Porto..

Originally written in 2014, this post was later updated recently.

It is not always an easy task to move to a country, which doesn't speak a common language as you do. Moreover, the visa requirements make it a time consuming process. Following my blog post on how to apply for a visa to study in Portugal (mostly focusing on Sri Lankan students; but also applicable to many other third countries), I started to receive questions from students seeking information on apartments and studies in Lisbon. There are a few issues that you may have to face. This post tries to discuss those concerns.

1. Universities provide a list of landlords to the students. But in my observation, these apartments tend to get booked faster and are also more expensive than the other available options. You are left with no choice other than to follow this list, as it is safer to reserve a room that your university recommends, than a random one as you will have to send one month of reservation/deposit of the room to the landlord. You will also have to be quick for the visa processes and also to make sure that you are not running out of the good and affordable apartments.

2. Not all the landlords will be able to understand English, and communicating with them regarding the visa requirements such as accommodation letters may be hard.

3. You are reserving a room without seeing it yourself, unless you are already there in the city. Mostly, descriptions by words are available, which are written by the landlords themselves.

4. It is very hard to send money to a third person by bank-transfer from Sri Lanka, due to the local regulations. This is done by the banks, case-by-case. It took me a considerable effort to send the reservation fee to the landlord. A transfer by paypal would be more convenient, though not many landlords would accept that. This issue may be specific to Sri Lankans. 

UniPlaces provides solution for these issues. UniPlaces.com is a third-party web site that provides accommodation options to students away from home. It lets the landlords list their apartments for free.

1. UniPlaces is partnered with the major universities in Lisbon, including the University of Lisbon (ULisboa). This makes UniPlaces a trusted web site for the students. It also includes a large number of options to choose from.

2. Having a proactive support service very fluent in English (and Portuguese, of course), it makes it easier to communicate via their support system (chat) or phone. If you need some assistance you may drop your number and information for UniPlaces to contact you.

3. UniPlaces provides neutral descriptions on the apartments, with photos. Searching for your apartment with specific requirements such as price, room type, and other features is very promising. As you may not be able to see your room before arriving at Lisbon, having a verification from a third-party makes your journey to Lisbon stress-free.

4. UniPlaces provides secured payment option via PayPal. Nevertheless, for those who have trouble with PayPal, UniPlaces still offers an option to pay through bank transfer.

5. As a start-up consisting of an international team of geeks, fresh graduates and students, UniPlaces has managed to have the student-feel in their listings. Learn more about the recommended areas in Lisbon, from UniPlaces.

Apart from Lisbon, UniPlaces lets you find apartments in London currently.


Later update on the 8th of September, 2016
It is expected that as a company grows, its approach changes considerably. Recently I was able to book an apartment through UniPlaces. The landlord accepted the booking, and hence the full charge of the first month rent as well as the service fee was reduced from my account.

However, after 10 days, the landlord cancelled the booking through UniPlaces without even letting me know for as a matter of courtesy and professionalism. It is funny that after accepting my payment the landlord also claimed one of the room mates has a dog, and asked whether I am fine with that. I mentioned him I am fine with the dog. It is contradictory with his UniPlaces posting where it indicates "No Pets Allowed"
 

Also after cancelling my booking for no reason, he immediately made the same apartment open for the same dates! See (October 1st, 2016 listed as available - the same dates I booked before).

I suspect there is a strong scam going on here with this landlord. Probably he is collecting the tenant details. Probably he is plain crazy and nasty.

But what saddens me is the UniPlaces' don't care policy. I warned them with all the details on this, to no proper response. The landlord now will go on to scam other students, wasting their time, and making their money locked between the bank transactions. I am awaiting the money to be refunded to my account. UniPlaces claimed that it has been refunded already - so I guess I will get it back again.

While I will still use UniPlaces for booking as I know the team and they are legit, I must warn them to be more caring regarding students. I mean, if as a student I cancel my booking, my service fees is gone and in most cases I won't get the security deposit (the first month rent paid to the landlord through UniPlaces) back. But on the other hand, the landlord can get away with his tactics by accepting the student/tenant at will, and rejecting later at will - with no reason whatsoever.

Users are the major pillar of any business. I hope UniPlaces will learn to respect the user reviews given to them in a private manner more positively, hence not requiring me to update this blog post.

For students, if you are searching for apartments, feel free to use UniPlaces.com. However, be aware of the scam artists as landlords. On the other hand, be warned that the prices are increased (which is reasonable for multiple reasons: UniPlaces needs to be paid for their services and employees. Second, there is an 8% flat rate per monthly rent charged on the landlords, which the landlord in turn collects back from the student by increasing the price. In addition there is a 25% charge for the first month as well). In addition some landlords tend to increase the price arbitrarily further more - as an unsuspecting student may pay more - why not. If you compare the price, I was able to find an almost identical studio for 450 Euro in Alameda listed as 675 Euro.

I wish the services be more transparent and caring of their all customers, than focusing on a specific subset (in this case, only the landlords, as the landlords tend to be the permanent customers with students being the one-time tenant customers).

In case, if you are looking for an apartment in Lisboa, NEVER find this landlord, who hides the existence of a dog (going to an extent to say "Pets are not allowed" in the listings - double standards!) and informs later after paying, and also cancels the booking once you have paid! Most probably his descriptions are fake too.
I wish good luck and success to my colleagues at UniPlaces despite these issues. Hope they will filter out these scam landlords sooner, and be more caring towards their student customers too, as they used to be - in the good old days.