Plan for Q3 (Personal PhD update #3.1)

Two quarters down! Here is the plan for the next quarter (Jan-Feb-Mar 2016).
I have spent last few weeks exploring and honing in on curriculum for the next quarter. Here it is.

Computer Science / Math track

As I was going through the Coursera Machine Learning course and learning about Speech Recognition and reading papers I see that one gap in my knowledge is statistics. For example terms like this: Hidden Markov Models, posterior probabilities, Bayesian statistics, cross entropy, Gaussian, negentropy, doubly stochastic process, Cauchy density. I have heard of simpler things like p-values or confidence intervals, but I wouldn’t know how to produce them if I needed.

Therefore I decided that for the CS track I will focus on statistics and probability for the next quarter. (Beauty of making up my own curriculum and being able to adjust real time to the needs I have at work.)

I looked at few stats courses and I have settled on Udacity’s: Intro to Descriptive Stats (https://www.udacity.com/courses/ud827) & Intro to Inferential Stats (https://www.udacity.com/courses/ud201).

Udacity has a mobile app so I can watch videos or do quizzes while on the bus for example, which does add to the convenience. Also, Udacity recently came out with Machine Learning Engineer Nanodegree which looks very interesting, it includes a very new course on Deep Learning in collaboration with Google. (https://www.udacity.com/course/deep-learning–ud730)

Another course that I explored as it was very highly recommended from a colleague was MIT’s Introduction to Probability – The Science of Uncertainty. (https://courses.edx.org/courses/MITx/6.041x_1/1T2015/info) From what I can tell from week 1 and the syllabus it goes much deeper than the Udacity’s intro stats courses.

 

To throw in some fun reading about statistics I plan to

* finish reading The Signal and the Noise: Why So Many Predictions Fail — But Some Don’t  by Nate Silver (amazon)

* read Supreforecasting: The Art and Science of Prediction by Tetlock and Gardner (amazon)

I commit to finishing Intro to Descriptive stats course for this quarter and will see how far I will get with the rest.

Linguistics track

For linguistics track I will study and refresh my Spanish for a month (30 days). Its kind of cheating as it is not really linguistics 🙂 but two reasons 1) I am planning to go for vacation in Spanish speaking country and 2) it is actually helpful for my job to know foreign languages. The other day for example Russian knowledge came in handy while looking at experiment results.

Paper Reading

Continue reading the draft of Deep Learning book (http://www.deeplearningbook.org/). I finished last quarter with Chapter 9, there are total 20 chapters, so I plan to finish it. 

Writing

Still the same old – strive to write and publish an update every two weeks. Hopefully one extra post on how I finally figure out a system for myself to track the scientific papers that I read and want to read next.

Second Quarter Summary [Personal PhD Update #2.5]

 The second quarter of my Personal PhD has concluded and its time to look at what I have accomplished.
Highlight: I finished the Machine Learning Coursera course by Andrew Ng on time on the last day of the quarter!
 Lowlight: I wasn’t consistent with studying regularly and there was a period of 3 weeks where I didn’t do anything for the ML course.
Below follows how I did against my plan for the second quarter.

Computer Science track

DONE. Even though the book was published in 2007 and by tech book standards that’s like ancient, I still think its the best intro to Machine Learning for people who have absolutely no idea about Machine Learning. I am just saying this because I was ready to write this book off before even looking at it, but then by  my colleague’s recommendation I still read it. The book helped me to see how Machine Learning fits among other analysis methods. For example I have seen k-means clustering before being applied to bioinformatics data, but nobody called it a Machine Learning method. Turns out the same k-means clustering method used for prediction in a different community is now called a Machine Learning method.
I highly recommend reading the book before or slightly ahead of the Coursera ML course. Get general intuition from the book and then get mathematical intuition in the Coursera course.
Score: 1.0 out of 1.0.
2. Do rest of the Machine Learning course by Andrew Ng on Coursera (Weeks 6-11)
DONE. With Thanksgiving and Christmas holidays I totally procrastinated on the Coursera’s course. On weekends and holidays I am usually totally disconnected, so unfortunately it is not an opportunity to spend more time on Personal PhD. For Thanksgiving I went on an epic roadtrip from San Francisco to Mohave to Grand Canyon to Sedona to Phoenix to Joshua Tree National Park back to San Francisco. For Christmas I went to Latvia and then to UK.
I got back from Christmas travels on Sunday night the December 27th. I had 4 days left in the second quarter of my Personal PhD and I had weeks 9, 10, 11 of the course materials to finish.
So I spent 1.5 hours Monday morning before work, 2 hours after work, 1.5h Tuesday morning before work, 1h in the evening after work, again on Wed morning and evening. (I was getting up at 6am in the morning). Thursday I had off and it was the final push! Right before noon I finally submitted the final quiz! Then a screen appeared that said congratulations on finishing the course with 96.5% grade on quizzes and programming assignments. I am glad that the final weeks of the Coursera course didn’t have programming assignments. Those are more time consuming and I may not have finished in time.
So yeah, that’s what I get for procrastinating 🙂  but I am glad I still finished on time (by my own imposed deadline). After that I could celebrate New Years Eve with no remorse.
I hope I won’t procrastinate like this again. When getting back to it after a prolonged break there definitely was some time wasted to remember again the previous weeks materials as the material builds on each week.
Score: 1.0 out of 1.0.

Linguistics track

For my linguistics track for the second quarter I chose to read a book called Watching The English (the perks of setting my own curriculum 🙂 ). I have seen before when studying French that culture and language is intertwined, so I decided to read this book on culture and language of the English. It is a pretty long and dense book, but I really enjoyed it. Especially since the end of the year I spent few days in UK and was able confirm some observations.
Maybe someday I will write up about my takeaways from the book, but its not gonna be today to get this update out at somewhat reasonable time scale 🙂
Score: 1.0 out of 1.0.

Paper Reading

4. Read 10 more scientific papers on Speech Recognition.
 I didn’t really end up reading any new papers. However I am reading Ian Goodfellow’s draft book on Deep Learning  http://www.deeplearningbook.org/ and discussing it with my colleagues of how it relates to the work we do. I think each chapter of that book counts as a paper as each chapter is long and dives into existing research. I have read through 9 chapters with is 352 pages.
I am glad I did majority of the Coursera ML course before reading this book, it was a really good background for this reading.
Score: 0.9 out of 1.0

 Writing

5. Write an update every other week.
Yeah, so I haven’t been doing well in this department. I wrote 5 updates in 12 weeks, which is the same as previous quarter. But doesn’t quite hit the target of writing every two weeks. I think this is the hardest part of it all.
Score: 0.83 out of 1.0

Summary

Overall score for the second quarter: 0.95 (compared to previous  quarter of 0.90) and I am happy that I managed to finish the Coursera course.

I will take some time to finalize the curriculum for the third quarter.

P.S. The formatting of this post is whacky, but I rather get it out imperfect than not do it at all.

Hard work pays off [Personal Phd update #2.4]

At work a teammate started a reading group to read book Deep Learning by Ian Goodfellow (https://goodfeli.github.io/dlbook/). Even though it is ahead of my curriculum I can’t miss the chance to read the book and discuss it with my coworkers. We are now at Chapter 8.

This book is definitely harder than what I have been studying so far, but there is one thing I am very glad for. The progress I have made with the Coursera’s ML course is directly relevant in better understanding the deep learning material. For example in the Coursera course I learned really good basics on regularization and now reading about fancy methods of regularization in the Deep Learning book it is easier to understand what’s what.

Other than that, I am behind the schedule on other things, but I still have 2 weeks left in this quarter 🙂 Now I am off for a week of holiday visiting family.

Happy Holidays!

Personal PhD update #2.3

So I am totally behind on writing these updates, the last one was almost 3  weeks ago. But hey, better later than never.

 

The Coursera course on Machine Learning is also a bit behind. Right now I am almost done with week 7. In order to finish the rest of the course by end of December I would have to go at a pace of one week of lectures in a week’s time as opposed to twice as slow. Doable, but holidays are not exactly helping as I tend to be offline during vacations.

 

Take for example Thanksgiving! We managed to do an epic road-trip from San Francisco to Grand Canyon to Phoenix to Joshua Tree National Park and then back to San Francisco.

 

On the flip-side, I got a lot of reading done. I finished reading the Programming Collective Intelligence book (as part of Computer Science curriculum) as well as Watching the English: The Hidden Rules of English Behavior book (vaguely part of Linguistics curriculum, but hey, it’s my curriculum!).

 

Fun stuff
Are these names pokemon names or big data cmpanies? https://pixelastic.github.io/pokemonorbigdata/  (Thanks Matt for the link!)

 

ML in the news
Google open sourced Tensorflow: a library for Machine Learning. https://www.tensorflow.org/
Here is Wired article about it.

 

That’s all for today!

Personal Phd update #2.2

Here is another update on my Personal PhD progress. This time short and sweet 🙂

 

My progress
I have finished week 6 of the Coursera’s Machine Learning course by Andrew Ng. This section of the course was a little easier than week 5. Week 5 probably was the hardest week of all (had to implement a backpropagation algorithm). Week 6 talked about training error, cross-validation and test-errors and the usefulness of plotting them. Now I really understand how to interpret learning curve graphs I have seen at work! So another proof that taking this course is helpful for my work.
Also read few more chapters  (6 and 7) of the Programming Collective Intelligence book.

 

Machine Learning in the news:

Plan for Q2 (Personal Phd update #2.1)

A little on the later side but here is my plan for the second quarter of my Personal Phd (Oct-Nov-Dec 2015).
Computer Science track:
1. Finish reading Programming Collective Intelligence book (Chapters 6 to 12)
   – why? First 5 chapters have been great with putting Machine Learning in the context of other ways of solving problems, I am looking forward to the rest.
2. Do rest of the Machine Learning course by Andrew Ng on Coursera (Weeks 6-11)
   – why? First 5 weeks have been super useful and has instilled good basics including the math, so I will continue with it. I also like the quiz format and programming exercises as it forces me to pay attention.
(I am pausing on NYU Automatic Speech Recognition course as it is goes over my head right now).
Linguistics track:
3. Previous quarter I focused on brushing up on German. This time I actually want to focus on English as opposed to yet another foreign language. So item for this quarter is to read book Watching the English: The Hidden Rules of English Behavior.
   – why?  Here are the reasons: 1. At least in my personal experience so much of language understanding comes from understanding the culture. 2. My boyfriend is English 🙂 3. The topic sounds fascinating and this is my curriculum so I can make it whatever I would like it to be 🙂  yay, the benefits of a “personal” learning path 🙂
Paper Reading
4. Read 10 more scientific papers on Speech Recognition.
Writing:
5. Write an update every other week.
This quarter is going to be tight with starting quite late and having Thanksgiving and Christmas holidays, but will see what I can do 🙂 (during holidays I may get more reading done while on planes, etc., but not any studying).

Reflections on first quarter and how did I do? (Personal PhD update #1.6)

The first quarter of Personal PhD is done! I finished right around October 1st, but took a while to put this update together.
This update contains my summary and thoughts on how the inaugural quarter of my Personal PhD went.
Observations
Overall, I am glad I started this Personal PhD project! The first quarter was interesting and rewarding experience.
I think I learned more than if I hadn’t structured it. Having a structure and goals means I had to keep doing things and you all are watching me 🙂
There were unexpected side benefits from sharing this project with other people. Turns out other people are interested in doing something like this as well! I love hearing from others that they are interested in self-education as well.
I now keep noticing how much machine learning there is around us: to select fashion models, to select resumes, price prediction, etc.
Doing this has definitely helped me in my job. I know understand more of the lunch conversations my coworkers have, which is fun!
All that Computer Science theory that I learned in college that I thought would be useless in the real world — not true. Here I am, wrangling weighted finite state transducers (extension of finite state automata) at work for speech recognition.
Slight adjustments to my Personal PhD plan
I am big fan of launch and iterate, so here is an adjustment to the original plan of how to structure the Personal PhD time-wise.
Still keep quarters (3 month long periods), but spend the first two weeks of the quarter devising a curriculum, aim to finish coursework two weeks before end of quarter (also so that it doesn’t coincide with actual work quarter), then two weeks to finish any loose ends and to write up a reflection on the quarter.
 
Goals and what actually got done
Here is a summary of my original plan and how I actually did.
Computer science track:
  • Goal: Do half (5.5 weeks) of Andrew’s Ng Machine Learning course on Coursera (https://www.coursera.org/learn/machine-learning), watch videos and do quizes and homework programming assignments (automatically graded).
    • What actually got done: I did 5 weeks of Andrew Ng Machine Learning course. I watched all videos and did all quizzes and exercises which I think was worthwhile to do. Going through the hassle of implementing backpropagration algorithm was useful. I consider this done as 5 weeks is close enough to 5.5 and stopping at middle of the week is quite awkward.
  • Goal: Go through all of Automatic Speech Recognition class at NYC (http://www.cs.nyu.edu/~eugenew/asr13/)
    • What actually got done: I went through only half of NYU ASR class videos.
  • What else got done:
    • I read 1-5 chapters of book Programming Collective Intelligence. This book is an older one but very useful in putting ML in context of other ways of solving problems.
    • I also read through Neural Networks tutorial: http://karpathy.github.io/neuralnets/
  • Score: 0.75 (out of 1)

Linguistics track:

  • Goal: Brush up on my German using similar method as I did for Spanish – study for 30 days German itself + learn more about the linguistics aspect of it.
    • Done!  I did study German for 30+ days and I did go to Germany and was able to use it to get around, buy train tickets, order food, have conversation at a store about what cell phone prepaid plan I would like 🙂 Overall I declare it a success. 🙂 I did study German in high school for 3 years, but hadn’t really used it since. This was more like a review. Yes, foreign language skills get dusty very quickly, but now I see that its possible to revive them in a month long time-frame.
  • I also learned about linguistic concepts such as homographs, homonyms and words pronounced letter by letter.
  • Score: 1.0 (out of 1)

Reading:

  • Goal: Come up with a system for myself to keep track of papers I read and start reading papers.
    • End result:  I tried out Mendeley – that didn’t work as it is a separate app and not at all integrated in my workflow and the tools I use. Also, it’s search wasn’t as good as Google Scholar search. So I ended up having Google Scholar search + labels to keep track of papers to read; I also have spreadsheet to keep track of papers I have read and for each paper: notes, things to learn, cited papers to potentially read later.
    • I read 11 papers on topics related to Speech Recognition. Most of them were hard and I didn’t understood most of it, but some were reasonable.
  • Score: 1.0

Writing:

  • Goal: send updates on my progress via email every other week.
    • I wrote 5 updates in 12 weeks, so that comes out to almost every other week.
  • Oh, I also finally migrated my blog from blogger to wordpress 🙂
  • Score: 0.83
Overall score for the Q1: 0.9 which I am quite happy about 🙂
Now onto the second quarter! 🙂

Whiiii, the first quarter is done!! (Personal PhD update #1.5)

Whiiii, the first quarter is done!!
I didn’t do everything as I had planned, but nonetheless, I am very happy with my progress!
I managed 5 weeks of Machine Learning course on Coursera (vs 5.5 weeks planned). So I will call this as a success!
For the Automatic Speech Recognition videos I watched first 6 lectures out of 12 vs I had planned to do all. Got half done, but still more than nothing! Next time will know better that 1.5 courses per quarter is more than I can do.
I will write a more detailed review of my quarter soon, but wanted to get this out first 🙂
Next up: I will take two weeks to plan out syllabus for the next quarter. (Such a good feeling to have the freedom to make it up myself to make it the most relevant to what’s going on at work right now!)
Machine Learning in the news:
Learning to learn, or the advent of augmented data scientists – no machine learning will not replace data scientists 🙂
New book:

A new popular science book on machine learning has just come out (on Sept 22nd), looks interesting and I plan to read it: The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World (amazon)

Chugging along (Personal PhD update #1.4)

Here is an update on my progress via an infographic.

Machine learning resources:
I have discovered more Machine Learning resources:
Other progress:
I am also reading book called Collective Intelligence. And although it is an older book it has a very intuitive intro into machine learning and it explains where it fits among all other methods of processing data and shows why is machine learning useful (complete with practical examples).
In contrast the Coursera course on Machine Learning starts off with much more math. In the end, both resources are very useful to learn from, but if I knew this from the beginning I would have read the Collective Intelligence book before starting the Coursera course.
Speech recognition though is still hard. I don’t understand most of the lectures. But that’s ok, even understanding just 10% is more helpful to doing my job better and understanding better what other people in my bigger team are doing.
I have learned a little more about linguistics. Ie what are heterographs and heteronyms.

 

 
Machine learning in the news
Fascinating, machine learning is used in all sorts of places: A machine learning algorithm picks out the fashion models most likely to succeed.

Personal PhD update #1.3

A few weeks ago I was in Germany. I arrived at the Frankfurt train station. I was in no rush so when I saw a big bookstore I went in. I love hanging out in bookstores. So it being a bookstore where most of the books would be in German wasn’t a deterrent. One book piqued my interest. It was titled “The devil lies in the detail”. I flipped through it, saw this chart:

So I bought the book. (Seems I can’t come back from an international trip without a book or two.) It’s a German book in German about English language. My German is not that good yet so I have to read it with dictionary but the first chapter is already hilarious. The author recounts a scene he overheard. A German lady in England asked at an ice-cream stand: “Can I please have two ice balls”.  To which the seller replied: “My ice balls are not for sale, Ma’am!” If she had said “two scoops of ice-cream” it wouldn’t be such noteworthy conversation.

Anyways, this is an example of how I follow my curiosity. One thing will lead to another, and I am reading a book in German.

It is the same with this Personal PhD project. I don’t know where it is going to get me in 5 years, but I am convinced its gonna be some place awesome.

Already it has taken me places I didn’t plan for in the beginning. For example I  explored how to visualize my progress on the curriculum (thanks Dominique for the suggestion!)

See here:

https://infogr.am/personal_phd_progress_q1

I picked infogr.am as its a startup I have heard a lot about (it is a Latvian startup). The result is best I could do with free version in half an hour. 🙂

Oh, and by the way. That brushing up on my German was very useful. While I was on vacation in Germany I was able to buy train tickets, order food, check in the hotel. Even though the total studying time didn’t add up to much, hearing it a little every day helped a lot. If anything I was able to keep the conversation going in German however broken as opposed the other person switching to English right away. So I am done with my 30 days of learning German but I will extend it to read the new book.
Another installment of machine learning in the news. AirBnB is doing machine learning to predict at what price a place is going to rent out. http://www.forbes.com/sites/ellenhuet/2015/06/05/how-airbnb-uses-big-data-and-machine-learning-to-guide-hosts-to-the-perfect-price/