How was the sprint format? [Personal PhD update #5.2]

I have finally finished my first sprint of the Personal PhD. I finished it later than my target of Dec 25th, but hey, at least I finished what I set out to do. (Here is my first sprint announcement post).

What got done

I finished watching all videos and in-course exercises of Udacity’s Introduction to Inferential Statistics. I am glad I did! After having never taken statistics in school this was very useful. Now I understand what p-values mean and how to utilize t-tests, z-score, confidence intervals, hypothesis testing, ANOVA, r^2 etc.

I read the book The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy by Sharon Bertsch McGrayne and it was a fascinating read. I now have a more intuitive understanding of Bayes rule and it was fascinating to read about the history of how it was applied.

I also listened to all episodes of Talking Machines podcast. I didn’t necessarily understand everything, but it was still interesting (http://www.thetalkingmachines.com/).

The format of the sprint

I like this 6 week length sprint better than 3 month length quarter. It is shorter so there is less chance for life to get in the way. The deadline is sooner so if life does get in the way, I have to summon the energy to finish stuff sooner. As a result less debt accumulates, and because of that it is easier to finish and get to the end.

On note-taking

Throughout school and college I have always taken notes with pen and paper. So far in my Personal PhD I haven’t. I have taken few notes in a Google doc, but it is not nearly as comprehensive as the notes I used to take.

I am starting to think that not taking notes or taking way less notes digitally is not as productive for learning as taking more physical notes, at least for me.

I have known for a while that I am visual learner.  I remember in school when we had to learn poems by heart and then recite them, I would write them down on a piece of paper and then take the paper with me everywhere and learn the poem from it. When I had to recite, in my mind, in front of me there would be a visual picture of the piece of paper with the poem and I would basically read it out from the picture in my mind.

If you give me verbal directions, left, right, straight, left at the light, there is no chance I will remember them. But a quick glance at the map with directions and I will be able to find my way.

The video presentations online MOOC like Udacity’s work well for me (way better than just listening to a lecture), but I think I am missing on the note taking part. If I write down a formula by hand in my notes I get to see it again and again as I continue taking notes and referring back to it, thus engraving a visual picture in my mind. Also, without physical notetaking I don’t have a summary cheat-sheet of the main concepts. So I think for next courses I will try to do physical notetaking with a pen and paper.

How much time did I spend?

Dominique once asked me how much time do I spend on this project. For this sprint I kept track. Studying on Udacity took 16.7 hours, by far the vast majority was in the last week between Christmas and New Year’s, when I was doing about an hour a day to finish it before New Year’s.

There are 41 episodes in the podcast and total length almost 27 hours. That seems quite a lot, but it didn’t felt I was spending extra time on them. I found the time while doing other activities or while waiting: I listed while running, working out, walking, waiting, on airplane, at dentist’s etc.

Don’t know how long it took to read the book, but it felt like quite a long read 🙂

 

How I picked which online Statistics course to study [Personal PhD update #5.2]

The prep week of the first Personal PhD sprint is done. I have chosen my curriculum. I have received few questions on how I pick courses among sea of so many options, so I thought I would share my process in this case.

In general my philosophy is, if there are many different course options, it doesn’t really matter which specific course you pick to go through content wise. From any of them, you will learn more than if you are being paralyzed from too many choices and don’t make a decision at all and don’t study. This is one of the reasons I have an explicit prep time built into my structure so that 1) I give myself time to play around with different options 2) have a deadline by which I have to pick a material. I feel that gives a balance of having somewhat informed decision, but not get bogged down in endless research.

Previously I took Udacity’s Intro to Descriptive Statistics course, now it’s time to learn about Inferential Statistics.

I did some brief Googling on online statistics courses, incorporated coworker recommendations, and looked at next course in sequence for Udacity and picked three options to check out in detail. They were the following:

  1. edX Introduction to Statistics by University of California Berkeley.
  2. Probability and Statistics on Stanford Online platform.
  3. Udacity’s Intro to Inferential Statistics course.

I spent roughly an hour on each and went through the first few lessons.

For me personally, what ended up being the main criteria in this case for picking one course over another was how mobile-friendly it was.

I spend a lot of time with my butt in a seat, be it plane, train or shuttle. I want to use that time productively, and one the ways how I spend it is studying. So bite-sized lessons being available on the phone is important.

If you always study at certain time at a desk with a computer then this totally doesn’t apply to you.

Here are my thoughts on each course.

EdX Berkeley course:

  • Consists mostly of video lessons with the professor speaking quite slowly. On the desktop it is possible to increase the speed of the video, but not on the mobile site.
  • The video screen is small by default on mobile and on each video needs to be expanded bigger; in the expanded mode there is no way to advance to the next video. So unzoom, advance, zoom.. too clumsy for a mobile interface.
  • There is additional reading and exercises on a different webpage that is not really integrated with the course. It is not always clear which exact section need to be read and in which order compared to the videos. Too many micro-decisions cause decision fatigue and therefore there is less willpower left for the actual studying. Also it is very clumsy on a smartphone to jump back and forth between different webpages.

Stanford Online Platform also uses EdX to power its courses, however their Statistics and Probability course is actually very heavy on text materials, not just video so it wasn’t that painful to not be able to speed up the occasional video on the phone. It had an excellent introduction on meta-cognition to be a more effective learner in MOOC setting, which I thought was a very nice touch.

Udacity has a dedicated mobile app which is quite good. It is all video based, but it is so much more than a professor lecturing with slides, that it is so busy with graphs and pointers and stuff happening on the screen with quite a fast speaker, that there is no need to speed up the video.

 

So for me it really boils down to ease of use on the go on a smartphone.  The EdX Berkeley one is out, but it really is a toss-up between the Stanford and Udacity courses.

In the end I picked the Udacity’s Intro to Inferential Statistics as I had already done its Intro to Descriptive statistics. Perhaps some day as a review and to cement in the basics even more, I will come back to the Stanford’s course.

To round-up theoretical material with some lighter material for this sprint I also plan to read the book The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy by Sharon Bertsch McGrayne and listen to podcast http://www.thetalkingmachines.com/ about Machine Learning. The book has been on my list to read from beginning of year so I will start with that first, but I have received some other great recommendations, so keep them coming!

Kicking off the first sprint [Personal PhD update #5.1]

In my previous post I explained why I am changing the format of my Personal Phd from three month long quarters to six week long sprints.

Today I am starting the second year of the project with its first sprint. Starts today (November 7, 2016) and will last till December 25th, 2016. That is actually seven weeks instead of six weeks due to one week of vacation during Thanksgiving. In the previous year I discovered that I really like to disconnect during vacation and no studying gets done. So this year I want to account for that in the schedule.

The first week Nov 7- Nov 13 is a prep week where I will try out different resources and decide on curriculum.

Five weeks Nov 14-Dec 18 will be 4 weeks of studying and 1 week of vacation.

The final week Dec 19-25 will be a wrap-up week to finish loose ends and write up a summary post.

The main topic for this sprint is Statistics. I started learning Statistics in the third quarter with basics of descriptive statistics, now its time for the basics of inferential statistics.

I recently started listening to podcast The Talking Machines about Machine Learning. So far very interesting, so I plan to listen to all of the episodes from the very beginning for this sprint.

 

Changing the format: from three month quarters to six week sprints

I took a quarter long summer break from my Personal PhD project and I am ready get back into it again!

One thing that I am changing for my second year of Personal PhD is the format.

In the previous year I did 4 quarters each 3 months long, where first 2 weeks were prep, 10 weeks were studying and 2 weeks were wrap-up time. The built-in prep and wrap-up time is super important. Because I get to decide my curriculum I can make the length and rhythm of the periods to be the most optimal for me.

The prep time is for figuring out the curriculum for the next period. It is for trying out few different resources before settling on a specific curriculum.

Even with the best intentions a side project inevitably get delayed so the wrap-up time is a life saver. It’s a time to catch up, finish the last chapters, write up a post summarizing the period. Its a chance to start the next period on time instead of feeling down for delaying it.

From the last year I found that a quarter is a very long time and working on several big things at a time is harder. Also a quarter length project is not agile enough. With shorter periods I can focus on one topic at a time and choose topics as need arises. Also with shorter sprints I can plan better around vacations. From what I saw from last year, I am usually totally disconnected during vacations and get very behind on studying.

So going forward I will do shorter sprints. Each sprint will be six weeks long: one week of prep time (finalizing curriculum and trying out few resources), four weeks for studying, and one week of wrap-up time.

I still plan to write updates, one at the beginning, one in the middle and one in the end.

Fourth quarter of my Personal PhD project (April – June 2016) [update #4.1]

It is hard to believe, but it has been a year into my self-learning project dubbed Personal PhD.

I will write up a summary of my year a bit later, but for now, what happened with the fourth quarter?

It did not go as well as previous quarters. For one this is the only update about the fourth quarter. I didn’t manage to write an update at the beginning of the quarter with what was my plan. One reason was that it took longer than anticipated to wrap up with the third quarter. I wrote the final post on the third quarter on April 27th when the new quarter was supposed to start on April 1st.

The goal for the fourth quarter was to follow along Stanford’s course CS224d Deep Learning for Natural Language Processing in real-time as the course was happening. It wasn’t officially a MOOC but all the materials are posted online here: cs224d.stanford.edu/syllabus.html

The topic is exactly relevant to what I do at work and in fact few of my coworkers were taking the class as well so we even started a study group.

Unfortunately it didn’t pan out as I had hoped for. Till now I have only managed to get through the first 3 lectures out of the total 18 lectures.

Let’s see why.

My quarter was supposed to be from April through end of June, but as I said above I only really wrapped up Q3 at the very end of April. The reason for that was that I went on vacation for 10 days trekking in Patagonia at the end of March. When I am on vacation, I am on vacation, so no work and no studying. After I got back I had to catch up with a lot of work and life stuff.

Then in May I was sick for a week which is so atypical for me, and the worst, I couldn’t even read. At the end of May I was in NYC for 10 days very busy so I couldn’t study then and again, had lots of stuff to catch up after I got back.

My takeaway from this experience is that whenever I go on vacation or trip I need to build in a catch up time. This is especially true for longer trips. Several times this year I found myself being really behind on personal email and errands after coming back from trips.

The CS224d course itself is great! Even the first three lectures were super interesting and useful.

So what’s the score for the quarter? 3 out of 18 lectures and 1 out of 6 updates is 0.17 (compare that to previous quarter’s score of 0.92).

Anyways I am cutting my loses short and declaring now its the end of 4th quarter instead of spending a month catching up and potentially getting further behind 🙂

Next up, I will be taking a “summer break” from my Personal PhD program 🙂 I have a two-week long vacation planned in August and right now I am about to head out for a road trip/camping trip to Crater Lake National Park and have few more road trips planned in July.

Initially I thought I could take a three-month long summer break, but I have a feeling I won’t be able to last that long without learning something new, so don’t worry, I will be back soon enough! 🙂

 

 

Third quarter summary [Personal PhD update #3.4]

This update is long overdue, but it finally marks the finish of the third quarter in my Personal PhD journey.

One big reason for the delay was that I went on a 10 day trip to Chile and 5 of those days I spent completely off the grid trekking in Patagonia. I was able to do some limited studying on the planes from printed papers and the Udacity app, but the trip yet again confirmed that I prefer focusing on one thing at a time. If I am on vacation, I am on vacation.

Hopefully some day I will get to write a more detailed post about my travels, but in the meantime, here are some pictures.

After I got back from Chile, I had to catch up with my third quarter curriculum.

I managed to finish Udacity’s Intro to Descriptive Statistics course. On the one hand it was pretty easy and straightforward. On the other hand all those exercises and quizzes really hammered down the material, compared to, for example, just reading the formula for some concept once.

I finished reading the Deep Learning book draft. That was hard. I didn’t understand most of it, especially the math in the third part of the book. Though I think it was still good use of my time and I got an overview of deep learning.

I also read The Master Algorithm by Pedro Domingos. It was meh. The style of writing didn’t really speak to me. Perhaps a person who doesn’t know anything at all about science could enjoy the tone, but otherwise seems to me like a tone reserved for speaking to a small kid.

Here is summary of my third quarter.

Computer Science / Math track

The focus this quarter was statistics.

 Score: 1 out of 1.

Linguistics

Learning Spanish – I refreshed my Spanish before my trip to Chile using Memrise, Duolingo (reached 47 day streak right before Patagonia) and Coffee Break Spanish podcasts. Its incredible how 5 minutes here and 5 minutes there daily add up. In Chile I was able to successfully use my Spanish to get around, order food, buy bus ticket exchanges, even talk to police (all ended well 🙂 ).
Score: 1 out of 1.

Paper reading

It was a slog but I did finish Deep Learning book. Also read 2 other papers.
Score: 1 out of 1.

Writing

This is 4th update out of the goal of 6 (1 every two weeks).
Score 0.7 out of 1.

Overall score for the quarter: 0.92 out of 1. I am pretty happy about this 🙂

 

Short and sweet [Personal PhD Update #3.3]

This update is going to be super short as I need to head out to the airport in 10 minutes for a 10 day vacation trekking in Patagonia and I am not taking my laptop with me. I can’t remember the last time I went somewhere without my laptop. It probably was 6 years ago (for another trekking trip)!

Also I am so overdue with an update, that I can’t afford to wait another 10 days 🙂
Here it is:
  • 45 day streak on Duolingo and Memrise apps to learn Spanish
  • Finished up to Lesson 4 (incl.) of Udacity’s Intro to Descriptive Stats Course.
  • Continuing on the stats theme, I finished reading book  Superforecasting: The Art and Science of Prediction by Philip E. Tetlock and Dan Gardner
  • Watched videos from Weeks 1 and 2 of Coursera’s course on Natural Language Processing https://www.coursera.org/course/nlangp (this wasn’t part of planned curriculum, but a coworker wanted to study it so we decided to join forces).
  • Read another chapter (chap 13) from Deep Learning book.
That’s all for now and maybe next time I will write about how to study on the go. (I have some reading material with me!)

25 day streak on Duolingo [Personal PhD Update #3.2]

I am having a hard time to dedicate a regular time for studying. Last year somehow my schedule was more consistent and regular. Now it is less so. Seems now my schedule is more project based and focusing on one thing at a time (which could be traveling, or dancing, exercising, or doing big spring cleaning). That’s ok though 🙂 I can do the same with studying for my Personal PhD and do batches of studying every two weeks. Will see how this system works for me.

Writing these updates, that’s a whole another story. Have had a draft of this one for about a week, but got to actually finish it just now.

Now onto the update.

So for this quarter so far I have done 3.5 lessons (out of 7) from Udacity’s Intro to Statistics course.  It is quite basic and at sometimes can feel too slow paced, but I think it will instill really solid basics.

I finished reading The Signal and the Noise: Why So Many Predictions Fail — But Some Don’t  by Nate Silver. It was fascinating read and went into history of IBM Deep Blue winning in chess, history of weather forecasting, history of earthquake forecasting and poker. Also had a very good explanation of applied Bayesian statistics to making and adjusting predictions.

I have restarted studying Spanish. I am using Memrise, Duolingo (and now I am on 25 day streak!) and Coffee Break Spanish Season 4 podcast. All this for my upcoming trip to Patagonia at end of March! Woohoo!

Screenshot_20160225-184423

For paper reading: I have read chapters 10-12 of Deep Learning book and few other papers.

Plan for Q3 (Personal PhD update #3.1)

Two quarters down! Here is the plan for the next quarter (Jan-Feb-Mar 2016).
I have spent last few weeks exploring and honing in on curriculum for the next quarter. Here it is.

Computer Science / Math track

As I was going through the Coursera Machine Learning course and learning about Speech Recognition and reading papers I see that one gap in my knowledge is statistics. For example terms like this: Hidden Markov Models, posterior probabilities, Bayesian statistics, cross entropy, Gaussian, negentropy, doubly stochastic process, Cauchy density. I have heard of simpler things like p-values or confidence intervals, but I wouldn’t know how to produce them if I needed.

Therefore I decided that for the CS track I will focus on statistics and probability for the next quarter. (Beauty of making up my own curriculum and being able to adjust real time to the needs I have at work.)

I looked at few stats courses and I have settled on Udacity’s: Intro to Descriptive Stats (https://www.udacity.com/courses/ud827) & Intro to Inferential Stats (https://www.udacity.com/courses/ud201).

Udacity has a mobile app so I can watch videos or do quizzes while on the bus for example, which does add to the convenience. Also, Udacity recently came out with Machine Learning Engineer Nanodegree which looks very interesting, it includes a very new course on Deep Learning in collaboration with Google. (https://www.udacity.com/course/deep-learning–ud730)

Another course that I explored as it was very highly recommended from a colleague was MIT’s Introduction to Probability – The Science of Uncertainty. (https://courses.edx.org/courses/MITx/6.041x_1/1T2015/info) From what I can tell from week 1 and the syllabus it goes much deeper than the Udacity’s intro stats courses.

 

To throw in some fun reading about statistics I plan to

* finish reading The Signal and the Noise: Why So Many Predictions Fail — But Some Don’t  by Nate Silver (amazon)

* read Supreforecasting: The Art and Science of Prediction by Tetlock and Gardner (amazon)

I commit to finishing Intro to Descriptive stats course for this quarter and will see how far I will get with the rest.

Linguistics track

For linguistics track I will study and refresh my Spanish for a month (30 days). Its kind of cheating as it is not really linguistics 🙂 but two reasons 1) I am planning to go for vacation in Spanish speaking country and 2) it is actually helpful for my job to know foreign languages. The other day for example Russian knowledge came in handy while looking at experiment results.

Paper Reading

Continue reading the draft of Deep Learning book (http://www.deeplearningbook.org/). I finished last quarter with Chapter 9, there are total 20 chapters, so I plan to finish it. 

Writing

Still the same old – strive to write and publish an update every two weeks. Hopefully one extra post on how I finally figure out a system for myself to track the scientific papers that I read and want to read next.

Second Quarter Summary [Personal PhD Update #2.5]

 The second quarter of my Personal PhD has concluded and its time to look at what I have accomplished.
Highlight: I finished the Machine Learning Coursera course by Andrew Ng on time on the last day of the quarter!
 Lowlight: I wasn’t consistent with studying regularly and there was a period of 3 weeks where I didn’t do anything for the ML course.
Below follows how I did against my plan for the second quarter.

Computer Science track

DONE. Even though the book was published in 2007 and by tech book standards that’s like ancient, I still think its the best intro to Machine Learning for people who have absolutely no idea about Machine Learning. I am just saying this because I was ready to write this book off before even looking at it, but then by  my colleague’s recommendation I still read it. The book helped me to see how Machine Learning fits among other analysis methods. For example I have seen k-means clustering before being applied to bioinformatics data, but nobody called it a Machine Learning method. Turns out the same k-means clustering method used for prediction in a different community is now called a Machine Learning method.
I highly recommend reading the book before or slightly ahead of the Coursera ML course. Get general intuition from the book and then get mathematical intuition in the Coursera course.
Score: 1.0 out of 1.0.
2. Do rest of the Machine Learning course by Andrew Ng on Coursera (Weeks 6-11)
DONE. With Thanksgiving and Christmas holidays I totally procrastinated on the Coursera’s course. On weekends and holidays I am usually totally disconnected, so unfortunately it is not an opportunity to spend more time on Personal PhD. For Thanksgiving I went on an epic roadtrip from San Francisco to Mohave to Grand Canyon to Sedona to Phoenix to Joshua Tree National Park back to San Francisco. For Christmas I went to Latvia and then to UK.
I got back from Christmas travels on Sunday night the December 27th. I had 4 days left in the second quarter of my Personal PhD and I had weeks 9, 10, 11 of the course materials to finish.
So I spent 1.5 hours Monday morning before work, 2 hours after work, 1.5h Tuesday morning before work, 1h in the evening after work, again on Wed morning and evening. (I was getting up at 6am in the morning). Thursday I had off and it was the final push! Right before noon I finally submitted the final quiz! Then a screen appeared that said congratulations on finishing the course with 96.5% grade on quizzes and programming assignments. I am glad that the final weeks of the Coursera course didn’t have programming assignments. Those are more time consuming and I may not have finished in time.
So yeah, that’s what I get for procrastinating 🙂  but I am glad I still finished on time (by my own imposed deadline). After that I could celebrate New Years Eve with no remorse.
I hope I won’t procrastinate like this again. When getting back to it after a prolonged break there definitely was some time wasted to remember again the previous weeks materials as the material builds on each week.
Score: 1.0 out of 1.0.

Linguistics track

For my linguistics track for the second quarter I chose to read a book called Watching The English (the perks of setting my own curriculum 🙂 ). I have seen before when studying French that culture and language is intertwined, so I decided to read this book on culture and language of the English. It is a pretty long and dense book, but I really enjoyed it. Especially since the end of the year I spent few days in UK and was able confirm some observations.
Maybe someday I will write up about my takeaways from the book, but its not gonna be today to get this update out at somewhat reasonable time scale 🙂
Score: 1.0 out of 1.0.

Paper Reading

4. Read 10 more scientific papers on Speech Recognition.
 I didn’t really end up reading any new papers. However I am reading Ian Goodfellow’s draft book on Deep Learning  http://www.deeplearningbook.org/ and discussing it with my colleagues of how it relates to the work we do. I think each chapter of that book counts as a paper as each chapter is long and dives into existing research. I have read through 9 chapters with is 352 pages.
I am glad I did majority of the Coursera ML course before reading this book, it was a really good background for this reading.
Score: 0.9 out of 1.0

 Writing

5. Write an update every other week.
Yeah, so I haven’t been doing well in this department. I wrote 5 updates in 12 weeks, which is the same as previous quarter. But doesn’t quite hit the target of writing every two weeks. I think this is the hardest part of it all.
Score: 0.83 out of 1.0

Summary

Overall score for the second quarter: 0.95 (compared to previous  quarter of 0.90) and I am happy that I managed to finish the Coursera course.

I will take some time to finalize the curriculum for the third quarter.

P.S. The formatting of this post is whacky, but I rather get it out imperfect than not do it at all.