What if we could save time, and have most potential interview questions answered right here?
I'm an M.Sc. CS and coder in many languages (have used: C, C++, python, perl, bash + core Unix utils, R, javascript, PHP, Julia, Prolog, LISP, Pascal, Basic, & a few more).
It would be difficult to pin me into a narrow-box of a job title. A combination of a Computer-Scientist, Machine-Learning engineer, & Data-Scientist, with both back-end and front-end skills & experience.
I have built successful production-ready systems & software for:
I've applied data-mining and machine learning techniques to personal investing. I do this in my little free time, and after a long time working on it, I'm seeing exciting and positive results. The goal is to go way beyond broad asset-allocation with periodic tactical re-balancing on a mix of low-cost, broad index and asset classes while capturing a large number of latent interactions between market segments, with possible lags under very stochastic and non-stationary conditions.
The material on my finance.yendor.com and on Planet Quant (an past collaboration project with two friends) sites is long, to point to a small sample of old articles:
I wrote an introduction to "R", the popular statistical and machine learning programming environment which I rarely use now. I also gave a short lightning talk @ BARUG (Bay Area R Users Group) on one particular reason to like R.
Here's a mini visualization project leading to insight: profiling a distributed web application using R/ggplot.
A mini-project applying machine-learning to customized/personal weight-loss lifestyles. The idea is that people & genomes are different. Question: which lifestyle may lead to your very own weight-loss? Why not experiment yourself, with your own data? This project went viral after someone posted a link to it on Hacker News, and now has over 3200 stars on github.
In 2022, I taught myself javascript & D3.js
and wrote a generally-applicable "data to insights" tool:
It is a converter of a relational-table + an "attention" config
into entity-relationship force-directed graph. The result
is a interactive (D3.js) visualization of entities and
relationships between them.
The Data to E-R transformation is config-driven, so one can apply
the tool to many domains & visualize any type of relation they want.
Here are two mini-app output examples of applying this technique:
I am experienced with many ML tools and frameworks.
Here's an introduction and demo of vowpal-wabbit an open-source, liberal license machine-learning tool. A good read, no need to have a machine-learning background to "get it".
Debugging machine learning is hard; especially when opaque models with thousands of input features, weights and biases, are involved. After thinking about the challenge for some time, and realizing where Machine Learning practitioners most often get it wrong, I thought-up and implemented an unconventional method to help improve models by detecting anomalies & irregularities during the learning process. as opposed to focusing on the end result (cross-validation, accuracy, precision/recall metrics, or similar). The idea presented above became the basis for an unsupervised multi-tenant anomaly-detection system at-scale, focusing on security @ Netskope which you can read a little about in this blog.
Security is a hard, probably unsolvable problem: a constant arms-race between black and white hats. I once implemented an effective, very fast classifier assigning a "maliciousness" (via surrogate) probability to arbitrary byte sequences highlighting how a big challenge can be broken into smaller (iterative) actions and how to drive ideas all the way to production quality ML.
Focus on minimizing training error is arguably the most common mistake ML practitioners make. To explain this common issue, I wrote a cheat-sheet with optimal solutions to the TensorFlow playground problem-set. It explains how to design a minimalist model for each problem, while also achieving a low generalization error. You might appreciate the spoilers much more if you first try and solve these problems yourself at the TensorFlow playground (click and point ML, no coding necessary), first.
Human evolution has made visualization one of the best ways to understand data. I wrote a little exercise titled "A picture is worth a 1000 words" to convince you. Then, I GPL'd the little heat-map utility (hm) I used in the exercise and made it available here. Later, I wrote a newer version, in python + pandas & matplotlib, called xyz which you can find here. If you find your 3D numeric data hard to understand, xyz can help.
While working with data, I realized how limited is the design of
one of the oldest Unix(R) utilities: cut is. Seeing how many
others experienced the same issues, I wrote a replacement which
I called cuts for
"cut on steroids" (over 60 stars on github).
If you use cut but feel something is missing, you may
want to try cuts.
Codility.com is a site where programmers can test their coding skills. It evaluates coders objectively on their ability to solve well defined "leet style" toy problems. They automatically check code for both build and run time errors, solution correctness (many input sets), and efficiency. On my first attempt there I scored 100% on both the Perl and C tests, placing me in the top 12.36% of 19,394 coders based on codility's statistics.
I enjoy optimizing stuff: processes, performance, productivity. Here's an example of me trying to improve vowpal_wabbit (a free machine learning software) by experimenting with speed and collision rates of two hash functions and another: tracking VW performance over time.
At Netskope I was one of the top contributors in moving the company to a CICD (Continuous Integration, Continuous Delivery) process. I implemented a early (dev stage) CI layer on top of git, providing:
After leaving Netskope I came up with a FOSS implementation of some of the above ideas in a better way. I called it clean-push: an automated git flow to produce safe, neat (rebased + squashed) pull-requests. Try it! It may solve many of the issues you may find working with git on a daily basis.
I was 4-times among the top-10 award-winner in coupons.com bi-yearly hackathon. One of the 4 was 1st place.
A mini team I led won the 1st two spots in the Pantheon 2022 Hackathon.
Projects were all data-driven, focused on visualization, animation, machine learning, prediction, and fraud detection.
My favorite current IDEs are LLMs like ChatGPT and Bard. Rather the displace me as a programmer, they made me a much better one, as they are faster looking-up APIs and typing, while I'm much better than they are in strategizing, planning, breaking-up large problems into smaller chunks, and spotting errors. I feel everyone should use what makes them feel most productive and happy.
In my old days at Silicon Graphics I led and contributed to the free software on SGI project with several other colleagues. SGI customers loved it based on the many thanks we got. From humble beginnings, this effort eventually resulted in Linux running on SGI hardware, and SGI making Linux central to its strategy.
From time to time I try to help Linux and Analytics newcomers on forums
where I believe the only stupid question is the one you were too
timid to ask. See for example the
Ask Ubuntu,
Stack Overflow,
and
Cross Validated
sites, or browse through my StackExchange profile via the widget on the right:
I also try to contribute a bit by reporting bugs and help
figure out issues on
Ubuntu Launchpad
Some of my code has been used in a machine learning free software called PCP, by Ljubomir Buturovic of San Francisco State University.
I've sent some usability and speed related patches to both liblinear by Chih-Jen Lin et al and svmlin by Vikas Sindhwani two machine learning tools which I have used.
Around 2008, I started using John Langford's Vowpal Wabbit which exceeded my expectations on many levels: scalability in both space and time; robustness; out of sample (test-like) loss estimate for the model as it trains; automatic generation of feature-interactions; friendly user interface, more flexible and useful input format than what you find in similar programs; willingness of the main author to listen, educate, and accept patches. I have been contributing code and fixes to vowpal-wabbit via github for several years, helping newcomers on the vowpal wabbit mailing list , and reached #3 contributor status on github for a while in those early days. I was humbled by John's inclusion of my name in the Vowpal wabbit early authors roster.
Here's my dream job from multiple perspectives: