Ariel Faigon: Online Resume

Ariel Faigon: prospective employer summary
(A resume + interview of sorts)

Introduction

Is there a better way to learn about someone than a conventional resume?
What if instead of a list of keywords, I could tell you more about:

What I actually love to do?
What motivates me the most, and can bring out the best in me?
A showcase of links to actual code, presentations, & projects I've written.
How we can help each other achieve our goals?

What if we could save time, and have most potential interview questions answered right here?

Click here for standard resume format & employment summary

Analytics, Machine Learning, Data Science

I have a passion for machine-learning, data-mining, predictive modeling, analytics, & data visualization. I love diving into data and uncover insights that may change the way management looks at a business, lead the way to high-impact changes that move the needle, or uncover & highlight facts that were not known before.

I'm an M.Sc. CS and coder in many languages (have used: C, C++, python, perl, bash + core Unix utils, R, javascript, PHP, Julia, Prolog, LISP, Pascal, Basic, & a few more).

It would be difficult to pin me into a narrow-box of a job title. A combination of a Computer-Scientist, Machine-Learning engineer, & Data-Scientist, with both back-end and front-end skills & experience.

I have built successful production-ready systems & software for:

Near real-time, large scale, multidimensional anomaly-detection (@ Netskope)
CI/CD leadership, Test-driven-design implementation (@ Netskope)
Language models, document multi-class (NLP, Word2Vec) (@ Netskope)
Recommender systems, revenue maximizing, and optimal ranking (@ coupons.com)
Deep data understanding via entity-relationship, visualizations (@Pantheon)
Content analysis, clustering, anomaly-detection, spam & DDoS abuse auto-detection and mitigation (security/anti-abuse @ Yahoo!)
Data-mining tools, Mineset team (@ SGI)
Compilation, code-generation, Release automation, random-testing (@ National Semiconductor)

I've applied data-mining and machine learning techniques to personal investing. I do this in my little free time, and after a long time working on it, I'm seeing exciting and positive results. The goal is to go way beyond broad asset-allocation with periodic tactical re-balancing on a mix of low-cost, broad index and asset classes while capturing a large number of latent interactions between market segments, with possible lags under very stochastic and non-stationary conditions.

The material on my finance.yendor.com and on Planet Quant (an past collaboration project with two friends) sites is long, to point to a small sample of old articles:

Interactive bubble charts
Portfolio optimization using Simulated Annealing
Does momentum exist? and how can it be quantified?
Using heat-maps for feature selection and model optimization
Other lessons from the Netflix challenge: on the power of ensemble approaches in ML.
Early thoughts on Short-Term-Contrarian (STC) strategies plus some simulations thereof

I wrote an introduction to "R", the popular statistical and machine learning programming environment which I rarely use now. I also gave a short lightning talk @ BARUG (Bay Area R Users Group) on one particular reason to like R.

Here's a mini visualization project leading to insight: profiling a distributed web application using R/ggplot.

A mini-project applying machine-learning to customized/personal weight-loss lifestyles. The idea is that people & genomes are different. Question: which lifestyle may lead to your very own weight-loss? Why not experiment yourself, with your own data? This project went viral after someone posted a link to it on Hacker News, and now has over 3200 stars on github.

In 2022, I taught myself javascript & D3.js and wrote a generally-applicable "data to insights" tool: It is a converter of a relational-table + an "attention" config into entity-relationship force-directed graph. The result is a interactive (D3.js) visualization of entities and relationships between them. The Data to E-R transformation is config-driven, so one can apply the tool to many domains & visualize any type of relation they want. Here are two mini-app output examples of applying this technique:

I have a more impressive demo to show privately (please ask me about it).

I am experienced with many ML tools and frameworks.

Here's an introduction and demo of vowpal-wabbit an open-source, liberal license machine-learning tool. A good read, no need to have a machine-learning background to "get it".

Debugging machine learning is hard; especially when opaque models with thousands of input features, weights and biases, are involved. After thinking about the challenge for some time, and realizing where Machine Learning practitioners most often get it wrong, I thought-up and implemented an unconventional method to help improve models by detecting anomalies & irregularities during the learning process. as opposed to focusing on the end result (cross-validation, accuracy, precision/recall metrics, or similar). The idea presented above became the basis for an unsupervised multi-tenant anomaly-detection system at-scale, focusing on security @ Netskope which you can read a little about in this blog.

Security is a hard, probably unsolvable problem: a constant arms-race between black and white hats. I once implemented an effective, very fast classifier assigning a "maliciousness" (via surrogate) probability to arbitrary byte sequences highlighting how a big challenge can be broken into smaller (iterative) actions and how to drive ideas all the way to production quality ML.

Focus on minimizing training error is arguably the most common mistake ML practitioners make. To explain this common issue, I wrote a cheat-sheet with optimal solutions to the TensorFlow playground problem-set. It explains how to design a minimalist model for each problem, while also achieving a low generalization error. You might appreciate the spoilers much more if you first try and solve these problems yourself at the TensorFlow playground (click and point ML, no coding necessary), first.

Human evolution has made visualization one of the best ways to understand data. I wrote a little exercise titled "A picture is worth a 1000 words" to convince you. Then, I GPL'd the little heat-map utility (hm) I used in the exercise and made it available here. Later, I wrote a newer version, in python + pandas & matplotlib, called xyz which you can find here. If you find your 3D numeric data hard to understand, xyz can help.

Passion for Coding

I've been coding for many years. Here's some C code I've written long time ago, just out of college. Some of it are translations from classical texts pseudo-code into C (Sedgewick's "algorithms in C" was published years later, I worked with the pseudo-code first edition), and some is 100% original code. Shortly after "the web" happened, I dusted them off and published them on github under a FOSS license. While being pretty straightforward, I hope these shows how much I care about writing well structured, clear, and well documented and well tested code.

While working with data, I realized how limited is the design of one of the oldest Unix(R) utilities: cut is. Seeing how many others experienced the same issues, I wrote a replacement which I called cuts for "cut on steroids" (over 60 stars on github). If you use cut but feel something is missing, you may want to try cuts.

Codility.com is a site where programmers can test their coding skills. It evaluates coders objectively on their ability to solve well defined "leet style" toy problems. They automatically check code for both build and run time errors, solution correctness (many input sets), and efficiency. On my first attempt there I scored 100% on both the Perl and C tests, placing me in the top 12.36% of 19,394 coders based on codility's statistics.

I enjoy optimizing stuff: processes, performance, productivity. Here's an example of me trying to improve vowpal_wabbit (a free machine learning software) by experimenting with speed and collision rates of two hash functions and another: tracking VW performance over time.

At Netskope I was one of the top contributors in moving the company to a CICD (Continuous Integration, Continuous Delivery) process. I implemented a early (dev stage) CI layer on top of git, providing:

A simplified quick integration development flow
Support for personal preferences of integrated tests
Integrated enforcement of quality checks
Block-chain signing / qualification to allow merging into main-line.

After leaving Netskope I came up with a FOSS implementation of some of the above ideas in a better way. I called it clean-push: an automated git flow to produce safe, neat (rebased + squashed) pull-requests. Try it! It may solve many of the issues you may find working with git on a daily basis.

Organizational Culture, programmer's productivity

At SGI I was twice runner up for the "Spirit Award", an employee excellence recognition. I wrote the SGI intranet search engine "Sniff" way back in 1995-1996 (all of robot, indexer, and user-search UI) long before Google existed. My paper, Lessons I learned from Sniff in which I tell about this adventure, won the best paper award in the SGI worldwide software conference. You can learn a bit about my work on Sniff, and my beliefs about what makes great organizations in that paper.

I was 4-times among the top-10 award-winner in coupons.com bi-yearly hackathon. One of the 4 was 1st place.

A mini team I led won the 1st two spots in the Pantheon 2022 Hackathon.

Projects were all data-driven, focused on visualization, animation, machine learning, prediction, and fraud detection.

Free Software

Free software is more than just a tool for me. My desktop has been GNU/Linux since early 1994 (before version 1.0 came out), when one had to use a bunch of floppies to install. I feel tremendous gratitude for both Richard Stallman and Linus Torvalds, and the by now millions, who write, share, debug, use, and help improve free software every day.

My favorite current IDEs are LLMs like ChatGPT and Bard. Rather the displace me as a programmer, they made me a much better one, as they are faster looking-up APIs and typing, while I'm much better than they are in strategizing, planning, breaking-up large problems into smaller chunks, and spotting errors. I feel everyone should use what makes them feel most productive and happy.

In my old days at Silicon Graphics I led and contributed to the free software on SGI project with several other colleagues. SGI customers loved it based on the many thanks we got. From humble beginnings, this effort eventually resulted in Linux running on SGI hardware, and SGI making Linux central to its strategy.

From time to time I try to help Linux and Analytics newcomers on forums where I believe the only stupid question is the one you were too timid to ask. See for example the Ask Ubuntu, Stack Overflow, and Cross Validated sites, or browse through my StackExchange profile via the widget on the right:

I also try to contribute a bit by reporting bugs and help figure out issues on Ubuntu Launchpad

Some of my code has been used in a machine learning free software called PCP, by Ljubomir Buturovic of San Francisco State University.

I've sent some usability and speed related patches to both liblinear by Chih-Jen Lin et al and svmlin by Vikas Sindhwani two machine learning tools which I have used.

Around 2008, I started using John Langford's Vowpal Wabbit which exceeded my expectations on many levels: scalability in both space and time; robustness; out of sample (test-like) loss estimate for the model as it trains; automatic generation of feature-interactions; friendly user interface, more flexible and useful input format than what you find in similar programs; willingness of the main author to listen, educate, and accept patches. I have been contributing code and fixes to vowpal-wabbit via github for several years, helping newcomers on the vowpal wabbit mailing list , and reached #3 contributor status on github for a while in those early days. I was humbled by John's inclusion of my name in the Vowpal wabbit early authors roster.

Dream Job

If you are a recruiter, CTO, or VP of engineering, and see a good match, for the dream-job below, please contact me.

Here's my dream job from multiple perspectives:

Work on a challenging/interesting problem. Ideally something that can improve the world: AI, clean-energy, breakthrough in genetic research, physics, cosmology, or agriculture, automating anything that is hard and in great need.
A hands-on technical leadership job with a focus on engineering and data-science.
An individual contributor job. I can mentor, technically lead by example, and teach well. (Not looking for management or administration roles).
A manager who sets high-level goals and doesn't micro-manage. One who shares a vision & gives freedom to pick the best path to achieve big goals. Willing to experiment, take risks, and sometimes fail. Pulls and inspires rather than push or control. Bob Green, the best manager I ever had described his role to me this way: "My job is to hire the best people, and then help them succeed". Sunil Bopardikar, another great manager I had, once taught me about message delivery: "it's not what you say, its HOW you say it".
Environments which appreciate self-directed people & continuous progress.
Small-teams of complementary skills forming organically/naturally.
Leaders who believe in moving fast to get the 1st result; prototyping, seek feedback, then iterating towards perfection.
A culture fostering truth & full transparency where substance, and common purpose drive change.
Impactful actions that actually move the needle, over ceremony, pomp & circumstance.
Peers focused on increasing the pie as opposed to on how to slice it.
A small to medium sized, and fast-growing company. Ideally one with 20 - 300 employees.
A company where Machine-learning is central to its success, where I can make a difference and make a real impact solving big problems
A remote job: during the COVID-19 lock-down I've learned that work-from-home works really well for me. My most productive days were those where I didn't have to spend hours commuting. I live in the south Bay Area (Los Altos).

Past employment and projects I worked on

Standard-format resume on Google docs

Other/Misc

I'm a Math & Computer Science major (MSc). Had the honor of going through a great Computer Science program (@ HUJI) and had two Turing Award recipients as teachers: Michael O. Rabin (probabilistic algos) and Jeffrey D. Ullman (VLSI Design). I spent 3 years in medical school before switching to Computer Science.
My Myers Briggs personality is INTP (several tests at different ages, very clear on the I dimension. Borderline, but always consistent on the N,T,P dimensions).
I'm a US Citizen.

Ariel Faigon: prospective employer summary (A resume + interview of sorts)