Ariel Faigon: prospective employer summary
(A resume + interview of sorts)

Ariel in Crater Lake

Introduction

Is there a better way to learn about someone than a conventional resume?
What if instead of a list of keywords, I could tell you more about:

What if we could save time, and have most potential interview questions answered right here?

Click here for standard resume format & employment summary

Analytics, Machine Learning, Data Science

I have a passion for machine-learning, data-mining, predictive modeling, analytics, & data visualization. I love diving into data and uncover insights that may change the way management looks at a business, lead the way to high-impact changes that move the needle, or uncover & highlight facts that were not known before.

I'm an M.Sc. CS and coder in many languages (have used: C, C++, python, perl, bash + core Unix utils, R, javascript, PHP, Julia, Prolog, LISP, Pascal, Basic, & a few more).

It would be difficult to pin me into a narrow-box of a job title. A combination of a Computer-Scientist, Machine-Learning engineer, & Data-Scientist, with both back-end and front-end skills & experience.

I have built successful production-ready systems & software for:

I've applied data-mining and machine learning techniques to personal investing. I do this in my little free time, and after a long time working on it, I'm seeing exciting and positive results. The goal is to go way beyond broad asset-allocation with periodic tactical re-balancing on a mix of low-cost, broad index and asset classes while capturing a large number of latent interactions between market segments, with possible lags under very stochastic and non-stationary conditions.

The material on my finance.yendor.com and on Planet Quant (an past collaboration project with two friends) sites is long, to point to a small sample of old articles:

I wrote an introduction to "R", the popular statistical and machine learning programming environment which I rarely use now. I also gave a short lightning talk @ BARUG (Bay Area R Users Group) on one particular reason to like R.

Here's a mini visualization project leading to insight: profiling a distributed web application using R/ggplot.

A mini-project applying machine-learning to customized/personal weight-loss lifestyles. The idea is that people & genomes are different. Question: which lifestyle may lead to your very own weight-loss? Why not experiment yourself, with your own data? This project went viral after someone posted a link to it on Hacker News, and now has over 3200 stars on github.

In 2022, I taught myself javascript & D3.js and wrote a generally-applicable "data to insights" tool: It is a converter of a relational-table + an "attention" config into entity-relationship force-directed graph. The result is a interactive (D3.js) visualization of entities and relationships between them. The Data to E-R transformation is config-driven, so one can apply the tool to many domains & visualize any type of relation they want. Here are two mini-app output examples of applying this technique:

I have a more impressive demo to show privately (please ask me about it).

I am experienced with many ML tools and frameworks.

Here's an introduction and demo of vowpal-wabbit an open-source, liberal license machine-learning tool. A good read, no need to have a machine-learning background to "get it".

Debugging machine learning is hard; especially when opaque models with thousands of input features, weights and biases, are involved. After thinking about the challenge for some time, and realizing where Machine Learning practitioners most often get it wrong, I thought-up and implemented an unconventional method to help improve models by detecting anomalies & irregularities during the learning process. as opposed to focusing on the end result (cross-validation, accuracy, precision/recall metrics, or similar). The idea presented above became the basis for an unsupervised multi-tenant anomaly-detection system at-scale, focusing on security @ Netskope which you can read a little about in this blog.

Security is a hard, probably unsolvable problem: a constant arms-race between black and white hats. I once implemented an effective, very fast classifier assigning a "maliciousness" (via surrogate) probability to arbitrary byte sequences highlighting how a big challenge can be broken into smaller (iterative) actions and how to drive ideas all the way to production quality ML.

Focus on minimizing training error is arguably the most common mistake ML practitioners make. To explain this common issue, I wrote a cheat-sheet with optimal solutions to the TensorFlow playground problem-set. It explains how to design a minimalist model for each problem, while also achieving a low generalization error. You might appreciate the spoilers much more if you first try and solve these problems yourself at the TensorFlow playground (click and point ML, no coding necessary), first.

Human evolution has made visualization one of the best ways to understand data. I wrote a little exercise titled "A picture is worth a 1000 words" to convince you. Then, I GPL'd the little heat-map utility (hm) I used in the exercise and made it available here. Later, I wrote a newer version, in python + pandas & matplotlib, called xyz which you can find here. If you find your 3D numeric data hard to understand, xyz can help.

Passion for Coding

I've been coding for many years. Here's some C code I've written long time ago, just out of college. Some of it are translations from classical texts pseudo-code into C (Sedgewick's "algorithms in C" was published years later, I worked with the pseudo-code first edition), and some is 100% original code. Shortly after "the web" happened, I dusted them off and published them on github under a FOSS license. While being pretty straightforward, I hope these shows how much I care about writing well structured, clear, and well documented and well tested code.

While working with data, I realized how limited is the design of one of the oldest Unix(R) utilities: cut is. Seeing how many others experienced the same issues, I wrote a replacement which I called cuts for "cut on steroids" (over 60 stars on github). If you use cut but feel something is missing, you may want to try cuts.

Codility.com is a site where programmers can test their coding skills. It evaluates coders objectively on their ability to solve well defined "leet style" toy problems. They automatically check code for both build and run time errors, solution correctness (many input sets), and efficiency. On my first attempt there I scored 100% on both the Perl and C tests, placing me in the top 12.36% of 19,394 coders based on codility's statistics.

I enjoy optimizing stuff: processes, performance, productivity. Here's an example of me trying to improve vowpal_wabbit (a free machine learning software) by experimenting with speed and collision rates of two hash functions and another: tracking VW performance over time.

At Netskope I was one of the top contributors in moving the company to a CICD (Continuous Integration, Continuous Delivery) process. I implemented a early (dev stage) CI layer on top of git, providing:

After leaving Netskope I came up with a FOSS implementation of some of the above ideas in a better way. I called it clean-push: an automated git flow to produce safe, neat (rebased + squashed) pull-requests. Try it! It may solve many of the issues you may find working with git on a daily basis.

Organizational Culture, programmer's productivity

At SGI I was twice runner up for the "Spirit Award", an employee excellence recognition. I wrote the SGI intranet search engine "Sniff" way back in 1995-1996 (all of robot, indexer, and user-search UI) long before Google existed. My paper, Lessons I learned from Sniff in which I tell about this adventure, won the best paper award in the SGI worldwide software conference. You can learn a bit about my work on Sniff, and my beliefs about what makes great organizations in that paper.

I was 4-times among the top-10 award-winner in coupons.com bi-yearly hackathon. One of the 4 was 1st place.

A mini team I led won the 1st two spots in the Pantheon 2022 Hackathon.

Projects were all data-driven, focused on visualization, animation, machine learning, prediction, and fraud detection.

Free Software

Free software is more than just a tool for me. My desktop has been GNU/Linux since early 1994 (before version 1.0 came out), when one had to use a bunch of floppies to install. I feel tremendous gratitude for both Richard Stallman and Linus Torvalds, and the by now millions, who write, share, debug, use, and help improve free software every day.

My favorite current IDEs are LLMs like ChatGPT and Bard. Rather the displace me as a programmer, they made me a much better one, as they are faster looking-up APIs and typing, while I'm much better than they are in strategizing, planning, breaking-up large problems into smaller chunks, and spotting errors. I feel everyone should use what makes them feel most productive and happy.

In my old days at Silicon Graphics I led and contributed to the free software on SGI project with several other colleagues. SGI customers loved it based on the many thanks we got. From humble beginnings, this effort eventually resulted in Linux running on SGI hardware, and SGI making Linux central to its strategy.

profile for arielf on Stack Exchange, a network of
    free, community-driven Q&A sites From time to time I try to help Linux and Analytics newcomers on forums where I believe the only stupid question is the one you were too timid to ask. See for example the Ask Ubuntu,   Stack Overflow, and Cross Validated sites, or browse through my StackExchange profile via the widget on the right:

I also try to contribute a bit by reporting bugs and help figure out issues on Ubuntu Launchpad

Some of my code has been used in a machine learning free software called PCP, by Ljubomir Buturovic of San Francisco State University.

I've sent some usability and speed related patches to both liblinear by Chih-Jen Lin et al and svmlin by Vikas Sindhwani two machine learning tools which I have used.

Around 2008, I started using John Langford's Vowpal Wabbit which exceeded my expectations on many levels: scalability in both space and time; robustness; out of sample (test-like) loss estimate for the model as it trains; automatic generation of feature-interactions; friendly user interface, more flexible and useful input format than what you find in similar programs; willingness of the main author to listen, educate, and accept patches. I have been contributing code and fixes to vowpal-wabbit via github for several years, helping newcomers on the vowpal wabbit mailing list , and reached #3 contributor status on github for a while in those early days. I was humbled by John's inclusion of my name in the Vowpal wabbit early authors roster.

Dream Job

If you are a recruiter, CTO, or VP of engineering, and see a good match, for the dream-job below, please contact me.

Here's my dream job from multiple perspectives:

Past employment and projects I worked on

Standard-format resume on Google docs

Other/Misc