Monday, May 8, 2017

Readings in Data Science models


As I've previously mentioned, I recently started a 6 part series on LinkedIn called "ex-libris" of a Data Scientist. 

Part I was on "data and databases": "ex-libris" of a Data Scientist - Part i

I just posted part II, "models": "ex-libris" of a Data Scientist - Part II

It does cover machine learning, but before going there, I cover Metrics, Operations Research, Econometrics and Time Series and Statistics. Even more fundamentally, I start with Math.

Yes, if you slept through linear algebra or calculus (or analysis as it is called in certain parts of the world), check the list out. Book suggestions and links to videos and other resources.

Also, a reminder: Python specific books will show up in part IV.

Francois Dion

Monday, April 24, 2017

Meet Eliza #AI

I will be presenting and directing a discussion on artificial intelligence, from various angles including the arts, Tuesday April 25th at Wake Forest in Winston Salem, NC.

Details here:

Francois Dion

Thursday, April 13, 2017

Readings in data and databases

Recent readings (can you guess/decipher some of them?)

I've been fairly quiet on this particular blog this year. Beside a lot of data science work, I've done presentations at meetups and conferences, including a recent tutorial on "Getting to know your data at scale" at the IEEE SouthEastCon 2017. Notebooks will be posted on github soon.

But, in the meantime...


Something else I've been doing is publishing a few articles here and there. Just recently, I started a 6 part series on LinkedIn called "ex-libris" of a Data Scientist. I think many readers of this blog will appreciate this series, and particularly this first installment on "data and databases":

"ex-libris" of a Data Scientist - Part i

It covers a good variety of books on the subject, some pretty much must read for whatever corner of the computer science world you live in. Also of interest will be the Postgres, Hadoop and graph database pointers and a list of over 20 curated must read papers in the field.

Python specific books will show up in part IV.

Francois Dion

Monday, November 7, 2016

Something For Your Mind, Polymath Podcast episode 3

"Improving your communications and Entrepreneurship"

In this episode, , we will cover these two different topics. Specifically, we will talk about kinematic displays, improving your communications, coworking spaces, entrepreneurship, innovation and more. This episode will wrap up with a learn more section, on what is an entrepreneur.
"these are people that can't help but generate ideas all the time..."

A few selected links are on the episode web page, and includes a few items related to colors (some for all platforms, others are Linux only)

Something for your mind is available on


Francois Dion

Thursday, October 20, 2016

Stemgraphic, a new visualization tool

PyData Carolinas 2016

At PyData Carolinas 2016 I presented the talk Stemgraphic: A Stem-and-Leaf Plot for the Age of Big Data.


The stem-and-leaf plot is one of the most powerful tools not found in a data scientist or statistician’s toolbox. If we go back in time thirty some years we find the exact opposite. What happened to the stem-and-leaf plot? Finding the answer led me to design and implement an improved graphical version of the stem-and-leaf plot, as a python package. As a companion to the talk, a printed research paper was provided to the audience (a PDF is now available through

The talk

Thanks to the organizers of PyData Carolinas, videos of all the talks and tutorials have been posted on youtube. In just 30 minutes, this is a great way to learn more about stemgraphic and the history of the stem-and-leaf plot for EDA work. This updated version does include the animated intro sequence, but unfortunately the sound was recorded from the microphone, and not the mixer. You can see the intro sequence in higher audio and video quality on the main page of the website below.

I've created a web site for stemgraphic, as I'll be posting some tutorials and demo some of the more advanced features, particularly as to how stemgraphic can be used in a data science pipeline, as a data wrangling tool, as an intermediary to big data on HDFS, as a visual validation for building models and as a superior distribution plot, particularly when faced with non uniform distributions or distributions showing a high degree of skewness (long tails).

Github Repo

Francois Dion

Tuesday, October 11, 2016

PyData Carolinas 2016 Tutorial: Datascience on the web

PyData Carolinas 2016

Don Jennings and I presented a tutorial at PyData Carolinas 2016: Datascience on the web.

The plan was as follow:


Learn to deploy your research as a web application. You have been using Jupyter and Python to do some interesting research, build models, visualize results. In this tutorial, you’ll learn how to easily go from a notebook to a Flask web application which you can share.


Jupyter is a great notebook environment for Python based data science and exploratory data analysis. You can share the notebooks via a github repository, as html or even on the web using something like JupyterHub. How can we turn the work we have done in the notebook into a real web application?
In this tutorial, you will learn to structure your notebook for web deployment, how to create a skeleton Flask application, add a model and add a visualization. While you have some experience with Jupyter and Python, you do not have any previous web application experience.
Bring your laptop and you will be able to do all of these hands-on things:
  1. get to the virtual environment
  2. review the Jupyter notebook
  3. refactor for reuse
  4. create a basic Flask application
  5. bring in the model
  6. add the visualization
  7. profit!
Now that is has been presented, the artifacts are a github repo and a youtube video.

Github Repo

After the fact

The unrefactored notebook is here while the refactored one is here.
Once you run through the whole refactored notebook, you will have train and test sets saved in data/ and a trained model in trained_models/. To make these available in the tutorial directory, you will have to run the script. On a unix like environment (mac, linux etc):
chmod a+x


The whole session is now on youtube: Francois Dion & Don Jennings Datascience on the web

Francois Dion

Thursday, October 6, 2016

Improving your communications: Professional Audio-Video Production on Linux

Pro AV on Linux

I'll be presenting on the subject of Professional Audio-Video Production on Linux, next week at TriLug.

From concept to finished product, it has never been easier to obtain professional results when it comes to audio-video production on Linux.

We will cover some of the hardware that should be part of your production suite, from microphones to jog wheels and highlight some of the top tools for animation, audio, broadcasting, effects, modeling, music, transcoding and video. We will also go beyond the usual suspects and introduce some tools that might not be typically used for AV production.
By the end of the presentation, you will have all the tools you need to improve the quality of your communications, for your personal enjoyment, your career, or your business.


Thursday, 13 October 2016 - 7:00pm to 9:00pm
The Frontier, 800 Park Offices Drive, Durham, NC
Francois Dion