publications


The page includes select material from formal print and electronic publications, plus certain guest posts I’ve written on others’ sites.

The material listed here does not include my blog, Complex Machinery (my newsletter on AI, risk, and related topics), Block & Mortar (my newsletter on all things web3), my O’Reilly Radar articles, or my one-pager websites like “Will AI Help Here?” and “How Do I Do AI?”.

Understanding Patterns of Disruption

(January 2018)

https://www.safaribooksonline.com/library/view/understanding-patterns-of/9781492027225/

cover: Understanding Patterns of Disruption

I paired up with longtime co-author Ken Gleason to explore how shifts in the technology landscape have opened (or, even, reopened ) the door on disruptive, market-shifting business models. In this paper, we mix technology, economics, and markets to show how to spot a potential business disruptor and how to make the most of it.

On Leadership

(September 2015)

http://radar.oreilly.com/2015/09/on-leadership.html

Many people migrate from hands-on technical roles to leadership positions without formal management training. They often learn the hard way that being an engineering manager or analytics lead is not natural progression of their technical skill set, but instead requires that they develop a very different kind of muscle. In this O’Reilly Radar piece, a colleague and I offer guidance for the newly-minted leaders and leaders-to-be.

Planting a Seed: Setting a New Direction for Tech Noncompetes

(October 2014)

https://medium.com/@qethanm/planting-a-seed-setting-a-new-direction-for-tech-noncompetes-c5d723eb1e46

In the technology space, employment contracts and noncompete agreements can be thorny issues. While it definitely behooves prospective employees to review those documents with an attorney, there’s still room for employers to create better agreements from the start. Coauthor Ken Gleason and I propose the tech sector borrow some ideas from the world of finance.

Business Models for the Data Economy

(October 2013)

http://oreilly.com/data/business-models-for-the-data-economy.csp

cover: Business Models for the Data Economy

The recent surge in data collection and analysis opens up a number of business models, though only a couple of them get much attention. Business Models for the Data Economy explores eight ways to add value and generate revenue in the world of data.

This blog post on O’Reilly Radar, “Building a Business on Data” describes the paper in greater detail. You can also download it for free through the O’Reilly catalog.

Steering the ship that is data science

(May 2013 - O’Reilly Radar)

http://radar.oreilly.com/2013/05/steering-the-ship-that-is-data-science.html

This was the second in a set of O’Reilly Radar posts I co-authored with Mike Loukides (@mikeloukides). We explored some parallels between today’s data science boom and the late-1990s tech boom. In particular, we ask: how can data science reap the rewards of being the Hot New Thing while avoiding its pitfalls?

Leading Indicators

(April 2013 - O’Reilly Radar)

http://radar.oreilly.com/2013/04/leading-indicators.html

This is the first in a set of O’Reilly Radar posts I co-authored with Mike Loukides (@mikeloukides). We pondered how to size up an organization’s data science efforts from the outside, perhaps as a possible interview candidate.

Bad Data Handbook: Mapping the World of Data Problems

(November 2012)

http://shop.oreilly.com/product/0636920024422.do

cover: Bad Data Handbook

A road map of data problems and solutions. This book describes various real-world data problems, from the hands-on technical grunt work to the high-level strategic issues.

I was the book’s editor, which means I was responsible for developing the concept and leading the project.
I supported and coordinated the efforts of nineteen contributing authors. I also co-wrote a chapter, “Data Quality Analysis Demystified: Knowing When Your Data Is Good Enough”.

Parallel R: Data Analysis in the Distributed World

(October 2011)

http://shop.oreilly.com/product/0636920021421.do

cover: cover: Parallel R

Paralell R describes strategies for getting R to work in the Big-Data era. In other words, Stephen Weston and I explain how to work past R’s limitations – being memory-bound and single-threaded – and let R work in a parallel, distributed manner suited to modern datasets.

The book covers well-known R packages for parallelism (Snow, Multicore, Parallel) as well as newer, Hadoop-related tools (RHIPE, Segue, Hadoop streaming). Much of my contribution explores how to mix R and Hadoop.

Managing RPM-Based Systems with Kickstart and Yum

(March 2007)

http://www.oreilly.com/catalog/9780596513825

An exploration of automated builds and systems management, using the RedHat Kickstart and yum tools.

APR Networking & the Reactor Pattern

(2006/10/03 - Doctor Dobb’s Journal)

Introduction to Apache Portable Runtime (APR) networking. I use the classic Reactor pattern as an example.

What Is Jetty

(2006/06/14 - OnJava (O’Reilly Network))

http://www.oreillynet.com/pub/a/onjava/2006/06/14/what-is-jetty.html

A page from the O’Reilly “What Is” series, this article describes the Jetty servlet container and its underlying API. Jetty is designed with embedding in mind; that is, you can add webapp (servlet, JSP, web services) functionality to a Java application without having to repackage it as a formal WAR.

GNU Autoconf

(2005/12/09 - C/C++ User’s Journal)

Use GNU Autoconf to simplify cross-platform builds of your native-code apps. Familiar with the standard ./configure; make; make install routine? Autoconf is what drives the ./configure step.

App-Managed DataSources with commons-dbcp

(2005/11/17 - Java.net)

http://today.java.net/pub/a/today/2005/11/17/app-managed-datasources-with-commons-dbcp.html

I’m all for standards, such as J2EE’s container-managed database connection pooling. Sometimes, though, you have to take a different path. This article explains how to create a database connection pool inside your application using two Jakarta libraries, commons-pool and commons-dbcp.

Processing XML with Xerces and SAX

(2005/11/10 - OnLAMP (O’Reilly Network))

http://www.oreillynet.com/pub/a/onlamp/2005/11/10/xerces_sax.html

Second in a two-part series, this article explains how to use the SAX side of the (Apache) Xerces C++ library to process XML documents.

The Perl-Compatible Regular Expressions Library

(2005/09/28 - C/C++ Users Journal)

Want the power of Perl’s regular expressions (regexps) in your C and C++ apps? Use the Perl-Compatible Regular Expressions Library, or PCRE.

Processing XML with Xerces and the DOM

(2005/09/08 - OnLAMP (O’Reilly Network))

http://www.oreillynet.com/pub/a/onlamp/2005/09/08/xerces_dom.html

First in a two-part series, this article explains how to use the DOM side of the (Apache) Xerces C++ library to process XML documents.

Simplify Network Programming with libCURL

(2005/05/05 - Linux DevCenter)

http://www.oreillynet.com/pub/a/linux/2005/05/05/libcurl.html

The curl commandline tool is a Swiss-Army knife of URL handling and downloading. Use its backend libCURL library to add file-transfer power to your native-code applications.

Pre-Patched Kickstart Installs

(2005/02/17 - Linux DevCenter)

http://www.oreillynet.com/pub/a/linux/2005/02/17/kickstart_updates.html

Third in a series, this article explains how to create a pre-patched Kickstart tree (that is, one with the updates already applied) and add some change control to your yum cronjobs.

Custom Containers & Iterators for STL-Friendly Code

(2005/02/15 - C/C++ Users Journal)

Many C++ STL container objects look and act alike, but they don’t share a parent class. Learn how to extend existing contianers or create new ones using STL’s “concepts,” a kind of loosely-enforced polymorphism.

The Watchful Eye of FAM

(2004/12/16 - Linux DevCenter)

http://www.oreillynet.com/pub/a/linux/2004/12/16/fam.html

Watching for changes in a file or directory? Calling poll() can be expensive. Let the File Alteration Monitor, or FAM, watch for you and report results to your code.

Advanced Linux Installations and Upgrades with Kickstart

(2004/11/04 - Linux DevCenter)

http://www.oreillynet.com/pub/a/linux/2004/11/04/advanced_kickstart.html

Second in a series, this article shows how to customize your Kickstart process and leverage Kickstart for OS upgrades.

Migrating to Page Controllers

(2004/10/14 - OnLAMP)

http://www.oreillynet.com/pub/a/php/2004/10/14/page_controller.html

Use the Page Controller pattern in your PHP web applications to separate business logic from the HTML.

Hands-Off Fedora Installs with Kickstart

(2004/08/19 - Linux DevCenter)

http://www.oreillynet.com/pub/a/linux/2004/08/19/kickstart.html

First in a series, this article is an introduction to the Kickstart automated OS-install tool for Linux. Why click through the installer a few (hundred) times? For Red Hat, Fedora, CentOS, and other RPM-based Linux distros, let Kickstart do the work so you can hang out at the pub.

Building a PHP Front Controller

(2004/07/08 - OnLAMP)

http://www.oreillynet.com/pub/a/php/2004/07/08/front_controller.html

Apply the Front Controller design pattern to your PHP apps, and in return you’ll get a single entry point through which to apply common services (such as security or page templating).

Programming Linux 2.6

(2004/06/15 - Linux Magazine)

A review of the developer-oriented features in Linux kernel 2.6.

Changing a Program’s Identity

(2004/04/15 - Linux Magazine)

Learn how to safely use the setuid() and setgid() system calls to make your app change its identity at runtime.

Writing a Trace System

(2004/03/15 - Linux Magazine)

You can’t always use a debugger in production! Add a configurable trace (logging) system to your app so you can track down problems at runtime.

Software Packaging with RPM

(2004/02/15 - Linux Magazine)

The RPM is the unit of measurement Red Hat Linux and its derivatives (Fedora, CentOS, and so on). Learn how to package your software as an RPM, so you can take advantage of the OS’s package management system.