Data Science Community Matures with Diversity: Conference Observations,Highlights,and Interviews—Strata Data Conference,New York,September 11

Abstract

This publication covers the highlights and interviews at the Strata Data Conference held in New York on September 11–13, 2018.

Section 1: General Observations

New York's Strata Data Conference “Make Data Work”^* was presented by O'Reilly^† and Cloudera.^‡ (Congrats to Cloudera on its acquisition of Hortonworks!) The event's goals are to “help you put big data, cutting-edge data science (DS), and new business fundamentals to work.”

This premier data conference occupied the premier real estate of the Javits Center.^§ Contrary to one long forgotten night of November 8, 2016, for three memorable, beautiful September days the center was bursting at the seams with excited and exciting people full of fresh DS, artificial intelligence (AI), machine learning (ML), analytics, and internet of things (IoT) ideas, tools, services, and products. Day 1 (September 11) was all about training and tutorials, whereas days 2 and 3 (September 12 and 13) were for keynotes and sessions. Clearly, it is impossible to cover the absolute majority of what was happening at the conference. So, what are my personal highlights of this mega-event?

Let us start with five general observations. First, it is very telling and appropriate that this year the conference changed its name from “Strata Big Data and Hadoop Conference” to just “Strata Data Conference.” Second, the efforts for gender parity were much greater this year. That was evident on several fronts. First, and for the first time in my experience as a regular at the conference, there were long lines for both men's and women's restrooms. Also, and seriously, the total numbers of the keynotes, plenaries, and companies presented by women were the highest among these conferences!

Third, the conference diversity was not limited only to the gender parity but was on full display by the breadth and depth of the business applications. They included, for example, Infoworks' agile engineering software, Timescale's time-series scalable database, RLR Company's Hadoop desktop server, Integris Software's GDPR (a new European Union regulation called “General data protection regulation”) data privacy solution, and SmartCover's IoT diagnostics and prevention technology for seaward systems (all described in the below Section 2). This year a lion's share of the offerings and products were both thoughtful, to the point, and actionable. Altogether, this clearly indicates a diversity and maturity of the DS community at large, and not just of the so-called “data-industrial complex” (Tim Cook, CEO of Apple).¹ My hat goes off to both O'Reilly and Cloudera for being true Friends of Data & Analytics (FDAs)^2–4 and for their many-years support of our community!

Fourth, there is another, less obvious, and somewhat subjective remark on the community's maturity. This one comes from my personal, over a decade-long observations. When I moved from academia to industry to become one of the first Chief Data Officers (CDOs) world-wide, our community was very small, and a number of the utilized abbreviations was very small too. Nowadays, including during the Data Strata conference, this number continues to go through the roof. And I'm not only talking about well-known abbreviations, such as AI, DS, GDPR, IoT, and ML, but also about other ones, like, for example, AGI, DL, FDR, GOBS, NLP, and 5Vs. Finally, for all the abbreviations' lovers (guilty as charged), let me reckon two most recent additions: CSP and DIC (a youngest relative of over half-century old MIC. The latter abbreviations will be defined at the end of this article.).

Fifth, multiple presentations exemplified the conference's goal of translating data into superior business outcomes. One of the best keynotes was presented by Cassie Kozyrkov, Chief Decision Scientist (have you heard of this role before?) from Google Cloud.^** Cassie brilliantly covered important topics from actionable insights and decision science to the future of data science and type III errors (finding the right answers for the wrong questions).⁵ Type III errors are indispensable in our relentless focus on business outcomes as well as preventing us from generation of BS (GOBS).⁶ My interview with Cassie can be found in the below Section 3 (and her bio can be found in the Supplementary Data (Supplementary Data are available online at www.liebertpub.com/big). Still, her bottom line was quite simple: “all those complex mathematical endeavors need to be directed well, otherwise the results may be at best useless and at worst harmful… The world is collecting so much data – our goal should be to make it as useful as possible.” Amen!

Section 2: Conference Highlights

Here is a very narrowed down, subjective, and short write-up of only five individual presentations and/or company products I saw at the conference.

Agile data engineering is an extremely important activity for many of us in trenches. Amar Arsikere, CEO and Founder of Infoworks,^†† introduced the company's new agile data engineering software. This software automates and accelerates big data analytics projects through the company's Autonomous Data Engine, which has been adopted by some of the largest enterprises in the world. Using a code-free environment, the engine allows organizations to quickly create and manage data pipeline and workflow processes from source to consumption. Customers deploy big data projects to production within days, dramatically increasing analytics agility and time-to-value. And if you want to learn more about simplifying data operations, please go to Infoworks.^‡‡

At the conference (and 1.5 years after its launch), CEO and Co-Founder Ajay Kulkarni of Timescale^§§ announced TimescaleDB 1.0. It signifies the maturity and enterprise readiness of this open-source time-series database, built on top of PostgreSQL. This database offers the reliability and tooling of a 20+ years old database with a powerful extension framework. TimescaleDB ingests millions of data points per second; scales tables to 100s billions of rows and 10s of terabytes; and returns quick responses to complex queries.^*** It is architected to manage time-series data and includes many key specific functions, including automatic space–time partitioning, a hypertable abstraction layer, adaptive chunk sizing, time-series analytics in SQL, geospatial analysis, JSON support, and easy schema management. To my knowledge (please, let me know if I am wrong!), it is the only time-series database to scale these workloads, while still supporting full SQL.

At the conference, the Ricker Lyman Robotic Company^††† debuted its first product, Hivecell One, enabling developers to have a cluster on their desktop for working with Hadoop (it is still alive and kicking!). Hivecell is a small stackable server with six core 64-bit ARMv8 processor, 256 GPU CUDA cores, and 8 GB RAM, which brings true linear scalability. You can place another Hivecell on the stack to scale your compute power. The patent-pending Baranovsky connectors pass power and Ethernet through the stack of Hivecells, eliminating the clutter of wires from the developer's desktop. Hivecell has a built-in patent-pending provision system that enables developers to install Hadoop on a cluster with a single click of a button. It also supports Mesos, Kubernetes, and Kafka. My interview with the CEO and Co-Founder Jeff Ricker is in Section 4 hereunder, and the joint bios of Jeff and his multiyear friend, President, and Co-Founder Paul Lyman is in the Supplementary Data.

Data privacy issues, including GDPR, were discussed in several sessions. Integris Software^‡‡‡ was founded to meet the requirements of GDPR, the new California privacy law, and other privacy obligations. First, companies have to realize that privacy is fundamentally a data issue and has to be an outcome of a comprehensive data protection strategy. Few technology executives actually know what data are sitting on their systems, and relying on the old manual survey-based collection methods are not realistic in the age of big data. What is needed is data privacy automation, a new field that uses ML to help organizations discover, map, and set polices for their data so they do not have to lock it all down. Integris Software, led by Kristina Bergman, CEO and Co-founder, is a pioneer in data privacy automation.^§§§ Integris helps companies to build privacy into the design of their modern data architecture, enabling to protect customer privacy, while keeping their data unlocked.

Lastly, Dr. Greg Quist, CEO and Co-Founder of SmartCover Systems,^**** told a fascinating story on how to predict and prevent sewer spills. Unfortunately, for many of the sewer systems, getting data is a challenge. This is particularly true for the utility leaders, who are challenged to maintain an aging infrastructure in the midst of increasingly frequent and severe storm events and, hence, are up many nights. SmartCover helps these utility leaders step out of the dark, and, through a robust IoT solution (sensors, satellite communications, analytics, real-time data, and event notifications), let the sewers update them on how things were going. This results in reduced costs, optimized operations, and the elimination of overflows and spills. Best of all, the clients are now sleeping, and letting their sewers do the talking! My interview with Greg is in the following Section 5, and his bio is in the Supplementary Data.

Overall, this year Strata Data Conference in New York's Javits Center was well organized, amazing, and empowering! Hopefully these observations, highlights, and interviews will relate some of that to you, our dearest Big Data journal's readers.

Finally, the promised definitions of the additional abbreviations are: AGI (artificial general intelligence),⁷ DL (deep learning),⁸ FDR (false discovery rate),⁹ GOBS (generation of BS), NLP (natural language processing),¹⁰ and 5Vs (five challenges of big data: value, variety, velocity, veracity, and volume).¹¹ To my knowledge, this article is the first one to define two new abbreviations: CSP as “consulting, services, products” and DIC as “data-industrial complex.” Interestingly, in computer sciences, CSP stands for “constraint satisfaction problem.”⁸ Clearly, DIC reminds us of the infamous MIC (military-industrial complex), introduced by President Eisenhower in 1961¹² and since then acquired huge popularity. Do you expect that DIC will gain the same uber-popularity?

This article would not be possible without help, assistance, and encouragement of the following individuals: Zoran Obradovic, Sophie Mohin, Maureen Jennings, Amar Arsikere, Andrey and Michael Baranovsky, Kristina Bergman, Tricia Bush, Marjorie Cannon, Mary Eggert, Justin Hahn, Jacob Javits, Benjamin, Evelyne, and Natali Kolker, Cassie Kozyrkov, Ajay Kulkarni, Paul Lyman, Joe Manguno, Lucas Mayer, Cassie McAllister, Jacinda Mein, Vural Ozdemir, Jeff Ricker, Greg Quist, Graham Symmonds, and Jenny Wang.

I am looking forward to the next year New York's Data Strata Conference, again in the Javits Center and again in September!^†††† Please, let me know if you have any questions, suggestions, or ideas at (ekolker@nyu.edu).

Section 3: Interview with Cassie Kozyrkov, Google

Dr. Eugene Kolker : Is there anything about the data science business that has surprised you?

Dr. Cassie Kozyrkov: Something that jumped out at me while attending Strata NYC 2018 was a subtle tone in many conversations I heard. To be fair, I have been noticing this since my grad school days, but the surprise is that you still hear it today (and I hope we will all work together to make it less common). It is a tone of casual acceptance of a gulf between data scientists and business leaders/decision-makers. Anyone not perturbed by a wide gulf—or worse, an adversarial relationship—between those who analyze data and those who make decisions based on it surprises me deeply. Data scientists depend on high-quality requests from decision-makers. Their work loses its value when the relationship is not collaborative. There is no point to all that complicated math if there is no improvement in how actions can be taken based on it, so decision-makers need to be involved in the process. I hope the data science community moves toward a stronger more intentionally collaborative attitude. This also means business leaders need to start doing their part and seeking the training and skills to participate effectively in their own vital role in the data science process.

Dr. Kolker : What is the most important aspect of your role?

Dr. Kozyrkov: I work to ensure that data projects result in something useful, rather than just dissipated heat and some numbers no one ever cares about. This means helping good ideas and well-thought-out projects flourish, as well as identifying ill-advised projects so these can be shut down before they begin.

Dr. Kolker : What is the hardest thing you do?

Dr. Kozyrkov: Working to get teammates with diverse skills and perspectives aligned so they can find a common language and achieve a common goal.

Dr. Kolker : Can you explain our readers one major message from your Strata keynote you are most passionate about?

Dr. Kozyrkov: The world is collecting data like never before. What is the point if we never make it useful? If a data point falls in a forest, does it matter? We need to focus on making our data useful, not hoarding it and doing meticulous pointless calculations on it. And as part of making it useful, as a society we need to build the skills that safe effective use of data requires. Up until now, there has unfortunately been too little focus on the skills of the decision-maker and project leader—that is something I hope our community will work to change.

Dr. Kolker : What is type III error?

Dr. Kozyrkov: It is where you correctly reject the wrong null hypothesis. So it is a stats joke of sorts, it means: using all the right math (and data) to solve the wrong problem. Though it is brought up in statistics class as a joke (it gets giggles and is promptly forgotten), I do not think we should be laughing. It is becoming a more and more serious matter as the world collects more data and asks more people to work with it.

Type III error means you should not have been pursuing the problem you are solving in the first place, you should have been doing something else that is more useful. When you go down the wrong rabbit hole with data, at best you are wasting everybody's time and at worst you are doing something harmful.

Forgetting Type III error is a bad mistake for society. We have got so much data. Now we need to really talk about making it useful. Meticulously answering the wrong question is painful for everybody. Let us have a discipline oriented around bringing down Type III error. And that is what is at the beating heart of Decision Intelligence: doing the right thing properly.

Dr. Kolker : What is Decision Intelligence?

Dr. Kozyrkov: Decision Intelligence is the discipline of turning information into more beneficial actions. You can think of it as applied data science++, applied data science augmented with the social and managerial sciences, as well as nuggets of decision-making wisdom from other disciplines.

Dr. Kolker : How can it provide value for the data science community? Business community at large? General public?

Dr. Kozyrkov: Data science community: no one wants to work very hard on useless things. The data science function is bookended by other business functions. If anyone drops the baton, there is no purpose to the data scientists' efforts. Unfortunately since they are not the start of the process (that is the decision-makers), they are at the mercy of upstream skills. Making sure those skills and processes are there means data scientists are guaranteed that what they are asked to do carefully is actually worth doing carefully. And so when they work, their investment of time and energy is much more likely to pay off and go somewhere.

Business community at large: data science has a lot of promise and there is a reason businesses are investing in it. Unfortunately that investment goes nowhere businesses are not able to use data science effectively. Decision Intelligence is a way to multiply the impact of data science and make sure that investment pays off.

General public: the world is collecting data and there is a lot of human benefit that is locked in those data. We can unlock that benefit if we have the skills and abilities to make those data useful. Building those bridges is the key to success.

Dr. Kolker : Where do you see the industry going in the next 2, 5, 10, and 15 years?

Dr. Kozyrkov: I see the growth of applied data science. The research side will always be strong, and there will always be new classes of problem to solve and new directions to discover. But up to now, research was the primary orientation, especially in machine learning and AI. In the near future, the world will begin to realize that thanks to the efforts of those researchers, everyone else can now stand on the shoulders of giants. They will not need to reinvent the researchers' wheel, they will need to focus on application and tapping into their own creativity. Added to this the fact that tools are becoming easier to use and more accessible and you have something truly beautiful. Focus on application is going to grow a whole lot. And to use an analogy, just because you do not need to build your own microwave (because the researchers have figured microwaves out for you) does not mean it is easy to run a fast-food business at global scale. It is not easy, there are a lot of things you need to think about. Cooking with data at scale is what I mean by the applied discipline, Decision Intelligence, and we can expect to see it come into its own right rapidly. I am excited to see what society will do with greater more vibrant more creative application of a very powerful technology that they do not need to rebuild themselves from scratch every time.

Dr. Kolker : If our readers will, unfortunately, forget anything we talked up until now, what is the single thing you want them to remember?

Dr. Kozyrkov: When it comes to applied data science, all those complex mathematical endeavors need to be directed well, otherwise the results may be at best useless and at worst harmful. That is why thinking about how to direct them well is really important. The world is collecting so much data—our goal should be to make it as useful as possible!

Section 4: Interview with Jeff Ricker, RLR Company

Dr. Eugene Kolker : Is there anything about the data science and robotic businesses that has surprised you?

Dr. Jeff Ricker: The hardware market is at the beginning of a huge pendulum swing from cloud computing to fog computing. By 2022, 75% of enterprise-generated data will be created and processed outside data centers and cloud (up from 10% now).

Dr. Kolker : What is the most important aspect of your role?

Dr. Ricker: Creating a culture for success. Our success is founded on creating an environment in which engineers can reach their full potential. If engineers are unencumbered, they will unleash their creativity and solve problems for customers in astounding ways. We are very attentive to company culture: creative, hardworking, open, egalitarian, friendly, humorous, and family first.

Dr. Kolker : What is the key outcome of your work?

Dr. Ricker: Democratize data. Simple. We see that occurring in three steps: 1.

Deliver the revolution in hardware that matches the revolution that has occurred in software.

Create the standard building block for edge computing: the personal data center.

Build a fog computing ecosystem for adopting new distributed software and sharing compute power securely peer to peer.

Dr. Kolker : Can you explain our readers one major feature of your new product Hivecell?

Dr. Ricker: Software has undergone a revolution. Starting with Hadoop, all new major frameworks are distributed. Software has dramatically changed, but hardware has not. Hivecell is specifically designed to meet the needs of distributed software development.

Dr. Kolker : Why this is so important?

Dr. Ricker: When Hadoop was released, software changed profoundly. It was as big a change as the World Wide Web. Distributed computing left the laboratory and became mainstream. Hadoop was just the trigger. Now we see it everywhere, in NOSQL databases, Docker containers, microservices, stream-based processing, blockchain, machine learning; all these new technologies use a distributed computing pattern. Software is now designed to run on multiple servers.

Dr. Kolker : How can it provide value for the data science community? Business community at large? General public?

Dr. Ricker: Computer companies are still building huge servers with dozens of cores, designed to meet the needs when an application such as a relational database had to run on one server. These servers are too big for most uses. As a result, industry has created virtual servers, software that makes a large server behave like several small servers. Meanwhile, Hadoop, Mesos, and other distributed software make several servers behave like one big server. Why not just have small servers? Why not build hardware that meets the needs of modern software?

There is an immediate pressing need in the market. There is a significant barrier to learning and developing on these distributed software frameworks. Furthermore, installing and configuring (provisioning) distributed frameworks are extremely difficult. There are hundreds of parameters to be set. Most of the professional services provided by companies such as Cloudera and Hortonworks are just for helping clients with provisioning. The barrier to learning is preventing the supply of developers from meeting the exploding demand for big data, AI and machine learning expertise.

Dr. Kolker : Where do you see the industry going in the next 3, 5, 10, and 15 years?

Dr. Ricker: Mark Twain is reputed to have said, “History doesn't repeat itself but it often rhymes.” Looking back at the progress of innovation, we saw the mainframe lead to the minicomputer lead to the microcomputer lead to the personal computer. Now we see the data center lead to the mini data center lead to the microdata center. The logical end step is the personal data center.

A personal data center works in the office and the home, just as the personal computer did 30 years ago. Blockchain, fog computing, and personal data centers all work together. Blockchain enables individuals to own their own data and share it peer to peer. Fog computing enables vendors to provision software to the edge, removing the complexity from the user. Personal data centers enable individuals to store the data and run the software that they own and control but can still use online.

Dr. Kolker : What are the unsolved problems for you? And how are you working to solve them?

Dr. Ricker: The cloud arose as a by-product. The compute infrastructure Amazon needs to meet the demands of Black Friday are more than double what it needs for the rest of the year. As a result, half of Amazon's compute power was left unused 80% of the time. Amazon decided to resell that compute power, and the cloud was born.

Amazon's case is acute, but most production systems follow a similar pattern of having a peak usage that exceeds normal usage. For instance, most of the trading on Wall Street occurs at the open of the market and at the close of the market. Two hours of peak usage, 6 hours of mild usage, and 16 hours of idleness. The pattern is found everywhere.

With the growth of AI and machine learning, there is a growing need for the ability to buy and sell spare computing power at the edge. It has to be at the edge in a growing number of cases as using the cloud is too slow to be practical. However, sharing compute power at the edge peer to peer is a significant challenge. Trust can be achieved if the hardware is secure to the metal and both parties know that the other is using the same hardware.

Dr. Kolker : What else and very important we did not cover with the above questions?

Dr. Ricker: Hivecell is obviously hardware, but it just as much software. Our infrastructure enables one to install and configure distributed software as easy as on the cloud—or even easier. The user has a web-based management console of his or her hives, just as one would have for the cloud. The console shows the hives and their status. From the console, the user can provision or reprovision a hive with distributed software such as Hadoop, Spark, and Kubernetes. It works just like an app store on your smartphone. Hivecell One may be the first server designed for reprovisioning.

Dr. Kolker : If our readers will, unfortunately, forget anything we talked here, what is the one thing you want them to remember?

Dr. Ricker: There is a growing backlash to large companies such as Facebook and Google owning, mining, and sharing our personal data. Blockchain will democratize the ownership of data, but if all the data are still held by three cloud giants, then we have achieved nothing. Democratized data demand democratized compute power.

Section 5: Interview with Greg Quist, Smart Cover

Dr. Eugene Kolker : When and how did you start SmartCover?

Dr. Greg Quist: We (I and my partner and cofounder David Drake) divested ourselves of the Cryptosporidium detection business in early 2005, and were sitting around David's living room in early February looking for some way for us two technologists to help the water industry, that we felt was generally hopelessly mired in the 1960's from a technology standpoint. So we called three of our insider pals in the industry and asked—“What keeps you up at night.”

The answers we received from our friends were both shocking and uniform—“What can you guys do about sewer spills? Our sewers overflow, we have strict liability, and we pay fines and get bad press. Can you guys help us out?” So we jumped on the Internet and found no solutions and there were no patents on this at the USPTO. So on February 5, 2005, Hadronex (the official corporate name for SmartCover Systems) was formed.

Starting with a blank sheet of paper, we turned back to our water friends and asked: “What do you want it to do? How do you want it to function? How much would you pay?.” The answers we got were—“keep it simple,” “easy to install and service,” “give me the answers I want,” “make it affordable for large numbers,” “no confined space entry,” “keep the sensors out of the water,” “tell us if the manhole has been opened.” Armed with a specification directly from the industry, David and I set out to build a solution with no preconceived notions. With our own meager funds, working elbow to elbow in David's home workshop, we built a start-up IoT solution before IoT became a buzzword.

So what the customers said translated into, for us, “plug and play solution,” “reliable and dependable sensors,” “dependable two way wireless communications,” and “built-in power.” Fortunately, in 2005, reasonable answers existed for the technology required to make this happen. SmartCover could not have started 10 years earlier. The technology was not there yet. Timing is everything.

Working with our customers and starting literally from scratch, we had our first prototype in the field by May, fixed our problems, had our second prototype in the field by July, fixed those problems, and by November, had our full solution available for the market. David and Greg did engineering, quality control, R&D, finance, customer service, sales, and everything else. So we had designed, tested, fielded, and sold a complete end-to-end IoT solution from a blank sheet of paper in 9 months.

We started sales locally in San Diego County, to be sure our product was reliable, then expanded to Southern California, then to all of California, then nationwide and internationally. At each step, we made sure our solution was getting robust enough to succeed without close babysitting. The customer was always at the center of our focus. We listened and made modifications.

We have taken in two rounds of financing to get gasoline in the company engine. First in late 2007—early 2008, we took in private equity to get our sales, customer service, and engineering beefed up. We tripled sales the next year. And in 2016, we took in funding from XPV Water Partners to help us reach the next level. XPV have been excellent partners and only invest in water companies.

Dr. Kolker : What is the most important aspect of your business?

Dr. Quist: I think it is a little surprising, and maybe a little cliché, but the most important aspect of our business is not the technology, it is the customer service. It shocks me how some businesses stay alive without a complete balls-out commitment to customer service. This is particularly true for utilities looking to install new and innovative technologies into their culture. If we cannot make it simple and easy for them, and provide rock-solid and effective customer service, they will eventually give up or go away. And in this business, reputation has a way of hanging around.

Dr. Kolker : What makes SmartCover different from others in your space?

Dr. Quist: We were the first of our kind in our industry. There has been a relatively long history of monitoring in sewers, but traditionally it has been to support modeling and using short-term flow meters to validate flow models. Our solution was lost cost, ubiquitous, and robust permanent sensing. Give our customers visibility to what was previously invisible. We are a company continuously innovating and integrating new technologies and techniques into our product solutions to both provide the best available technology and best value for our customers.

Besides best in class customer service, examples of our current technical differentiation are (1) we utilize the Iridium^® low earth orbit satellite system as our two-way wireless communications. There are multiple reasons why we do this but a few are ubiquity for the Iridium—it works anywhere in the world; availability under the most demanding conditions—our system worked in the New York City region flawlessly during Hurricane Sandy when all other wireless systems were down; (2) we do not require confined space entry for installation or service; and (3) our system provides a real-time intrusion alarm telling our customers their sewer has been breached.

Dr. Kolker : What is the hardest thing you do?

Dr. Quist: In my opinion, the hardest thing we do is try to convince civil engineering firms about the value and utility of our technology. These firms are paid by our customers to provide a variety of services to support operations and engineering. The new digital technology stuff has not yet reached the level of universal acceptance in the water and wastewater world, so until it does, we are rarely—and there are exceptions—finding civil engineering firms including our type of solution in their packages offered to customers. The flip side is, if they do, they should be able to demonstrate the value of the technology and demonstrate a technical discriminant to help them win a competitive bid.

Dr. Kolker : Is there anything about the water and wastewater utility business that has surprised you?

Dr. Quist: That is a great question. Sitting on the other side of the dais for 28 years now, I have a pretty decent handle on how our industry thinks, so how slowly we move and change is not a surprise. I think the biggest surprise is the inertia that exists in the regulatory bodies. Consent decrees are still being written based on 20-year technology and the governments—state and federal—have for the most part not kept up with technology so there is no encouragement for our customers to try to keep up from the regulators. We are actually trying to change that by working with NACWA and other industry groups to get language in bills to encourage the use of digital technology.

Dr. Kolker : How does your product work and how does it provide value for the sewer industry?

Dr. Quist: Our basic solution for the wastewater industry consists of four components: the sensor package, the two-way communications, the analysis and data fusion, and decision support. Our customers really do not care specifically about anything but decision support.

Our sensors are purpose built for the sewer environment—able to withstand corrosive atmosphere, high humidity, dirt, and shock. We typically measure water level, and from that flow, and we are adding new sensors carefully to be sure our customers get the performance expected from SmartCover. For example, next year, we will be bringing an H₂S sensor to the market. We can measure other parameters as well, including pressure, temperature, and pH.

Communication is accomplished through the Iridium satellite constellation. Iridium is best used for our application because when the big storms come, such as Hurricane Florence this year, and Hurricane Irma last year, and New York of course remembers Hurricane Sandy, you want to know what is going on in your sewer. Most terrestrial systems such as cell phones fail under high stress. That is why the U.S. military is a big Iridium customer, too.

We perform a great deal of automated analysis on our data, both at the measurement site to ensure quality measurements and on our cloud servers to improve the value of the data. This includes data fusion with other data sources such as NOAA and USGS, giving our customers great visibility to the response of their sewer systems to storms, snow, floods, and tides.

Ultimately, the information we provide to our customers answers the question: “So what do I do now?” Of course, the answer most of the time is “nothing–all's good.” But simply having that assurance that the system is operating well when before the advent of SmartCover, operators were effectively blind to the real-time conditions of their sewers. As Peter Drucker says, “what's measured improves.” We help our customers act both proactively using predictive methods, and reactively to rapidly changing conditions, often driven by external events such as rain.

It is our goal—and I think we are succeeding—to help our customers save money, reduce operational risk, extend the lifetime of their assets with no increased risk, minimize or avoid spills, and simply do their job better with less hassle and cost.

Dr. Kolker : Where do you see the industry going in the next 5, 10, and 15 years?

Dr. Quist: I think we are beginning to see the light of data science and data technology shine onto the water and wastewater industries. As much of the power industry has already done, the concept of “smart infrastructure” is beginning to find a foothold with the water and wastewater industries. You can live without outside power, but you cannot live—very long—without water and sanitary sewers. Water is critical to life, so it makes sense to operate conservatively.

I mean, the iPhone came out in 2007, 2 years after we started. So SmartCover is 2 years older than the iPhone, and I would say the iPhone has met with broader acceptance than IoT for sewers. But we are seeing the peak of the baby boomers like me start to retire. We call it the “Silver Tsunami”—and the younger generation coming in is less resistant to digital technology, partly due to the iPhone. And rate payers—customers of our customers—are becoming much more picky due to the smart phone in their hands and are expecting higher and better performance from their utilities to whom they pay their bills. I think we are going to see much deeper and broader adoption by the industry within the next 5 to 10 years, and those who are not onboard will slowly disappear.

Dr. Kolker : What are the unsolved problems for you? And how are you working to solve them?

Dr. Quist: As far as we can tell, I think the biggest unsolved problem lies with data integration. As many of our customers begin to rely more and more on remote monitoring, real-time data, and real-time and predictive analysis, they are going to need to figure out how to integrate various sensing/data systems together. So I predict one of two things will happen. Either a major system integrator—maybe like Honeywell or Siemans—will step up and gobble up all of the IoT companies and then provide the result as a single platform to a customer, or a major software entity—maybe like Google or Amazon or Apple—would provide a universal software platform to integrate a variety of IoT entities, probably setting standards as they go.

What we are doing now is making our digital data and predictive analysis as easy to integrate into a variety of standard utility platforms as possible. And we are doing it one step at a time.

Dr. Kolker : Besides sewer utilities, where else can your solution fit?

Dr. Quist: Our strength lies in monitoring things in locations that have a difficult time with power and communications—typically remote sites, but you would not normally think of a manhole as a remote site, but getting power and communications to manholes in the middle of roads, for example, is harder—or at least ridiculously expensive—than you can imagine, especially if you want to monitor a whole bunch of manhole sites. So anywhere you have a need to know what is going on, and you do not have power or communications easily available, we tend to shine. And our Iridium communications extend that even further. Except for political restrictions, there is no place on earth we cannot operate.

Footnotes

Abbreviations Used

*

†

‡

§

**

††

‡‡

§§

***

†††

‡‡‡

§§§

****

††††

References

Schechner

, Peker

. 2018. Apple CEO condemns ‘data-industrial complex.' Wall Street Journal. Available online at https://www.wsj.com/articles/apple-ceo-tim-cook-calls-for-comprehensive-u-s-privacy-law-1540375675 (last accessed November 25, 2018).

Davies

. 2015. Improving healthcare (and pharma's) performance through analytics. Eye for Pharma. Available online at http://social.eyeforpharma.com/market-access/improving-healthcareand-pharmas-performance-throughanalytics (last accessed November 25, 2018).

Koster

, Stewart

, Kolker

. Health care transformation: A strategy rooted in data and analytics. Acad Med. 2016; 91:165–167.

Belissent

, Kisker

, Cullen

. 2016. Case study: Seattle children's hospital adopts a healthy approach to data and analytics. Forrester Research Report. Available online at https://www.forrester.com/report/Case+Study+Seattle+Childrens+Hospital+Adopts+A+Healthy+Approach+To+Data+And+Analytics/-/E-RES129450 (last accessed November 25, 2018).

Kimball

. Errors of the third kind in statistical consulting. J Am Stat Assoc. 1957; 52:133–142.

Holzman

, Kolker

. Statistical analysis of global gene expression data: Some practical considerations. Curr Opin Biotechnol. 2004; 15:52–57.

Gubrud

. Nanotechnology and international security. In: Fifth Foresight Conference on Molecular Nanotechnology, Palo Alto, CA, 1997.

Dechter

. Learning while searching in constraint-satisfaction problems. In: Proceedings of the 5th National Conference on Artificial Intelligence, Philadelphia, 1986.

Benjamini

, Hochberg

. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol. 1995; 57:289–300.

10.

Bates

. Models of natural language understanding. Proc Natl Acad Sci USA. 1995; 92:9977–9982.

11.

Higdon

, Haynes

, Stansberry

, et al. Unraveling the complexities of life sciences data. Big Data. 2013; 1:42–50.

12.

Eisenhower

. 1961. Eisenhower's farewell address to the nation. Available online at http://mcadams.posc.mu.edu/ike.htm (last accessed November 25, 2018).

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.05 MB

Data Science Community Matures with Diversity: Conference Observations,Highlights,and Interviews—Strata Data Conference,New York,September 11–13,2018

Abstract

Abstract

Section 1: General Observations

Section 2: Conference Highlights

Section 3: Interview with Cassie Kozyrkov, Google

Section 4: Interview with Jeff Ricker, RLR Company

Section 5: Interview with Greg Quist, Smart Cover

Footnotes

Abbreviations Used

*

†

‡

§

**

††

‡‡

§§

***

†††

‡‡‡

§§§

****

††††

References

Supplementary Material