Predictive analytics?

Or, just give me the data I need, please.....

About 1800 children die each year from abuse and neglect.  Wouldn’t it be nice if we could use technology to predict and prevent those tragedies, as well as the thousands of other abusive injuries children suffer? After all, if Amazon and Facebook can predict my next purchases, why can’t we leverage technology to predict child abuse?

The answer is not a simple one and is complicated by two basic problems: human nature and the correlation-causation fallacy, which really geeky people might refer to as a “cum hoc ergo propter hoc” error.

It is, of course, now technologically possible to design a machine-learning program that will predict whether a specific family is going to become further involved with the child welfare system. According to the ACLU, at least half the states have considered adopting such technology, and 11 states as well as some large-city child protection agencies have adopted and daily use predictive analytics to determine which families are most at risk of child protection system intervention. Prime among the early adopters was Allegheny County, Pennsylvania, which has had in place its predictive analytics model for several years now.

I’m a big fan of using data to improve child welfare’s response, but I’m also sensitive to the concerns of many (e.g., here and here) that a poorly-designed predictive analytics model will simply catch already-vulnerable families in a “feedback loop.” In other words, the factors that may lead to high-needs but low-danger of maltreatment situations — poverty, being on public benefits, living in a crime-ridden neighborhood, minority status — will be misinterpreted by “the machine” to indicate that these families present a risk to their children. More to the point, correlation is not causation.

At the same time, we all agree that knowledge is power, and the more information we are able to get about a family — from the minute we receive a CPS call throughout the process of investigation, family preservation, foster care, family finding, etc. — the better outcomes we can help that family achieve. There’s a great article from a year or so ago that discusses1 how an “ethical” system of predictive analytics can succeed. The authors emphasize that such a system must be both transparent and accountable — there can be no “black boxes” that spit out a risk score. Rather, the system must demonstrate that its methods are responsible, that its recommendations are explainable, accurate, and auditable, and that its outcomes are fair.

There are a couple of basic steps that I believe any child welfare system could take in moving toward predictive analytics while maintaining these ethical standards. The first is to use prior medical data to flag situations in which a child may have suffered prior incidents of abuse. The second is to use “birth match” data to flag newborns whose families have had serious prior system involvement.

On the first issue, there are numerous studies2 showing that when it comes to serious physical abuse against a child, it is highly likely that the child suffered a prior incident of physical abuse that was not recognized as such. Often those children saw a medical provider for the prior “incident” but medical professionals missed the abuse. By connecting medical diagnosis (ICD-10) codes with child welfare records in real time, we could flag situations in which an “accident” actually appears to be part of a larger pattern.

On the second issue, most professionals agree that if a mother loses a child permanently due to abuse or neglect, we as a child protective system at least need to be aware if she has another baby. We would hope that whatever the issue that caused the first tragedy has been alleviated, but at least we would want our social workers to check in on the mother and ensure she had all she needed to parent safely. As Lanier notes, a number of states have instituted “birth match” programs that link records of births with CPS records and flag those situations in which parents who previously lost a child to CPS have had a new baby. Over the past year, Georgia has begun linking together birth records and child welfare records in a way that will make this sort of “flagging” possible.

No one wants to create a system in which those who are already unfairly caught up in child protective services are made to suffer more due to an algorithm. Some would argue that trying to predict severe abuse is not worth its cost: Lanier, for example, makes a solid point that in trying to use predictive analytics to prevent abusive child deaths, we are trying to predict a “black swan”, to prevent an outcome that occurs less than 1% of the time even in the highest-risk group. At the same time, there are ways to use data that will help child protective services professionals find and assist those families whose children are truly at risk of serious abuse or neglect. The key is to build those systems in a way that reflects our values of keeping families together, respecting family integrity, and ensuring that we can show a solid causal connection between a “risk factor” and the serious danger of abuse.


P. Lanier et al, “Preventing Infant Maltreatment with Predictive Analytics:  Applying Ethical Principles to Evidence-Based Child Welfare Policy” 35 J. Family Violence 1 (2020).


W.King et al, “Child Abuse Fatalities:  Are We Missing Opportunities for Intervention?”  22 Ped. Emerg. Care 211 (April 2006); D. Scott et al, “The utility and challenges of using ICD codes in child maltreatment research:  a review of existing literature.”  33 Child Abuse & Neglect 791 (Nov 2009); T. Sieswerda-Hoogendoorn, Abusive head trauma in young children in the Netherlands: evidence for multiple incidents of abuse 102 Acta Paediatrica e497