9 min read

The Mosaic

THE PORTRAIT YOU NEVER SAT FOR  |  PART 1 OF 5  | In 2000, Latanya Sweeney sat at a computer in the Carnegie Mellon Data Privacy Lab and ran a test. She took three pieces of information: zip code, birth date, and sex. She checked them against a voter registration database. The result stopped her cold. Those three fields, which appear on everything from prescription forms to loyalty cards, uniquely identified 87 percent of Americans. Not a subset. Not a vulnerable population. Nearly nine out of ten people in the country could be pinpointed with information that no one considers sensitive. That was 2000. Before smartphones. Before fitness trackers. Before credit card transactions uploaded to data broker servers in real time. Before the average American carried a device that broadcasts location to dozens of applications simultaneously. Before AI systems could join five databases in thirty seconds. Sweeney called it re-identification: the process of combining individually harmless data points until they converge on a single person. Intelligence professionals call the result a mosaic: a portrait assembled from fragments, each piece innocuous alone, each piece essential to the whole. Federal courts have started to call it what it actually is: a Fourth Amendment problem that current law has no name for. This piece is about the mosaic. Specifically, it is about the legal infrastructure that makes it possible to build one on any of your clients without a warrant, without their knowledge, and without violating a single statute currently on the books. What You Generate Without Trying Start with a single workday. Your client drives to your office. Automated license plate readers photograph their vehicle at multiple points between home and your parking structure. Their cell phone pings towers along the route, recording movement in logs their carrier retains. Their credit card pays for parking. Their phone’s operating system records GPS coordinates continuously before they reach your lobby. The fitness tracker on their wrist logs their route, pace, and heart rate. Their car’s telematics system transmits location data to the manufacturer. No investigator followed your client. No warrant issued. No one collected this data to monitor them. It accumulated because collection is the business model, and because nothing in current law prohibits it. Call it data exhaust: the byproduct of existing in a networked world. Passive, continuous, and permanent. And it accumulates into something that has a name. Intelligence analysts call it pattern-of-life data: a granular record of where a person goes, when they go there, how long they stay, and how frequently they return. A 2013 study in Scientific Reports by de Montjoye and colleagues found that just four spatiotemporal data points uniquely identify 95 percent of individuals in a mobility dataset, even after analysts coarsened the location data by time and geography. Google paid $391.5 million to forty states for misleading users about how much of this data it collected. Even when users disabled the Location History setting, Google continued collecting location data through a separate Web and App Activity setting that Google had enabled by default. Your client did not have to do anything wrong. They just had to exist. The Join Key Problem Here is where 2025 is categorically different from 2015. Individual data streams have existed for decades. Credit card transaction logs, cell tower records, license plate scans, app location data. None of these are new. What changed is the join key: the common field that lets a machine link one database to another. A name. A phone number. A device identifier. A Social Security number. Before AI-powered correlation tools, joining five databases required a team of analysts, weeks of work, and a specific investigative purpose. The friction was functional. It limited who could build a mosaic and on whom. That friction is gone. Modern commercial data correlation platforms can ingest location pings, transaction records, social media activity, public court filings, and data broker profiles and surface connections no human analyst would find manually. The same technology that helps financial crimes investigators identify money laundering networks now operates in the commercial insurance, employment screening, and data enrichment industries. The capability is no longer exceptional. It is a software subscription. Sweeney’s 2000 finding identified the theoretical vulnerability: three fields, uniquely identifying. AI-powered correlation tools removed the practical barrier that kept that vulnerability theoretical. The Legal Architecture of Permitted Surveillance Lawmakers built American constitutional law for a world where collection and correlation were separate problems. They no longer are. The Fourth Amendment prohibits unreasonable government searches. It does not prohibit purchases. The Defense Intelligence Agency confirmed this interpretation in a January 2021 memo to Senator Ron Wyden, writing that it does not construe Carpenter to require a judicial warrant for purchase or use of commercially available data. The Department of Homeland Security, the FBI, the IRS Criminal Investigation division, and Immigration and Customs Enforcement all purchase commercial location data from data brokers. No warrant. No probable cause. A subscription fee, not a constitutional standard. In Carpenter v. United States, 585 U.S. 296 (2018), the Supreme Court held that the government needs a warrant to compel a wireless carrier to produce historical cell-site location records, because comprehensive location tracking over time crosses a constitutional threshold. Chief Justice Roberts wrote that digital technology had produced seismic shifts in the relationship between citizens and the state, and noted explicitly that the holding was a narrow one. Federal agencies read that narrowness as permission to route around the ruling. If the data is not compelled, Carpenter does not apply. Circuit courts have named the underlying doctrine without resolving it. Justice Alito’s concurrence in United States v. Jones, 565 U.S. 400 (2012), articulated what scholars now call the mosaic theory: that surveillance of sufficient duration and breadth can violate the Fourth Amendment even when no single act of observation would. The Fourth Circuit applied this reasoning in Leaders of a Beautiful Struggle v. Baltimore Police Department, 2 F.4th 330 (4th Cir. 2021), holding that persistent aerial surveillance of an entire city required a warrant. The problem was not the individual photograph. It was the pattern the photographs assembled. That reasoning stops at the government’s door. Commercial data aggregators operate in a different legal world entirely. The Federal Trade Commission has jurisdiction over deceptive trade practices. HIPAA protects medical records held by covered entities. The Fair Credit Reporting Act governs data used in credit and employment decisions. Each statute addresses a slice. None addresses the aggregation layer where the actual profiling happens: the join that converts individually regulated data streams into a comprehensive portrait. This gap is not a drafting error. It is the product of a legal system that categorizes data by type and regulates each type separately, in a world where value comes from combining types. What Sweeney Found Next Sweeney did not stop at the 87 percent finding. She went further. In 1997, she purchased the Cambridge, Massachusetts voter registration rolls for approximately $20. She combined them with de-identified health records the state had released from its Group Insurance Commission, records the state considered safe to publish after it stripped names and addresses from each entry. Using only birth date, sex, and zip code, she re-identified Governor William Weld’s medical records and mailed them to his office. The data the state called anonymous was a three-field query away from a named patient. Sixteen years later, she went further still. Her 2013 paper in Technology Science obtained Washington State hospital discharge data and crossed it against news coverage of accidents, crimes, and injuries. She re-identified 35 of 81 patients whose incidents had appeared in news reports. The hospital system followed every applicable rule. The rules were not designed for the join operation. A 2019 study in Nature Communications, using fifteen demographic attributes, achieved 99.98 percent re-identification accuracy across datasets anonymized by industry-standard techniques. The authors concluded that complete anonymization of high-dimensional data is “practically infeasible.” Vendors anonymized your clients’ data and sold it to purchasers who received assurances it was safe. Those assurances may have been accurate when made. They may not survive the next join. What This Means for Your Practice The mosaic effect is not an abstract privacy concern. It is an evidence problem, a confidentiality problem, and a competence problem, and each operates independently. Evidence first. Location data, transaction records, and fitness tracker data generated during a matter are discoverable. Courts have moved from reluctance to routine on wearable device data in less than a decade. Fitbit production is now treated as electronically stored information under the Federal Rules of Civil Procedure. If opposing counsel knows to ask for it, it exists. [See “Your Fitness Tracker Is a Spy, Part 2,” Morris Legal Technology Blog, February 2026.] Confidentiality next. Model Rule 1.6(c) requires reasonable efforts to prevent unauthorized disclosure of information relating to the representation. If your client’s location data establishes they visited your office fourteen times during a period their adversary claims they had no legal representation, that data, purchased legally from a commercial broker, disclosed something about the representation. No breach occurred. No one acted negligently. The disclosure happened because your client lived in the world. Competence completes the triad. ABA Model Rule 1.1, Comment 8 (2012) extended the duty of competence explicitly to technology. An attorney who does not understand how commercially available data can profile a client’s activities cannot assess the risks that profile creates in litigation, investigation, or regulatory contexts. The FTC’s enforcement action against data broker InMarket in 2024 revealed the scale of what is available: location data from apps on more than 390 million devices, with users placed into purchasable audience segments. One of those segments may describe your client. [See “Every Phone in the Room, Part 2: The Data Broker Your Client Never Hired,” Morris Legal Technology Blog, 2025.] Where This Argument Has Limits The mosaic theory has boundaries, and I want to name them before the skeptic does. First, the practical barrier to building a mosaic on any individual remains higher than this piece may suggest. Commercial data correlation tools are expensive. They require technical expertise. Most clients, most of the time, do not justify the effort. Risk is not evenly distributed. Second, not all data exhaust remains retrievable indefinitely. Some carriers purge location records after eighteen months. Some fitness applications allow deletion, and deletion may be effective. The portrait is not always as complete or as permanent as a worst-case reading implies. Third, legal protections are moving. The Texas Attorney General filed the first enforcement action under the Texas Data Privacy and Security Act against an insurer for using location data in underwriting in January 2025. Montana closed the data broker loophole for government surveillance in May 2025. Regulation lags the technology. It does not stand still. The steelman of the current system is legitimate: the same data flows that enable discriminatory profiling also enable fraud detection, public health surveillance, and the investigative work that identified the Boston Marathon bombers in four days using 800,000 photographs and converging commercial data streams. The question is not whether the capability should exist. The question is who holds it, under what authority, and with what accountability structure. That argument deserves a serious answer. Parts 2 through 5 of this series attempt to provide one. Your Thursday Action Pull one active client file. List every type of data your client generated during this matter: location data from their devices, credit and debit card transactions, fitness tracker records, cell carrier logs, court appearance records, email metadata, social media activity, app usage. You do not need to obtain this data. You need to inventory its existence. That inventory maps what opposing counsel can subpoena, what a government investigator can purchase, and what a commercial data aggregator may already hold. If the inventory surprises you, that is the point. The data existed before you made the list. The list just makes the exposure visible. In 2000, Latanya Sweeney published her 87 percent finding. Privacy scholars read it with alarm. The data industry read it with something closer to interest. In the years that followed, the industry built exactly what Sweeney’s paper described: more data types, more persistent identifiers, more join operations, more accurate portraits of people who never sat for them. The portrait on file for your client is not a future risk. It already exists. The question worth asking, before opposing counsel asks it first, is what it shows. About the Author JD Morris is Co-Founder and COO of LexAxiom, an AI platform for the business of law. He holds a Master of Legal Studies from Texas A&M University School of Law, a Master of Engineering from George Washington University, and dual MBAs from Columbia Business School and UC Berkeley Haas. He writes the Morris Legal Technology Blog under the series banner “The Technology Blind Spot.” Connect with him on LinkedIn at http://www.linkedin.com/in/jdavidmorris, on X at @JDMorris_LTech, or on Bluesky at @JDMorris-ltech.bsky.social. References 1. Sweeney, Latanya. “Simple Demographics Often Identify People Uniquely.” Carnegie Mellon University Data Privacy Lab Working Paper 3. 2000. 2. Sweeney, Latanya. “Matching Known Patients to Health Records in Washington State Data.” Technology Science. June 26, 2013. 3. de Montjoye, Yves-Alexandre, et al. “Unique in the Crowd: The Privacy Bounds of Human Mobility.” Scientific Reports 3, no. 1376 (2013). DOI: 10.1038/srep01376. 4. Rocher, Luc, et al. “Estimating the Success of Re-identifications in Incomplete Datasets Using Generative Models.” Nature Communications 10, no. 3069 (2019). DOI: 10.1038/s41467-019-10933-3. 5. Carpenter v. United States, 585 U.S. 296 (2018). 6. United States v. Jones, 565 U.S. 400 (2012) (Alito, J., concurring). 7. Leaders of a Beautiful Struggle v. Baltimore Police Department, 2 F.4th 330 (4th Cir. 2021) (en banc).

Originally published on LinkedIn Newsletter: The Technology Blind Spot

Leave a Reply

Discover more from The Technology Blind Spot

Subscribe now to keep reading and get access to the full archive.

Continue reading