as of
11-04-24
@_juliarosenberg
julia@julia-rosenberg.com
currently: ventures lead, uniswap labs
previously:
+ co-founder & ceo, metropolis
(fka orca protocol)
+ co-founder,
+ m&a, acreage holdings
+ student, NYU
WRITING
09-30-24 data trespassing
08-15-24 building in vegas
09-30-24
What we’re seeing now is rampant “data trespassing”—a term that seems increasingly apt. The public web is a free-for-all, allowing AI models to siphon off public data (in ethically and legally murky ways). Cases like hiQ Labs v. LinkedIn carved out a grey area where scrapers don’t technically violate laws like the CFAA. But how long will this loophole last? Today’s AI models depend on free, publicly available data, but this reliance is unlikely to endure as regulatory and economic pressures mount.
As AI becomes more specialized—more domain-specific, more "intelligent"—the demand for both general knowledge and curated, private datasets will surge. We’re already familiar with information companies turning their data pipelines into a lucrative commodity (from Pinterest’s 2015 Promoted Pins to X’s 2023 API access shift). Traditional data sellers are wrestling with pricing strategies and usage boundaries as AI companies strike deals with media firms to secure vast swaths of private data for training. This is just the start of increasing data privatization and monetization.
So, what about our personal data? The debate over user privacy is, at its core, a conversation about data monetization. In isolation, our data may seem worthless, but in a diverse collective, it’s invaluable. Companies like Vana are enabling data portability and consumer data monetization, allowing users to self-upload their data or contribute to data collectives. Will we ever reap the benefits of our digital trails? Maybe. There’s plenty of room to further privatize (and thus monetize) internet data, but it’s complicated—data scraping disruptors, complex pricing models, digital ownership precedents, etc.
And what about the free, open internet? The web, once a digital commons, is colliding with the growing monetization of data. This isn’t about ad-targeting anymore. We’re entering a realm of AI models that anticipate and shape human behavior on a level far beyond today’s algorithms. The new competitive edge is the quality of data fed into the machine–better data in = better data out. As AI’s appetite for data grows, so too will the conflict over ownership, profit, and access. The challenge isn’t just technical—it's legal, monetary, and ethical. The future of AI will shape and be shaped by what we feed it, and the question is no longer just how much data, but more so what data, and who provides it.