In September 2023, WordPress.com quietly updated a developer page about accessing a "Firehose" of around a million daily WordPress posts, clarifying that these feeds are intended for search engines, AI products, and market intelligence providers. This change, which was not previously noted, has sparked discussion, especially after revelations that Automattic, the owner of WordPress.com and Tumblr, plans to share user data with OpenAI and Midjourney.
The recent buzz surrounds questions about which WordPress blogs are included, data retroactivity for opt-outs, and the lack of transparency from Automattic. This company has been selling access to post data for years, including through the Jetpack plugin, but has recently announced exclusions for select AI partners.
Automattic's deals with OpenAI and Midjourney are particularly contentious as they aim to enhance generative AI tools. This data sharing, facilitated by partners like SocialGist and DataStreamer, offers insights into market trends and user behavior, but raises concerns about privacy and data usage.
SocialGist, a major player in data aggregation, emphasizes its access to WordPress and Tumblr posts, catering to market research and AI training. Despite assurances from Automattic about data usage, questions remain unanswered about terms of use, privacy features, and enforcement mechanisms.
The complex data supply chain underscores the challenges of tracking data usage and enforcing policies. Users have limited visibility into how their content is shared and utilized by third parties, raising ethical and privacy concerns.
As the landscape of data sharing evolves, users are left grappling with the implications of their digital footprint being used for purposes beyond their control. The lack of transparency and accountability in the data economy highlights the need for greater regulation and user empowerment.