The Good Journal #7 – It is your data, not their dataset

The Good Journal #7 - It is your data, not their dataset

Whether you write, paint, sing, draw, design, calculate, report, present or play the theremin, it’s very likely that you’ll have files related to these pursuits, and these files are your property.

At The Good Cloud, such files are securely stored and available to you. But we don’t use your files for anything else. Your work will never become part of any kind of model while stored with us. Which ensures that your work will not be easily recreated by a AI. This will protect you from copyright infringements by others. Such as some authors and journalists are now facing.

In recent times, the fine print of cloud services has begun to incorporate the potential use of your data in large language models and visual models. For instance, if you use Adobe Creative Cloud, you should check your settings. If you didn’t opt out of the “content analysis,” you would have given them permission to use your work in their “techniques, such as machine learning.”

In December, Dropbox teamed up with OpenAI. They claim that data is shared only if users activate a specific feature and in such cases we can only trust that this means the same thing as we our selves envision.

The update to Google’s privacy policy in 2023 had a similar impact. They now reserve the right to scrape your data and behavior. While this might be limited to publicly available data, we again have to trust in the mutual understanding of what we mean by that. With Microsoft’s partnership with OpenAI, your data may have very few safe havens. Even if we can trust in this mutual understanding, I urge you all to consider with whom you store your data. Maintain control of your ownership.

This is not to join the general outrage parade towards AI itself. Honestly, I consider the entire debate to be overly focused on a symptom rather than a cause. The rush to offer AI is driven by overwhelming financial incentives leading to erroneous shortcuts, and as with all new emerging tech, misuse is initially rampant.

Personally, I see a lot of potential in the correct methods of feeding a dataset or model, and the informed utilization of the resulting tools. That being said. Participation should be and should have been, opt in instead of opt out.

Within our service and the Nextcloud software we utilize, you will be finding more and more options that offer ways of integrating certain AI services such as ChatGPT or LocalAI. This will always remain opt in and would only result in the tool being available with in the service. It may help you correct that report you’re writing but it will never scan or use the poem you’ve stored.

And, crucially, you don’t just have to trust our word on that. You can always jump into the Nextcloud community and have a good look at the code: https://github.com/nextcloud

So, will we be offering AI as a service?

No, while we have initially considered it, we have decided not to offer a service like LocalAI hosted on our platform. We have not found a large language model that does not have a dubious origin story, nor have we found one that would not, on occasion, proclaim something dangerous or downright wrong. We will, how ever, be on hand to help you figure out how to connect what you would like to connect.

Take care of what you create. It’s precious.

Image: EasyDiffusion SD5

Text spelling: OnlyOffice autocorrect

Text copy editing: OpenAI’s chatGPT 4 and Grammarly