Artificial intelligence is speeding up software development
Artificial intelligence (AI) has taken the world by storm and is becoming a common accelerator also in software development. Although AI in general is not something entirely new, with its roots reaching to the 1950s, generative AI has been soaring since the first commonly known large language models (LLMs) were launched a few years back. Generative AI tools are trained with content often referred to as training data (input) and they generate new content (output), such as software code, based on the user’s command, a prompt. The output may resemble human-generated text, pictures, sounds, videos – or software, for that matter.
From the legal perspective, generative AI raises various intellectual property right related questions in the context of software development, which are roughly divided into three main categories:
- IPRs in the input code of AI tools;
- IPRs in the AI (software) tools themselves, and IPR indemnity protection provided by the AI tool’s licensor against third party IPR infringements; and
- IPRs in the AI-generated output code, and specifically, whether the output may infringe third party IPRs, and whether the output may be eligible for IPR protection, in the first place.
Use of free and open source software as the AI tools’ training data
The training data used for teaching generative AI tools in software development often consists of an enormous amount of code, usually free and open source software (FOSS) simply because it is freely available on the internet under license terms which permit running the program for any purpose (including the training of AI tools), while simultaneously granting rights to copy, modify and distribute the code. Despite the seemingly unrestricted nature of FOSS, it is important to note that the use of FOSS may be subject to a wide range of license conditions that must be complied with when the code is distributed, ranging from copyright notice requirements to source code distribution and the so-called copyleft effect, potentially also consuming the proprietary code under the FOSS license. While the new section 13b of the Copyright Act (and equivalent Copyright Act sections in other European Union jurisdictions in accordance with the relevant directive) also authorizes the reproduction of certain copyrighted content for AI data mining purposes in the absence of an express reservation of the right by the copyright holder, it does not allow further distribution of the content. In turn, FOSS licenses by definition also allow further distribution of the code for any purpose; provided, of course, that the FOSS license conditions are met in each stage of (re)distribution.
If the FOSS license terms and conditions are not complied with when the copied FOSS code is further redistributed as part of the output in any given step of the software development process, all FOSS license rights are lost, resulting in unauthorized use of the FOSS code. Such unauthorized use means simultaneously a breach of the FOSS license terms and the infringement of IPRs (copyrights and/or patents) in the copied and redistributed parts of the FOSS code.
Does the output of generative AI tools infringe third party IPRs?
When using FOSS as training data, IPR questions arise when the AI tool generates output including copied parts of the copyright and/or patent-protected FOSS code derived from the training data. If a (partial) copy of such FOSS code is re-distributed in any step of the software development process, be it in connection with prompting the output by the AI tool or further distributing the output by the user of the AI tool, the conditions of the applicable FOSS licenses must be complied with. If the FOSS license terms and conditions are not complied with when the copied FOSS code is further redistributed as part of the output in any given step of the software development process, all FOSS license rights are lost, resulting in unauthorized use of the FOSS code. Such unauthorized use means simultaneously a breach of the FOSS license terms and the infringement of IPRs (copyrights and/or patents) in the copied and redistributed parts of the FOSS code. This outcome was effectively confirmed also by the U.S. District Court of the Northen District of California (case 4:22-cv:06823-JST), which held that certain FOSS developers had standing to sue GitHub and OpenAI for monetary damages based on a substantial risk that the defendants’ AI tools would reproduce their FOSS code in the generative AI tool’s output in breach of the applicable FOSS licenses. The outcome is effectively the same, i.e. breach of contract and infringement of copyrights, also in case copyrighted content is first lawfully used for data mining under section 13 b of the Copyright Act, but then redistributed as part of the output without authorization.
Managing the risks of third party IPR infringement by output
The challenge is that it is nearly impossible for the generative AI tool’s user to know whether the output may be subject to third parties’ copyrights (or patents), especially if the copyright notices and license references are not retained in the output code possibly copied from the input. Consequently, it is also impossible for the user to comply with applicable FOSS license terms. One way to manage this risk is to either request the third party AI tools’ licensors indemnify the user (licensee) against third party IPR infringement claims and perform code scans – both of which may limit the risks only partially. Proper IPR indemnities may be hard to get, and code scans only benchmark source code of the output with public source code repositories. Development of own AI software tool and training it only with owned and/or other known input to avoid third party IPR infringements may be feasible only in case of big tech companies, if even then. Consequently, there appears to be a natural convergence with third party copyright and patent risks – the known unknowns of potential third party IPR infringements. On the other hand, the right holders – independent software developers and big corporations alike – should pay attention to including proper copyright management information in each source code file to make it easier for users to identify the source code’s right holders. They should also realize that they may not be in a position to dispose of all the rights in the output independently, in case the output gets encumbered with prior copyrights in possibly copied input code, be it FOSS or proprietary. Should the copied code be, say, GPL licensed source code ending up in the output, further distribution of the output would, depending on the software architecture, entirely fall under the GPL due to its copyleft effect, including the source code distribution requirement. In the worst case scenario, without the user even being aware of it.
Is AI-generated output eligible for IPR protection?
While software code may be eligible for copyright and patent protection under certain conditions, in most jurisdictions it requires sufficient human contribution as copyright automatically subsists with the author and patent protection may be applied only for the inventor’s invention. Whether AI-generated content is eligible for IPR protection poses a significant question, as users may want to gain exclusivity in the output. While the applicable statutory IPR laws, case law and guidelines issued by authorities are territorial, dissimilarities may result in a situation where global corporations wish to concentrate the creation of AI-generated software assets to AI-friendly jurisdictions in order to gain some form of IPR protection for their outputs. In the absence of proper IPR protection, trade secrets and contract-based protection mechanisms become more important to bridge the gap.
Finally
All in all, the use of generative AI tools poses many questions and challenges that both generative AI tool developers and users alike must consider. It is vital that companies applying generative AI tools in their software development functions understand their legal position and employ the necessary safeguards to mitigate the risks to a sufficient extent. As with the use of FOSS, AI also offers a lot of efficiency in software development. Prohibiting the use of FOSS or AI is not a smart option, but their use should be conducted in a controlled manner. If your organisation faces generative AI, IPR and/or FOSS-related questions, please do not hesitate to contact us at Dittmar & Indrenius.