July 30, 2025

EU Commission Publishes Guidelines on General Purpose AI Obligations as Well as Training Data Disclosure Template: Further Clarity as the Countdown to Enforcement Begins

Practices & Industries

Lawyers

On July 18, 2025, the European Commission (the “Commission”) published its non-binding Guidelines on the scope of the obligations for general-purpose AI (“GPAI”) models (the “Guidelines”)[1] relating to the European Union’s Regulation 2024/1689, also known as the EU AI Act (the “AI Act”)[2], which was followed on July 24, 2025 by publication of its template document for use by GPAI providers in publicly summarising their models’ training content (the “Template”). These quickly follow publication of the Commission’s Code of Practice for General Purpose AI on July 10, 2025 (the “Code”)[3]^,[4]. This alert summarises the key provisions of the Guidelines and the Template and identifies practical takeaways for providers of GPAI models and other stakeholders in the AI value chain.

Key Takeaways

1. No exhaustive list. As with the previous guidelines on the definition of “AI systems”^[1], the Commission has unsurprisingly refrained from giving an exhaustive list or methodology for how to determine the scope of a GPAI model or its systemic risk and when a modification to a model results in a “new model”. However, helpful indicative criteria (both quantitative and qualitative) are provided, to at least allow model developers to focus their initial analysis. AI developers will need to carefully monitor actual and estimated training compute, and adjust their compliance framework accordingly based on the thresholds their model meets.

2. A safe harbour for solely non-EU developers? To the extent that there are any model developers based outside of the EU, and who have no intention to bring their model to the EU market, the Guidelines state that such developers can exculpate themselves from provider status for uses by downstream actors in the EU provided that they clearly and unequivocally state that their model may not be placed on the EU market. The Guidelines provide no further guidance on how this exclusion must be given, but for model developers seeking to deploy specific models on a regional basis (based on client demand or to avoid compliance obligations), this may represent an opening to do so.

3. No “stop the clock”. Despite calls from several stakeholders to “stop the clock” on AI Act obligations, the Commission is pressing ahead with the deadlines set out in the AI Act. Nevertheless, the Guidelines make clear that the European Commission’s AI Office (the “AI Office”) will take a collaborative, staged and proportionate approach to assessing providers’ compliance, although providers are encouraged to proactively engage with the AI Office to receive this treatment.

4. The Code is King. In the Guidelines the Commission has made clear that, although the Code is voluntary, those who do not sign-up to the Code’s obligations (which seek to enable signatories to comply with their AI Act GPAI obligations with respect to copyright, model documentation, and safety and security (for GPAI with systemic risk), and which we summarised in our previous client alert[2]) may be subject to enhanced scrutiny, including a larger number of requests for information and access to conduct model evaluations than signatories of the Code. While some providers have already indicated that they do not intend to sign the Code, GPAI model providers should carefully consider whether committing to the enhanced commitments of the Code outweighs the potential benefits of streamlined compliance.

5. Training data disclosure requirements unveiled. With the publication of the Template, providers of GPAI models now have the AI Office template in which they are obliged under the AI Act to publish a “sufficiently detailed summary” about their model’s training content, as well as some guidance on how the Commission interprets both “sufficient detail” and “content used for training”. The publication is timely seeing as providers of GPAI models first placed on the market in the EU from August 2, 2025 will have an immediate obligation (upon placing the model on the market) to publish a completed Template (with existing providers having until August 2, 2027 to complete and publish their Templates).

Summary of the Guidelines

The Guidelines are intended to “increase legal clarity and to provide insights into the Commission’s interpretation of the provisions relating to general-purpose AI systems” under the AI Act, which will apply from August 2, 2025. Specifically, the Guidelines address: (i) what are considered to be GPAI models; (ii) what is considered a “systemic risk” relating to such models; (iii) what is the “lifecycle” of a model for compliance; (iv) when entities become “providers” of GPAI models; (v) the scope of exemptions for open-source GPAI models; and (vi) the Commission’s proposed approach to enforcement.

Scope of GPAI Models. The AI Act defines a GPAI model as “an AI model…that displays significant generality and is capable of competently performing a wide range of distinct tasks regardless of the way the model is placed on the market and that can be integrated into a variety of downstream systems or applications…”. The Guidelines make it clear that, given the focus on capabilities in this definition and the wide variety of potential capabilities and use cases, it is not possible for the Commission to provide a precise definition of GPAI models.Instead, the Guidelines set out an ‘indicative criterion’ for a model to be considered a GPAI model, where the model: (i) has a training compute (i.e. computational resources used to train the model) exceeding 10²³ FLOPS (which the Guidelines note is benchmarked based on the guidance given in Recital 98 AI Act regarding GPAI models having at least a billion parameters and large data use for training); and (ii) can generate language (in text or audio form), text-to-image or text-to-video (which the Guidelines note is benchmarked based on the guidance given in Recital 99 AI Act that GPAI models having with these capabilities are more able to accommodate a range of distinctive tasks). While the Guidelines are very clear that these criteria are not determinative (in particular, a model with only a narrow range of capabilities and use cases will not be a GPAI (e.g., a model specifically trained just to increase the resolution of images)), the Commission considers that it will be an important objective indicator of whether GPAI obligations apply to a model.

Systemic risks and lifecycles. Under the AI Act, providers of GPAI models with systemic risk are subject to enhanced obligations in relation to assessment and mitigation of those risks. The AI Act states that a GPAI model will have systemic risk if: (i) it has a “risk that is specific to the high-impact capabilities of general-purpose AI models, having a significant impact on the Union market due to their reach, or due to actual or reasonably foreseeable negative effects on public health, safety, public security, fundamental rights, or the society as a whole, that can be propagated at scale across the value chain”, where “high-impact capabilities” are “capabilities that match or exceed the capabilities recorded in the most advanced general-purpose AI models”; or (ii) the Commission has otherwise designated the GPAI model as having systemic risk based on various criteria set out in the AI Act (e.g. number of parameters, size of data set, compute required for training, nature of modalities, etc.). The AI Act further notes that a GPAI model will be presumed to have systemic risk if the compute required for training is greater than 10²⁵

The Guidelines primarily discuss the processes by which providers of GPAI models must notify the Commission if they consider their model to have systemic risk and can challenge any decision by the Commission that their model has systemic risk. In this regard, most notably, the Commission makes it clear that the burden is on the provider to evidence why a Commission decision is ill-founded, including by providing appropriate information on the model’s achieved or anticipated capabilities, including actual or forecasted benchmark results (e.g. based on scaling analyses). Further, providers are ‘strongly advised’ to include other relevant information, such as model architecture, number of parameters, number of training examples, data curation and processing techniques, training techniques, input and output modalities, expected tool use and expected context length. The Guidelines are clear that the existence of mitigations alone will not prevent designation as a model with systemic risk, and instead should be built into the monitoring and mitigation obligations for the provider of that model under the AI Act.

The Guidelines also note that, when determining whether their model requires notification, providers should estimate the cumulative training compute of their model before the large pre-training run (“the foundational run conducted on a large amount of data to build the model’s general capabilities”) and then re-notify of any changes once training is complete. Technical guidance for calculating this compute estimation is set out in the annex to the Guidelines. GPAI model providers should therefore proactively monitor training compute figures, particularly where these figures are close to systemic risk thresholds.

The GPAI model lifecycle. Several obligations under the AI Act (especially for those models with systemic risk) apply for the lifecycle of the model. According to the Guidelines, the lifecycle of a GPAI model begins at the start of the large pre-training run (which may follow smaller experimental training runs), with any subsequent downstream development of the GPAI model forming part of that model’s lifecycle.
Scope of “Provider”. Following various definitions and Recitals of the AI Act, the Guidelines note that:

Consideration is first required of when a model is “placed on the market”. The examples given in the Guidelines are generally unsurprising (e.g. distributing the model in a software package or physical copy in the market, making the model available in the market via software, mobile applications API, cloud or direct web interfaces). The Guidelines also provide some less obvious examples, in particular use of models for internal processes essential for providing a product or services that “affect the rights of natural persons in the Union”. This has potentially wide-reaching implications and highlights the broad application, and extra-territorial effect, of the AI Act.
The “provider” of that model will generally be the entity that developed that model (or had it developed on its behalf) and then actually places the model on the EU market (even if using third-party hosting or equivalent services to do so). However, to add nuance to this position, the Guidelines note that: (i) if there is a consortium or collaboration developing the model, the coordinator of that consortium or collaboration will be considered the provider; and (ii) a downstream actor that integrates a GPAI model into its AI system will not generally be considered the provider (although will need to comply with other obligations under the AI Act in relation to that AI system), unless the upstream developer makes the GPAI model available for the first time outside the EU with clear and unequivocal exclusions on distribution of the model on the EU market, but the downstream integrator places the integrated model on the EU market anyway (in which case, the downstream actor is the provider of the model).
The Guidelines consider in more detail a situation where a downstream actor modifies or fine-tunes a GPAI model (with or without integrating into an AI system). The AI Act suggests that in certain circumstances such modification or fine-tuning could amount to new model. The Commission provides guidance that there will only be a “new model” for which a downstream modifier is a provider of a GPAI “if the modification leads to a significant change in the model’s generality, capabilities, or systemic risk”. The Guidelines note that the indicative criterion for such significant change is that the training compute used for the modification is greater than a third of the training compute used to train the original model or, if the downstream modifier cannot be expected to know this value, one third of 10²³ FLOPS for a GPAI model without systemic risk or 10²⁵ FLOPS for a GPAI model with systemic risk. However, the Commission concurrently clarifies that the downstream modifier’s obligations as a provider are only required in relation to the scope of modification (not the model as a whole). This relatively high-bar will provide businesses integrating existing GPAI models into their AI system with minimal modifications (i.e. ‘white-labelling’ existing GPAI models) with comfort that they will not become subject to GPAI provider obligations (and the guidelines note that “currently few modifications may meet the criterion”).

Exemptions Open-Source Models. Under Arts. 53(2) and 54(6) AI Act, certain GPAI provider obligations (e.g., technical documentation and authorised representatives for non-EU providers) do not apply to models released under free and open-source licences. The key focus in the Guidelines is that, to benefit from this exemption, the GPAI model must be freely available (including details of parameters, weights and architecture) without any requirements for payment (or other compensation) or restrictions on use or modification (including discriminatory access terms), other than any requirements to attribute authorship and to on-license on the same terms. The Commission make it clear that requirements to purchase support, training or maintenance services to be able to use the relevant version of the model would remove that model from the scope of the exemption (but without limiting the provider’s right to have a premium version of the model making use of such ancillary services).
Enforcement. In discussing its enforcement objectives and focus, the Commission unsurprisingly heavily references its recently-released Code. The Guidelines repeatedly encourage GPAI model to adhere to the Code (or any other code of practice approved by the AI Office), noting that signatories will benefit from “increased trust from the Commission and other stakeholders” with adherence taken into account in any fining decisions. Notably, the Guidelines state that providers that are not signatories of the Code should conduct a gap analysis to demonstrate their compliance, and warn that such providers may be subject to a larger number of requests for information than signatories.
The Commission may also take into account commitments implemented in line with an approved code of practice as mitigating factors when fixing amounts of fines. The Commission also notes that the obligations on GPAI model providers will come into effect as intended on August 2, 2025 (without any “stopping the clock”, as requested by many in the industry), with enforcement of non-compliance possible for August 2, 2026. However, the Guidelines note that: (i) in line with the AI Act, GPAI models placed on the market prior to August 2, 2025 will not need to comply until August 2, 2027, with no need for such models to be re-trained or otherwise unlearn any training if not possible or where there would be a disproportionate burden; and (ii) the Commission understands that, even for GPAI models placed on the market after August 2, 2025, providers may need time to adapt their policies and procedures, so the Commission will take a “collaborative, staged and proportionate” approach to its enforcement activities, including “supporting providers in taking the necessary steps to comply with their obligations”. The Guidelines and the Code therefore both suggest that, while the Commission will not be stopping the clock, it is at least willing to engage with GPAI providers where they are making good faith efforts to comply but are potentially unable to do so within the tight time limits set by the AI Act.

Summary of the Template

The Template has been published to enable all GPAI model providers (including those releasing models under free and open-source licences and downstream actors who become providers) to comply with their AI Act obligation to “make publicly available a sufficiently detailed summary about the content used for training the GPAI model”. The Template is intended to be a simple, consistent and effective means of providing a “comprehensive overview of the data used to train a model, list main data collections and explain other sources used”, with a view to balancing the rights of GPAI model providers (e.g. to protect trade secrets) with the objective of enhancing transparency of GPAI models to enable parties with legitimate interests (notably, including copyright holders, data subjects and downstream providers integrating a model) to exercise their rights under EU law. As with the other obligations of GPAI model providers, the obligation to complete and publish the Template will come into effect on August 2, with enforcement of non-compliance possible for August 2, 2026 (save for GPAI models placed on the market prior to August 2, 2025, which will not need to comply until August 2, 2027). We have summarised the key requirements of the Template below:

Scope of training content. The Template (and AI Act) requires the disclosure of “content used for training”. The guidance accompanying the Template clarifies that the Commission interprets this as covering all data used in all stages of model training, from pre-training to post-training (including alignment and fine-tuning), but not input data used only during operation (e.g. through RAG).
Level of detail required. In addition to basic information of the model, GPAI model providers must disclose varying levels of detail in respect of training content depending on the source of that content (and nature of the GPAI provider). The Template requires the most extensive disclosure in relation to any data crawled and scraped online (unless falling within the scope of a “publicly available dataset” as summarised below), including a “comprehensive description” of the type of content and online sources crawled (including geography, language, types of websites), a list of the top 10% of domain names by size of content scraped (with a lower but still significant threshold for SMEs), details of the crawlers used, and the period of collection. For publicly available datasets (being datasets compiled by a third party and made available publicly for free), the provider must provide a general description of the content (e.g. the relevant modalities (such as text or images), whether personal data, copyright-protected, and/or synthetic, linguistic characteristics, and collection period) and, if any modality in the dataset exceeds 3% of the size of all publicly available datasets used for training, a link to the dataset where available. For private datasets, the provider needs only to confirm whether such were licensed by rightsholders or obtained from intermediaries and the relevant modalities, and for user data and synthetic data the disclosure obligations are more limited still (e.g. to disclose the modalities, the service through which user data is collected, and details of AI models used to generate synthetic data) or excluded (e.g. if user data is only used to fine-tune).
Copyright text and data mining (“TDM”) and illegal content. Although not going so far as to require GPAI model providers to disclose the details of specific crawled or scraped data and works used in training (which the Commission’s guidance states would go beyond the obligation to provide a “summary”), the Template requires GPAI model providers to describe the measures implemented to respect copyright holders’ reservations of rights from TDM (including to state whether they have signed the Code) and to avoid or remove illegal content (such as child sexual abuse material and terrorist material) from training data.
Publication and beyond. The Commission’s guidance is that providers must publish the Template on their official website (clearly visible and accessible) and through model distribution channels, in each case by no later than the model being placed on the EU market, and thereafter (where the model undergoes further training) update the Template at least every six months or (if earlier) whenever a materially significant update is required. Where a downstream actor becomes a provider (as discussed in the Guidelines), the Commission’s guidance is that the downstream GPAI model provider needs only to complete the Template in relation to the content used for model modification.

Conclusion

Although the contents of the Guidelines and Template are generally as expected, the Commission has provided some useful guidance in relation to several fundamental aspects of the AI Act’s GPAI regime, most significantly providing indicative methods for model developers and modifiers to understand whether the resulting model will be considered GPAI, whether it carries a systemic risk and/or when those entities may be considered the provider, and the steps needed to comply with providers’ obligation to publish “sufficiently detailed” information about their model’s training content. The Guidelines also indicate that the Commission is aware that the obligations in the AI Act place significant burden on model providers and, as such, that enforcement may not be immediate even when possible (although with the cautionary warning that the Code is the only safe harbour that exists).

Unsurprisingly though, given the nature of the subject matter, the Guidelines leave room for further development and moving of goalposts as AI technology continues to develop. We may yet see further developments as the AI Office begins collecting information on GPAI models through the AI Act’s transparency obligations, and as the Commission’s enforcement powers come into play from August 2026. In the meantime, the Guidelines and the Template, together with the Code, provide the only handrails for GPAI developers and downstream providers heading into the August 2 compliance deadline.

* * *

[1] Paul, Weiss commentary on these guidelines available here: https://www.paulweiss.com/insights/client-memos/european-commission-provides-guidance-on-scope-of-ai-systems-under-the-eu-ai-act.

[2] Paul, Weiss commentary on the Code available here: https://www.paulweiss.com/insights/client-memos/eu-commission-publishes-its-code-of-practice-for-general-purpose-ai-what-you-need-to-know.

[1] Available here: https://digital-strategy.ec.europa.eu/en/policies/guidelines-gpai-providers.

[2] Available here: https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng.

[3] Available here: https://digital-strategy.ec.europa.eu/en/policies/contents-code-gpai.

[4] Paul, Weiss commentary on the Code available here: https://www.paulweiss.com/insights/client-memos/eu-commission-publishes-its-code-of-practice-for-general-purpose-ai-what-you-need-to-know.