A Guide to Generative AI Security Risks And Best Practices

The advent of GenAI systems, such as DeepSeek, OpenAI’s ChatGPT, or Gemini, has changed the game for businesses forever. The surge in productivity is expected to reach the next level. This lures organizations to integrate these systems with apps.

The latest McKinsey report on the Economic Impact of GenAI on Productivity says that GenAI can add up to $4.4 trillion in value annually. However, many security experts worry that GenAI adoption is outpacing the industry’s ability to fully comprehend the risks. The security implications can be significant. It raises the question of how to deal with these new security challenges.

This blog serves as a quick reference guide for best practices for those looking to protect against security threats posed by GenAI systems.

Understanding GenAI Security Risks

Generative AI cybersecurity risks include cybersecurity risks, such as altering the behavior of models through prompt injection, sensitive information leakage, corrupting LLM models to generate unintended outcomes, instructional jailbreaks, or supply chain risks.

Additionally, generative AI systems may cause several other security risks, such as technical controls, policies, governance risks. Certain other types of generative AI security risks, such as LLMjacking, where hackers gain unauthorized access to LLMs using compromised cloud credentials. They can use stolen AWS access keys to tamper with large language models.

CISOs and business leaders are concerned about these risks. Several studies point out the security concerns. A study done by BlackBerry says that 75% of organizations are contemplating, or have already banned, generative AI applications owing to security concerns.

However, a blanket ban on GenAI systems doesn’t serve any purpose, as it will deprive organizations of the immense potential benefits they offer.

Common Types of GenAI Risks

Below, we have presented a list of most common generative AI security risks:

01. Prompt injection

Prompt injection is the most common GenAI security risk in which threat actors create malicious prompts to alter a model’s behavior in unexpected ways. Prompt injection vulnerabilities can be created directly by injecting prompts in a model, or indirectly by forcing a model to accept inputs from external sources.

02. RAG poisoning

Retrieval-Augmented Generation (RAG) refers to a scenario which involves manipulation of retrieval data sources to generate biased, false, or misleading results.

Instructional jailbreaks

Instructional jailbreaks occur when hackers use a set of instructional prompts to bypass the security, ethical, or content filters of large language models. Hackers do it by using specific prompt structures, or input patterns, to get a response which would otherwise be blocked.

Multi-agent prompt relay attack

A multi-agent prompt relay attack is a sophisticated prompt injection technique that involves the collaboration of multiple AI agents. To perform such attacks, attackers exploit the instructional nature of LLMs. One of the attackers embeds malicious instructions that are relayed by other agents. Eventually, this forces a model to generate unexpected outcomes.

Plugin-level persistence

Plugin-level persistence is a serious security risk caused by third-party plugins or extensions. It arises when malicious code is embedded within plugins that have access to the AI system.

These compromised plugins can easily bypass security measures and access sensitive information. Plugin-level risks are even more dangerous, as they can survive routine cleanup since they can reload or reboot automatically. This way, threat actors maintain unauthorized access without detection.

Fine-tuning abuse with stealth triggers

Fine-tuning abuse with stealth triggers refers to generative AI security risks caused by hidden, embedded activation patterns in AI models. This can occur by manipulating the fine-tuning process of LLMs using concealed triggers. These triggers activate malicious behaviors only upon receiving specific inputs.

Supply chain risks

LLM supply chains are susceptible to multiple vulnerabilities that can impact the integrity of data, models, and platforms. This can lead to the manipulation of large language models to generate biased outputs, and may even result in security breaches or system failure. These risks often arise from relying on third-party datasets, models, or plugins without proper security vetting.

Model poisoning

Model poisoning risks refer to the process of inserting corrupted or malicious data into an LLM’s training dataset. Threat actors can do this by identifying a target, gaining access, crafting malicious data, and injecting it into the training models. It negatively impacts a model’s reliability and accuracy.

Sensitive data exposure

This can occur due to several factors, such as unencrypted data storage or weak API protections. Unintended data exposure can cause significant security issues, including identity theft, financial loss, or severe regulatory penalties.

Common Examples of GenAI Cyberattacks

As the usage of GenAI is increasing, so are the instances of cyberattacks. Here is a list of the latest cyberattacks:

SAP AI flaws: The vulnerability allowed hackers to gain unauthorized service takeover and customer data access.
NVIDIA container Vulnerability: Known as CVE-2024-0132, this critical vulnerability allowed hackers a full host system access.
Hugging face architecture: The security loopholes in Inference API compromised several AI-as-a-Service providers.
DeepSeek data exposure: This happened in January 2025, when sensitive data including chat histories and API secrets were exposed.

How Red Teams Can Secure GenAI Systems

1. Define scope

This is the first stage, red teams ascertain the scope of testing. They do it to know where the vulnerability lies and which of the components of the GenAI system need protection, APIs, models, plugins, data connectors, and hosting setups. Based on the evaluation, the red team focuses security efforts on these areas to ensure no critical gap gets overlooked.

2. Map the attack surface

Next, they break down the existing GenAI system into multiple components, which includes API layer, input preprocessing, model inference engine, third-party plugins/ extensions, and output flow engine.

Mapping the attack surface helps them understand how attackers could interact with the system and use vulnerabilities to exploit it. Additionally, it helps them visualize blind spots in the system.

3. Perform threat modelling

Red teams prepare detailed possible attack scenarios that hackers can use to target GenAI systems. While doing so, they think about potential tactics, such as prompt injection, model inversion, plugin integration exploit, data poisoning, and sensitive data leaks, etc. Since all risks are not the same, they prioritize the severity of risks based on the likelihood of a specific event.

Next, offensive security teams evaluate the infrastructure where GenAI is deployed. This includes cloud or on-prem settings, where API access controls lie, how encryption is done, and how plugins are integrated. This ensures basic security measures are in place to prevent suspicious activity.

4. Identify attack vectors

The attack vectors are a subset of the cyber kill chain. They include all possible actions that attackers can take to conduct offensive cyber operations. These are the components that help an attacker breach system defenses to gain access and modify large language models. Some popular attack vectors include the following:

Password brute-forcing and cracking
Encryption/authentication vulnerabilities (in the access control system)
Intentional backdoors in algorithms, protocols, or products
Exploitation of exposed credentials
Code vulnerabilities
AI-specific attack vectors, such as prompt-triggered code execution, model extraction, or model distillation

5. Simulate attacks

Red teams run offensive tests specifically designed for GenAI systems. It includes injecting malicious prompts, automatic data extraction, or adding adversarial inputs. It helps in proactive identification of vulnerabilities.

6. Report and recommend

Finally, they document all findings clearly that includes the vulnerabilities identified, testing methodology, and potential impact. The report suggests clear recommendations to plug detected vulnerabilities.

Best Practices for Prevention And Mitigation of GenAI Risks

Managing the risks of generative AI is crucial for risk managers to take advantage of the technology. There is a lot at stake for businesses as suddenly they have to manage totally new and amplified risks. Below, we have outlined the list of best practices for prevention and mitigation of generative AI risks:

Model development and output

Provide specific instructions about the model’s role, capabilities, and limitations within the system prompt.
Specify clear output formats to validate adherence to these formats.
Apply semantic filters and use string-checking to scan for non-allowed content.
Implement human-in-the-loop controls for critical operations to prevent unauthorized actions.
Use red team assessment and adversarial tactics to regularly detect vulnerabilities.

Data integrity

Only use data from reputable, verified sources.
Periodically conduct security audit external knowledge bases, APIs, and other resources to eliminate introduction of malicious/misleading information.
Always track and monitor data ingestion and retrieval activities for anomalies or unexpected changes.

LLM input validation and sanitization

Sanitize and clean all user-provided data or prompts to remove malicious content or code. For example, remove all SQL commands or scripting tags from user inputs before using them in the AI model.
Follow secure coding best practice, such as applying OWASP’s. recommendations in ASVS (Application Security Verification Standard)
Use SAST, DAST, IAST in development pipelines.
Test and validate inputs in CI/CD pipeline. .

Output handling and monitoring

Check and review AI outputs to avoid sensitive data leaks.
Regularly log and monitor outputs to keep track of anomalies, or policy violations.
Make human approval mandatory for high-risk or sensitive outputs.

Plugin and extension management

Always use plugins and extensions from reliable sources and review their code or security documentation.
Prohibit or limit the unnecessary extensions that LLM agents are allowed to call. For example, an LLM used for text analysis should not have access to URL-retrieval plugins.

Minimize extension functionality. For example, if an extension needs to access email content to summarise the messages, they should not contain other functionalities.
Restrict the permissions and data access granted to unnecessary plugins and extensions.
Periodically track and monitor for vulnerabilities in installed plugins and apply updates or patches without delay.
Review all extension requests and vet it properly against security policies.

Access control and authentication

Use Principle of Least Privileges (POLP) while granting users and systems access.
Always provide minimum access necessary to perform their tasks.
Implement Multi-Factor Authentication (MFA) in conjunction with robust password policies to verify user identities before providing access.
Regularly review and update access rights.
Always authorize actions in downstream systems. Avoid doing it through LLMs.
Clearly specify ownership for key tasks like model validation, data protection, and incident response.
Use a RACI chart to clearly clarify who is responsible, accountable, consulted, and Informed for each security activity.

Resource and service protection

Make authentication for all API access mandatory.
Regularly review access and usage of GenAI resources to detect anomalous activities.
Use resource Isolation technique to separate GenAI infrastructure from other critical systems. It limits the blast radius of potential breaches. For example, deploy GenAI servers in a separate virtual network and make sure it has restricted connectivity with the system’s internal databases.

Audit supply chain risks

Prepare a comprehensive model of threats to LLM components and trust boundaries.
Maintain a tight control over training data, pipelines, and models.
Regularly audit and test third-party providers regularly.
Carefully verify vendor resilience testing and SLAs.
Prepare a benchmark generative AI cybersecurity metrics.

Utilize zero-trust model framework

Know who is interacting with the resources.
Understand what actions they are performing.
Put restrictive control access to sensitive data and its use.
Identify which application or data is being accessed and by whom.
Track and monitor user behavior throughout the interaction.

Monitor audit trails

Make sure that all stages of model development, deployment, and usage are properly logged.
Set-up data version control (DVC) to track changes in datasets and detect manipulation.
Maintain a log of all details such as data changes, model updates, and access events to ensure every incident is tracked.
Regularly review audit logs, such as User access logs, Input/output interactions, API usage monitoring, API requests, model/data changes, permission modifications etc.

Prioritize AI Bill of Materials (AI-BOM)

Prepare a comprehensive inventory of all GenAI models, datasets, and any third-party components used while creating models.
Regularly update the AI-BOM so that it always reflects the latest version.
Periodically review the AI-BOM to proactively evaluate exposure to new vulnerabilities.

Data security and privacy

Avoid using personal identifiers (PII) in datasets being utilized in training or testing.
Using Principle of Least Privilege, collect, and use the minimum necessary data for your AI application.
To detect suspicious activity, keep track of who accessed the data and what information they evaluated and when.
Use tools such as OWASP CycloneDX or ML-BOM to track the origins and transformations of data.
Confirm the validity and reliability of the data at every stage.
To find indications of poisoning, thoroughly vet data vendors and compare model outputs to reliable sources.
Always use data version control (DVC) to monitor and track changes in datasets. Versioning helps in maintaining integrity of models.

To learn more about the best practices for prevention and mitigation of data security and privacy risks in GenAI systems, click here.

Governance and data management

Create a comprehensive policy that includes all possible aspects, such as good conduct,data protection, data classification, plugins, third-party software use, and usage limit.
Prepare a SOP to evaluate risk assessments, and fix governance responsibility.
Make sure GenAI solutions include DevOps and cybersecurity to ensure comprehensive AI safety.

Clearly communicate the policy to relevant stakeholders.
Enforce the compliance through periodical reviews.

Bottomline

In conclusion, generative AI is undoubtedly a game changing technology, and it will have a transformative impact on businesses. However, the key to reap the advantages of generative AI lies in how early your risk managers identify its security risks and take appropriate steps to proactively prevent associated risks.

CISOs need to understand the subtle and complex landscape of generative AI risks and take appropriate steps to ensure robust security measures to protect their AI models and its entire ecosystem. GenAI security threats are already in your backyard. Our red team experts can guide you to prepare in advance. Contact us now to schedule a call!

// SecureLayer7

How SecureLayer7 helps

SecureLayer7 red teams GenAI and LLM deployments for prompt injection, jailbreaks, RAG and model poisoning, and plugin-level persistence. We test how your AI systems behave under real attacker techniques and give you fixes ranked by impact.

Test Your AI

Frequently Asked Questions

What is prompt injection in generative AI?

Attackers craft malicious prompts that change how a model behaves. Direct injection feeds the prompt straight into the model, while indirect injection hides instructions in external sources the model pulls in. It is the most common GenAI security risk.

What is LLMjacking?

Hackers gain unauthorized access to large language models using stolen cloud credentials. With compromised AWS access keys, they can tamper with models, run queries on the victim’s account, and rack up usage costs. It treats the LLM as a hijacked cloud resource.

How does RAG poisoning work?

Retrieval-Augmented Generation pulls external data into model responses. Poisoning manipulates those retrieval sources so the model returns biased, false, or misleading output. The model itself stays unchanged, but the corrupted data shapes what it produces.

What is the difference between model poisoning and fine-tuning abuse?

Model poisoning inserts corrupted data into the training set to degrade accuracy and reliability. Fine-tuning abuse plants hidden activation patterns during fine-tuning that trigger malicious behavior only on specific inputs. One damages general output, the other stays dormant until a stealth trigger fires.

Should organizations ban generative AI tools over security concerns?

A BlackBerry study found 75% of organizations have banned or are considering banning GenAI apps. A blanket ban cuts off real productivity gains and is not a fix. Apply technical controls, governance policies, and security vetting of third-party models and plugins instead.