Skip to content

cloudformation package is always generating a new zip #3131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
izidorome opened this issue Feb 6, 2018 · 46 comments
Open

cloudformation package is always generating a new zip #3131

izidorome opened this issue Feb 6, 2018 · 46 comments
Labels
cloudformation package-deploy customization Issues related to CLI customizations (located in /awscli/customizations) feature-request A feature should be added or improved. p3 This is a minor priority issue

Comments

@izidorome
Copy link

izidorome commented Feb 6, 2018

I have a Golang lambda with the following template:

AWSTemplateFormatVersion : '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Billing Api Create Application

Resources:
  BillingCreate:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: billing-create
      Handler: main
      CodeUri: ./build
      Runtime: go1.x
      Policies: AWSLambdaDynamoDBExecutionRole

Even when the code didn't change (Go build generates the same compiled code), aws cloudformation package command generates a new zip file.

@izidorome izidorome changed the title package is always generating a new zip cloudformation package is always generating a new zip Feb 6, 2018
@kyleknap
Copy link
Contributor

kyleknap commented Feb 9, 2018

Could you elaborate a little more on why it is an issue that a new zip is formed for every package command? It may be difficult to avoid making a new zip every time because the package command does an md5 of the zip it creates to see if it needs to reupload the code to s3 by comparing the two md5's.

@kyleknap kyleknap added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Feb 9, 2018
@izidorome
Copy link
Author

Imagine a scenario where you have a cloudformation file with more than one Lambda declared.
For now, let's call it FN1 and FN2, one is at fn1.go file and the second at fn2.go.

I build both of them, which generates two binaries fn1 and fn2.

I run cloudformation package, and it generates 2 zip files and send them to S3.

One week later, I change the fn1 function, but not the fn2. My CI builds both of them, but only the first has a different MD5 (the second has the same MD5 as before).

The problem here is the package command will generate a new zip for the second one too, even if the file did not changed, which causes all my cloudformation declared functions to be deployed.

@jakul
Copy link

jakul commented Feb 27, 2018

I'm having the same issue with Python code. Every time I run aws cloudformation package it creates/uploads a new zip file and changes the CloudFormation template

@jakul
Copy link

jakul commented Feb 27, 2018

@rizidoro Can you download the zip files from S3, unzip them locally and diff them? Turns out I had one file which was actually different, because it included a "generated at" date which was being updated everytime I built the CloudFormation script

@jakul
Copy link

jakul commented Feb 27, 2018

You also need to check for timestamp differences amongst the files

@jmassara
Copy link

jmassara commented Apr 15, 2018

the package command does an md5 of the zip it creates to see if it needs to reupload the code to s3 by comparing the two

That is exactly the issue. If timestamps on files are different in the zip file, even if they are the same contents, the md5 is different. In the case of scripting languages, this is probably not an issue. However with Go, each time you run go build, a new binary is created and thus a new timestamp.

This is especially troublesome if you are trying to use CodePipeline and CodeBuild (see https://docs.aws.amazon.com/lambda/latest/dg/automating-deployment.html) because no matter what, package is always going to create a zip with a different md5.

Perhaps package should md5 each file in the zip instead of the zip as a whole. As it is now, it's not an accurate comparison.

@izidorome
Copy link
Author

@jmassara exactly the problem I'm facing right now. The final binary go build generates change the timestamp.

@jmassara
Copy link

jmassara commented Apr 15, 2018

@rizidoro Yes. This is a bug with package. It should probably create a temporary file that has a list of the md5 hashes of all files going into the zip. Then md5 this temporary file and use that value as the name of the S3 object.

@atamgp
Copy link

atamgp commented May 26, 2018

I have the same issue. Have a CodeCommit repo with a sam.yml containing multiple lambdas.

When from my VM i use aws cli on the promt 2 times after each other, the frist will upload a .ip for every lambda. The seconds one does nothing because nothing changed = correct.

But... doing exactly the same from CodePipeline , CodeBuild (aws cloudform package bla bla) it does not work. You can trigger the pipeline with "Release Change" without needing a commit which will trigger the pipeline. It start a aws cli docker for CodeBuild, gets the input sources from S3 and unzips them. Calls cloudformation package which DOES reupload unchanged code for every lambda causing redeployment in next steps.

  1. How does not anyone using CodePipeline and Lambdas run into this?
  2. It seems that fetching unchanged sources from S3, unzipping them en doing package leads to other MD5 which is NOT OK.

Does anyone know a workaround and when this bug will be fixed?

@kyleknap kyleknap removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Jun 14, 2018
@paul-wilkinson
Copy link

I am having the same issue. I'm finding reviewing CloudFormation change sets painful because they are polluted with changes to Lambda resources that didn't materially change.

@rmmeans
Copy link

rmmeans commented Jul 20, 2018

I'm seeing the same problem as @jmassara reported above with node. This one is painful for us as we are trying to use a CodePipeline to deploy Lambda@Edge functions with the CDN in the stack - even if we don't touch the functions, the CLI during packaging thinks the files changed resulting in a CDN update (wait 15 min) even if we didn't change anything in the function code. It is far more than just an unnecessary version publish in the change set - slows the entire CD process down unnecessarily because of how slow CloudFront updates are.

@vaibhavkewl
Copy link

Hi, Is there any progress with this feature-request?
Comparing the md5sum of each file within a zip instead of md5sum of zip file sounds like a good possible solution for this problem.
Appreciate your thoughts and a possible fix for this. We have a CI/CD pipeline with many lambda functions and this problem is causing a new version of aws lambda being deployed everytime unnecessarily.

@mruckli
Copy link

mruckli commented Nov 20, 2018

We are also facing this exact issue.

@Umkus
Copy link

Umkus commented Dec 4, 2018

@rmmeans I have exactly same issue. This not only slows down the deployment, but also the rollbacks.

@okovalov
Copy link

okovalov commented Dec 9, 2018

Guys, my question is not 100% related to this particular bug (I bypassed it by having different and separated lambdas) but there is smth I really cant bypass and I am giving up on it.. I would really appreciate any help/suggestions - please take a look at this error

image

that package command fails when i have too many deps added to my package.json, and unfortunately, do to the nature of the lambda, there is no way to decrease files amount..

so, is there any way, to , actually, run it with zip64 support ? please help.. I have already given up on this...

@bjorg
Copy link

bjorg commented Feb 1, 2019

The solution may depend on the programming language (and therefore, potentially not possible for some). We solved it in the λ# CLI as follows:

.NET Core has a deterministic build system, which means that if the source files and nuget packages have not changed, then the resulting compiled binaries remain identical as well. During the build phase of the package, the CLI creates a checksum of the file contents and filenames instead of the ZIP file itself. The latter contains date & timestamps that would cause the checksum to change with every build. The result is a package filename that only changes when the underlying code changes, which in-turn, only updates Lambda functions--or Lambda layers--when required.

@Anheurystics
Copy link

Any updates on this issue?

@dan-lind
Copy link

I'm facing the exact same problem

@wmonk
Copy link

wmonk commented Aug 22, 2019

I've also been suffering this issue. I am using the sam-cli and have been trying to optimise the time to run sam package and sam deploy. So far I've got to a nice place using a node script to pre-package each of the 29 lambdas into their own directory with the required node_modules. This is important so that I can make code changes in one file, then run deployment, and it'll very quickly deploy the lambdas for which that file change was necessary. Best case it'll affect 1 lambda and my deployment will take a few seconds.

As per the rest of the conversation in this issue, the md5 of the zip is different each time. Here is a demonstration:

~/C/t/test ❯❯❯ mkdir out
~/C/t/test ❯❯❯ touch out/test
~/C/t/test ❯❯❯ echo "Hello world" > out/test
~/C/t/test ❯❯❯
~/C/t/test ❯❯❯ md5 out/test
MD5 (out/test) = f0ef7081e1539ac00ef5b761b4fb01b3

~/C/t/test ❯❯❯ zip -rqX out.zip out
~/C/t/test ❯❯❯ md5 out.zip
MD5 (out.zip) = 5f28021c0b6fc266abbfb1b36870fa1d
~/C/t/test ❯❯❯
~/C/t/test ❯❯❯ zip -rqX out2.zip out
~/C/t/test ❯❯❯ md5 out2.zip
MD5 (out2.zip) = 5f28021c0b6fc266abbfb1b36870fa1d
~/C/t/test ❯❯❯ # Same md5!

~/C/t/test ❯❯❯ echo "Hello world" > out/test
~/C/t/test ❯❯❯ md5 out/test
MD5 (out/test) = f0ef7081e1539ac00ef5b761b4fb01b3
~/C/t/test ❯❯❯ # Same md5 for file!

~/C/t/test ❯❯❯ zip -rqX out3.zip out
~/C/t/test ❯❯❯ md5 out3.zip
MD5 (out3.zip) = 1a8ec423697ce9c657b6f1c12c51476f
~/C/t/test ❯❯❯ # Different zip file md5!

Digging into the source code for the zipping + uploading functionality you can see that the code walks the file tree and adds each file to the zipfile:

def make_zip(filename, source_root):
zipfile_name = "{0}.zip".format(filename)
source_root = os.path.abspath(source_root)
with open(zipfile_name, 'wb') as f:
zip_file = zipfile.ZipFile(f, 'w', zipfile.ZIP_DEFLATED)
with contextlib.closing(zip_file) as zf:
for root, dirs, files in os.walk(source_root, followlinks=True):
for filename in files:
full_path = os.path.join(root, filename)
relative_path = os.path.relpath(
full_path, source_root)
zf.write(full_path, relative_path)
return zipfile_name

My proposal would be that in this step it also md5s all the files adding to the zip, and then finally md5s the total. Not sure what the perf impact would be doing this, but it should make the final deployment significantly faster if doing this kind of thing.


I've tested locally on a lambda with a small 😛 sized node_modules, total directory size ~20mb:

~/C/g/a/.s/Api ❯❯❯ time find . -type f -exec md5 \{\} >> ../out.md5 \;
       10.51 real         3.18 user         6.76 sys
~/C/g/a/.s/Api ❯❯❯ md5 ../out.md5
MD5 (../out.mdf) = 6e6584c968e3974b60ba7b4e244a84b5

This was for 3098 files.

@bjorg
Copy link

bjorg commented Aug 22, 2019

Yes, that's close to how it's done in λ# for the .NET zip packages. Make sure to sort the files by their full path first, then MD5 the file contents and the file path. If you omit the latter, the MD5 doesn't change when you change capitalization of a file!

See details at https://github.com/LambdaSharp/LambdaSharpTool/blob/9767b96fda1c459f21ebf68c1dd18670970c012d/src/LambdaSharp.Tool/Internal/StringEx.cs#L164

@wmonk
Copy link

wmonk commented Aug 23, 2019

@stealthycoin would there be any appetite for a PR implementing this?

@wmonk
Copy link

wmonk commented Sep 13, 2019

@stealthycoin any update on this? I'd be happy to take a crack at a PR to implement the behaviour discussed.

@hatim-heffoudhi
Copy link

hello guys, any updates please :) ? im facing the same issue, i have a multiple lambdas in monorepo
once i update a lambda, the sam package generate multiples s3 zip files for the others even if i ddidnt any changes..
its a bug or feature request ?

@gpiccinni
Copy link

Hi all, I've created a pull request which seems to solve the issue we were facing, where basically we compute the checksum on the entire function content (after installing all requirements) rather than computing it on the resulting ZIP file (the current behavior).
The main difference is that when computing checksum on the ZIP it changes every time a file is created (it keeps into account file mtime and ctime) even if there is no actual change in the file content.

It would be great if this pull gets accepted and merged.
Thanks.
G

@wmonk
Copy link

wmonk commented Dec 23, 2019

@gpiccinni I implemented a similar solution to yours in September here #4526, but unfortunately nothing ever came of it.

@gpiccinni
Copy link

@wmonk many thanks for pointing this out, by looking at your pull request I realized that in my case checksum is not changing when filenames change (which in my opinion should), whereas in your code you already addressed this !

I'll look into other libraries such as dirhash where the filename and path is included in the checksum and eventually change my pull request.

Thanks
G

@hatim-heffoudhi
Copy link

hatim-heffoudhi commented Dec 23, 2019

@gpiccinni , awesome !!! and thanks ! i hope that your PR can be merged quickly ! this can fix a lot of pipelines..,

@rsodha
Copy link

rsodha commented Jan 23, 2020

That is exactly the issue. If timestamps on files are different in the zip file, even if they are the same
contents, the md5 is different. In the case of scripting languages, this is probably not an issue. However with Go, each time you run go build, a new binary is created and thus a new timestamp.

@jmassara This problem exists for scripting languages also. I am facing same problem with Node.js lambdas. Looks like it is due to zip headers. Have a look at this stackoverflow discussion.

@rehanvdm
Copy link

Well the CDK team does not have this problem? Find out what they are doing and do the same

@wmonk
Copy link

wmonk commented Jan 23, 2020

After being frustrated at this issue for a while, i've fixed this in my own deploy scripts. Hopefully this can help some other, and maybe get some optimisations! I'm not sure if this is the "right" way to do it, but it's been working fine for us. One big benefit i've found is that I can make config changes without having to redeploy every function that relies on code (that hadn't changed).

find src -type f -exec md5sum {} \; > tmp-md5
find node_modules -type f -exec md5sum {} \; >> tmp-md5
CODE_MD5=$(md5sum tmp-md5 | cut -c 1-32)

if [ ! -f "$CODE_MD5" ]; then
    zip -q -r $CODE_MD5 src node_modules # more files here
fi

aws s3 ls s3://bucket-name/$CODE_MD5 || aws s3 cp $CODE_MD5.zip s3://bucket-name/$CODE_MD5

sam deploy --parameter-overrides CodeUriKey=$CODE_MD5
Parameters:
  CodeUriKey:
    Type: String
    NoEcho: true

Lambda:
  Type: AWS::Serverless::Function
  Properties:
    CodeUri:
      Bucket: bucket-name
      Key: !Ref CodeUriKey

@rsodha
Copy link

rsodha commented Jan 30, 2020

I have found another workaround (may be easier for those who have many lambda functions in one pipeline) to this issue.

Key to this workaround was to find out what contributes to different md5 of a zip even if contents of files within zip have not changed. I found 'Modified Timestamp' of files to be culprit. So idea is; if we can have consistent 'Modified Timestamp' on all files just before 'aws cloudformation package' or 'sam package' command is run, produced zip files will have consistent md5 across build executions.

find . -exec touch -m --date="2020-01-30" {} \; # date does not matter as long as it  is never changed.
aws cloudformation package --template-file template.yml --s3-bucket <bucket> --output-template-file package-template.yml

Above trick has worked for me so far.

@Al-tekreeti
Copy link

does not work for me

@kdaily kdaily added the customization Issues related to CLI customizations (located in /awscli/customizations) label Nov 12, 2020
@ShengHow95
Copy link

I have similar issue but with lambda layers instead. I have my template on codecommit and I created a codepipeline with a codebuild that will automate the cloudformation package and deploy process. However, everytime if there is any changes to the codecommit even the lambda layer did not change, it will still create a new lambda layer.

Anyone here has got any alternatives?

@ryancabanas
Copy link

I have similar issue but with lambda layers instead. I have my template on codecommit and I created a codepipeline with a codebuild that will automate the cloudformation package and deploy process. However, everytime if there is any changes to the codecommit even the lambda layer did not change, it will still create a new lambda layer.

Anyone here has got any alternatives?

I have this same issue. While the suggestion from @rsodha does work to prevent most duplicate packages from being uploaded by the aws cloudformation package command, the AWS::Serverless::LayerVersion layer that I've created keeps getting re-uploaded, even when there are no package changes. I believe the reason is due to the CODEBUILD_SRC_DIR path, which is different every time an AWS::CodeBuild::Project is generated as part of my CodePipeline run. This CODEBUILD_SRC_DIR path is saved inside the package.json files that are created when I download the needed npm packages for my Node Lambdas (but doesn't appear to be an issue for the Python packages). Because of this, the layer hash is always different and, therefore, gets re-uploaded every time.

If there were a way we could manually set the CODEBUILD_SRC_DIR path to a static value every time the AWS::CodeBuild::Project is generated in the CodePipeline's CloudFormation template, then that might be a solution to this issue.

@ryancabanas
Copy link

I have similar issue but with lambda layers instead. I have my template on codecommit and I created a codepipeline with a codebuild that will automate the cloudformation package and deploy process. However, everytime if there is any changes to the codecommit even the lambda layer did not change, it will still create a new lambda layer.
Anyone here has got any alternatives?

I have this same issue. While the suggestion from @rsodha does work to prevent most duplicate packages from being uploaded by the aws cloudformation package command, the AWS::Serverless::LayerVersion layer that I've created keeps getting re-uploaded, even when there are no package changes. I believe the reason is due to the CODEBUILD_SRC_DIR path, which is different every time an AWS::CodeBuild::Project is generated as part of my CodePipeline run. This CODEBUILD_SRC_DIR path is saved inside the package.json files that are created when I download the needed npm packages for my Node Lambdas (but doesn't appear to be an issue for the Python packages). Because of this, the layer hash is always different and, therefore, gets re-uploaded every time.

If there were a way we could manually set the CODEBUILD_SRC_DIR path to a static value every time the AWS::CodeBuild::Project is generated in the CodePipeline's CloudFormation template, then that might be a solution to this issue.

After many attempts, I still could not prevent a new Lambda Layer from being generated during each CodePipeline run. I tried the following:

  • In my buildspec.yaml, for the CODEBUILD_SRC_DIR variable and the path that is automatically generated in the AWS::CodeBuild::Project resource, right from the start I renamed the the src... path to the static value src123456789 and updated the CODEBUILD_SRC_DIR variable accordingly. Unfortunately, this still resulted in a new Layer being created, even though it successfully provided a consistent source path between CodePipeline runs.
  • I also tried using sam pacakage just in case there was a different between that and aws cloudformation package, but this didn't make a difference either.

I've downloaded a couple of Lambda Layer .zip files that didn't change between CodePipeline runs and checked their MD5 hash values and they are indeed different for some reason. The size of the files are different too (for example, 16,461,107 bytes vs. 16,461,114 bytes), but I can't figure out what the differences are between these two, as I've unzipped them and performed a directory comparison using the comparison tool Meld and it doesn't report any file differences.

So, I'm out of ideas as to why a new Lambda Layer is always generated and how to stop this from happening.

Any other ideas out there? Thanks.

@bjorg
Copy link

bjorg commented Oct 5, 2021

@ryancabanas the file dates are probably different. Different values also means different compression level. I had to solve this problem for LambdaSharp.Net as well. You have to MD5 only the file paths and file contents in the ZIP file to make it an idempotent process.

@ryancabanas
Copy link

@bjorg Thanks for helping! I am using the suggestion above from @rsodha and resetting the modified date for all the files, so they are consistent in that respect from build to build.

Any suggestions on how to go about determining what else could be different between the files from build to build? Thanks!

@bjorg
Copy link

bjorg commented Oct 6, 2021

@ryancabanas not sure, but isn't there a modified and a created timestamp on files? Could that be it? Do folders have timestamps? Does the zip file itself have an internal timestamp?

I'd recommend you write a little app that opens both zips and compare the metadata of all entries. If the files are the same, it's must be the metadata. Most zip libraries are pretty easy to use. It's almost identical to comparing two folders. This might be frustrating, but so is guessing blindly.

Sorry I couldn't be of more assistance.

@ryancabanas
Copy link

@bjorg Okay. I'll dig further in the ways you've mentioned. Thanks!

@kyptov
Copy link

kyptov commented Oct 6, 2021

@ryancabanas did you try aws-cdk? It looks like it generates same hash for same contents each time.

@rehanvdm
Copy link

rehanvdm commented Oct 6, 2021

CDK fanboy here. They don't have this problem, the cdk-assets does things like normalize file dates and line endings before zipping.

But @ryancabanas what you are describing the CODEBUILD_SRC_DIR is different has an impact on the package.json. TL;DR It is the wild-wild west within the node_modules directory, it mutates after installation and is the cause for non-deterministic hashing.

Some packages embed the absolute path in the package.json after installation and then because CODEBUILD_SRC_DIR is different, it forces that package.json to be different. I wrote about it here: https://www.rehanvdm.com/blog/cdk-shorts-1-consistent-asset-hashing-nodejs It is not actually a CDK or CFN problem but rather an NPM one.

The solution is to either remove the package.json from every node_module/ package so that when the hash is calculated, they are excluded. The better solution is to use bundling, a tool like ES Build treeshakes and bundles all your code into a single .js file. This is the only file in the zip then, so no package.json anywhere.

@ryancabanas
Copy link

@rehanvdm Thanks for your article! Yes, what you said about the package.json metadata, namely the CODEBUILD_SRC_DIR path, is exactly what I discovered. I performed a test where, in CodeBuild, before anything else, I changed the src... folder name to a consistent name (for example, I always change it to src123456789) and this has resulted in .zip file contents that were then the same from build to build, but a new Lambda Layer is always uploaded still, even when it hasn't changed from build to build. I also used the suggestion above and changed the dates of all the files to a consistent date, but this hasn't solved the problem either.

I'm new to development and AWS, so I haven't used CDK before, or bundling. I will have to look into these. Thanks for the help!

@ryancabanas
Copy link

Got it!

So I used the folder-hash package that @rehanvdm mentioned in his article and this helped reveal differences between my Lambda Layer assets. I had already taken care of the CODEBUILD_SRC_DIR issue in the package.json files for Node, but I'm also using a couple Python packages and it seems the .pyc files differ in the __pycache__ folders from build to build. So after installing the packages, I have deleted these .pyc files and now no more unnecessary Lambda Layers are being created and uploaded! Thanks for the help!

@KyleThen
Copy link

For me my issue was the I was creating the bundled zip using linux's zip command. I needed to use the -X option so it didn't add all the extra attributes to the created zip. I also included deleting the .pyc files and setting the last modified date for all the files to be the same in the solution so I'm not positive which combination of them is needed.

@ConnorKirk
Copy link
Contributor

I've also encountered this issue when using CodeBuild to package a lambda functions and layers in a CloudFormation template.

As a workaround, the sam cli does not seem to have this behaviour (anymore?), and is included in the aws/codebuild/standard:6.0 CodeBuild image. I was able to swap aws cloudformation package for sam package in CodeBuild buildspec to work around this issue.

@tim-finnigan tim-finnigan added the p3 This is a minor priority issue label Nov 14, 2022
@jtheuer
Copy link

jtheuer commented Jul 21, 2023

I still have the same problem with aws cloudformation package for Lambda functions that are pointing to a local .py file. Even setting the mdate of my source files to a fixed date didn't help: touch -a -m -t"201001010000.00".

The generated zip file always has a different checksum.

What I would like to have is: When running cloudformation package, cloudformation deploy on the same source files then cloudformation must not re-deploy unchanged resources.

Are you able to implement that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cloudformation package-deploy customization Issues related to CLI customizations (located in /awscli/customizations) feature-request A feature should be added or improved. p3 This is a minor priority issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.