a >hv@s ddlZddlZddlZddlZddlZddlZddlmZddlmZddl m Z ddl Z ddl m Z ddlmZddlmZddlmZdd lmZdd lmZdd lmZdd lmZdd lmZmZmZm Z ddl!m"Z"m#Z#ddl$m%Z%ddl&m'Z'm(Z(m)Z)ddZ*GdddeZ+dS)N)ProcessPoolExecutor)datetime)getpwuid) __version__) SoSComponent) SoSIPParser) SoSMacParser)SoSHostnameParser)SoSKeywordParser)SoSUsernameParser) SoSIPv6Parser)SoSReportArchiveSoSReportDirectorySoSCollectorArchiveSoSCollectorDirectory)DataDirArchiveTarballArchive)InsightsArchive)get_human_readable import_moduleImporterHelpercCs ||SN)obfuscate_arc_files)arcflistr8/usr/lib/python3.9/site-packages/sos/cleaner/__init__.pyr)src sNeZdZdZdZdgggdgdddddgd ZdJfd d ZdKd d ZdLddZdMddZ dNddZ e ddZ ddZ ddZe ddZddZddZd d!Zd"d#Zd$d%Zd&d'Zd(d)Zd*d+Zd,d-ZdOd.d/Zd0d1Zd2d3Zd4d5Zd6d7Zd8d9Zd:d;Zdd?Z!d@dAZ"dBdCZ#dDdEZ$dFdGZ%dHdIZ&Z'S)P SoSCleanera- This function is designed to obfuscate potentially sensitive information from an sos report archive in a consistent and reproducible manner. It may either be invoked during the creation of a report by using the --clean option in the report command, or may be used on an already existing archive by way of 'sos clean'. The target of obfuscation are items such as IP addresses, MAC addresses, hostnames, usernames, and also keywords provided by users via the --keywords and/or --keyword-file options. For every collection made in a report the collection is parsed for such items, and when items are found SoS will generate an obfuscated replacement for it, and in all places that item is found replace the text with the obfuscated replacement mapped to it. These mappings are saved locally so that future iterations will maintain the same consistent obfuscation pairing. In the case of IP addresses, support is for IPv4 and IPv6 - effort is made to keep network topology intact so that later analysis is as accurate and easily understandable as possible. If an IP address is encountered that we cannot determine the netmask for, a private IP address from 172.17.0.0/22 range is used instead. For IPv6, note that IPv4-mapped addresses, e.g. ::ffff:10.11.12.13, are NOT supported currently, and will remain unobfuscated. For hostnames, domains are obfuscated as whole units, leaving the TLD in place. For instance, 'example.com' may be obfuscated to 'obfuscateddomain0.com' and 'foo.example.com' may end up being 'obfuscateddomain1.com'. Users will be notified of a 'mapping' file that records all items and the obfuscated counterpart mapped to them for ease of reference later on. This file should be kept private. z6Obfuscate sensitive networking information in a reportautoN /etc/sos/cleaner/default_mappingF) archive_typedomainsdisable_parsersskip_cleaning_filesjobskeywords keyword_filemap_file no_updatekeep_binary_filestarget usernamesc s|st|||d|_n|d|_|d|_|d|_|d|_|d|_d|_t|jdsl|jj |j_ d |j_ t d |_t d |_tjtj|jd dd |||_td||_|j|_|jjd |_|jjrtj|jjnd}|j||jj g}t!|t"|t#|t$|t%|t&|g|_'|jj(D]v}|j'D]h} | j)*j+dddd} | ,} |*,| krL|-d| |j.d|d|j'/| qLqBt0t1t2t3t4t5t6g|_7d|_8|-d|jdS)NToptionstmpdirsys_tmppolicymanifestFr&rsosZsos_uicleanerexist_ok?z/etc/sos/cleanerparser)maxsplitrzDisabling parser: zDisabling the 'zP' parser. Be aware that this may leave sensitive plain-text data in the archive.z#Cleaner initialized. From cmdline: )9super__init__ from_cmdlineoptsr/r0r1r2hasattrthreadsr&r"loggingZ getLoggersoslogui_logosmakedirspathjoinreview_parser_values load_map_fileZcleaner_mappingumaskin_placeZget_preferred_hash_name hash_name components add_section cleaner_mdr)dirnamer%r rr rr r parsersr$namelowersplitstriplog_infowarningremoverr rrrrr archive_typesnested_archive) selfr8argsZcmdlinerKZ hook_commons cleaner_dirZ parser_args_parserZ_loadedZ_tempZ _loaded_name __class__rrr<fst                 zSoSCleaner.__init__cCsd|rd|ndd|S)Nz[cleaner:r!z] rr[msgcallerrrr _fmt_log_msgszSoSCleaner._fmt_log_msgcCs|j|||dSr)rBdebugrerbrrr log_debugszSoSCleaner.log_debugcCs|j|||dSr)rBinforerbrrrrVszSoSCleaner.log_infocCs|j|||dSr)rBerrorrerbrrr log_errorszSoSCleaner.log_errorcCs|d||jdS)NzSoS Cleaner Detailed Help)Z set_titleadd_text__doc__)clssectionrrr display_helps zSoSCleaner.display_helpc Csi}d}tj|jjr,td|jjdtj|jjs`|jj|kr|d|jjdnt|jjdddz}zt |}WnZt j y|d Yn>ty}z&|d |jjd |WYd }~n d }~00Wd n1s0Y|S) zVerifies that the map file exists and has usable content. If the provided map file does not exist, or it is empty, we will print a warning and continue on with cleaning building a fresh map r zRequested map file z is a directoryzERROR: map file z6 does not exist, will not load any obfuscation matchesrutf-8encodingzOERROR: Unable to parse map file, json is malformed. Will not load any mappings.zERROR: Could not load '': N) rDrFisdirr>r) ExceptionexistsrjopenjsonloadZJSONDecodeError)r[Z_confZ default_mapmferrrrrrIs* <zSoSCleaner.load_map_filec Cs|d}|jdtd|j||jjsz tdWnVtyf|jd|dYn0t y}z|d|WYd}~n d}~00dS) zWhen we are directly running `sos clean`, rather than hooking into SoSCleaner via report or collect, print a disclaimer banner aThis command will attempt to obfuscate information that is generally considered to be potentially sensitive. Such information includes IP addresses, MAC addresses, domain names, and any user-provided keywords. Note that this utility provides a best-effort approach to data obfuscation, but it does not guarantee that such obfuscation provides complete coverage of all such data in the archive, or that any obfuscation is provided to data that does not fit the description above. Users should review any resulting data and/or archives generated or processed by this utility for remaining sensitive content before being passed to a third party. z sos clean (version z) z- Press ENTER to continue, or CTRL-C to quit. z Exiting on user cancelr9N) Z_fmt_msgrCrhrr>ZbatchinputKeyboardInterrupt_exitrv)r[rcerrrprint_disclaimers     zSoSCleaner.print_disclaimercCsd|_|dd}|jdddd|jdd gd d d |jd dgdd|jddgddd|jdddgddd|jdddtdd|jddgdd d|jd!dd"d#d$|jd%d&d'd(d)|jd*d+d,d-d.d/|jd0d,d-d1d2d3|jd4d5gdd6d/dS)7Nzsos clean|mask TARGET [options]zCleaner/Masking Optionsz7These options control how data obfuscation is performedr,ZTARGETz%The directory or archive to obfuscate)metavarhelpz--archive-typer)rreportZcollectZinsightszdata-dirtarballz8Specify what kind of archive the target was generated as)defaultchoicesrz --domainsextendz!List of domain names to obfuscate)actionrrz--disable-parsersr$zCDisable specific parsers, so that those elements are not obfuscated)rrdestrz--skip-cleaning-filesz--skip-masking-filesr%zBList of files to skip/ignore during cleaning. Globs are supported.z-jz--jobsrz&Number of concurrent archives to clean)rtyperz --keywordsr'zList of keywords to obfuscatez--keyword-filer(z&Provide a file a keywords to obfuscate)rrrz --map-filer)r z;Provide a previously generated mapping file for obfuscation)rrrz --no-updater*F store_truezr,)r[rFrrrset_target_path0szSoSCleaner.set_target_pathcCsd}|jjdkrN|jjdd}|jD]$}|j|kr&||jj|j|jj}q&n4|jD],}||jjrT||jj|j|jj}qqT|sdS||_ |j ||j r|j ||j |||_|jr|jj|j_dS)zThe target path is not a directory, so inspect it for being an archive or an archive of archives. In the event the target path is not an archive, abort. Nr-_)r>r"replacerY type_namer,r/r+Z check_is_type main_archive report_pathsappend is_nestedrZget_nested_archivesrXrZ descriptionui_name)r[Z_arcZ check_typearchiverrrrinspect_target_archive7s0        z!SoSCleaner.inspect_target_archivecCsJ|jjD]&}t|ddkrtd|dqdd|jjD|j_dS)zCheck any values passed to the parsers via the commandline: - For the --domains option, ensure that they are valid for the parser in question. - Convert --skip-cleaning-files from globs to regular expressions. .zInvalid value 'z0' given: --domains values must be actual domainscSsg|]}t|qSr)fnmatch translate).0prrr cz3SoSCleaner.review_parser_values..N)r>r#lenrTrvr%)r[Z_domrrrrHWs  zSoSCleaner.review_parser_valuesc Cs|jjd|j_|jjdddd|_|jr>|g|_tj |jjst|j d|jj| d||js|j d| dg|_|jD]}|jdkr|jq||||js|jrd S|j d | d|j d t|jd |}||}||||jrZd d|jD}||fSd }t|jdkrx|}n|jd}|j}| |j}|d ur|!|ddd|j"} t#tj $|j%| ddd} | &|Wd n1s0Y|'tj $|j%|!|dd}t()||t*|} |j d||j d|d|j dt+| j,|j dt-| j.j/d|j d|0d S)a,SoSCleaner will begin by inspecting the TARGET option to determine if it is a directory, archive, or archive of archives. In the case of a directory, the default behavior will be to edit the data in place. For an archive will we unpack the archive, iterate over the contents, and then repack the archive. In the case of an archive of archives, such as one from SoSCollector, each archive will be unpacked, cleaned, and repacked and the final top-level archive will then be repacked as well. /z.tarrz*Invalid target: no such file or directory r9z'No valid archives or directories found zHostname ParserNz#No reports obfuscated, aborting... z Successfully obfuscated z report(s) cSsg|] }|jqSr)final_archive_path)rarrrrrz&SoSCleaner.execute..rwrqrrz2A mapping of obfuscated elements is available at z) The obfuscated archive is available at  z Size z Owner zcPlease send the obfuscated archive to your support representative and keep the mapping file private)1r>r,rstriprTarc_namer=rrrDrFrwrCrirrcompleted_reportsrQrRmappingZset_initial_countspreload_all_archives_into_mapsgenerate_parser_item_regexesobfuscate_report_pathsrKrhrcompile_mapping_dictwrite_map_for_archivewrite_map_for_configwrite_stats_to_manifestrebuild_nested_archiverget_new_checksumobfuscate_stringrLrxrGr0writewrite_cleaner_logshutilmovestatrst_sizerst_uidpw_nameZcleanup) r[r8_mapmap_pathZ arc_pathsZ final_pathZarc_pathrchecksumZ chksum_namecfZarcstatrrrexecutefs                *    zSoSCleaner.executec Cs|jd}|j|d|jD]J}|jdd}||j}|durd|d|j}|jj||dqt |j j D]X\}}}|D]H} t j || } | |j j d}|d}|jj| |dt | qqv|jd d |j|jjS) zHandles repacking the nested tarball, now containing only obfuscated copies of the reports, log files, manifest, etc... z -obfuscated)rRrrNz checksums/rrTr)rZ setup_archiverrrTrrLrZ add_stringrDwalkrZextracted_pathrFrGlstripadd_filerXrfinalizer>Zcompression_type) r[rrZarc_destrZdnameZdirnrfilesfilenamefnamerrrrs"      z!SoSCleaner.rebuild_nested_archivecCs2i}|jD]"}i||j<||j|q |S)aBuild a dict that contains each parser's map as a key, with the contents as that key's value. This will then be written to disk in the same directory as the obfuscated report so that sysadmins have a way to 'decode' the obfuscation locally )rQZ map_file_keyupdateZget_map_contents)r[rr8rrrrs   zSoSCleaner.compile_mapping_dictcCsFt|ddd$}|tj|ddWdn1s80Y|S)zjWrite the mapping to a file on disk that is in the same location as the final archive(s). rrqrrr)indentN)rxrrydumps)r[rrFr{rrrwrite_map_to_files2zSoSCleaner.write_map_to_filec Cshz,tj|j||jd}|||WStyb}z|d|WYd}~dSd}~00dS)Nz -private_mapz"Could not write private map file: ) rDrFrGr0rrrrvrj)r[rrr|rrrrsz SoSCleaner.write_map_for_archivec Cs|jjr|jjstj|jj}z6tj|dd|||jj|d|jjWn4t y}z| d|WYd}~n d}~00dS)z}Write the mapping to the config file so that subsequent runs are able to provide the same consistent mapping Tr5zWrote mapping to z&Could not update mapping config file: N) r>r)r*rDrFrPrErrgrvrj)r[rr]r|rrrrszSoSCleaner.write_map_for_configcCstj|j|jd}t|ddd6}|jd|jD]}| |q>Wdn1sb0Y|r| ||j j |dddS) zWhen invoked via the command line, the logging from SoSCleaner will not be added to the archive(s) it processes, so we need to write it separately to disk z-obfuscation.logrrqrrrNzsos_logs/cleaner.logr) rDrFrGr0rrxZ sos_log_fileseek readlinesrobfuscate_filerr)r[rZlog_nameZlogfilelinerrrrs * zSoSCleaner.write_cleaner_logc Cszhd}t|dF}t|j}||}|s.q:||q|dWdWS1s\0YWn4ty}z|d|WYd}~n d}~00dS)zvCalculate a new checksum for the obfuscated archive, as the previous checksum will no longer be valid irbrNz!Could not generate new checksum: ) rxhashlibnewrLreadrZ hexdigestrvrg)r[ archive_pathZ hash_sizeZ archive_fpZdigestZhashdatar|rrrrs    0&zSoSCleaner.get_new_checksumcCszdt|jd|jjd}|j||jjr>|jd|jD]"}|jd|j| |qD|j r| | |j Wn(t y|jdt dYn0dS) zPerform the obfuscation for each archive or sos directory discovered during setup. Each archive is handled in a separate thread, up to self.opts.jobs will be obfuscated concurrently. zFound z. total reports to obfuscate, processing up to z! concurrently within one archive zpWARNING: binary files that potentially contain sensitive information will NOT be removed from the final archive z Obfuscating zExiting on user cancelr}N)rrr>r&rCrhr+rWrobfuscate_reportrZ_replace_obfuscated_archivesrrDr)r[rcZ report_pathrrrr1s&     z!SoSCleaner.obfuscate_report_pathscCsV|jD]J}t|j|jj}|jdd}tj ||}t |j|||_qdS)zWhen we have a nested archive, we need to rebuild the original archive, which entails replacing the existing archives with their obfuscated counterparts rrN) rrDrXrrZrrrTrFrGrr)r[rrrZ dest_namerrrrOs  z'SoSCleaner._replace_obfuscated_archivescCs|jD] }|qdS)zFor the parsers that use prebuilt lists of items, generate those regexes now since all the parsers should be preloaded by the archive(s) as well as being handed cmdline options and mapping file configuration. N)rQZgenerate_item_regexes)r[r8rrrr\s z'SoSCleaner.generate_parser_item_regexesc Cs.|jD]}|jd}|||D]}||}|sBq.|d|d|d|j| D]T}z| |Wqht y}z(|d|d|d|WYd}~qhd}~00qhq.| ||} | r|d|d |j| D]} |j | q|j|D]} |j | qq||jdS) a* For each archive we've determined we need to operate on, pass it to each prepper so that we can extract necessary files and/or items for direct regex replacement. Preppers define these methods per parser, so it is possible that a single prepper will read the same file for different parsers/mappings. This is preferable to the alternative of building up monolithic lists of file paths, as we'd still need to manipulate these on a per-archive basis. :param archive: The archive we are currently using to prepare our mappings with :type archive: ``SoSObfuscationArchive`` subclass :param prepper: The individual prepper we're using to source items :type prepper: ``SoSPrepper`` subclass rz Prepping z parser with file z from zFailed to prep z map from : Nz mapping with items from )rQrRrSrTrUZget_parser_file_listZget_file_contentrgr splitlinesZ parse_linervZget_items_for_mapraddZ regex_itemsZadd_regex_item set_parsers) r[rprepperr^ZpnameZ_fileZcontentrr|Z map_itemsitemZritemrrr_prepare_archive_with_prepperds4     z(SoSCleaner._prepare_archive_with_prepperccsZttjj}g}|D]}|td|qt|dddD]}||jdVqBdS)a Discover all locally available preppers so that we can prepare the mappings with obfuscation matches in a controlled manner :returns: All preppers that can be leveraged locally :rtype: A generator of `SoSPrepper` items zsos.cleaner.preppers.cSs|jSr)priority)xrrrrz)SoSCleaner.get_preppers..)key)r.N) rr3r4ZpreppersZ get_modulesrrsortedr>)r[helperZprepsZ_preprrrr get_prepperss   zSoSCleaner.get_prepperscCsB|d|D]}|jD]}|||qq|j|jdS)aBefore doing the actual obfuscation, if we have multiple archives to obfuscate then we need to preload each of them into the mappings to ensure that node1 is obfuscated in node2 as well as node2 being obfuscated in node1's archive. z.Pre-loading all archives into obfuscation mapsN)rVrrrrrrQ)r[rrrrrrs    z)SoSCleaner.preload_all_archives_into_mapsc szjj}t}|d|js4dt d}}}fddt j j D}tj j jd^}|t|fddt j j D} | D]"\} } } || 7}|| 7}|| 7}qWdn1s0YzWn<tyB} z"jd | jd WYd} ~ n d} ~ 00zWn<ty} z"jd | jd WYd} ~ n d} ~ 00js0}|r$d z j|WnRty"} z8d jd| d| WYd} ~ WdSd} ~ 00jt}|d||d|||d||d|d}|rd}||}d|Wn@ty} z&jdjd| WYd} ~ n d} ~ 00dS)zIndividually handle each archive or directory we've discovered by running through each file therein. Positional arguments: :param archive str: Filepath to the directory or archive start_timezBeginning obfuscation...rcsg|]}qSrrrirrrrrz/SoSCleaner.obfuscate_report..) max_workersZ initializercsg|]}|djjqSr)r>r&r) file_listr[rrrrNz!Failed to obfuscate directories: rdzFailed to obfuscate symlinks: zRe-compressing...zArchive z failed to compress: zFailed to re-compress archive: end_timeZrun_timeZfiles_obfuscatedZtotal_substitutionsr!z! [removed %s unprocessable files]zObfuscation completedzException while processing r) rOrN archive_namerZnow add_fieldZ is_extractedextractZ report_msglistZ get_filesranger>r&rZload_parser_entriesmaprobfuscate_directory_namesrvrVobfuscate_symlinksrZget_compressionZrename_top_dirrcompressrgrrrCrh)r[rZarc_mdrZfiles_obfuscated_countZtotal_sub_countZremoved_file_countZ archive_listexecutorZfuturesZfocZtscZrfcr|methodrZrmsgr)rrr[rrs       (        zSoSCleaner.obfuscate_reportcCs|j|gdSr)rr)r[rrrrrszSoSCleaner.obfuscate_filec s|jd|jd|D]}z||jddfdd|jD}|sb|ddWq|jd |jdt |}tj |j| }| |}||ks||krt |t||Wqty}z"|d |d |WYd }~qd }~00qd S) aIterate over symlinks in the archive and obfuscate their names. The content of the link target will have already been cleaned, and this second pass over just the names of the links is to ensure we avoid a possible race condition dependent on the order in which the link or the target get obfuscated. :param archive: The archive being obfuscated :type archive: ``SoSObfuscationArchive`` zObfuscating symlink namesrr9rcs(g|] }tfdd|jDs|qS)c3s|]}|VqdSr)match)rZ_skipZ_symrr *rz;SoSCleaner.obfuscate_symlinks...)anyZ skip_patterns)rZ_prrrr(sz1SoSCleaner.obfuscate_symlinks..z Skipping obfuscation of symlink z due to skip pattern matchzObfuscating symlink zError obfuscating symlink 'rtN)rVrZ get_symlinksrTrrrQrgrDreadlinkrFrGrrXsymlinkrv)r[rrZ_parsers_targetZ _ob_sym_nameZ _ob_targetr|rrrrs2         zSoSCleaner.obfuscate_symlinkscCs|d|jt|ddD]~}t|D]n}tj||}||j d}tj |r0| |}||kr0| |}tj|j | d|}t||q0q"dS)zFor all directories that exist within the archive, obfuscate the directory name if it contains sensitive strings found during execution z'Obfuscating directory names in archive T)reverserrN)rVrrZget_directory_listrDlistdirrFrGrTrrurrrrename)r[rdirpath_nameZ_dirnameZ_arc_dirZ _ob_dirnameZ _ob_arc_dirrrrrEs"   z$SoSCleaner.obfuscate_directory_namesc CsT|jD]H}z||}WqtyL}z|d|WYd}~qd}~00q|S)NzError obfuscating string data: )rQZparse_string_for_keysrvrV)r[Z string_datar8r|rrrr\s  (zSoSCleaner.obfuscate_stringcCsL|jd}|jD]4}||jdd}|dt|jj qdS)zLWrite some cleaner-level, non-report-specific stats to the manifest rQ rentriesN) rOrNrQrRrrSrrrZdatasetkeys)r[Z parse_secr8Z_secrrrrds  z"SoSCleaner.write_stats_to_manifest)NNNFN)N)N)N)N)F)(__name__ __module__ __qualname__rlZdescZ arg_defaultsr<rergrVrj classmethodrorIrrrrrHrrrrrrrrrrrrrrrrrrrr __classcell__rrr_rr-sf'N      1 a    , i.r),rryrArDrrconcurrent.futuresrrpwdrZsos.cleaner.preppersr3rZ sos.componentrZsos.cleaner.parsers.ip_parserrZsos.cleaner.parsers.mac_parserrZ#sos.cleaner.parsers.hostname_parserr Z"sos.cleaner.parsers.keyword_parserr Z#sos.cleaner.parsers.username_parserr Zsos.cleaner.parsers.ipv6_parserr Zsos.cleaner.archives.sosr rrrZsos.cleaner.archives.genericrrZsos.cleaner.archives.insightsrZ sos.utilitiesrrrrrrrrr s.