Advice on using many different methods, some lawful, some not.
Category Archives: oa.search
Measuring Metadata Impacts: Books Discoverability in Google Scholar – The Scholarly Kitchen
“The scholarly publishing community talks a LOT about metadata and the need for high-quality, interoperable, and machine-readable descriptors of the content we disseminate. However, as we’ve reflected on previously in the Kitchen, despite well-established information standards (e.g., persistent identifiers), our industry lacks a shared framework to measure the value and impact of the metadata we produce.
In 2021, we embarked on a Crossref-sponsored study designed to measure how metadata impacts end-user experiences and contributes to the successful discovery of academic and research literature via the mainstream web. Specifically, we set out to learn if scholarly books with DOIs (and associated metadata) were more easily found in Google Scholar than those without DOIs.
Initial results indicated that DOIs have an indirect influence on the discoverability of scholarly books in Google Scholar — however, we found no direct linkage between book DOIs and the quality of Google Scholar indexing or users’ ability to access the full text via search-result links. Although Google Scholar claims to not use DOI metadata in its search index, the results of our mixed-methods study of 100+ books (from 20 publishers) demonstrate that books with DOIs are generally more discoverable than those without DOIs….”
Preprints als Informationsquelle besser nutzbar machen – TH Köln
From Google’s English: “In the project PIXLS – Preprint Information eXtraction for Life Sciences, TH Köln and ZB MED will develop an application over the next three years that automatically opens up the preprint server. This enables the research community to make better use of current information that was published on preprint servers – and therefore hardly appears in classic detection and search systems.
The German Research Foundation (DFG) is funding the project as part of the e-Research Technologies framework programme.”
DFG – Deutsche Forschungsgemeinschaft – Deutsche Forschungsgemeinschaft schafft Grundlagen für die Veröffentlichung von Abschlussberichten
From Google’s English: “Recipients of grants from the German Research Foundation (DFG) are obliged to report on their work and the results obtained after completing their project. The reports serve to account for the use of public funds and provide information about the success of the funding and for the further development of funding programs….
In order to broaden the scientific information base and to contribute to the necessary culture change in scientific publishing, the DFG Executive Board has decided to make final reports of DFG projects easier to access and to make the scientific results section from project reports publicly accessible….
In future, grant recipients will be asked to make part of the final report intended for publication accessible in suitable repositories. The publication is supported by corresponding templates, which specify a structuring into a part intended for publication and a non-public part. In addition, the DFG provides a non-binding white list that identifies at least one possible place of publication for each scientific area according to tested quality standards….
For most applications approved after January 1, 2023, the templates provided are mandatory when preparing the final report. Projects that were approved at an earlier point in time can also use the templates. From summer 2023 it will be possible to send the link to the repository to the DFG via the elan application portal and link the reports in GEPRIS….”
The State of Journal Production and Access 2022: Report on survey of independent academic publishers
“Among the main findings on the topic of journal production were:
• Compared to 2020, there was apparent growth in journals producing HTML articles.
• Full-text XML article production remained flat since 2020 (38% in 2020 and 2022).
• 50%+ respondents included ORCIDs and DOIs in metadata, but other PIDs like author/ contributor roles, funder IDs, and organizational IDs had lower adoption rates. That said, some PIDs increased across the two surveys, including Funder ID (20% in 2022 versus 16% in 2020) and CRediT (22% in 2022 versus 16% in 2020).
• Most respondents said PDF and HTML are the most important article formats for their readers, as well as reaching publishing program goals.
• When asked to rate their publishers’ primary production goals, most respondents chose “journal/article search engine optimization” (86% reported that this was “very” or “somewhat” important).
Among the main findings on the topic of journal access were:
• 95% of respondents said at least one of their publisher’s journals offered OA options.
• 80% of respondents said their organization utilizes fully-OA publishing models.
• When asked to rate their publishers’ primary funding/revenue priorities, most respondents chose “identifying viable funding model(s) for publishing one or more fully-OA journals” (68% reported it’s “very” or “somewhat” important).
• Institutional subsidies and grants were seen as having the highest OA funding potential…”
Metadata on open access books in BASE – ScienceOpen
Abstract: BASE (Bielefeld Academic Search Engine) is one of the world’s largest search engines for academic documents on the web, with references to around 310 million documents. Cross-system metadata communication and standardization are fundamental prerequisites for the development of BASE and comparable information systems. The presentation focuses on the metadata of open access books and book chapters in BASE, whose share of the total index has increased significantly in recent years, and illustrates the challenge of further dissemination of open access book metadata.
Google’s Got A Secret – Knuckleheads’ Club
“Bandwidth costs money, so there’s a limit to how much and how often website operators will let their websites be crawled. This limit means that website operators are picky about who they let crawl their websites. Only a select few crawlers are allowed access to the entire web, and Google is given extra special privileges on top of that. This isn’t illegal and it isn’t Google’s fault, but this monopoly on web crawling that has naturally emerged prevents any other company from being able to effectively compete with Google in the search engine market.
There Should Be A Public Cache Of The Web
All of Google’s competitors in the search engine market have failed in their own way but most of them have complained bitterly about how Google has such an advantage when it comes to web crawling. We think that there is clearly a failure in this market and government intervention is required to break Google’s hold on the natural monopoly of crawling the web….”
Anna’s Archive
“Search engine of shadow libraries: books, papers, comics, magazines. ?? Z-Library, Library Genesis, Sci-Hub. ?? Fully resilient through open source code and data. ?? Spread the word: everyone is welcome here!…”
We Need an Open-Source Approach to Weed Out Bad Quality Patents
“Open-source development can solve these problems. Everyone has an interest in ensuring that only valid patents get issued. Patent owners cannot continue to spend millions of dollars defending their patents. And accused infringers cannot continue to face the decision to modify their products, spend millions in litigation or risk billions in patent damages.
Solving this problem in a fair, fast, and thorough way will require a cooperative effort. This immediately brings to mind an open-source approach. Open-source software (OSS) is a significant driver of freedom, trust and innovation in the digital age….”
We Need an Open-Source Approach to Weed Out Bad Quality Patents
“Open-source development can solve these problems. Everyone has an interest in ensuring that only valid patents get issued. Patent owners cannot continue to spend millions of dollars defending their patents. And accused infringers cannot continue to face the decision to modify their products, spend millions in litigation or risk billions in patent damages.
Solving this problem in a fair, fast, and thorough way will require a cooperative effort. This immediately brings to mind an open-source approach. Open-source software (OSS) is a significant driver of freedom, trust and innovation in the digital age….”
Google Scholar – Platforming the Scholarly Economy | Internet Policy Review
Abstract: Google Scholar has become an important player in the scholarly economy. Whereas typical academic publishers sell bibliometrics, analytics and ranking products, Alphabet, through Google Scholar, provides “free” tools for academic search and scholarly evaluation that have made it central to academic practice. Leveraging political imperatives for open access publishing, Google Scholar has managed to intermediate data flows between researchers, research managers and repositories, and built its system of citation counting into a unit of value that coordinates the scholarly economy. At the same time, Google Scholar’s user-friendly but opaque tools undermine certain academic norms, especially around academic autonomy and the academy’s capacity to understand how it evaluates itself.
Discover, Create, and Publish your research paper | SciSpace by Typeset
“Our struggle with Word and LaTeX in formatting journal submissions and academic assignments led us to build SciSpace. We realised that no one had designed a platform that was dedicated to meet the needs of people like you, who generate billions of pieces of academic work each year. We found that Word and Google Docs are unstructured and need constant re-editing and re-formatting, while LaTeX is too hard for most researchers. SciSpace intends to be the perfect bridge – ease of intuitive writing and collaboration, with the rigor and power of LaTeX.
We have been working at it since 2014 and have been in beta for over a year. During this period we’ve collected feedback from thousands of you, and we are grateful to our early users. It helped us identify pain points and build industry-leading features on SciSpace. What you see today, is the work of thousands of man-hours that have created self-learning journal and thesis builders, that make sure you have a 100% compliant submission with zero errors.
We are committed to Open Standards as well as keeping our platform open, and you can export every letter you write on SciSpace without any ado, if we fail to live up to your expectations. Till date, we’ve created journal builders for over 14,000 journals and scores of assignment, and thesis templates.
We are adding to our library by the hundreds every week, and every dollar that you spend on SciSpace is invested in building out more features that will help you save time, get accuracy and enjoy the process of writing research.
Go ahead, give our baby a test drive and let us know what you feel on feedback@typeset.io
And yes, if you like our work, please do consider joining the growing SciSpace Community and spread the word.”
GoTriple
“The TRIPLE project was launched in October 2019. The acronym TRIPLE stands for “Transforming Research through Innovative Practices for Linked Interdisciplinary Exploration”. TRIPLE consists of a consortium of 21 partners from 13 European countries and is coordinated from France by the Research Infrastructure Huma-Num, a unit of the French National Centre for Scientific Research (CNRS). Project and scientific coordinator is Suzanne Dumouchel, who is also Co-coordinator of OPERAS and Member of the Board of Directors of the EOSC Association. She is supported by the TRIPLE team with currently around 90 staff members working in one or more of the 8 work packages.
At the heart of the project is the development of the GoTriple platform, an innovative multilingual and multicultural discovery solution. It will be one of the dedicated services of OPERAS, the Research Infrastructure supporting open scholarly communication in the social sciences and humanities (SSH) in the European Research Area….”
Similarity search on millions of books, in-browser / Benjamin Schmidt / Observable
“Keyword search remains dominant for books, but at some point, whether they know it or not, everyone will probably be searching vectorized representations. This notebook tries out some methods for textual similarity search across a large corpus of books based on vectorized representations.
Back in 2018, it took me a lot of effort to set up an approximate nearest-neighbors search on a server. Now in 2022, new technologies and new tricks that make it possible to search across 2 million+ books in dozens of languages without even having a server. In this demo notebook, I load exactly 2 million books; it would be quite easy to scale up significantly higher, although it might take a minute to download representations of ten to twenty million books….”
How to search for images you can (legally) use for free – The Verge
“If you’re looking for an image that you can repurpose for one of your projects and aren’t able to take a photo yourself, there are a ton of free images you can use online without running into any copyright issues — you just have to know where to look.
Here, we’ll go over different places where you can search for free images on the web. It’s worth noting that when searching for free images, you’ll often come across the Creative Commons (CC) license that lets you use an image for free. But depending on the type of CC license an image has, there may be some limitations that require you to credit the original artist or prevent you from making modifications to the image….”